The problem
If you're trying to improve as a photographer, getting useful feedback is surprisingly hard. Friends say "nice shot." Online forums give contradictory advice. Professional critiques cost money and take days. I kept running into the same wall: I knew my photos could be better, but I couldn't figure out why or what to change.
The real frustration wasn't the lack of feedback; it was the lack of specific, actionable feedback. Generic AI tools would say things like "good composition" without explaining what made it good. Or worse, they'd give advice that contradicted basic photography principles. I wanted something that could look at a photo and tell me exactly what was working and what I could improve, grounded in actual photography knowledge.
How I built the solution
AI Photo Critique started with a simple question: what if I could give an AI the same knowledge base that a professional photography instructor has? Instead of hoping a generic model knew about the rule of thirds or leading lines, I could ensure it did by feeding it curated photography principles at the moment of critique.
This is where Retrieval-Augmented Generation (RAG) comes in. Rather than fine-tuning a model (expensive, inflexible, and a maintenance headache), RAG lets you inject relevant knowledge into the prompt dynamically. The AI stays current because you update the knowledge base, not the model.
Here's what I built:
- A multimodal analysis pipeline: When you upload a photo, GPT-4o Vision first analyzes the image and generates a detailed text description. This isn't just "a bridge in fog"; it identifies composition techniques, lighting conditions, mood, and technical elements. That description becomes the search query for the next step.
- A curated photography knowledge base: I built a vector database in Pinecone containing principles from composition guides, lighting tutorials, and genre-specific advice. When the image description comes in, the system finds the most relevant chunks, maybe the rule of thirds for a landscape, or catchlight positioning for a portrait.
- Context-aware critique generation: The retrieved knowledge and image description are combined into an augmented prompt. This gives GPT-4o everything it needs to generate feedback that's grounded in actual photography principles, not hallucinated advice.
- LangChain.js for orchestration: The whole pipeline is coordinated through LangChain.js, which handles the chunking, embedding, retrieval, and prompt assembly. The frontend is built with Next.js and Tailwind CSS for a clean, responsive experience.
Designing the experience
Before writing any code, I thought about what a photographer actually needs from a critique tool. The answer: clarity and structure. A wall of text doesn't help anyone improve. Neither does vague praise.
I landed on a two-part format: "What Works Well" and "Areas for Improvement." This isn't arbitrary; it mirrors how professional portfolio reviews are structured. You acknowledge strengths before discussing growth areas. It also makes the feedback scannable; you can jump straight to what you want to work on.
The critique is rendered in Markdown for readability. Headers, bullet points, and emphasis make it easy to digest. I explicitly instruct the AI to use this format in the system prompt. Without that constraint, models tend to ramble. Structured output took iteration to get right, but it made the difference between feedback that feels professional and feedback that feels like a chatbot response.
I also considered the upload experience. Photographers want to see their image alongside the critique, not navigate away from it. The UI keeps the photo visible while the analysis loads, with clear progress indication so you know something is happening. Small touches, but they make the tool feel responsive rather than frustrating.
Peeking under the hood: the RAG pipeline
To make this process tangible, here's a look at the actual data flowing through the system for a sample photo of the Golden Gate Bridge:
Step 1: Vision Model Generates a Description
The system first sends the image to GPT-4o Vision. The model returns a detailed text description that serves as the basis for the search query.
{
"image_description": "A vertical photograph of the Golden Gate Bridge, partially obscured by a thick layer of fog. The iconic red-orange tower on the right is prominent, rising above the low-hanging clouds. The foreground is the dark, choppy water of the bay. The composition follows the rule of thirds, with the main tower positioned off-center. The mood is somber and atmospheric due to the fog."
}
Step 2: RAG Pipeline Retrieves Relevant Context
This description is converted to a vector and used to search the Pinecone database. The system retrieves the most relevant chunks of photography knowledge.
{
"retrieved_chunks": [
{
"source": "composition_guide.pdf",
"text": "The Rule of Thirds is a fundamental principle. By placing key elements along the lines or at their intersections, you create a more balanced and engaging photograph than simply centering the subject."
},
{
"source": "landscape_photography.pdf",
"text": "Atmospheric conditions like fog or mist can be powerful tools for creating mood and a sense of depth. Use them to obscure parts of the scene, adding mystery and drawing focus to your primary subject."
}
]
}
Step 3: The Augmented Prompt is Sent to the LLM
Finally, the system combines the description and the retrieved context into a single, powerful prompt. This gives the LLM all the information it needs to generate a high-quality, specific critique.
{
"system_prompt": "You are an expert, objective photography critic. Your goal is to provide honest, professional feedback to help a photographer improve their craft.",
"context": [
"The Rule of Thirds is a fundamental principle. By placing key elements along the lines or at their intersections, you create a more balanced and engaging photograph than simply centering the subject.",
"Atmospheric conditions like fog or mist can be powerful tools for creating mood and a sense of depth. Use them to obscure parts of the scene, adding mystery and drawing focus to your primary subject."
],
"image_description": "A vertical photograph of the Golden Gate Bridge, partially obscured by a thick layer of fog. The iconic red-orange tower on the right is prominent, rising above the low-hanging clouds. The composition follows the rule of thirds, with the main tower positioned off-center.",
"instructions": "Generate a critique formatted in Markdown. Structure your response with two main headings: '## What Works Well' and '## Areas for Improvement'."
}
Handling the real world
A public-facing AI tool needs more than just a good pipeline. It needs to handle abuse, failures, and edge cases gracefully. I learned this early when I soft-launched the app and watched my API costs spike from a few automated requests.
Keeping costs sustainable
For a project using pay-per-use APIs like OpenAI and Pinecone, rate limiting isn't optional; it's survival. I implemented IP-based rate limiting with Upstash Redis. Each IP gets a limited number of critiques per time window. When you hit the limit, you get a clear message explaining when you can try again, not a cryptic error.
This also protects against bots and scrapers. Without rate limiting, a single bad actor could burn through your monthly budget in minutes.
Image validation
Not every upload is a valid photograph. The app validates file types and sizes before processing. Oversized images get rejected with a helpful message rather than silently failing or timing out. This saves API costs (vision models charge per token, and larger images mean more tokens) and gives users immediate feedback.
When things go wrong
External APIs fail. Networks are flaky. The app handles these gracefully:
- If the vision model times out, users see a friendly error message with a retry option, not a stack trace.
- If Pinecone is unreachable, the system can still generate a critique, but it just won't be augmented with retrieved knowledge. Graceful degradation over complete failure.
- Loading states keep users informed. Nothing is worse than wondering if the app is frozen or actually working.
What I learned
Building AI Photo Critique taught me a lot about working with multimodal AI and building public-facing tools. A few things stuck with me:
RAG beats fine-tuning for domain knowledge. I considered fine-tuning a model on photography critiques, but RAG turned out to be the better choice. It's cheaper, more flexible, and easier to update. When I find a new photography principle worth including, I add it to the vector database, no retraining required. The knowledge base evolves with the tool.
Prompt engineering is invisible but essential. Users don't see the system prompt, but it's doing most of the heavy lifting. The difference between "critique this photo" and a carefully crafted prompt with persona, format instructions, and constraints is enormous. I spent more time iterating on prompts than I expected, and it was worth every minute.
Multimodal pipelines are trickier than single-modality ones. Coordinating an image analysis model with text embedding and retrieval introduces complexity. The vision model's description becomes the input for the next stage, so if it misses something important in the image, the retrieved context will be off too. Testing required real images across different genres, not just synthetic examples.
Public APIs need protection from day one. I naively launched without rate limiting and watched my costs spike. Lesson learned: implement abuse prevention before you need it, not after. Upstash Redis made this straightforward, but I should have planned for it from the start.
Structure makes AI output useful. Unstructured AI responses feel like chatbot noise. Explicitly requesting Markdown formatting with specific sections transformed the output from "interesting" to "actionable." Users can skim, find what they need, and actually improve their photography. The format is as important as the content.
Curating knowledge is ongoing work. The vector database isn't a "set and forget" system. As I use the tool and see what kinds of photos people upload, I discover gaps in the knowledge base. Portrait lighting principles were underrepresented. Street photography advice was too vague. Maintenance is part of the product.
