The Problem
If you've ever tried to get honest, actionable feedback on your photos, you know how tough it can be. Workshops and peer reviews are great, but they're often slow, subjective, or just not available when you need them. I kept wondering: could a modern AI actually give instant, high-quality critiques that help photographers grow?
How I Built the Solution
So, I built AI Photo Critique, a web app that uses a Retrieval-Augmented Generation (RAG) pipeline to deliver nuanced, context-aware feedback. Instead of relying on a generic model, I enhanced the AI's analysis with a curated repository of knowledge on photography.
Here's how it works:
- Image Analysis: You upload an image. OpenAI's GPT-4o Vision model analyzes the visual information and creates a detailed text description.
- Smart Search: That description is used to search a specialized Pinecone vector database for relevant photography principles.
- Enhanced Generation: The most relevant principles from the database are combined with the image description into an "augmented" prompt, which is then sent to GPT-4o to generate the final, context-rich critique.
Peeking Under the Hood: A Log of the RAG Pipeline
To make this process tangible, here's a look at the actual data flowing through the system for a sample photo of the Golden Gate Bridge:
Step 1: Vision Model Generates a Description
The system first sends the image to GPT-4o Vision. The model returns a detailed text description that serves as the basis for the search query.
{
"image_description": "A vertical photograph of the Golden Gate Bridge, partially obscured by a thick layer of fog. The iconic red-orange tower on the right is prominent, rising above the low-hanging clouds. The foreground is the dark, choppy water of the bay. The composition follows the rule of thirds, with the main tower positioned off-center. The mood is somber and atmospheric due to the fog."
}
Step 2: RAG Pipeline Retrieves Relevant Context
This description is converted to a vector and used to search the Pinecone database. The system retrieves the most relevant chunks of photography knowledge.
{
"retrieved_chunks": [
{
"source": "composition_guide.pdf",
"text": "The Rule of Thirds is a fundamental principle. By placing key elements along the lines or at their intersections, you create a more balanced and engaging photograph than simply centering the subject."
},
{
"source": "landscape_photography.pdf",
"text": "Atmospheric conditions like fog or mist can be powerful tools for creating mood and a sense of depth. Use them to obscure parts of the scene, adding mystery and drawing focus to your primary subject."
}
]
}
Step 3: The Augmented Prompt is Sent to the LLM
Finally, the system combines the description and the retrieved context into a single, powerful prompt. This gives the LLM all the information it needs to generate a high-quality, specific critique.
{
"system_prompt": "You are an expert, objective photography critic. Your goal is to provide honest, professional feedback to help a photographer improve their craft.",
"context": [
"The Rule of Thirds is a fundamental principle. By placing key elements along the lines or at their intersections, you create a more balanced and engaging photograph than simply centering the subject.",
"Atmospheric conditions like fog or mist can be powerful tools for creating mood and a sense of depth. Use them to obscure parts of the scene, adding mystery and drawing focus to your primary subject."
],
"image_description": "A vertical photograph of the Golden Gate Bridge, partially obscured by a thick layer of fog. The iconic red-orange tower on the right is prominent, rising above the low-hanging clouds. The composition follows the rule of thirds, with the main tower positioned off-center.",
"instructions": "Generate a critique formatted in Markdown. Structure your response with two main headings: '## What Works Well' and '## Areas for Improvement'."
}
The whole process is orchestrated with LangChain.js and served through a clean, responsive UI built with Next.js and Tailwind CSS.
Reflections & Key Takeaways
Taking this project from idea to live tool taught me a lot:
- Prompt Engineering is Essential: The quality of the AI's output is directly tied to the precision of the system prompt. Iterating on the AI's persona and explicitly requesting Markdown formatting were critical steps to achieving a professional result.
- The Power of RAG: Integrating a RAG pipeline was the single most impactful architectural decision. It grounded the AI's responses in a curated source of truth and dramatically improved the relevance and accuracy of its critiques.
- Cost and Abuse Mitigation: For a public-facing app using pay-per-use APIs, I had to implement IP-based rate limiting with Upstash Redis to prevent abuse and keep the project sustainable.