问题
想提升摄影水平时,真正高质量的反馈并不容易获得:
- 熟人反馈往往过于礼貌
- 社区意见经常相互矛盾
- 专业点评成本高、周期长
更关键的是,很多 AI 工具给的是“泛化表扬”,缺少可执行建议。我需要的是:为什么这张图成立?下一步应该具体怎么改?
解决思路
AI Photo Critique 的核心不是“让模型猜”,而是“给模型正确知识”。
我选择了 RAG,而不是微调:
- 微调成本高、更新慢
- RAG 可以随时更新知识库,持续改进输出质量
整体链路如下:
- 用户上传照片
- GPT-4o Vision 生成结构化图像描述
- 用描述去 Pinecone 检索摄影知识片段
- 将检索结果与图像描述一起喂给模型,生成结构化点评
输出格式设计
为了让反馈真正可用,我把输出固定为两部分:
## 表现好的部分## 可改进的部分
这和专业摄影评审的表达方式一致,也更便于快速阅读和执行。
RAG 流水线示例
步骤 1:视觉模型生成描述
{
"image_description": "A vertical photograph of the Golden Gate Bridge, partially obscured by a thick layer of fog. The iconic red-orange tower on the right is prominent, rising above the low-hanging clouds. The foreground is the dark, choppy water of the bay. The composition follows the rule of thirds, with the main tower positioned off-center. The mood is somber and atmospheric due to the fog."
}
步骤 2:检索相关知识
{
"retrieved_chunks": [
{
"source": "composition_guide.pdf",
"text": "The Rule of Thirds is a fundamental principle. By placing key elements along the lines or at their intersections, you create a more balanced and engaging photograph than simply centering the subject."
},
{
"source": "landscape_photography.pdf",
"text": "Atmospheric conditions like fog or mist can be powerful tools for creating mood and a sense of depth. Use them to obscure parts of the scene, adding mystery and drawing focus to your primary subject."
}
]
}
步骤 3:构造增强提示并生成点评
{
"system_prompt": "You are an expert, objective photography critic. Your goal is to provide honest, professional feedback to help a photographer improve their craft.",
"context": [
"The Rule of Thirds is a fundamental principle. By placing key elements along the lines or at their intersections, you create a more balanced and engaging photograph than simply centering the subject.",
"Atmospheric conditions like fog or mist can be powerful tools for creating mood and a sense of depth. Use them to obscure parts of the scene, adding mystery and drawing focus to your primary subject."
],
"image_description": "A vertical photograph of the Golden Gate Bridge, partially obscured by a thick layer of fog. The iconic red-orange tower on the right is prominent, rising above the low-hanging clouds. The composition follows the rule of thirds, with the main tower positioned off-center.",
"instructions": "Generate a critique formatted in Markdown. Structure your response with two main headings: '## What Works Well' and '## Areas for Improvement'."
}
工程化保障
成本与滥用控制
使用 Upstash Redis 做按 IP 限流,避免恶意调用和成本失控。
上传校验
提前校验图片类型与大小,减少无效请求和超时。
异常兜底
- 视觉模型失败:返回友好错误并支持重试
- Pinecone 不可用:降级为无检索模式,保证核心流程可用
收获
- RAG 在“领域知识 + 持续更新”场景下比微调更实用
- Prompt 设计对输出质量影响极大
- 多模态链路需要端到端监控和清晰降级策略
