AI Photo Critique

通过 LangChain、Pinecone 向量数据库与 GPT-4o Vision 设计 RAG 流水线,输出更专业、可执行的摄影点评。

AI/MLRAGLangChain向量数据库提示词工程
AI Photo Critique

问题

想提升摄影水平时,真正高质量的反馈并不容易获得:

  • 熟人反馈往往过于礼貌
  • 社区意见经常相互矛盾
  • 专业点评成本高、周期长

更关键的是,很多 AI 工具给的是“泛化表扬”,缺少可执行建议。我需要的是:为什么这张图成立?下一步应该具体怎么改?

解决思路

AI Photo Critique 的核心不是“让模型猜”,而是“给模型正确知识”。

我选择了 RAG,而不是微调:

  • 微调成本高、更新慢
  • RAG 可以随时更新知识库,持续改进输出质量

整体链路如下:

  1. 用户上传照片
  2. GPT-4o Vision 生成结构化图像描述
  3. 用描述去 Pinecone 检索摄影知识片段
  4. 将检索结果与图像描述一起喂给模型,生成结构化点评

输出格式设计

为了让反馈真正可用,我把输出固定为两部分:

  • ## 表现好的部分
  • ## 可改进的部分

这和专业摄影评审的表达方式一致,也更便于快速阅读和执行。

RAG 流水线示例

步骤 1:视觉模型生成描述

{
  "image_description": "A vertical photograph of the Golden Gate Bridge, partially obscured by a thick layer of fog. The iconic red-orange tower on the right is prominent, rising above the low-hanging clouds. The foreground is the dark, choppy water of the bay. The composition follows the rule of thirds, with the main tower positioned off-center. The mood is somber and atmospheric due to the fog."
}

步骤 2:检索相关知识

{
  "retrieved_chunks": [
    {
      "source": "composition_guide.pdf",
      "text": "The Rule of Thirds is a fundamental principle. By placing key elements along the lines or at their intersections, you create a more balanced and engaging photograph than simply centering the subject."
    },
    {
      "source": "landscape_photography.pdf",
      "text": "Atmospheric conditions like fog or mist can be powerful tools for creating mood and a sense of depth. Use them to obscure parts of the scene, adding mystery and drawing focus to your primary subject."
    }
  ]
}

步骤 3:构造增强提示并生成点评

{
  "system_prompt": "You are an expert, objective photography critic. Your goal is to provide honest, professional feedback to help a photographer improve their craft.",
  "context": [
    "The Rule of Thirds is a fundamental principle. By placing key elements along the lines or at their intersections, you create a more balanced and engaging photograph than simply centering the subject.",
    "Atmospheric conditions like fog or mist can be powerful tools for creating mood and a sense of depth. Use them to obscure parts of the scene, adding mystery and drawing focus to your primary subject."
  ],
  "image_description": "A vertical photograph of the Golden Gate Bridge, partially obscured by a thick layer of fog. The iconic red-orange tower on the right is prominent, rising above the low-hanging clouds. The composition follows the rule of thirds, with the main tower positioned off-center.",
  "instructions": "Generate a critique formatted in Markdown. Structure your response with two main headings: '## What Works Well' and '## Areas for Improvement'."
}

工程化保障

成本与滥用控制

使用 Upstash Redis 做按 IP 限流,避免恶意调用和成本失控。

上传校验

提前校验图片类型与大小,减少无效请求和超时。

异常兜底

  • 视觉模型失败:返回友好错误并支持重试
  • Pinecone 不可用:降级为无检索模式,保证核心流程可用

收获

  • RAG 在“领域知识 + 持续更新”场景下比微调更实用
  • Prompt 设计对输出质量影响极大
  • 多模态链路需要端到端监控和清晰降级策略