🧠 阿头学 · 🪞 Uota学 · 💬 讨论题

GPT-5.5 提示词指南：从过程控制转向结果优先

这份指南最有价值的判断是：GPT-5.5 时代，提示词应少管过程、多定义结果，但它把平台内经验包装成通用最佳实践，证据力度明显不够。
打开原文 ↗

2026-04-30 原文链接 ↗

阅读简报

双语对照

完整翻译

原文

讨论归档

核心观点

结果优先比过程堆叠更有效 文档明确主张，GPT-5.5 在“目标、成功标准、约束、证据、停止条件”清楚时表现最好，这个判断大体站得住，因为它抓住了现代模型更擅长自主选路径的事实。
旧 prompt 资产可能已经变成负债 文档反复强调不要照搬旧时代的细粒度 prompt stack，这个判断是反直觉但重要的；如果仍把每一步写死，模型搜索空间会被压缩，回答会更机械，这对升级模型的团队是现实风险。
真正可落地的是停止条件与检索预算 相比“更短更好”这种容易被误读的口号，文档里最扎实的部分其实是“何时停止搜索、何时追加证据、证据不足时如何处理”，因为这直接影响准确性、成本和延迟。
提示词正在从文案技巧变成运行策略 文档把 personality、phase、commentary、verification、retrieval budget 统一纳入 prompt 设计，这个判断很关键：prompt 不再只是让模型“会说”，而是在规定 agent “怎么运行”。
这不是中立方法论，而是带生态导向的操作手册 文档大量围绕 Responses API、assistant item、phase、Codex 迁移展开，这说明它首先服务于 OpenAI 自家工作流；把它当普适真理会高估其外部泛化能力。

跟我们的关联

对 ATou 意味着什么、下一步怎么用 ATou 如果在做 prompt 体系升级，最该做的不是继续加规则，而是删掉历史噪音，重写成 Goal / Success criteria / Constraints / Stop rules 结构；下一步应挑 3 个高频场景做 A/B 测试，而不是全量迁移。
对 Neta 意味着什么、下一步怎么用 Neta 如果关注知识工作流，这篇最值得吸收的是“检索预算”而不是“更短提示词”；下一步应用在研究、摘要、问答链路上，明确什么情况下必须继续查、什么情况下够证据就停。
对 Uota 意味着什么、下一步怎么用 Uota 如果重视用户体验，这篇关于 commentary 和前言的判断很实用：用户感知速度常常比真实速度更重要；下一步可以把“先给一句进度更新”接入流式或多工具任务，直接优化首屏体验。
对 ATou/Neta/Uota 共同意味着什么、下一步怎么用 三者都应警惕把“少写过程”误读成“少写约束”；下一步应把硬性安全规则、证据规则、失败回退、输出边界保留下来，只删那些为了旧模型而写的僵硬流程控制。

讨论引子

1. “结果优先”在哪些场景是真升级，在哪些高风险场景反而会失控？ 2. 检索预算和早停规则会提升效率，但会不会系统性提高漏检率？ 3. 平台方的最佳实践文档，应该默认信到什么程度，必须自己补哪些验证？

最新：GPT-5.5 GPT-5.5 提示词指南 GPT-5.5 GPT-5.4 GPT-5.3 Codex GPT-5.2 GPT-5.1 GPT-4.1 GPT-5.5 提示词指南

GPT-5.5 相比 GPT-5.4 的新变化更短、结果优先的提示词通常比过程堆叠很重的提示词效果更好。更高效的推理意味着，在升级到更高强度之前，应先重新评估 low 和 medium effort 是否已经足够。对于大量使用工具的 Responses 工作流，前言、phase 处理以及 assistant item 回放仍然很重要。明确的人格设定、检索预算和验证规则，有助于塑造面向客户和智能体式的用户体验。

GPT-5.5 在提示词明确结果目标，同时给模型留出选择高效解法路径的空间时效果最好。与更早的模型相比，你通常可以使用更短、更偏结果导向的提示词。描述清楚什么样算好，哪些约束重要，有哪些证据可用，以及最终答案应包含什么。

不要把旧提示词堆栈里的每条指令都照搬过来。遗留提示词往往会把过程规定得过细，因为早期模型更需要外部约束来保持方向。到了 GPT-5.5，这样做反而可能带来噪音，缩小模型的搜索空间，或导致回答过于机械。

如果想进一步了解 GPT-5.5 的行为变化，先从 Using GPT-5.5 指南看起。本指南关注的是由这些行为变化引出的提示词调整。

这里的模式只是起点。要根据你的产品界面、工具、评测和用户体验目标进行调整。

使用 Codex 自动迁移

Codex 可以借助 OpenAI Docs Skill 实现本指南中的改动。

$openai-docs migrate this project to gpt-5.5

如果想在其他编码智能体中使用这个技能，可以从 OpenAI skills 仓库下载。

个性与行为

GPT-5.5 的默认风格高效、直接、以任务为导向。这对生产系统很有用。回答会更聚焦，行为更容易引导，模型也会避免不必要的对话填充。

对于面向客户的助手、支持工作流、辅导体验以及其他对话型产品，应该同时定义人格设定和协作风格。

人格设定控制助手说话的方式，包括语气、温度、直接程度、正式度、幽默感、共情能力和润色程度。协作风格控制助手如何做事，包括何时提问、何时做假设、应当多主动、给出多少上下文、何时检查工作，以及如何处理不确定性或风险。

这两部分都应保持简短。人格设定应塑造用户体验。协作说明应塑造任务行为。两者都不应取代清晰的目标、成功标准、工具规则或停止条件。

一个适用于稳定、任务导向型助手的人格设定示例：

Personality

你是一位有能力的协作者，亲切、稳定、直接。默认假设用户有能力且出于善意，并以耐心、尊重和切实有用的方式回应。

当请求已经足够清楚、可以开始尝试时，优先推动进展，而不是停下来寻求澄清。利用上下文和合理假设继续推进。只有当缺失的信息会实质性改变答案或带来明显风险时，才提出澄清，而且问题要尽量收窄。

保持简洁，但不要显得生硬。提供足够的上下文，让用户能够理解并信任答案，然后就停止。必要时使用示例、对比或简单类比，帮助对方更容易抓住重点。纠正用户或表达不同意见时，要坦率但有建设性。有人指出错误时，直接承认，把重点放在修正上。

在专业边界内匹配用户的语气。默认避免使用表情符号和粗口，除非用户明确要求这种风格，或者对话中已经清楚建立了这种风格是合适的。

一个适用于更有表现力、协作型助手的人格设定示例：

Personality

采用鲜活的对话存在感。聪明、好奇、在合适时带一点俏皮，并关注用户的思路。问题模糊时提出好问题，在有足够上下文之后果断起来。

保持温暖、协作和讲究。对话应轻松自然、有生命力，但不是为了聊天而聊天。不要只是镜像用户，而应提出真正的观点，同时始终响应他们的目标和约束。

当任务需要综合判断或建议时，要显得周到且立足事实。在上下文足够时给出明确建议，解释重要权衡，并指出不确定性，但不要变得闪烁其词。

对于更强调表达感的产品，可以明确加入温度、好奇心、幽默感或观点，但整段仍要保持简短。用人格设定塑造体验，而不是拿它去弥补目标不清或任务说明缺失。

用前言改善首个可见 token 的到达时间

在流式应用中，用户会注意到第一个可见响应出现前要等多久。GPT-5.5 可能会在输出可见文本之前花时间推理、规划或准备工具调用。

对于较长或大量使用工具的任务，可以提示模型先给出一小段前言。也就是一条简短、可见的进度更新，用来确认请求并说明第一步。这可以提升用户感知到的响应速度，而不改变底层任务本身。

当任务可能需要多于一步、需要调用工具，或涉及长时间运行的智能体工作流时，可以使用这种模式。

对于多步骤任务，在进行任何工具调用之前，先发送一条用户可见的简短更新，确认请求并说明第一步。控制在一到两句话内。

对于区分不同消息阶段的编码智能体，你可以写得更明确：

如果任务需要调用工具，那么在 analysis channel 中输出任何内容之前，必须先给出一条中间更新。给用户的这条更新应确认请求，并说明你的第一步。结果优先的提示词与停止条件

当提示词定义了目标结果、成功标准、约束和可用上下文，然后让模型自己选择路径时，GPT-5.5 的表现最强。

对于很多任务，描述终点而不是每一步。这会给模型留出空间，让它为任务选择合适的搜索、工具或推理策略。

优先使用这种写法：

端到端解决客户的问题。

成功意味着： - 基于现有政策和账户数据做出资格判断 - 在回复前完成任何允许执行的操作 - 最终答案包含 completed_actions、customer_message 和 blockers - 如果证据缺失，只询问最小缺失字段

避免不必要的绝对规则。旧提示词常用诸如 ALWAYS、NEVER、must 和 only 之类的严格指令来控制模型行为。只有在真正不可变的场景下才使用这些词，比如安全规则、必需的输出字段，或绝不能发生的动作。对于需要判断的场景，比如何时搜索、何时请求澄清、何时使用工具或继续迭代，更适合使用决策规则。

除非每一步都确实必不可少，否则避免这种风格的指令：

先检查 A，再检查 B，然后比较每一个字段，再思考所有可能的例外情况，再决定该调用哪个工具，再调用工具，然后向用户解释整个过程。

加入明确的停止条件：

用尽可能少但仍然有用的工具循环解决用户问题，但不要让减少循环次数压过正确性、可访问的备用证据、计算过程，或事实性主张所需的引用标记。

在每次得到结果后，问自己，Can I answer the user's core request now with useful evidence and citations for the factual claims? 如果可以，就回答。

定义证据缺失时的行为：

使用足以正确作答的最少证据，精确引用，然后停止。格式

GPT-5.5 在输出格式和结构上非常容易引导。当这样做能提升理解效果或更贴合产品形态时，就应利用这一点。

设置 text.verbosity，说明期望的输出形态，并把更重的结构留给那些确实能提升理解或产品界面需要稳定产物的情况。text.verbosity 的 API 默认值是 medium；如果你希望回答更短、更简洁，可以使用 low。

纯对话式格式：

让格式服务于理解。普通对话、解释、报告、文档和技术写作，默认使用自然段落的朴素格式。让呈现保持干净易读，不要让结构显得比内容本身更沉重。

谨慎使用标题、粗体、项目符号和编号列表。只有在用户明确要求、答案需要清晰比较或排序，或信息写成散文会更难浏览时才使用。除此之外，优先采用短段落和自然过渡。

尊重用户的格式偏好。如果他们要求简短回答、极简格式、不要项目符号、不要标题，或指定某种结构，就按那个偏好执行，除非有充分理由不这样做。

加入明确的受众和长度说明：

面向高级商务受众写作。答案控制在 400 词以内。使用短段落，只有在确实有助于浏览时才使用项目符号。优先给出结论，然后是推理，再然后是注意事项。

对于编辑、改写、摘要或面向客户的消息，在要求模型改善风格之前，先说明哪些内容必须保留。当你希望润色但不希望内容扩张时，这种模式尤其有用。

优先保留所要求的成品形态、长度、结构和文体。悄悄改善清晰度、流畅度和正确性。除非明确要求，否则不要添加新主张、额外章节，或更强的宣传口吻。依据、引用与检索预算

对于基于依据的回答，引用行为应成为提示词的一部分。定义哪些内容需要支持，什么算足够证据，以及证据缺失时模型应如何表现。没有证据不应自动变成事实性的 no。更多细节和示例，见 citation formatting guide。

加入明确的检索预算

检索预算就是搜索的停止规则。它告诉模型，什么时候证据已经足够。

对于普通问答，先用简短但有区分度的关键词做一次宽泛搜索。如果最顶部的结果已经为核心问题提供了足够、可引用的支持，就直接基于这些结果回答，而不是再次搜索。

只有在以下情况下才进行另一轮检索调用： - 顶部结果没有回答核心问题。 - 缺少必需的事实、参数、负责人、日期、ID 或来源。 - 用户要求穷尽式覆盖、比较或完整列表。 - 必须读取特定文档、URL、邮件、会议、记录或代码产物。 - 否则答案中会包含重要但没有依据支持的事实性主张。

不要为了润色措辞、添加示例、给非关键细节补引用，或给本可以安全写得更泛化的表述补支持，而再次搜索。创意写作护栏

对于写作任务，要告诉模型哪些主张必须来自来源，哪些部分可以发挥创意来写。这对幻灯片、发布文案、客户总结、讲稿、领导简介和叙事包装尤为重要。

对于幻灯片、领导简介、外联文案、用于分享的摘要、讲稿或叙事包装等创意或生成式请求，要区分哪些是有来源支撑的事实，哪些只是创意表达。

具体的产品、客户、指标、路线图、日期、能力和竞争性主张，应使用检索到或已提供的事实，并对这些主张加上引用。
不要为了让文案显得更强，就编造具体名称、第一方数据主张、指标、路线图状态、客户成果或产品能力。
如果几乎没有或完全没有可引用的支持，就写一个有用但泛化的草稿，使用占位符或明确标注的假设，而不是加入没有根据的细节。前端工程与视觉品味

对于前端工作，可参考示例说明，了解如何在实践中引导界面质量。内容包括产品和用户上下文、设计系统一致性、首屏可用性、熟悉的控件、预期状态、响应式行为，以及应避免的常见生成式界面默认模式，比如通用英雄区、嵌套卡片、装饰性渐变、可见的说明性文字和损坏的布局。

提示模型检查自己的工作

当可以进行验证时，给 GPT-5.5 提供能够检查输出的工具。

对于编码智能体，要求给出具体的验证命令：

完成修改后，运行最相关的可用验证： - 针对变更行为的定向单元测试 - 适用时进行类型检查或 lint 检查 - 针对受影响包的构建检查 - 当完整验证成本过高时，执行最小化冒烟测试

如果无法运行验证，解释原因，并说明次优但最接近的检查方式。

对于可视化产物，要求在渲染后检查：

在定稿前先渲染产物。检查渲染输出中的布局、裁切、间距、内容缺失和视觉一致性。反复修订，直到渲染结果符合要求。

对于工程和规划任务，让实施计划可追踪：

对于实施计划，包含以下内容： - 需求以及每项需求在哪里得到处理 - 涉及的具名资源、文件、API 或系统 - 相关的状态转换或数据流 - 验证命令或检查方式 - 失败时的行为 - 隐私与安全考量 - 会实质影响实施的开放问题 Phase 参数

从 GPT-5.4 开始，长时间运行或大量使用工具的 Responses 工作流可以使用 assistant item 的 phase 值来区分中间更新和最终答案。GPT-5.5 使用同样的模式。

如果你使用 previous_response_id，API 会自动保留此前的 assistant 状态。如果你的应用会手动把 assistant 输出项回放到下一个请求中，就要保留每个原始 phase 值，并在回放时原样传回。这在响应包含前言、重复工具调用，或在中间更新之后再给出最终答案时尤其重要。

如果手动回放 assistant items： - 精确保留 assistant 的 phase 值。 - 对用户可见的中间更新使用 phase: "commentary"。 - 对完成后的答案使用 phase: "final_answer"。 - 不要给用户消息添加 phase。建议的提示词结构

将以下结构作为复杂提示词的起点。每个部分保持简短。只有在确实会改变行为时才补充细节。

Latest: GPT-5.5 GPT-5.5 prompting guide Improve time to first visible token with a preamble Outcome-first prompts and stopping conditions Grounding, citations, and retrieval budgets GPT-5.5 GPT-5.4 GPT-5.3 Codex GPT-5.2 GPT-5.1 GPT-4.1 GPT-5.5 prompting guide

New in GPT-5.5 vs GPT-5.4 Shorter, outcome-first prompts usually work better than process-heavy prompt stacks. More efficient reasoning means low and medium effort should be re-evaluated before escalating. Preambles, phase handling, and assistant-item replay remain important for tool-heavy Responses workflows. Explicit personality, retrieval budgets, and validation rules help shape customer-facing and agentic UX.

GPT-5.5 works best when prompts define the outcome and leave room for the model to choose an efficient solution path. Compared with earlier models, you can often use shorter, more outcome-oriented prompts: describe what good looks like, what constraints matter, what evidence is available, and what the final answer should contain.

Avoid carrying over every instruction from an older prompt stack. Legacy prompts often over-specify the process because earlier models needed more help staying on track. With GPT-5.5, that can add noise, narrow the model’s search space, or lead to overly mechanical answers.

For more detail on GPT-5.5 behavior changes, start with the Using GPT-5.5 guide. This guide focuses on prompt changes that follow from those behavior changes.

The patterns here are starting points. Adapt them to your product surface, tools, evals, and user experience goals.

Automated migration with Codex

Codex can implement the changes from this guide with the OpenAI Docs Skill.

$openai-docs migrate this project to gpt-5.5

To use this skill in other coding agents, download it from the OpenAI skills repository.

Personality and behavior

GPT-5.5’s default style is efficient, direct, and task-oriented. This is useful for production systems: responses stay focused, behavior is easier to steer, and the model avoids unnecessary conversational padding.

For customer-facing assistants, support workflows, coaching experiences, and other conversational products, define both personality and collaboration style.

Personality controls how the assistant sounds: tone, warmth, directness, formality, humor, empathy, and level of polish. Collaboration style controls how the assistant works: when it asks questions, when it makes assumptions, how proactive it should be, how much context it gives, when it checks work, and how it handles uncertainty or risk.

Keep both short. Personality instructions should shape the user experience. Collaboration instructions should shape task behavior. Neither should replace clear goals, success criteria, tool rules, or stopping conditions.

Example personality block for a steady task-focused assistant:

Personality

You are a capable collaborator: approachable, steady, and direct. Assume the user is competent and acting in good faith, and respond with patience, respect, and practical helpfulness.

Prefer making progress over stopping for clarification when the request is already clear enough to attempt. Use context and reasonable assumptions to move forward. Ask for clarification only when the missing information would materially change the answer or create meaningful risk, and keep any question narrow.

Stay concise without becoming curt. Give enough context for the user to understand and trust the answer, then stop. Use examples, comparisons, or simple analogies when they make the point easier to grasp. When correcting the user or disagreeing, be candid but constructive. When an error is pointed out, acknowledge it plainly and focus on fixing it.

Match the user's tone within professional bounds. Avoid emojis and profanity by default, unless the user explicitly asks for that style or has clearly established it as appropriate for the conversation.

Example personality block for an expressive collaborative assistant:

Personality

Adopt a vivid conversational presence: intelligent, curious, playful when appropriate, and attentive to the user's thinking. Ask good questions when the problem is blurry, then become decisive once there is enough context.

Be warm, collaborative, and polished. Conversation should feel easy and alive, but not chatty for its own sake. Offer a real point of view rather than merely mirroring the user, while staying responsive to their goals and constraints.

Be thoughtful and grounded when the task calls for synthesis or advice. State a clear recommendation when you have enough context, explain important tradeoffs, and name uncertainty without becoming evasive.

For more expressive products, add warmth, curiosity, humor, or point of view explicitly, but keep the block short. Use personality to shape the experience, not to compensate for unclear goals or missing task instructions.

Improve time to first visible token with a preamble

In streaming applications, users notice how long it takes before the first visible response appears. GPT-5.5 may spend time reasoning, planning, or preparing tool calls before emitting visible text.

For longer or tool-heavy tasks, prompt the model to start with a short preamble: a brief visible update that acknowledges the request and states the first step. This can improve perceived responsiveness without changing the underlying task.

Use this pattern when the task may take more than one step, require tool calls, or involve a long-running agent workflow.

Before any tool calls for a multi-step task, send a short user-visible update that acknowledges the request and states the first step. Keep it to one or two sentences.

For coding agents that expose separate message phases, you can be more explicit:

You must always start with an intermediary update before any content in the analysis channel if the task will require calling tools. The user update should acknowledge the request and explain your first step. Outcome-first prompts and stopping conditions

GPT-5.5 is strongest when the prompt defines the target outcome, success criteria, constraints, and available context, then lets the model choose the path.

For many tasks, describe the destination rather than every step. This gives the model room to choose the right search, tool, or reasoning strategy for the task.

Prefer this:

Resolve the customer's issue end to end.

Success means: - the eligibility decision is made from the available policy and account data - any allowed action is completed before responding - the final answer includes completed_actions, customer_message, and blockers - if evidence is missing, ask for the smallest missing field

Avoid unnecessary absolute rules. Older prompts often use strict instructions like ALWAYS, NEVER, must, and only to control model behavior. Use those words for true invariants, such as safety rules, required output fields, or actions that should never happen. For judgment calls, such as when to search, ask for clarification, use a tool, or keep iterating, prefer decision rules instead.

Avoid this style of instruction unless every step is truly required:

First inspect A, then inspect B, then compare every field, then think through all possible exceptions, then decide which tool to call, then call the tool, then explain the entire process to the user.

Add explicit stopping conditions:

Resolve the user query in the fewest useful tool loops, but do not let loop minimization outrank correctness, accessible fallback evidence, calculations, or required citation tags for factual claims.

After each result, ask: "Can I answer the user's core request now with useful evidence and citations for the factual claims?" If yes, answer.

Define missing-evidence behavior:

Use the minimum evidence sufficient to answer correctly, cite it precisely, then stop. Formatting

GPT-5.5 is highly steerable on output format and structure. Use that control when it improves comprehension or product fit.

Set text.verbosity, describe the expected output shape, and reserve heavier structure for cases where it improves comprehension or your product UI needs a stable artifact. The API default for text.verbosity is medium; use low when you prefer shorter, more concise responses.

Plain conversational formatting:

Let formatting serve comprehension. Use plain paragraphs as the default format for normal conversation, explanations, reports, documentation, and technical writeups. Keep the presentation clean and readable without making the structure feel heavier than the content.

Use headers, bold text, bullets, and numbered lists sparingly. Reach for them when the user requests them, when the answer needs clear comparison or ranking, or when the information would be harder to scan as prose. Otherwise, favor short paragraphs and natural transitions.

Respect formatting preferences from the user. If they ask for a terse answer, minimal formatting, no bullets, no headers, or a specific structure, follow that preference unless there is a strong reason not to.

Add explicit audience and length guidance:

Write for a senior business audience. Keep the answer under 400 words. Use short paragraphs and only include bullets when they improve scannability. Prioritize the conclusion first, then the reasoning, then caveats.

For editing, rewriting, summaries, or customer-facing messages, tell the model what to preserve before asking it to improve style. This pattern is useful when you want polish without expansion.

Preserve the requested artifact, length, structure, and genre first. Quietly improve clarity, flow, and correctness. Do not add new claims, extra sections, or a more promotional tone unless explicitly requested. Grounding, citations, and retrieval budgets

For grounded answers, citation behavior should be part of the prompt. Define what needs support, what counts as enough evidence, and how the model should behave when evidence is missing. Absence of evidence shouldn’t automatically become a factual “no.” For more details and examples, see the citation formatting guide.

Add an explicit retrieval budget

Retrieval budgets are stopping rules for search. They tell the model when enough evidence is enough.

For ordinary Q&A, start with one broad search using short, discriminative keywords. If the top results contain enough citable support for the core request, answer from those results instead of searching again.

Make another retrieval call only when: - The top results do not answer the core question. - A required fact, parameter, owner, date, ID, or source is missing. - The user asked for exhaustive coverage, a comparison, or a comprehensive list. - A specific document, URL, email, meeting, record, or code artifact must be read. - The answer would otherwise contain an important unsupported factual claim.

Do not search again to improve phrasing, add examples, cite nonessential details, or support wording that can safely be made more generic. Creative drafting guardrails

For drafting tasks, tell the model which claims must come from sources and which parts may be creatively written. This is especially important for slides, launch copy, customer summaries, talk tracks, leadership blurbs, and narrative framing.

For creative or generative requests such as slides, leadership blurbs, outbound copy, summaries for sharing, talk tracks, or narrative framing, distinguish source-backed facts from creative wording.

Use retrieved or provided facts for concrete product, customer, metric, roadmap, date, capability, and competitive claims, and cite those claims.
Do not invent specific names, first-party data claims, metrics, roadmap status, customer outcomes, or product capabilities to make the draft sound stronger.
If there is little or no citable support, write a useful generic draft with placeholders or clearly labeled assumptions rather than unsupported specifics. Frontend engineering and visual taste

For frontend work, refer to the example instructions for practical ways to steer UI quality. They cover product and user context, design-system alignment, first-screen usability, familiar controls, expected states, responsive behavior, and common generated-UI defaults to avoid, such as generic heroes, nested cards, decorative gradients, visible instructional text, and broken layouts.

Prompt the model to check its work

Give GPT-5.5 access to tools that let it check outputs when validation is possible.

For coding agents, ask for concrete validation commands:

After making changes, run the most relevant validation available: - targeted unit tests for changed behavior - type checks or lint checks when applicable - build checks for affected packages - a minimal smoke test when full validation is too expensive

If validation cannot be run, explain why and describe the next best check.

For visual artifacts, ask for inspection after rendering:

Render the artifact before finalizing. Inspect the rendered output for layout, clipping, spacing, missing content, and visual consistency. Revise until the rendered output matches the requirements.

For engineering and planning tasks, make implementation plans traceable:

For implementation plans, include: - requirements and where each is addressed - named resources, files, APIs, or systems involved - state transitions or data flow where relevant - validation commands or checks - failure behavior - privacy and security considerations - open questions that materially affect implementation Phase parameter

Starting with GPT-5.4, long-running or tool-heavy Responses workflows can use assistant-item phase values to distinguish intermediate updates from final answers. GPT-5.5 uses the same pattern.

If you use previous_response_id, the API preserves prior assistant state automatically. If your application manually replays assistant output items into the next request, preserve each original phase value and pass it back unchanged. This matters most when a response includes preambles, repeated tool calls, or a final answer after intermediate assistant updates.

If manually replaying assistant items: - Preserve assistant phase values exactly. - Use phase: "commentary" for intermediate user-visible updates. - Use phase: "final_answer" for the completed answer. - Do not add phase to user messages. Suggested prompt structure

Use this structure as a starting point for complex prompts. Keep each section short. Add detail only where it changes behavior.

Role: [1-2 sentences defining the model's function, context, and job]

Personality

[tone, demeanor, and collaboration style]

Goal

[user-visible outcome]

Success criteria

[what must be true before the final answer]

Constraints

[policy, safety, business, evidence, and side-effect limits]

Output

[sections, length, and tone]

Stop rules

[when to retry, fallback, abstain, ask, or stop]

最新：GPT-5.5 GPT-5.5 提示词指南 GPT-5.5 GPT-5.4 GPT-5.3 Codex GPT-5.2 GPT-5.1 GPT-4.1 GPT-5.5 提示词指南

如果想进一步了解 GPT-5.5 的行为变化，先从 Using GPT-5.5 指南看起。本指南关注的是由这些行为变化引出的提示词调整。

这里的模式只是起点。要根据你的产品界面、工具、评测和用户体验目标进行调整。

使用 Codex 自动迁移

Codex 可以借助 OpenAI Docs Skill 实现本指南中的改动。

$openai-docs migrate this project to gpt-5.5

如果想在其他编码智能体中使用这个技能，可以从 OpenAI skills 仓库下载。

个性与行为

GPT-5.5 的默认风格高效、直接、以任务为导向。这对生产系统很有用。回答会更聚焦，行为更容易引导，模型也会避免不必要的对话填充。

对于面向客户的助手、支持工作流、辅导体验以及其他对话型产品，应该同时定义人格设定和协作风格。

这两部分都应保持简短。人格设定应塑造用户体验。协作说明应塑造任务行为。两者都不应取代清晰的目标、成功标准、工具规则或停止条件。

一个适用于稳定、任务导向型助手的人格设定示例：

Personality

你是一位有能力的协作者，亲切、稳定、直接。默认假设用户有能力且出于善意，并以耐心、尊重和切实有用的方式回应。

在专业边界内匹配用户的语气。默认避免使用表情符号和粗口，除非用户明确要求这种风格，或者对话中已经清楚建立了这种风格是合适的。

一个适用于更有表现力、协作型助手的人格设定示例：

Personality

采用鲜活的对话存在感。聪明、好奇、在合适时带一点俏皮，并关注用户的思路。问题模糊时提出好问题，在有足够上下文之后果断起来。

当任务需要综合判断或建议时，要显得周到且立足事实。在上下文足够时给出明确建议，解释重要权衡，并指出不确定性，但不要变得闪烁其词。

用前言改善首个可见 token 的到达时间

在流式应用中，用户会注意到第一个可见响应出现前要等多久。GPT-5.5 可能会在输出可见文本之前花时间推理、规划或准备工具调用。

当任务可能需要多于一步、需要调用工具，或涉及长时间运行的智能体工作流时，可以使用这种模式。

对于多步骤任务，在进行任何工具调用之前，先发送一条用户可见的简短更新，确认请求并说明第一步。控制在一到两句话内。

对于区分不同消息阶段的编码智能体，你可以写得更明确：

当提示词定义了目标结果、成功标准、约束和可用上下文，然后让模型自己选择路径时，GPT-5.5 的表现最强。

对于很多任务，描述终点而不是每一步。这会给模型留出空间，让它为任务选择合适的搜索、工具或推理策略。

优先使用这种写法：

端到端解决客户的问题。

除非每一步都确实必不可少，否则避免这种风格的指令：

先检查 A，再检查 B，然后比较每一个字段，再思考所有可能的例外情况，再决定该调用哪个工具，再调用工具，然后向用户解释整个过程。

加入明确的停止条件：

用尽可能少但仍然有用的工具循环解决用户问题，但不要让减少循环次数压过正确性、可访问的备用证据、计算过程，或事实性主张所需的引用标记。

在每次得到结果后，问自己，Can I answer the user's core request now with useful evidence and citations for the factual claims? 如果可以，就回答。

定义证据缺失时的行为：

使用足以正确作答的最少证据，精确引用，然后停止。格式

GPT-5.5 在输出格式和结构上非常容易引导。当这样做能提升理解效果或更贴合产品形态时，就应利用这一点。

纯对话式格式：

加入明确的受众和长度说明：

加入明确的检索预算

检索预算就是搜索的停止规则。它告诉模型，什么时候证据已经足够。

不要为了润色措辞、添加示例、给非关键细节补引用，或给本可以安全写得更泛化的表述补支持，而再次搜索。创意写作护栏

对于幻灯片、领导简介、外联文案、用于分享的摘要、讲稿或叙事包装等创意或生成式请求，要区分哪些是有来源支撑的事实，哪些只是创意表达。

具体的产品、客户、指标、路线图、日期、能力和竞争性主张，应使用检索到或已提供的事实，并对这些主张加上引用。
不要为了让文案显得更强，就编造具体名称、第一方数据主张、指标、路线图状态、客户成果或产品能力。
如果几乎没有或完全没有可引用的支持，就写一个有用但泛化的草稿，使用占位符或明确标注的假设，而不是加入没有根据的细节。前端工程与视觉品味

提示模型检查自己的工作

当可以进行验证时，给 GPT-5.5 提供能够检查输出的工具。

对于编码智能体，要求给出具体的验证命令：

如果无法运行验证，解释原因，并说明次优但最接近的检查方式。

对于可视化产物，要求在渲染后检查：

在定稿前先渲染产物。检查渲染输出中的布局、裁切、间距、内容缺失和视觉一致性。反复修订，直到渲染结果符合要求。

对于工程和规划任务，让实施计划可追踪：

从 GPT-5.4 开始，长时间运行或大量使用工具的 Responses 工作流可以使用 assistant item 的 phase 值来区分中间更新和最终答案。GPT-5.5 使用同样的模式。

将以下结构作为复杂提示词的起点。每个部分保持简短。只有在确实会改变行为时才补充细节。

For more detail on GPT-5.5 behavior changes, start with the Using GPT-5.5 guide. This guide focuses on prompt changes that follow from those behavior changes.

The patterns here are starting points. Adapt them to your product surface, tools, evals, and user experience goals.

Automated migration with Codex

Codex can implement the changes from this guide with the OpenAI Docs Skill.

$openai-docs migrate this project to gpt-5.5

To use this skill in other coding agents, download it from the OpenAI skills repository.

Personality and behavior

For customer-facing assistants, support workflows, coaching experiences, and other conversational products, define both personality and collaboration style.

Example personality block for a steady task-focused assistant:

Personality

You are a capable collaborator: approachable, steady, and direct. Assume the user is competent and acting in good faith, and respond with patience, respect, and practical helpfulness.

Example personality block for an expressive collaborative assistant:

Personality

Improve time to first visible token with a preamble

In streaming applications, users notice how long it takes before the first visible response appears. GPT-5.5 may spend time reasoning, planning, or preparing tool calls before emitting visible text.

Use this pattern when the task may take more than one step, require tool calls, or involve a long-running agent workflow.

Before any tool calls for a multi-step task, send a short user-visible update that acknowledges the request and states the first step. Keep it to one or two sentences.

For coding agents that expose separate message phases, you can be more explicit:

GPT-5.5 is strongest when the prompt defines the target outcome, success criteria, constraints, and available context, then lets the model choose the path.

For many tasks, describe the destination rather than every step. This gives the model room to choose the right search, tool, or reasoning strategy for the task.

Prefer this:

Resolve the customer's issue end to end.

Avoid this style of instruction unless every step is truly required:

First inspect A, then inspect B, then compare every field, then think through all possible exceptions, then decide which tool to call, then call the tool, then explain the entire process to the user.

Add explicit stopping conditions:

Resolve the user query in the fewest useful tool loops, but do not let loop minimization outrank correctness, accessible fallback evidence, calculations, or required citation tags for factual claims.

After each result, ask: "Can I answer the user's core request now with useful evidence and citations for the factual claims?" If yes, answer.

Define missing-evidence behavior:

Use the minimum evidence sufficient to answer correctly, cite it precisely, then stop. Formatting

GPT-5.5 is highly steerable on output format and structure. Use that control when it improves comprehension or product fit.

Plain conversational formatting:

Add explicit audience and length guidance:

For editing, rewriting, summaries, or customer-facing messages, tell the model what to preserve before asking it to improve style. This pattern is useful when you want polish without expansion.

Add an explicit retrieval budget

Retrieval budgets are stopping rules for search. They tell the model when enough evidence is enough.

Do not search again to improve phrasing, add examples, cite nonessential details, or support wording that can safely be made more generic. Creative drafting guardrails

For creative or generative requests such as slides, leadership blurbs, outbound copy, summaries for sharing, talk tracks, or narrative framing, distinguish source-backed facts from creative wording.

Use retrieved or provided facts for concrete product, customer, metric, roadmap, date, capability, and competitive claims, and cite those claims.
Do not invent specific names, first-party data claims, metrics, roadmap status, customer outcomes, or product capabilities to make the draft sound stronger.
If there is little or no citable support, write a useful generic draft with placeholders or clearly labeled assumptions rather than unsupported specifics. Frontend engineering and visual taste

Prompt the model to check its work

Give GPT-5.5 access to tools that let it check outputs when validation is possible.

For coding agents, ask for concrete validation commands:

If validation cannot be run, explain why and describe the next best check.

For visual artifacts, ask for inspection after rendering:

Render the artifact before finalizing. Inspect the rendered output for layout, clipping, spacing, missing content, and visual consistency. Revise until the rendered output matches the requirements.

For engineering and planning tasks, make implementation plans traceable:

Starting with GPT-5.4, long-running or tool-heavy Responses workflows can use assistant-item phase values to distinguish intermediate updates from final answers. GPT-5.5 uses the same pattern.

Use this structure as a starting point for complex prompts. Keep each section short. Add detail only where it changes behavior.

Role: [1-2 sentences defining the model's function, context, and job]

Personality

[tone, demeanor, and collaboration style]

Goal

[user-visible outcome]

Success criteria

[what must be true before the final answer]

Constraints

[policy, safety, business, evidence, and side-effect limits]

Output

[sections, length, and tone]

Stop rules

[when to retry, fallback, abstain, ask, or stop]

📋 讨论归档

讨论进行中…