返回列表
🧠 阿头学 · 💬 讨论题

Codex 正从代码助手变成工作流操作系统

这篇文章最有价值的判断是:Codex 的真正增量不在“更会写代码”,而在“把跨工具、跨时间的工作流持续推进起来”,但它明显带着产品布道色彩,并刻意淡化了安全、成本和失控风险。
打开原文 ↗

2026-05-21 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • 持续性比聪明更重要 文章最站得住脚的部分是把持久线程、转向、排队、线程自动化和 Goals 串成一套“持续推进机制”,这比单次对话式助手更接近真实工作,因为真实任务本来就会被打断、接力和反复修正。
  • Agent 的边界正在从代码外溢到全电脑工作 作者的核心判断是对的:当浏览器、桌面 GUI、Slack、Gmail、Calendar、MCP 和文档审查都接进来后,Codex 的竞争对象就不再只是代码补全工具,而是更广义的工作流系统。
  • Goals 必须绑定验证器,这一点非常硬 “没有验证的雄心只是愿望”是全文最扎实的原则,因为测试、基准、复现用例和端到端流程,确实是目前约束 Agent 胡跑的最有效办法;没有验证器的长期目标,大概率只会制造幻觉和返工。
  • 共享记忆应外部化,而不是迷信模型记忆 把长期上下文写进 vault、文件夹和 AGENTS.md,而不是只留在聊天记录里,这个判断很务实,因为可审查、可迁移、可同步的显式记忆,明显比黑箱式“模型记住了”更可靠。
  • 文章对风险的回避很严重 作者一边鼓吹后台自动跑 Slack、Gmail、桌面和浏览器,一边几乎不谈权限隔离、误操作、隐私泄露、错误累积和维护成本,这不是小遗漏,而是产品宣传中最关键的盲区。

跟我们的关联

  • 对 ATou 意味着什么 ATou 如果在搭自己的 Agent 工作台,这篇文章提供的不是功能清单,而是产品骨架:持久线程承载上下文,转向/排队处理过程控制,Goals+验证器负责结果收敛;下一步应该先设计“状态持续存在”的任务系统,而不是继续堆单轮问答能力。
  • 对 Neta 意味着什么 Neta 若关注长期协作和知识沉淀,这篇文章证明“共享记忆外部化”是必要基础;下一步可以先试一个最小 vault 结构,把 TODO、项目状态、关键人和阻塞项写成显式文件,而不是继续依赖聊天历史。
  • 对 Uota 意味着什么 Uota 如果在研究 AI 如何真正接管日常工作,这篇文章说明未来价值点不在更花哨的生成,而在更稳定的异步推进;下一步该测试的不是“能不能做”,而是“无人值守 24 小时后会不会跑偏、会不会污染上下文、会不会误触高权限动作”。
  • 对投资判断意味着什么 这类产品的潜在市场确实比代码助手大,因为它试图吃下“知识工作操作系统”这块更肥的蛋糕;但下一步看项目时不能只看 demo 和集成数量,必须重点看验证闭环、安全隔离、权限控制和真实留存,否则很容易高估商业化成熟度。

讨论引子

1. 如果 Agent 的核心价值是“持续推进工作”,那产品的第一优先级到底应该是更强模型,还是更强状态管理与验证系统? 2. 把 Slack、Gmail、浏览器和桌面都交给 Agent 后,效率提升和安全失控之间的边界该怎么画,谁来承担责任? 3. “共享记忆写进 vault”会不会是未来个人和团队使用 Agent 的标准范式,还是只是重度用户的高维护方案?

大多数开发者最先把编码代理用在代码上,查看仓库,生成 diff,运行测试,然后发起 pull request。

这仍然是 Codex 的重心。但电脑上的很多工作,本来就已经由代码来中介,包括执行 shell 命令、浏览网页、调用 API、导出文档、响应事件,以及触发自动化。随着这些操作界面对 Codex 开放,它开始不像是狭义上的编码助手,而更像是一套让电脑工作真正跑起来的系统。

Codex 应用把这种转变具体化了。一个线程可以保留上下文、使用工具、展示产物,并且能在多轮提示之间持续推进,而不是每次对话后都重新开始。

想把 Codex 用得更充分,就要把这些能力配合起来使用:

  • 持久线程,用来保留上下文

  • 在用户仍参与其中时,使用语音、转向和排队控制任务

  • browser、computer-use、MCP servers 和 connectors,让 Codex 的动作范围不只局限于仓库

  • 在线程自动化和 Goals 的帮助下,在用户离开时继续推进工作

  • 侧边栏,用户可以在这里审查代码、文档、幻灯片和其他产物

持久线程

持久线程: 长时间运行的 Codex 线程,能够在反复使用的多个会话之间保留工作上下文。

置顶线程是一种让持久线程随手可用的方式。它们特别适合反复出现的工作流,比如:

  • 一个幕僚长线程

  • 一个发布线程

  • 一个文档审查线程

  • 一个专门负责外部监控的线程

这些是持续存在的工作空间,不是短暂聊天。Codex 可以随着时间反复回到这些线程里,保留之前的决策、偏好和工作上下文,不必每次都从零重建。

置顶线程的快捷键让这件事更实用。按下 Command-1 到 Command-9,就能直接跳进已保存的线程。

语音输入

语音输入有价值,是因为它能抓住一个想法最原始的样子,在它被压缩成工整文字之前先保留下来。

Codex 内置了语音输入。它特别适合那些说出来很自然、打字却别扭的模糊起点:

我觉得有个叫 Ben 的人好像在 Slack 里提过这件事。 细节记不清了。 你去查一下。

对于一个能搜索、收集上下文并回报结果的代理来说,这往往已经够用了。

它也很适合在任务尚未完全成形前,先进行两三分钟的思路倾倒。

转录稿也一样。一份原始会议转录,或者一段口述的计划记录,往往比一小段总结更适合作为素材,因为它保留了不确定性、重点,以及那些尚未说完的思路。

转向与排队

当语音和对活动任务的明确控制结合在一起时,它会更有用。

转向: 在当前步骤结束前,用新的指令打断一个正在进行中的 Codex 任务。

当代理走错方向,需要在完成之前及时纠正时,转向就很有用。比如在审查网站时,用户可以一边在侧边栏里标注界面,一边打断当前工作:

  • 这个做小一点

  • 这两个元素之间的间距感觉不对

  • 这段文案不对

排队: 在当前步骤完成后,给 Codex 添加后续要做的工作。

排队不一样。它不会打断当前任务,而是把下一项任务加到队列里。用户可能会说:

等这项工作做完后,把预览链接发给 Slack 里的审阅人。

转向改变的是 Codex 现在在做什么。排队改变的是下一步应该发生什么。这两者都让用户在工作展开的过程中始终贴近现场。

工具与触达范围

当一个线程具备了连续性,接下来的问题就是,它能对什么采取行动。Codex 可以一层层向外扩展:

  • 用于侧边栏内置浏览器的 $browser,在那里 Codex 可以检查并标注网页界面

  • 用于已登录浏览器状态和基于 Chrome 工作流的 @chrome

  • 用于只能通过桌面图形界面完成的工作的 @computer

$browser 适合在侧边栏里做浏览器审查。@chrome 适合依赖用户 Chrome 上下文的登录态浏览器工作。@computer 适合那些只能通过桌面图形界面存在的任务。

MCP servers 和 connectors 把同样的思路扩展到工作流的其他部分。Slack、Gmail 和 Calendar 之所以重要,是因为很多关键任务最初是以消息、收件箱条目或排期问题的形式出现的,随后才会变成代码。

Skills 让重复工作流可以复用。一个工作流一旦被证明有用,就把它打包成 skill,这样 Codex 下次就能直接运行,不必从头重新学一遍。

随时随地继续工作

Codex 移动应用改变了用户必须坐在桌前的时机。一个任务可以先在 Mac 上启动,因为文件、权限和本地环境都已经在那里,然后在用户用手机查看进展时继续推进。

这在很多细小场景里都很重要。有人可以在 Codex 运行一个较长任务时离开桌前,在外面回答一个问题,批准下一步,或者在线程跑偏之前重新调整方向。本地环境还留在原地,用户不必也留在原地。

自动化

自动化会按计划运行 Codex 的工作。当一个重复性任务应该每次都从某个工作空间重新开始时,比如日报或定期仓库检查,就用定时自动化。当计划任务应该回到一个已有上下文、仍在进行中的对话时,就用线程自动化。

线程自动化: 像心跳一样按计划反复唤醒,回到同一个 Codex 线程继续工作的机制。

置顶线程很有用,但它们仍然要等用户回来。线程自动化则可以每隔几分钟或几小时检查一次,直到满足某个条件为止,并且还能随着时间调整频率。

一个幕僚长线程可以每 30 分钟运行一次:

每 30 分钟检查一次 Slack 和 Gmail,看看有没有还没回复、需要我处理的消息。 帮我判断什么最值得优先处理。 如果有人问了我一个问题,就尽你所能深入研究答案,并替我起草回复,但不要发送。

等用户回来时,最费时间的上下文收集工作往往已经完成了。真正决定发什么的人,仍然是人。

线程自动化也很适合反馈回路。它可以观察 pull request 评论、Google Docs 评论或 Slack 回复,在用户离开时继续推动周边工作向前走。

想象一个动画工作流,审阅人在 Slack 里分享了一段视频。线程自动化可以按计划检查这个线程,在收到评论后渲染更新版本,并在同一个线程里回复并提及审阅人。如果某个集成无法完成最后上传,桌面自动化还可以通过图形界面补完这一步。

整个闭环会跨越 Slack 反馈、代码库渲染,以及桌面自动化完成最终上传。

Goals

当任务有一个真实的终点,而且代理可以持续朝它推进时,Goals 的威力最大。一个弱目标是:

Goals: 带有终点线、代理能够持续朝其推进的长期 Codex 任务。

把这个 Markdown 文件里的计划实现出来。

更强的目标会有一个可衡量的成功标准。

比如,一个工程师可能会把一个内部工具从 Python 迁移到 Rust。他先搭好新的目录,再定义目标,并把终点说清楚,新实现只有在单元测试通过后,才算完成。

一个 goal 把持续执行和验证器结合在一起。用户定义结果、停止条件,以及那个能表明 Codex 是否更接近目标的信号。

有用的验证器包括:

  • 一套测试

  • 一个基准测试

  • 一个 bug 复现

  • 一个验证矩阵

  • 一条必须持续通过的端到端工作流

有雄心当然重要,但没有验证,它就只是愿望。

侧边栏

侧边栏把工作放在生成它的对话旁边。用户不必先导出产物再切换上下文,而是可以直接在原地审查。输出可能是代码,也可能是幻灯片、PDF、浏览器页面、表格,或者过程中生成的其他产物。

它尤其适合四件事:

  1. 检查产物

  2. 标注需要修改的地方

  3. 操作网页界面

  4. 审查变更

侧边栏允许用户直接审查 Markdown、电子表格、数据表、文档和幻灯片。他们可以在不中断工作闭环的情况下查看、标记和修改这些产物。

https://developers.openai.com/codex/app/browser

幻灯片或 PDF 可以一直开在生成它的线程旁边,方便直接审查和修正。

内置浏览器让 Codex 可以检查一个已经渲染出来的页面、控制它,并且直接响应审查界面上的标注。页面或产物上的评论会留在工作闭环里,而不会变成一次额外的交接。

网页同时成了输出界面和控制界面。Codex 可以构建一个产物,在侧边栏里打开它,检查它,调试它,并继续在原地完善同一个对象。

https://developers.openai.com/codex/app/chrome-extension

这些界面尤其好用:

  • index.html,适合轻量静态产物

  • Storybook,适合 UI 审查

  • Remotion Studio,适合程序化动画

  • 基于浏览器的幻灯片,适合做演示

  • 数据应用,适合分析工作流

单个 index.html 文件就能成为一个持久的交互式产物,而且不需要服务器。线程自动化还可以随着时间刷新这些静态产物,这样用户回来时,线程里已经有新的内容在等着。

共享记忆

当长时间运行的线程可以共享某种不局限于单一对话的记忆时,它们会更有用。

共享记忆: 存放在线程之外、可持续存在的上下文,让未来的工作可以从某种明确且可审查的基础上继续。

一种稳固的模式,是把持久线程锚定在一个 Obsidian vault 里。实际做法通常就是一个由普通文件组成的文件夹,便于查看、编辑、移动,也适合长期保留。团队可以把这个文件夹放在云存储、Git、Dropbox、Google Drive,或者任何适合自己工作流的同步层里。

一个 vault 可能长这样:

vault/ ├── TODO.md ├── people/ ├── projects/ ├── agent/ └── notes/

在顶层,AGENTS.md 可以定义 Codex 应该怎样随着对人物、项目、决策和未闭环事项了解得更多,而去更新这个工作空间。

不要照搬某一种固定的 vault 结构。要教会代理,持久上下文应该放在哪里,哪些上下文值得保留,以及什么时候不该制造无谓变动。

一个实用的 AGENTS.md 可以这样写:

  • ~/vault 视为持久工作记忆。
  • 优先维护规范笔记,不要让笔记无序蔓延。
  • 明确路由 TODO、人物、项目、每日总结和草稿笔记。
  • 保留决策、阻塞项、负责人、日期和有用链接。
  • 如果没有真正有意义的变化,就不要反复改动 vault。

仓库存放代码。vault 存放滚动中的上下文,参与的人、发生了什么变化、哪里被卡住了、哪些事需要跟进,以及那些原本会在会话之间消失的内容。

重要的上下文不应该只存在于对话转录里。把它写在某个地方,这样下一个线程才能接着捡起来。

Codex 在 Settings > Personalization > Memories 里也有第一方记忆功能。它们提供的是一层本地召回能力,用于偏好、重复工作流和已知陷阱。它们是对明确书面上下文的补充,而不是替代。Chronicle 也在朝同一个方向推进,它帮助 Codex 从最近的屏幕上下文中建立记忆。

从代码向外扩展

Codex 依然从代码出发。但代码周围越来越多的工作,如今都能通过同一套系统触达,MCP servers、浏览器界面、桌面控制、线程自动化,以及可审查的产物。

这改变了控制模型。转向会打断正在进行的工作。排队会把下一项任务排进来。线程自动化会在用户离开时让线程继续活跃。Goals 则提供一条明确的终点线,让 Codex 能持续朝那里推进。

现在,即使工作离开了仓库,Codex 也能把一个工作流从指令一路带到执行,再带到产物审查。

Most developers first use coding agents for code: inspect a repository, make a diff, run tests, and open a pull request.

大多数开发者最先把编码代理用在代码上,查看仓库,生成 diff,运行测试,然后发起 pull request。

That’s still the center of gravity for Codex. But much of the work on a computer is already mediated by code: executing shell commands, browsing web pages, calling APIs, exporting documents, responding to events, and triggering automations. As those surfaces become available to Codex, it starts to feel less like a coding assistant in the narrow sense and more like a system for getting computer work done.

这仍然是 Codex 的重心。但电脑上的很多工作,本来就已经由代码来中介,包括执行 shell 命令、浏览网页、调用 API、导出文档、响应事件,以及触发自动化。随着这些操作界面对 Codex 开放,它开始不像是狭义上的编码助手,而更像是一套让电脑工作真正跑起来的系统。

The Codex app makes that shift concrete. A thread can keep context, use tools, surface artifacts, and continue across prompts instead of resetting after each exchange.

Codex 应用把这种转变具体化了。一个线程可以保留上下文、使用工具、展示产物,并且能在多轮提示之间持续推进,而不是每次对话后都重新开始。

Getting more out of Codex means using these capabilities together:

想把 Codex 用得更充分,就要把这些能力配合起来使用:

  • durable threads that preserve context
  • 持久线程,用来保留上下文
  • voice, steering, and queuing while the user is still in the loop
  • 在用户仍参与其中时,使用语音、转向和排队控制任务
  • browser, computer-use, MCP servers, and connectors that let Codex act beyond a repo
  • browser、computer-use、MCP servers 和 connectors,让 Codex 的动作范围不只局限于仓库
  • thread automations and Goals that continue the work while the user is away
  • 在线程自动化和 Goals 的帮助下,在用户离开时继续推进工作
  • the side panel, where users can review code, documents, decks, and other artifacts
  • 侧边栏,用户可以在这里审查代码、文档、幻灯片和其他产物

Durable threads

持久线程

Durable threads: Long-running Codex threads that preserve working context across repeated sessions.

持久线程: 长时间运行的 Codex 线程,能够在反复使用的多个会话之间保留工作上下文。

Pinned threads are one way to keep durable threads close at hand. They’re useful for recurring work streams such as:

置顶线程是一种让持久线程随手可用的方式。它们特别适合反复出现的工作流,比如:

  • a Chief of Staff thread
  • 一个幕僚长线程
  • a release thread
  • 一个发布线程
  • a documentation review thread
  • 一个文档审查线程
  • a thread dedicated to external monitoring
  • 一个专门负责外部监控的线程

These are persistent workspaces, not short chats. Codex can revisit them over time, preserving prior decisions, preferences, and working context that would otherwise need to be rebuilt from scratch.

这些是持续存在的工作空间,不是短暂聊天。Codex 可以随着时间反复回到这些线程里,保留之前的决策、偏好和工作上下文,不必每次都从零重建。

Pinned-thread shortcuts make this practical. Command-1 through Command-9 jump directly into saved threads.

置顶线程的快捷键让这件事更实用。按下 Command-1 到 Command-9,就能直接跳进已保存的线程。

Voice input

语音输入

Voice input is valuable because it captures the rough version of a thought before it’s compressed into polished prose.

语音输入有价值,是因为它能抓住一个想法最原始的样子,在它被压缩成工整文字之前先保留下来。

Codex has built-in voice input. It works especially well for vague starting points that are natural to say but awkward to type:

Codex 内置了语音输入。它特别适合那些说出来很自然、打字却别扭的模糊起点:

I think someone named Ben mentioned this in Slack. I do not remember the details. Please go look.

我觉得有个叫 Ben 的人好像在 Slack 里提过这件事。 细节记不清了。 你去查一下。

For an agent that can search, gather context, and report back, that’s often enough.

对于一个能搜索、收集上下文并回报结果的代理来说,这往往已经够用了。

It also works well for a two- or three-minute thought dump before the task is fully formed.

它也很适合在任务尚未完全成形前,先进行两三分钟的思路倾倒。

Transcripts work the same way. A raw meeting transcript or dictated planning note often provides better source material than a short summary because it preserves uncertainty, emphasis, and unfinished lines of thought.

转录稿也一样。一份原始会议转录,或者一段口述的计划记录,往往比一小段总结更适合作为素材,因为它保留了不确定性、重点,以及那些尚未说完的思路。

Steering and queuing

转向与排队

Voice becomes even more useful when paired with explicit control over an active task.

当语音和对活动任务的明确控制结合在一起时,它会更有用。

Steering: Interrupting an in-flight Codex task with new direction before the current step finishes.

转向: 在当前步骤结束前,用新的指令打断一个正在进行中的 Codex 任务。

Steering is useful when the agent is heading the wrong way and needs a correction before it finishes. During a website review, for example, the user can interrupt the work while annotating the surface in the side panel:

当代理走错方向,需要在完成之前及时纠正时,转向就很有用。比如在审查网站时,用户可以一边在侧边栏里标注界面,一边打断当前工作:

  • make this smaller
  • 这个做小一点
  • the spacing between these two elements feels off
  • 这两个元素之间的间距感觉不对
  • this copy is wrong
  • 这段文案不对

Queuing: Adding work for Codex to do after the current step completes.

排队: 在当前步骤完成后,给 Codex 添加后续要做的工作。

Queuing is different. It doesn’t interrupt the task in progress. It adds the next task to the line. A user might say:

排队不一样。它不会打断当前任务,而是把下一项任务加到队列里。用户可能会说:

Once the work is done, send the preview link to the reviewer in Slack.

等这项工作做完后,把预览链接发给 Slack 里的审阅人。

Steering changes what Codex is doing now. Queuing changes what should happen next. Both keep the user close to the work while it’s unfolding.

转向改变的是 Codex 现在在做什么。排队改变的是下一步应该发生什么。这两者都让用户在工作展开的过程中始终贴近现场。

Tools and reach

工具与触达范围

Once a thread has continuity, the next question is what it can act on. Codex can move outward in layers:

当一个线程具备了连续性,接下来的问题就是,它能对什么采取行动。Codex 可以一层层向外扩展:

  • $browser for the in-app browser in the side panel, where Codex can inspect and annotate web surfaces
  • 用于侧边栏内置浏览器的 $browser,在那里 Codex 可以检查并标注网页界面
  • @chrome for signed-in browser state and Chrome-based workflows
  • 用于已登录浏览器状态和基于 Chrome 工作流的 @chrome
  • @computer for work that only exists through a desktop GUI
  • 用于只能通过桌面图形界面完成的工作的 @computer

$browser fits side-panel browser review. @chrome fits signed-in browser work that depends on the user’s Chrome context. @computer fits tasks that only exist through a desktop GUI.

$browser 适合在侧边栏里做浏览器审查。@chrome 适合依赖用户 Chrome 上下文的登录态浏览器工作。@computer 适合那些只能通过桌面图形界面存在的任务。

MCP servers and connectors extend the same idea into the rest of a workflow. Slack, Gmail, and Calendar matter because many important tasks first appear as messages, inbox items, or scheduling problems before they ever become code.

MCP servers 和 connectors 把同样的思路扩展到工作流的其他部分。Slack、Gmail 和 Calendar 之所以重要,是因为很多关键任务最初是以消息、收件箱条目或排期问题的形式出现的,随后才会变成代码。

Skills make repeated workflows reusable. Once a workflow proves useful, package it as a skill so Codex can run it again without relearning the routine from scratch.

Skills 让重复工作流可以复用。一个工作流一旦被证明有用,就把它打包成 skill,这样 Codex 下次就能直接运行,不必从头重新学一遍。

Work from anywhere

随时随地继续工作

The Codex mobile app changes when the user has to be at the desk. A task can start on a Mac where the files, permissions, and local setup already live, then continue while the user checks in from a phone.

Codex 移动应用改变了用户必须坐在桌前的时机。一个任务可以先在 Mac 上启动,因为文件、权限和本地环境都已经在那里,然后在用户用手机查看进展时继续推进。

That matters in small moments. Someone can leave the desk while Codex runs a longer task, answer a question from outside, approve the next step, or redirect the thread before they get back. The local environment stays in place; the user doesn’t have to.

这在很多细小场景里都很重要。有人可以在 Codex 运行一个较长任务时离开桌前,在外面回答一个问题,批准下一步,或者在线程跑偏之前重新调整方向。本地环境还留在原地,用户不必也留在原地。

Automations

自动化

Automations run Codex work on a schedule. Use a scheduled automation when the recurring job should start fresh from a workspace, such as a daily report or a regular repository check. Use a thread automation when the schedule should return to an active conversation with its running context.

自动化会按计划运行 Codex 的工作。当一个重复性任务应该每次都从某个工作空间重新开始时,比如日报或定期仓库检查,就用定时自动化。当计划任务应该回到一个已有上下文、仍在进行中的对话时,就用线程自动化。

Thread automations: Heartbeat-style recurring wake-up calls that return to the same Codex thread on a schedule.

线程自动化: 像心跳一样按计划反复唤醒,回到同一个 Codex 线程继续工作的机制。

Pinned threads are useful, but they still wait for the user to return. A thread automation can check on something every few minutes or every few hours, continue until it meets a condition, and adjust the cadence over time.

置顶线程很有用,但它们仍然要等用户回来。线程自动化则可以每隔几分钟或几小时检查一次,直到满足某个条件为止,并且还能随着时间调整频率。

A Chief of Staff thread might run every 30 minutes:

一个幕僚长线程可以每 30 分钟运行一次:

Every 30 minutes, check Slack and Gmail for unanswered messages that need my attention. Help me prioritize what matters most. If someone asks me a question, research the answer as deeply as you can and draft a reply for me, but do not send it.

每 30 分钟检查一次 Slack 和 Gmail,看看有没有还没回复、需要我处理的消息。 帮我判断什么最值得优先处理。 如果有人问了我一个问题,就尽你所能深入研究答案,并替我起草回复,但不要发送。

When the user returns, the expensive part of gathering context is often done. The human still decides what gets sent.

等用户回来时,最费时间的上下文收集工作往往已经完成了。真正决定发什么的人,仍然是人。

Thread automations also fit feedback loops. A thread automation can watch pull request comments, Google Docs comments, or Slack replies and keep the surrounding work moving while the user is away.

线程自动化也很适合反馈回路。它可以观察 pull request 评论、Google Docs 评论或 Slack 回复,在用户离开时继续推动周边工作向前走。

Consider an animation workflow where a reviewer shares a video in Slack. A thread automation can check the thread on a schedule, render an updated version when comments arrive, and reply in the same thread tagging the reviewer. If one integration can’t complete the final upload, desktop automation can finish the step through the GUI.

想象一个动画工作流,审阅人在 Slack 里分享了一段视频。线程自动化可以按计划检查这个线程,在收到评论后渲染更新版本,并在同一个线程里回复并提及审阅人。如果某个集成无法完成最后上传,桌面自动化还可以通过图形界面补完这一步。

The loop spans Slack for feedback, the codebase for rendering, and desktop automation for the final upload.

整个闭环会跨越 Slack 反馈、代码库渲染,以及桌面自动化完成最终上传。

Goals

Goals

Goals are most powerful when the task has a real finish line that the agent can keep pushing toward. A weak goal is:

当任务有一个真实的终点,而且代理可以持续朝它推进时,Goals 的威力最大。一个弱目标是:

Goals: Longer-running Codex tasks with a finish line the agent can keep working toward over time.

Goals: 带有终点线、代理能够持续朝其推进的长期 Codex 任务。

Implement the plan in this Markdown file.

把这个 Markdown 文件里的计划实现出来。

A stronger goal has a measurable success criterion.

更强的目标会有一个可衡量的成功标准。

For example, an engineer might migrate an internal tool from Python to Rust by setting up the new directory, defining the goal, and making the finish line explicit: the new implementation isn’t done until the unit tests pass.

比如,一个工程师可能会把一个内部工具从 Python 迁移到 Rust。他先搭好新的目录,再定义目标,并把终点说清楚,新实现只有在单元测试通过后,才算完成。

A goal combines ongoing execution with a verifier. The user defines the outcome, the stopping condition, and the signal that says whether Codex is getting closer.

一个 goal 把持续执行和验证器结合在一起。用户定义结果、停止条件,以及那个能表明 Codex 是否更接近目标的信号。

Useful verifiers include:

有用的验证器包括:

  • a test suite
  • 一套测试
  • a benchmark
  • 一个基准测试
  • a bug reproduction
  • 一个 bug 复现
  • a validation matrix
  • 一个验证矩阵
  • an end-to-end workflow that must keep passing
  • 一条必须持续通过的端到端工作流

Ambition matters, but without verification it’s just a wish.

有雄心当然重要,但没有验证,它就只是愿望。

The side panel

侧边栏

The side panel keeps the work beside the conversation that produced it. Instead of exporting an artifact and switching contexts, the user can review it in place. The output might be code, but it might also be a deck, a PDF, a browser page, a table, or another artifact created along the way.

侧边栏把工作放在生成它的对话旁边。用户不必先导出产物再切换上下文,而是可以直接在原地审查。输出可能是代码,也可能是幻灯片、PDF、浏览器页面、表格,或者过程中生成的其他产物。

It supports four jobs especially well:

它尤其适合四件事:

  1. Inspect artifacts
  1. 检查产物
  1. Annotate what needs to change
  1. 标注需要修改的地方
  1. Operate web surfaces
  1. 操作网页界面
  1. Review changes
  1. 审查变更

The side panel lets users review Markdown, spreadsheets, data tables, documents, and slides in place. They can inspect, mark up, and revise artifacts without breaking the loop.

侧边栏允许用户直接审查 Markdown、电子表格、数据表、文档和幻灯片。他们可以在不中断工作闭环的情况下查看、标记和修改这些产物。

The deck or PDF can stay open beside the thread that produced it, ready for direct review and repair.

幻灯片或 PDF 可以一直开在生成它的线程旁边,方便直接审查和修正。

The in-app browser lets Codex inspect a rendered page, control it, and respond to annotations directly on the surface under review. Comments on a page or artifact stay inside the working loop instead of becoming a separate handoff.

内置浏览器让 Codex 可以检查一个已经渲染出来的页面、控制它,并且直接响应审查界面上的标注。页面或产物上的评论会留在工作闭环里,而不会变成一次额外的交接。

The web becomes both output and control surface. Codex can build an artifact, open it in the side panel, inspect it, debug it, and keep refining the same object in place.

网页同时成了输出界面和控制界面。Codex 可以构建一个产物,在侧边栏里打开它,检查它,调试它,并继续在原地完善同一个对象。

These surfaces work especially well:

这些界面尤其好用:

  • index.html for lightweight static artifacts
  • index.html,适合轻量静态产物
  • Storybook for UI review
  • Storybook,适合 UI 审查
  • Remotion Studio for programmatic animation
  • Remotion Studio,适合程序化动画
  • browser-based slide decks for presentations
  • 基于浏览器的幻灯片,适合做演示
  • data apps for analysis workflows
  • 数据应用,适合分析工作流

A single index.html file can become a durable interactive artifact with no server required. Thread automations can also refresh static artifacts over time so a thread has something new waiting when the user returns.

单个 index.html 文件就能成为一个持久的交互式产物,而且不需要服务器。线程自动化还可以随着时间刷新这些静态产物,这样用户回来时,线程里已经有新的内容在等着。

Shared memory

共享记忆

Long-running threads become more useful when they share memory outside any one conversation.

当长时间运行的线程可以共享某种不局限于单一对话的记忆时,它们会更有用。

Shared memory: Durable context stored outside a single thread so future work can resume from something explicit and reviewable.

共享记忆: 存放在线程之外、可持续存在的上下文,让未来的工作可以从某种明确且可审查的基础上继续。

One durable pattern is to anchor persistent threads in an Obsidian vault. In practice, that means a folder of plain files that stays straightforward to inspect, edit, move, and keep for a long time. Teams can store that folder in cloud storage, Git, Dropbox, Google Drive, or another sync layer that fits their workflow.

一种稳固的模式,是把持久线程锚定在一个 Obsidian vault 里。实际做法通常就是一个由普通文件组成的文件夹,便于查看、编辑、移动,也适合长期保留。团队可以把这个文件夹放在云存储、Git、Dropbox、Google Drive,或者任何适合自己工作流的同步层里。

A vault might look like this:

一个 vault 可能长这样:

vault/ ├── TODO.md ├── people/ ├── projects/ ├── agent/ └── notes/

vault/ ├── TODO.md ├── people/ ├── projects/ ├── agent/ └── notes/

At the top level, AGENTS.md can define how Codex should update that workspace as it learns more about people, projects, decisions, and open loops.

在顶层,AGENTS.md 可以定义 Codex 应该怎样随着对人物、项目、决策和未闭环事项了解得更多,而去更新这个工作空间。

Don’t copy one exact vault structure. Teach the agent where durable context should live, what context to preserve, and when not to create churn.

不要照搬某一种固定的 vault 结构。要教会代理,持久上下文应该放在哪里,哪些上下文值得保留,以及什么时候不该制造无谓变动。

A practical AGENTS.md might say:

一个实用的 AGENTS.md 可以这样写:

  • Treat ~/vault as durable work memory.
  • Prefer canonical notes over note sprawl.
  • Route TODOs, people, projects, daily summaries, and scratch notes explicitly.
  • Preserve decisions, blockers, owners, dates, and useful links.
  • If nothing meaningful changed, do not churn the vault.
  • ~/vault 视为持久工作记忆。
  • 优先维护规范笔记,不要让笔记无序蔓延。
  • 明确路由 TODO、人物、项目、每日总结和草稿笔记。
  • 保留决策、阻塞项、负责人、日期和有用链接。
  • 如果没有真正有意义的变化,就不要反复改动 vault。

Repositories hold code. The vault holds rolling context: the people involved, what changed, what’s blocked, what needs follow-up, and what would otherwise disappear between sessions.

仓库存放代码。vault 存放滚动中的上下文,参与的人、发生了什么变化、哪里被卡住了、哪些事需要跟进,以及那些原本会在会话之间消失的内容。

Important context shouldn’t live only inside a conversation transcript. Write it down somewhere the next thread can pick back up.

重要的上下文不应该只存在于对话转录里。把它写在某个地方,这样下一个线程才能接着捡起来。

Codex also has first-party memory features in Settings > Personalization > Memories. They provide a local recall layer for preferences, recurring workflows, and known pitfalls. They complement explicit written context rather than replacing it. Chronicle pushes in the same direction by helping Codex build memory from recent screen context.

Codex 在 Settings > Personalization > Memories 里也有第一方记忆功能。它们提供的是一层本地召回能力,用于偏好、重复工作流和已知陷阱。它们是对明确书面上下文的补充,而不是替代。Chronicle 也在朝同一个方向推进,它帮助 Codex 从最近的屏幕上下文中建立记忆。

From code outward

从代码向外扩展

Codex still starts from code. But more of the work around code is now reachable through the same system: MCP servers, browser surfaces, desktop controls, thread automations, and reviewable artifacts.

Codex 依然从代码出发。但代码周围越来越多的工作,如今都能通过同一套系统触达,MCP servers、浏览器界面、桌面控制、线程自动化,以及可审查的产物。

That changes the control model. Steering interrupts the work in progress. Queuing lines up the next task. Thread automations keep a thread active when the user steps away. Goals add a concrete finish line that Codex can keep working toward.

这改变了控制模型。转向会打断正在进行的工作。排队会把下一项任务排进来。线程自动化会在用户离开时让线程继续活跃。Goals 则提供一条明确的终点线,让 Codex 能持续朝那里推进。

Codex can now carry a workflow from instruction to execution to artifact review, even when the work leaves the repo.

现在,即使工作离开了仓库,Codex 也能把一个工作流从指令一路带到执行,再带到产物审查。

Most developers first use coding agents for code: inspect a repository, make a diff, run tests, and open a pull request.

That’s still the center of gravity for Codex. But much of the work on a computer is already mediated by code: executing shell commands, browsing web pages, calling APIs, exporting documents, responding to events, and triggering automations. As those surfaces become available to Codex, it starts to feel less like a coding assistant in the narrow sense and more like a system for getting computer work done.

The Codex app makes that shift concrete. A thread can keep context, use tools, surface artifacts, and continue across prompts instead of resetting after each exchange.

Getting more out of Codex means using these capabilities together:

  • durable threads that preserve context

  • voice, steering, and queuing while the user is still in the loop

  • browser, computer-use, MCP servers, and connectors that let Codex act beyond a repo

  • thread automations and Goals that continue the work while the user is away

  • the side panel, where users can review code, documents, decks, and other artifacts

Durable threads

Durable threads: Long-running Codex threads that preserve working context across repeated sessions.

Pinned threads are one way to keep durable threads close at hand. They’re useful for recurring work streams such as:

  • a Chief of Staff thread

  • a release thread

  • a documentation review thread

  • a thread dedicated to external monitoring

These are persistent workspaces, not short chats. Codex can revisit them over time, preserving prior decisions, preferences, and working context that would otherwise need to be rebuilt from scratch.

Pinned-thread shortcuts make this practical. Command-1 through Command-9 jump directly into saved threads.

Voice input

Voice input is valuable because it captures the rough version of a thought before it’s compressed into polished prose.

Codex has built-in voice input. It works especially well for vague starting points that are natural to say but awkward to type:

I think someone named Ben mentioned this in Slack. I do not remember the details. Please go look.

For an agent that can search, gather context, and report back, that’s often enough.

It also works well for a two- or three-minute thought dump before the task is fully formed.

Transcripts work the same way. A raw meeting transcript or dictated planning note often provides better source material than a short summary because it preserves uncertainty, emphasis, and unfinished lines of thought.

Steering and queuing

Voice becomes even more useful when paired with explicit control over an active task.

Steering: Interrupting an in-flight Codex task with new direction before the current step finishes.

Steering is useful when the agent is heading the wrong way and needs a correction before it finishes. During a website review, for example, the user can interrupt the work while annotating the surface in the side panel:

  • make this smaller

  • the spacing between these two elements feels off

  • this copy is wrong

Queuing: Adding work for Codex to do after the current step completes.

Queuing is different. It doesn’t interrupt the task in progress. It adds the next task to the line. A user might say:

Once the work is done, send the preview link to the reviewer in Slack.

Steering changes what Codex is doing now. Queuing changes what should happen next. Both keep the user close to the work while it’s unfolding.

Tools and reach

Once a thread has continuity, the next question is what it can act on. Codex can move outward in layers:

  • $browser for the in-app browser in the side panel, where Codex can inspect and annotate web surfaces

  • @chrome for signed-in browser state and Chrome-based workflows

  • @computer for work that only exists through a desktop GUI

$browser fits side-panel browser review. @chrome fits signed-in browser work that depends on the user’s Chrome context. @computer fits tasks that only exist through a desktop GUI.

MCP servers and connectors extend the same idea into the rest of a workflow. Slack, Gmail, and Calendar matter because many important tasks first appear as messages, inbox items, or scheduling problems before they ever become code.

Skills make repeated workflows reusable. Once a workflow proves useful, package it as a skill so Codex can run it again without relearning the routine from scratch.

Work from anywhere

The Codex mobile app changes when the user has to be at the desk. A task can start on a Mac where the files, permissions, and local setup already live, then continue while the user checks in from a phone.

That matters in small moments. Someone can leave the desk while Codex runs a longer task, answer a question from outside, approve the next step, or redirect the thread before they get back. The local environment stays in place; the user doesn’t have to.

Automations

Automations run Codex work on a schedule. Use a scheduled automation when the recurring job should start fresh from a workspace, such as a daily report or a regular repository check. Use a thread automation when the schedule should return to an active conversation with its running context.

Thread automations: Heartbeat-style recurring wake-up calls that return to the same Codex thread on a schedule.

Pinned threads are useful, but they still wait for the user to return. A thread automation can check on something every few minutes or every few hours, continue until it meets a condition, and adjust the cadence over time.

A Chief of Staff thread might run every 30 minutes:

Every 30 minutes, check Slack and Gmail for unanswered messages that need my attention. Help me prioritize what matters most. If someone asks me a question, research the answer as deeply as you can and draft a reply for me, but do not send it.

When the user returns, the expensive part of gathering context is often done. The human still decides what gets sent.

Thread automations also fit feedback loops. A thread automation can watch pull request comments, Google Docs comments, or Slack replies and keep the surrounding work moving while the user is away.

Consider an animation workflow where a reviewer shares a video in Slack. A thread automation can check the thread on a schedule, render an updated version when comments arrive, and reply in the same thread tagging the reviewer. If one integration can’t complete the final upload, desktop automation can finish the step through the GUI.

The loop spans Slack for feedback, the codebase for rendering, and desktop automation for the final upload.

Goals

Goals are most powerful when the task has a real finish line that the agent can keep pushing toward. A weak goal is:

Goals: Longer-running Codex tasks with a finish line the agent can keep working toward over time.

Implement the plan in this Markdown file.

A stronger goal has a measurable success criterion.

For example, an engineer might migrate an internal tool from Python to Rust by setting up the new directory, defining the goal, and making the finish line explicit: the new implementation isn’t done until the unit tests pass.

A goal combines ongoing execution with a verifier. The user defines the outcome, the stopping condition, and the signal that says whether Codex is getting closer.

Useful verifiers include:

  • a test suite

  • a benchmark

  • a bug reproduction

  • a validation matrix

  • an end-to-end workflow that must keep passing

Ambition matters, but without verification it’s just a wish.

The side panel

The side panel keeps the work beside the conversation that produced it. Instead of exporting an artifact and switching contexts, the user can review it in place. The output might be code, but it might also be a deck, a PDF, a browser page, a table, or another artifact created along the way.

It supports four jobs especially well:

  1. Inspect artifacts

  2. Annotate what needs to change

  3. Operate web surfaces

  4. Review changes

The side panel lets users review Markdown, spreadsheets, data tables, documents, and slides in place. They can inspect, mark up, and revise artifacts without breaking the loop.

https://developers.openai.com/codex/app/browser

The deck or PDF can stay open beside the thread that produced it, ready for direct review and repair.

The in-app browser lets Codex inspect a rendered page, control it, and respond to annotations directly on the surface under review. Comments on a page or artifact stay inside the working loop instead of becoming a separate handoff.

The web becomes both output and control surface. Codex can build an artifact, open it in the side panel, inspect it, debug it, and keep refining the same object in place.

https://developers.openai.com/codex/app/chrome-extension

These surfaces work especially well:

  • index.html for lightweight static artifacts

  • Storybook for UI review

  • Remotion Studio for programmatic animation

  • browser-based slide decks for presentations

  • data apps for analysis workflows

A single index.html file can become a durable interactive artifact with no server required. Thread automations can also refresh static artifacts over time so a thread has something new waiting when the user returns.

Shared memory

Long-running threads become more useful when they share memory outside any one conversation.

Shared memory: Durable context stored outside a single thread so future work can resume from something explicit and reviewable.

One durable pattern is to anchor persistent threads in an Obsidian vault. In practice, that means a folder of plain files that stays straightforward to inspect, edit, move, and keep for a long time. Teams can store that folder in cloud storage, Git, Dropbox, Google Drive, or another sync layer that fits their workflow.

A vault might look like this:

vault/ ├── TODO.md ├── people/ ├── projects/ ├── agent/ └── notes/

At the top level, AGENTS.md can define how Codex should update that workspace as it learns more about people, projects, decisions, and open loops.

Don’t copy one exact vault structure. Teach the agent where durable context should live, what context to preserve, and when not to create churn.

A practical AGENTS.md might say:

  • Treat ~/vault as durable work memory.
  • Prefer canonical notes over note sprawl.
  • Route TODOs, people, projects, daily summaries, and scratch notes explicitly.
  • Preserve decisions, blockers, owners, dates, and useful links.
  • If nothing meaningful changed, do not churn the vault.

Repositories hold code. The vault holds rolling context: the people involved, what changed, what’s blocked, what needs follow-up, and what would otherwise disappear between sessions.

Important context shouldn’t live only inside a conversation transcript. Write it down somewhere the next thread can pick back up.

Codex also has first-party memory features in Settings > Personalization > Memories. They provide a local recall layer for preferences, recurring workflows, and known pitfalls. They complement explicit written context rather than replacing it. Chronicle pushes in the same direction by helping Codex build memory from recent screen context.

From code outward

Codex still starts from code. But more of the work around code is now reachable through the same system: MCP servers, browser surfaces, desktop controls, thread automations, and reviewable artifacts.

That changes the control model. Steering interrupts the work in progress. Queuing lines up the next task. Thread automations keep a thread active when the user steps away. Goals add a concrete finish line that Codex can keep working toward.

Codex can now carry a workflow from instruction to execution to artifact review, even when the work leaves the repo.

📋 讨论归档

讨论进行中…