返回列表
🧠 阿头学 · 💬 讨论题 · 🪞 Uota学

单智能体 AI 编程已到瓶颈,多智能体工作流更像真正的工程解法

这篇文章的判断是对的:复杂 AI 编程的核心瓶颈不是 prompt 技巧,而是任务组织方式;但作者把“多智能体更优”说成“单智能体时代结束”,结论明显说过头了。
打开原文 ↗

2026-04-17 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • 真正的问题是工作流,不只是模型 文章最站得住脚的判断是,很多 AI 编程失败并不是模型纯粹不够强,而是把一个 agent 当成需求分析、编码、测试、修复、记录的全能角色,这种组织方式本身就容易被上下文膨胀和错误扩散拖死。
  • 多智能体的价值在于可验证分工 文中“主厨—线厨—验证者”的结构本质上是在复制成熟工程流程:先拆任务、再执行、再验证,这比让一个智能体长线程乱跑更可靠,这一点不是新鲜概念,但确实有效。
  • 并行适合低耦合任务,不适合所有工程任务 设计变体、页面生成、代码库探索、测试补全天然适合多智能体并行,但高耦合的后端逻辑、数据库迁移、跨模块重构并不会因为 agent 变多就自动变简单,作者在这里明显报喜不报忧。
  • “上下文变大”和“能力变强”不是一回事 文中把多个子智能体窗口相加,包装成“有效上下文从 200K 到 2500 万+”,这个说法很会卖点,但技术上并不严谨,因为摘要传递会失真,子任务之间也没有真正共享统一认知。
  • 这篇文章兼具经验总结和生态宣传 它对多智能体模式的五种归纳有实操价值,但反复提 Codex、Codex Spark、Cerebras、MCP、Factory.ai,也清楚说明这不是中立研究,而是带有产品教育和市场导流意图的内容。

跟我们的关联

  • 对 ATou 意味着什么 这篇文章强化了一个很实用的管理判断:别再追求“一个超级 agent 干到底”,下一步应该把自己的 AI 工作流默认改成“先契约、后执行、再验证”,尤其适合产品迭代、原型开发和内容生产。
  • 对 Neta 意味着什么 Neta 如果在做 agent、自动化或工作流工具,这篇文章说明真正可做的不是“更强模型壳”,而是任务拆解、状态管理、验证回路和失败恢复,下一步可以直接用“编排者/执行者/验证者”设计产品骨架。
  • 对 Uota 意味着什么 Uota 可以把这篇文章当成一个清晰立场:AI 的价值不是替代判断,而是扩大候选空间,下一步在设计、品牌、海外增长场景里更适合用“并行生成 + 人工筛选”,而不是迷信 AI 自带审美。
  • 对三者共同意味着什么 文中最该吸收的不是“多智能体”这个名词,而是“未经验证的输出不能进入下游”这条纪律,下一步不管做代码、研究还是内容,都应该显式加入独立 review 和测试环节。

讨论引子

1. 多智能体真正提升的是“产能”,还是只是把复杂度从一个线程转移到了编排层? 2. 在什么任务类型下,单智能体反而比多智能体更划算、更稳? 3. 如果把“构建者—验证者分离”当默认纪律,今天有哪些团队流程其实应该被重写?

由 @MilksandMatcha 和 @0xSero 创作

我先付了订阅费(每月 200 美元),写下一段自认为正确的提示词(既要做提示词工程师,又要做上下文工程师),然后等待。35 分钟过去,智能体还在“综合”、“细读”、“实施”、“酝酿”(这些词到底是谁想出来的)。

到最后,我手里只剩下一堆糟糕代码、一个臃肿的上下文窗口,还要掰着左手数剩下的 token。

好吧,我拿个苹果,压缩上下文,输入几句重口味的语言暴力,从头再解释一遍所有内容,然后祈祷下一次尝试能比上一次走得更远……结果还是同样令人失望。

到这时候,AI 编程最初的火花和快乐早就死光了。

别再做一次性 Sloperator

这就是单智能体的天花板。每个用 AI 智能体做开发的人,都会在项目从 3D HTML 贪吃蛇游戏升级到任何更实用的东西时撞上它。原因有两个:

  1. 我们对单个智能体期待太多

  2. 我们没有把问题拆成足够简单、可以验证的任务

而到了这一步,大多数人会开始向你推销:(a)一门没用的提示词工程课程,(b)又一个帮你管理上下文的 SaaS 工具,(c)或者问你为什么还没试试几秒钟前刚发布的新模型。但今天我们不做这些。

相反,我们会带你看真正有效的做法:把后厨搭起来。也就是多智能体工作流。

欢迎来到后厨

最近几周,多智能体工作流变得实用得多,原因有几个:底层模型更强了,主流 AI 编程智能体也让多智能体编排更容易搭建。上个季度,OpenAI 在 Codex 工作流中推出了更深层的编排能力,Anthropic 也继续扩展 Claude Code 和 MCP 生态。

但最大的突破是速度。OpenAI 最新模型之一 Codex Spark(由 @cerebras 驱动)的速度大约是每秒 1,200 token,这让并行步骤和验证步骤变得可行。否则,这些步骤通常太耗时间,不值得运行。

以一个使用 Codex 和 Figma MCP 把网站复制进 Figma 的任务为例,单智能体工作流平均每次运行 36.5 分钟,平均需要 12 次人工介入,而且失败率是 100%;而利用 CodeX Spark 的多智能体工作流只用了 5.2 分钟,人工介入 2 次,并且第一次就成功了。

什么是多智能体工作流?

多智能体工作流从架构层面解决单智能体天花板。不是让一个厨师做所有事,而是有一位主厨接单,把任务拆成有边界、可验证的工单,再把每一张工单交给一名线厨执行。

主厨(编排者):

主厨的工作,是从人类那里接单,把它拆成一份可执行的工单列表,然后叫来线厨,让每个人去完成一个更小、更明确的任务。编排者负责规划、协调和任务拆解。它唯一的工具是 delegate_task,并且只能看到高层目标和子智能体输出的摘要。

线厨(子智能体):

线厨的工作,是接过主厨给的工单(任务分配),然后把活干完,不多问。每个线厨都有自己全新的工位(上下文窗口),完成自己的工作,交出成品,然后下班。子智能体可以读取、写入、使用 MCP,以及调用任何需要的工具。它们只会看到自己被分配到的提示词和一个新的上下文窗口(没有之前的历史)。

让事情保持有序的诀窍是:线厨不会拿到完整的订单历史。它也不会拿到你那份 15,000 token 的总体规划文档,没必要看那些。它只拿到完成某一道具体菜所需的最小可行上下文。

在 Codex 这样的 AI 智能体里,你只需要直接告诉智能体“使用子智能体”,就能创建一个线厨。新的实例会拿到一段提示词、一组它可以访问的文件,以及它需要的上下文。

搭起后厨后立刻得到的三个收益

用编排者和子智能体,而不是试图一次性完成你正在构建的东西,或者死守一个单一、前沿、昂贵的模型,你会立刻得到三个明确收益。

1. Token:你的有效上下文窗口会从约 200K 变成 2500 万以上

人类、编排者和子智能体之间的交互方式如下:

  • 人类只和编排者对话。

  • 编排者除了 delegate_task 以外,没有任何工具。

  • 如果编排者想采取行动,就通过 delegate_task 生成一个子智能体。

  • 每个子智能体都有自己全新的上下文窗口,一开始只有一段提示词。

  • 子智能体可以读取、写入、使用 MCP,以及调用任何其他工具。

  • 子智能体把自己的工作摘要返回给主厨。

这意味着,编排者永远不需要直接读取文件、写入文件,也不需要直接看到工具调用结果。它的上下文窗口实际上被扩展到了它能生成的所有子智能体之和。你可以一整天工作,而不需要丢失上下文、压缩上下文或从头开始。

https://huggingface.co/moonshotai/Kimi-K2.5

2. 控制:你可以在智能体循环的每一轮强制执行顺序工作流

不是让一个智能体同时负责探索、烹饪、试味和装盘,而是把每一步变成一张精确、顺序执行的工单。这也很适合针对不同任务使用不同模型。借助 Codex Spark 这种速度显著更快的模型(约每秒 1,200 token),我们可以加入验证和 QA 步骤,而这些步骤通常会因为太耗时而被省掉。

编排者按脚本执行,每个阶段生成一个子智能体:

  1. 子智能体 A 把订单拆成一份“契约”,里面包含子任务和验收标准。

  2. 子智能体 B 探索下一个子任务。

  3. 子智能体 C 测试上一个子任务生成的代码。如果测试通过验证标准,就继续。否则重新生成负责编码的线厨,让它修复已发现的问题。

  4. 子智能体 D 记录子任务,并更新范围检查清单。

  5. 如果还有子任务,就从第 2 步继续。否则,服务结束。

在内部试验中,和同一份任务说明下的单智能体运行相比,这个顺序循环将人工介入减少了 84.3%。

3. 速度:你可以并行运行定义清楚的任务

如果任务允许,你可以并行生成多个子智能体。这很适合:

  1. 生成 logo、图片、吉祥物、素材、样机、设计或测试

  2. 以数量级更快的速度探索大型代码库

  3. 快速构建多个页面,每个子智能体负责代码库中不同的部分,并且不会互相覆盖。

五个并行吉祥物生成任务大约用了一分钟,而顺序执行需要五分钟。对于依赖审美探索的任务来说,这大约是 5 倍加速。

5 个真正有效的模式

过去几周里,我们在不同 AI 智能体上尝试了几十种工作流和搭建方式。下面是我们在构建多智能体工作流时发现有效的五种模式。

如果你刚开始接触,可以从最上面开始,一路往下试 :)

模式 1:备菜线

正式营业前,专业厨房不会让一个厨师慢慢切完每一颗蔬菜。它会有一排备菜厨,各自在同一个工位独立工作:一个切洋葱,一个处理青葱,一个分装蛋白质。最后,副厨检查成果,选出能用的部分。

这种形态适合设计探索、代码变体或测试生成。让你的线厨们各自生成很多选项,然后你手动挑出最好的。每个线厨都独立处理同一份任务说明,而你(是的,你还是有一个小任务)负责筛选结果。这是入门多智能体工作流最简单的方式,因为每个任务完全独立,没有文件冲突、依赖图或合并逻辑。

举个例子:我们想为 Parchi 创建 50 个吉祥物变体,所以派出了 5 个 Codex Spark 子智能体,每个生成 10 个变体。然后我们挑出喜欢的,扔掉剩下的。

https://factory.ai/news/missions

这个模式最好的地方在于:它也是把审美注入 AI 工作流的绝佳方式。今天的模型几乎没有审美。

当然,你也可能缺审美。对大多数开发者来说,蛮力解法是收集设计或图形模式案例,或者给 AI 编程智能体塞足够多的风格指南,多到你还不如自己写 html/css。与其做这种乏味的手工流程,不如让主厨召来一队线厨,然后从中挑出你最喜欢的。

https://github.com/0xsero/parchi

模式 2:晚餐高峰

周五晚餐高峰时,厨房里的每个工位,煎炒、烧烤、冷菜、甜点,都在同时开火。每个线厨负责不同的工作,但他们都在同时出盘,都在为同一张订单贡献成果。

这就是“群体智能”的概念,最早由 MoonshotAI 在训练 Kimi-K2.5 时开创。在群体智能中,每个线厨负责一个单一、有边界、相互区分的任务。这些线厨同时运行,共同服务于一个共享目标。

适合的场景包括:构建应用中的多个独立组件,为不同模块编写测试,或者把页面从一个框架迁移到另一个框架。

这种设置需要满足几件事:

  • 你需要一份极其具体的工作范围

  • 这个范围需要能拆成独立、可验证的步骤

  • 每个任务都必须有清楚记录的依赖关系

  • 每个任务应该只需要修改一组预先定义好的文件,这样线厨之间就不会互相覆盖。

重要:核心要求是任务不能共享文件。一旦两个线厨需要编辑同一个文件,你就需要换一种模式。

模式 3:按道上菜

品鉴菜单不会一次性全部端出来。开胃小点先上,然后才开始做前菜;前菜撤下后,主菜才开始装盘;甜点则等轮到它的时候再上。但在同一道菜内部,每个工位可以并行烹饪。

这就是分阶段并行执行的思路。你把项目拆成一道道菜(或者“一波一波”的任务),每一道都严格依赖前一道。每一道内部,可以有任意数量的任务和线厨并行运行。这非常适合更大的项目,比如完整应用重建或大型重构。

要让这个模式运转起来,你需要依赖树、严格顺序和打磨过的提示词。值得参考 https://factory.ai/news/missions,看看他们是怎么处理的。

下面是一个重建整套 UI 的真实例子。第 1 道菜负责探索并映射所有内容,第 2 道菜建立在这份共享理解之上。任何一道菜里的线厨,都不需要完整的对话历史。它们只拿到与自己工单相关的上下文简报。

作为人类,你要清楚定义需要什么。分道结构同时带来并行和顺序控制,这也是它比纯群体智能更能扩展到真实项目的原因。

模式 4:从备菜到装盘的流水线

你的线厨不会每个人都从零开始做一道菜。一个工位修整并调味蛋白质,下一个工位煎封,再下一个工位放进烤箱完成,最后由传菜口负责人装盘并点缀。每个工位都有一个明确任务,干净交接,不会把上一张订单的酱汁拖到下一张里。

在这个模式里,线厨沿着出菜口顺序作业。每个厨师完成一个更小的任务,验证它,然后把工作成果交给下一个工位。

这个模式非常适合周期较长、结果清楚可观察且可验证的任务,研究量大的任务,或多步骤流水线。核心原则是:不要把无关历史一直拖进一个巨大的线程里。每个阶段拿到足够完成自己部分的上下文,然后交接。状态存在文件和任务队列里,而不是存在对话里。

阶段之间要干净交接。状态存在文件和任务队列中,而不是存在对话历史里。在一个目标是让自定义模型运行在特定硬件上的例子中,每个线厨都有清楚、有边界的任务。

下面这个例子的目标,是在特定硬件上运行一个自定义模型。请特别注意,每个线厨的任务都是清楚且有边界的。

模式 5:戈登·拉姆齐来了

在专业厨房里,厨师做好一道菜后,它不会直接送到顾客面前(我们 倒是希望如此)。它会先经过检查。一个人检查烹饪是否到位,另一个人检查它是否符合订单、装盘是否正确。

最后这个模式与其说是项目架构,不如说是一种纪律:把写代码的线厨和检查代码的线厨分开。一个构建者负责烹饪,两个验证者(代码审查员和视觉/功能测试员)并行运行,验证输出。如果任一验证者指出问题,构建者就再做一轮。尤其是在 Codex Spark 这种几乎瞬时响应的编程模型出现后,加入验证几乎是免费的。

在这个工作流里,同一时间只有一个构建者写代码,但多个验证者可以同时运行。这是避免合并冲突和上下文漂移最重要的一条规则,而且它适用于这份列表里的所有其他模式。

什么时候使用它:永远使用。无论你正在运行哪种模式,都把它叠加上去。把构建和验证分离,可以在失败向下游任务扩散之前抓住它。验证步骤可以使用浏览器自动化、截图和确定性测试。目标是:任何线厨的输出,在没有证据证明它能工作之前,都不能送上出菜口。

接下来会走向哪里

如果你只从这篇反思中带走一件事,那就记住这个:单智能体一次性完成的时代已经结束了。我们仍然处在早期阶段,随着模型更快、上下文窗口更长、工具链更成熟,这些模式还会继续演化。

脱下围裙,穿上主厨外套。现在你在经营厨房,你的队伍已经等着了。你可以在这里阅读更多关于如何开始使用 Codex 和 Codex Spark 的内容。

感谢 Zhenwei Gao 和 James Wang 的意见,也感谢 @brickywhat 最早向我们介绍了“sloperator”这个词。插图由 @halleychangg 绘制。

Created by @MilksandMatcha and @0xSero

由 @MilksandMatcha 和 @0xSero 创作

I pay my upfront subscription ($200/month), write what I hope is the right prompt (prompt AND context engineer), and wait. 35 minutes later, the agent is still "synthesizing," "perusing," "effecting," and "germinating" (who came up with these).

我先付了订阅费(每月 200 美元),写下一段自认为正确的提示词(既要做提示词工程师,又要做上下文工程师),然后等待。35 分钟过去,智能体还在“综合”、“细读”、“实施”、“酝酿”(这些词到底是谁想出来的)。

By the end, I have files of bad code, a bloated context window, and I'm counting the remaining tokens on my left hand.

到最后,我手里只剩下一堆糟糕代码、一个臃肿的上下文窗口,还要掰着左手数剩下的 token。

Okay, I grab an apple, compact, type some heavy handed verbal abuse, re-explain everything from scratch, and pray the next attempt gets further than the last one… only to be disappointed by the same result.

好吧,我拿个苹果,压缩上下文,输入几句重口味的语言暴力,从头再解释一遍所有内容,然后祈祷下一次尝试能比上一次走得更远……结果还是同样令人失望。

By now, the spark and joys of AI coding are long dead.

到这时候,AI 编程最初的火花和快乐早就死光了。

Stop being a one-shot Sloperator

别再做一次性 Sloperator

This is the single-agent ceiling. Every developer building with AI agents hits it the moment their project graduates from a 3D HTML snake game to anything more practical. This happens for two reasons:

这就是单智能体的天花板。每个用 AI 智能体做开发的人,都会在项目从 3D HTML 贪吃蛇游戏升级到任何更实用的东西时撞上它。原因有两个:

  1. we expect too much from a single agent
  1. 我们对单个智能体期待太多
  1. we do not break problems into simple enough, verifiable tasks
  1. 我们没有把问题拆成足够简单、可以验证的任务

And while this is when most people will sell you (a) a useless course on prompt engineering, (b) another SaaS tool that manages your context, (c) or ask why you haven't tried out the new model that came out seconds ago, we won't be doing that today.

而到了这一步,大多数人会开始向你推销:(a)一门没用的提示词工程课程,(b)又一个帮你管理上下文的 SaaS 工具,(c)或者问你为什么还没试试几秒钟前刚发布的新模型。但今天我们不做这些。

Instead, we're going to walk you through what actually works: running a proper back of house. Multi-agent workflows.

相反,我们会带你看真正有效的做法:把后厨搭起来。也就是多智能体工作流。

Welcome to the back of house

欢迎来到后厨

There are a few reasons why multi-agent workflows have become much more practical in recent weeks: underlying models have gotten better, and popular AI coding agents have made multi-agent orchestration easier to set up. In the last quarter, OpenAI rolled out deeper orchestration in Codex workflows, while Anthropic continued expanding Claude Code and the MCP ecosystem.

最近几周,多智能体工作流变得实用得多,原因有几个:底层模型更强了,主流 AI 编程智能体也让多智能体编排更容易搭建。上个季度,OpenAI 在 Codex 工作流中推出了更深层的编排能力,Anthropic 也继续扩展 Claude Code 和 MCP 生态。

The biggest unlock, though, is speed. One of OpenAI's latest models, Codex Spark (powered by @cerebras) runs at roughly 1,200 tokens/second, which makes it practical to introduce parallel and verification steps that would otherwise be too time-costly to run.

但最大的突破是速度。OpenAI 最新模型之一 Codex Spark(由 @cerebras 驱动)的速度大约是每秒 1,200 token,这让并行步骤和验证步骤变得可行。否则,这些步骤通常太耗时间,不值得运行。

For an example task using Codex and the Figma MCP to copy a website into Figma, the single agent workflow had a 36.5 min/run average with an average of 12 interventions (and 100% failure rate) while the multi-agent workflow leveraging CodeX Spark had a 5.2 minute run, 2 manual interventions, and success on the first try.

以一个使用 Codex 和 Figma MCP 把网站复制进 Figma 的任务为例,单智能体工作流平均每次运行 36.5 分钟,平均需要 12 次人工介入,而且失败率是 100%;而利用 CodeX Spark 的多智能体工作流只用了 5.2 分钟,人工介入 2 次,并且第一次就成功了。

What is a multi-agent workflow?

什么是多智能体工作流?

Multi-agent workflows fix the single-agent ceiling at the architecture level. Instead of one cook doing everything, you have a head chef who takes the order, breaks it into scoped, verifiable tickets, and hands each one to a line cook to execute.

多智能体工作流从架构层面解决单智能体天花板。不是让一个厨师做所有事,而是有一位主厨接单,把任务拆成有边界、可验证的工单,再把每一张工单交给一名线厨执行。

***The Head Chef (Orchestrator): ***

主厨(编排者):

The Head Chef's job is to take the order from the human, break it into a working list of tickets, then call line cooks to each go out and complete one smaller, scoped job. The orchestrator is responsible for planning, coordination, and task decomposition. Its only tool is delegate_task, and it only sees high-level goals plus summaries of subagent outputs.

主厨的工作,是从人类那里接单,把它拆成一份可执行的工单列表,然后叫来线厨,让每个人去完成一个更小、更明确的任务。编排者负责规划、协调和任务拆解。它唯一的工具是 delegate_task,并且只能看到高层目标和子智能体输出的摘要。

***The Line Cooks (Subagents): ***

线厨(子智能体):

The Line Cook's job is to take the ticket (task assignment) given by the Head Chef and get the job done, no questions asked. Each line cook gets its own fresh station (context window), does its work, returns the plate, and clocks out. Subagents can read, write, use MCPs, and any other tools needed. They only see their assigned prompt and a fresh context window (no prior history).

线厨的工作,是接过主厨给的工单(任务分配),然后把活干完,不多问。每个线厨都有自己全新的工位(上下文窗口),完成自己的工作,交出成品,然后下班。子智能体可以读取、写入、使用 MCP,以及调用任何需要的工具。它们只会看到自己被分配到的提示词和一个新的上下文窗口(没有之前的历史)。

The trick to keeping things orderly: the line cook doesn't get the full order history. It also doesn't get your 15,000-token master plan document, it doesn't need to see all that. It gets the minimum viable context to cook one specific dish.

让事情保持有序的诀窍是:线厨不会拿到完整的订单历史。它也不会拿到你那份 15,000 token 的总体规划文档,没必要看那些。它只拿到完成某一道具体菜所需的最小可行上下文。

In AI agents like Codex, you create a line cook by literally telling your agent to "use subagents." The new instance gets a prompt, a set of files it can access, and any context it needs.

在 Codex 这样的 AI 智能体里,你只需要直接告诉智能体“使用子智能体”,就能创建一个线厨。新的实例会拿到一段提示词、一组它可以访问的文件,以及它需要的上下文。

Three immediate wins from running a back of house

搭起后厨后立刻得到的三个收益

There are three clear wins you get from Orchestrators and Subagents, instead of trying to one-shot whatever you are building or sticking to a single, frontier, expensive model.

用编排者和子智能体,而不是试图一次性完成你正在构建的东西,或者死守一个单一、前沿、昂贵的模型,你会立刻得到三个明确收益。

1. Tokens: your effective context window goes from ~200K to 25M+

1. Token:你的有效上下文窗口会从约 200K 变成 2500 万以上

Here's how the human, orchestrator, and sub-agents interact:

人类、编排者和子智能体之间的交互方式如下:

  • The human talks exclusively to the orchestrator.
  • 人类只和编排者对话。
  • The orchestrator is stripped of all tools other than delegate_task.
  • 编排者除了 delegate_task 以外,没有任何工具。
  • If the orchestrator wants to take an action, it spawns a sub-agent via delegate_task.
  • 如果编排者想采取行动,就通过 delegate_task 生成一个子智能体。
  • Each sub-agent has its own fresh context window, starting only with a prompt.
  • 每个子智能体都有自己全新的上下文窗口,一开始只有一段提示词。
  • Sub-agents can read, write, use MCPs, and any other tools.
  • 子智能体可以读取、写入、使用 MCP,以及调用任何其他工具。
  • Sub-agents return a summary of their work back to the Head Chef.
  • 子智能体把自己的工作摘要返回给主厨。

This means the orchestrator never has to read files, write files, or see tool-call results directly, effectively extending its context window to as many sub-agents as it can spawn. You can work all day without losing context, compacting, or starting over.

这意味着,编排者永远不需要直接读取文件、写入文件,也不需要直接看到工具调用结果。它的上下文窗口实际上被扩展到了它能生成的所有子智能体之和。你可以一整天工作,而不需要丢失上下文、压缩上下文或从头开始。

2. Control: you can enforce sequential workflows at each turn of the agentic loop

2. 控制:你可以在智能体循环的每一轮强制执行顺序工作流

Instead of one agent doing the exploration, cooking, tasting, and plating, each step becomes a precise, sequential ticket. This is also a great place to use different models for different tasks. With significantly faster models like Codex Spark (~1,200 toks/sec), we can add validation and QA steps that would normally be too time-costly.

不是让一个智能体同时负责探索、烹饪、试味和装盘,而是把每一步变成一张精确、顺序执行的工单。这也很适合针对不同任务使用不同模型。借助 Codex Spark 这种速度显著更快的模型(约每秒 1,200 token),我们可以加入验证和 QA 步骤,而这些步骤通常会因为太耗时而被省掉。

The orchestrator follows a script, spawning one sub-agent per phase:

编排者按脚本执行,每个阶段生成一个子智能体:

  1. Sub-agent A breaks the order into a "contract" with subtasks and criteria.
  1. 子智能体 A 把订单拆成一份“契约”,里面包含子任务和验收标准。
  1. Sub-agent B explores the next subtask.
  1. 子智能体 B 探索下一个子任务。
  1. Sub-agent C tests the code generated in the prior subtask. If tests pass the validation criteria, move on. Otherwise respawn the coding line cook to fix identified issues.
  1. 子智能体 C 测试上一个子任务生成的代码。如果测试通过验证标准,就继续。否则重新生成负责编码的线厨,让它修复已发现的问题。
  1. Sub-agent D documents the subtask and updates the scope checklist.
  1. 子智能体 D 记录子任务,并更新范围检查清单。
  1. If any subtasks remain, continue from step 2. Otherwise, service is done.
  1. 如果还有子任务,就从第 2 步继续。否则,服务结束。

In internal trials, this sequential loop reduced manual interventions by 84.3% compared to single-agent runs on the same brief.

在内部试验中,和同一份任务说明下的单智能体运行相比,这个顺序循环将人工介入减少了 84.3%。

3. Speed: you can run well-defined tasks in parallel

3. 速度:你可以并行运行定义清楚的任务

If your task permits it, you can spawn multiple sub-agents in parallel. This works well for:

如果任务允许,你可以并行生成多个子智能体。这很适合:

  1. generating logos, images, mascots, assets, mockups, designs, or tests
  1. 生成 logo、图片、吉祥物、素材、样机、设计或测试
  1. exploring a massive codebase orders of magnitude faster
  1. 以数量级更快的速度探索大型代码库
  1. building multiple pages quickly, where each subagent works on separate parts of a codebase and doesn't overwrite each other.
  1. 快速构建多个页面,每个子智能体负责代码库中不同的部分,并且不会互相覆盖。

Running five parallel mascot generations took roughly one minute versus five minutes sequentially, about a 5x speedup on taste-driven exploration tasks.

五个并行吉祥物生成任务大约用了一分钟,而顺序执行需要五分钟。对于依赖审美探索的任务来说,这大约是 5 倍加速。

5 Patterns That Actually Work

5 个真正有效的模式

Over the past few weeks, we've tried dozens of workflows and setups across different AI agents. Below are five patterns we've found success with for building multi-agent workflows.

过去几周里,我们在不同 AI 智能体上尝试了几十种工作流和搭建方式。下面是我们在构建多智能体工作流时发现有效的五种模式。

If you're new to this, start at the top and work your way down :)

如果你刚开始接触,可以从最上面开始,一路往下试 :)

Pattern 1: The Prep Line

模式 1:备菜线

Before service, a professional kitchen doesn't have one cook slowly dicing every single vegetable. It has a row of prep cooks each working independently on the same station, one dicing onions, one breaking down shallots, one portioning proteins. At the end, the sous chef inspects and picks what makes the cut.

正式营业前,专业厨房不会让一个厨师慢慢切完每一颗蔬菜。它会有一排备菜厨,各自在同一个工位独立工作:一个切洋葱,一个处理青葱,一个分装蛋白质。最后,副厨检查成果,选出能用的部分。

This is the right shape for tasks like design exploration, code variations, or test generation. Have your line cooks each generate many options, then manually pick the best ones. Every line cook works on the same brief independently, and you (yes, you do have one small task) curate the results. This is the easiest way to get your feet wet with multi-agent workflows because every task is fully independent, with no file conflicts, dependency graphs, or merge logic.

这种形态适合设计探索、代码变体或测试生成。让你的线厨们各自生成很多选项,然后你手动挑出最好的。每个线厨都独立处理同一份任务说明,而你(是的,你还是有一个小任务)负责筛选结果。这是入门多智能体工作流最简单的方式,因为每个任务完全独立,没有文件冲突、依赖图或合并逻辑。

As an example: we wanted to create 50 variations of a mascot for Parchi, so we dispatched 5 Codex spark sub-agents with 10 variations each. Then we cherry-picked the ones we liked and tossed the rest.

举个例子:我们想为 Parchi 创建 50 个吉祥物变体,所以派出了 5 个 Codex Spark 子智能体,每个生成 10 个变体。然后我们挑出喜欢的,扔掉剩下的。

The best part of this pattern: it's also a great way to inject taste into your AI workflow. ***Models today have very little taste. ***

这个模式最好的地方在于:它也是把审美注入 AI 工作流的绝佳方式。今天的模型几乎没有审美。

Chances are, you might also lack taste. For most developers, the brute-force solution is sourcing examples of design or graphic patterns, or giving the AI coding agent enough style guidelines that you might as well have written the html/css yourself. Instead of that tedious manual process, have your Head Chef call a brigade of line cooks, then cherry-pick your favorite.

当然,你也可能缺审美。对大多数开发者来说,蛮力解法是收集设计或图形模式案例,或者给 AI 编程智能体塞足够多的风格指南,多到你还不如自己写 html/css。与其做这种乏味的手工流程,不如让主厨召来一队线厨,然后从中挑出你最喜欢的。

Pattern 2: The Dinner Rush

模式 2:晚餐高峰

During a Friday night dinner rush, every station in the kitchen, sauté, grill, garde manger, pastry, is firing simultaneously. Each line cook owns a different job, but they're all plating at once, all contributing to the same ticket.

周五晚餐高峰时,厨房里的每个工位,煎炒、烧烤、冷菜、甜点,都在同时开火。每个线厨负责不同的工作,但他们都在同时出盘,都在为同一张订单贡献成果。

This is the concept behind "swarms," pioneered by MoonshotAI when they trained Kimi-K2.5. With swarms, each line cook is responsible for a single, scoped, distinct task. These line cooks run simultaneously, all contributing to one shared goal.

这就是“群体智能”的概念,最早由 MoonshotAI 在训练 Kimi-K2.5 时开创。在群体智能中,每个线厨负责一个单一、有边界、相互区分的任务。这些线厨同时运行,共同服务于一个共享目标。

Good fits: building multiple independent components of an app, writing tests for different modules, or porting pages from one framework to another.

适合的场景包括:构建应用中的多个独立组件,为不同模块编写测试,或者把页面从一个框架迁移到另一个框架。

The setup requires a few things to go right:

这种设置需要满足几件事:

  • you need a deeply specific scope of work
  • 你需要一份极其具体的工作范围
  • that scope needs to break into individual, verifiable steps
  • 这个范围需要能拆成独立、可验证的步骤
  • each task must have clearly documented dependencies
  • 每个任务都必须有清楚记录的依赖关系
  • each task should only require a predefined set of files to change, so line cooks don't overwrite each other.
  • 每个任务应该只需要修改一组预先定义好的文件,这样线厨之间就不会互相覆盖。

Important: The key requirement is that tasks don't share files. The moment two line cooks need to edit the same file, you need a different pattern.

重要:核心要求是任务不能共享文件。一旦两个线厨需要编辑同一个文件,你就需要换一种模式。

Pattern 3: Courses in Sequence

模式 3:按道上菜

A tasting menu doesn't come out all at once. The amuse-bouche goes out before anyone fires the appetizer, appetizers clear before entrées start plating, and dessert waits its turn. But within a single course, every station is cooking in parallel.

品鉴菜单不会一次性全部端出来。开胃小点先上,然后才开始做前菜;前菜撤下后,主菜才开始装盘;甜点则等轮到它的时候再上。但在同一道菜内部,每个工位可以并行烹饪。

This is the idea behind phased parallel execution. You break your project into courses (or "waves") where each course strictly depends on the one before it. Within each course, any number of tasks and line cooks can run in parallel. This is perfect for bigger projects like full app rebuilds or large refactors.

这就是分阶段并行执行的思路。你把项目拆成一道道菜(或者“一波一波”的任务),每一道都严格依赖前一道。每一道内部,可以有任意数量的任务和线厨并行运行。这非常适合更大的项目,比如完整应用重建或大型重构。

To make this pattern work, you need a dependency tree, strict ordering, and refined prompts. It's worth referencing https://factory.ai/news/missions to see how they've handled this.

要让这个模式运转起来,你需要依赖树、严格顺序和打磨过的提示词。值得参考 https://factory.ai/news/missions,看看他们是怎么处理的。

Here's a real example from rebuilding an entire UI. Course 1 explored and mapped everything, while Course 2 built on top of that shared understanding. Neither course's line cooks needed the full conversation history. They got exactly the context brief relevant to their ticket.

下面是一个重建整套 UI 的真实例子。第 1 道菜负责探索并映射所有内容,第 2 道菜建立在这份共享理解之上。任何一道菜里的线厨,都不需要完整的对话历史。它们只拿到与自己工单相关的上下文简报。

As the human, you clearly define what is needed. The course structure gives you parallelism and sequencing, which is why it scales to real projects better than pure swarms.

作为人类,你要清楚定义需要什么。分道结构同时带来并行和顺序控制,这也是它比纯群体智能更能扩展到真实项目的原因。

Pattern 4: The Prep-to-Plate Assembly

模式 4:从备菜到装盘的流水线

Your line cooks don't each build a dish from scratch. One station trims and seasons the protein, the next sears it, the next finishes it in the oven, and the expediter plates and garnishes. Each station has one clear job, hands off cleanly, and nothing drags sauce from the previous ticket into the next.

你的线厨不会每个人都从零开始做一道菜。一个工位修整并调味蛋白质,下一个工位煎封,再下一个工位放进烤箱完成,最后由传菜口负责人装盘并点缀。每个工位都有一个明确任务,干净交接,不会把上一张订单的酱汁拖到下一张里。

In this pattern, line cooks operate sequentially down the pass. Each cook does one smaller task, validates it, then hands the workpiece to the next station.

在这个模式里,线厨沿着出菜口顺序作业。每个厨师完成一个更小的任务,验证它,然后把工作成果交给下一个工位。

This pattern is perfect for long-horizon tasks with clear, observable, and verifiable outcomes, research-heavy tasks, or multi-step pipelines. The core principle: do not keep dragging unrelated history through one giant thread. Each phase gets enough context to do its part, then hands off. State lives in files and task queues, not in conversation.

这个模式非常适合周期较长、结果清楚可观察且可验证的任务,研究量大的任务,或多步骤流水线。核心原则是:不要把无关历史一直拖进一个巨大的线程里。每个阶段拿到足够完成自己部分的上下文,然后交接。状态存在文件和任务队列里,而不是存在对话里。

Clean handoff between phases. State lives in files and task queues, not in conversation history. In one example where the goal was to run a custom model on specific hardware, each line cook had a clear, bounded job.

阶段之间要干净交接。状态存在文件和任务队列中,而不是存在对话历史里。在一个目标是让自定义模型运行在特定硬件上的例子中,每个线厨都有清楚、有边界的任务。

Here's an example where the goal was to run a custom model on specific hardware. Pay special attention to how each line cook had a clear, bounded job.

下面这个例子的目标,是在特定硬件上运行一个自定义模型。请特别注意,每个线厨的任务都是清楚且有边界的。

Pattern 5: Here comes Gordon Ramsay

模式 5:戈登·拉姆齐来了

In a professional kitchen, the chef makes the dish, but it does not go straight to the customer (we wish). Instead, it passes through inspection first. One person checks whether it was cooked properly while another checks whether it matches the order and is plated correctly.

在专业厨房里,厨师做好一道菜后,它不会直接送到顾客面前(我们 倒是希望如此)。它会先经过检查。一个人检查烹饪是否到位,另一个人检查它是否符合订单、装盘是否正确。

This final pattern isn't a project architecture so much as a discipline: you separate the line cooks that write code from the line cooks that check code. One builder cooks, while two verifiers (a code reviewer and a visual/functional tester) run in parallel to validate the output. If either verifier flags an issue, the builder gets another pass. Especially with the availability of near-instant coding models like Codex Spark, adding verification is practically free.

最后这个模式与其说是项目架构,不如说是一种纪律:把写代码的线厨和检查代码的线厨分开。一个构建者负责烹饪,两个验证者(代码审查员和视觉/功能测试员)并行运行,验证输出。如果任一验证者指出问题,构建者就再做一轮。尤其是在 Codex Spark 这种几乎瞬时响应的编程模型出现后,加入验证几乎是免费的。

In this workflow, only one builder writes at a time, but multiple verifiers can run simultaneously. This is the single most important rule for avoiding merge conflicts and context drift, and it applies inside every other pattern on this list.

在这个工作流里,同一时间只有一个构建者写代码,但多个验证者可以同时运行。这是避免合并冲突和上下文漂移最重要的一条规则,而且它适用于这份列表里的所有其他模式。

When to use it: Always. Whatever pattern you're running, layer this on top. Separating build from verify catches failures before they cascade into downstream tasks. Use browser automation, screenshots, and deterministic tests for the verify step. The goal is that no line cook's output makes it onto the pass without evidence that it works.

什么时候使用它:永远使用。无论你正在运行哪种模式,都把它叠加上去。把构建和验证分离,可以在失败向下游任务扩散之前抓住它。验证步骤可以使用浏览器自动化、截图和确定性测试。目标是:任何线厨的输出,在没有证据证明它能工作之前,都不能送上出菜口。

Where this is heading

接下来会走向哪里

If you take one thing from this reflection, let it be this: the era of the solo-agent one-shot is over. We're still early, and these patterns will keep evolving as models get faster, context windows get longer, and tooling matures.

如果你只从这篇反思中带走一件事,那就记住这个:单智能体一次性完成的时代已经结束了。我们仍然处在早期阶段,随着模型更快、上下文窗口更长、工具链更成熟,这些模式还会继续演化。

Take off the apron and put on the chef's coat. You're running the kitchen now, and your brigade is waiting. You can read more about how to get started with Codex and Codex Spark here.

脱下围裙,穿上主厨外套。现在你在经营厨房,你的队伍已经等着了。你可以在这里阅读更多关于如何开始使用 Codex 和 Codex Spark 的内容。

Thanks to input from Zhenwei Gao and James Wang, and @brickywhat who first introduced us to the term 'sloperator'. Illustrations by @halleychangg.

感谢 Zhenwei Gao 和 James Wang 的意见,也感谢 @brickywhat 最早向我们介绍了“sloperator”这个词。插图由 @halleychangg 绘制。

Created by @MilksandMatcha and @0xSero

I pay my upfront subscription ($200/month), write what I hope is the right prompt (prompt AND context engineer), and wait. 35 minutes later, the agent is still "synthesizing," "perusing," "effecting," and "germinating" (who came up with these).

By the end, I have files of bad code, a bloated context window, and I'm counting the remaining tokens on my left hand.

Okay, I grab an apple, compact, type some heavy handed verbal abuse, re-explain everything from scratch, and pray the next attempt gets further than the last one… only to be disappointed by the same result.

By now, the spark and joys of AI coding are long dead.

Stop being a one-shot Sloperator

This is the single-agent ceiling. Every developer building with AI agents hits it the moment their project graduates from a 3D HTML snake game to anything more practical. This happens for two reasons:

  1. we expect too much from a single agent

  2. we do not break problems into simple enough, verifiable tasks

And while this is when most people will sell you (a) a useless course on prompt engineering, (b) another SaaS tool that manages your context, (c) or ask why you haven't tried out the new model that came out seconds ago, we won't be doing that today.

Instead, we're going to walk you through what actually works: running a proper back of house. Multi-agent workflows.

Welcome to the back of house

There are a few reasons why multi-agent workflows have become much more practical in recent weeks: underlying models have gotten better, and popular AI coding agents have made multi-agent orchestration easier to set up. In the last quarter, OpenAI rolled out deeper orchestration in Codex workflows, while Anthropic continued expanding Claude Code and the MCP ecosystem.

The biggest unlock, though, is speed. One of OpenAI's latest models, Codex Spark (powered by @cerebras) runs at roughly 1,200 tokens/second, which makes it practical to introduce parallel and verification steps that would otherwise be too time-costly to run.

For an example task using Codex and the Figma MCP to copy a website into Figma, the single agent workflow had a 36.5 min/run average with an average of 12 interventions (and 100% failure rate) while the multi-agent workflow leveraging CodeX Spark had a 5.2 minute run, 2 manual interventions, and success on the first try.

What is a multi-agent workflow?

Multi-agent workflows fix the single-agent ceiling at the architecture level. Instead of one cook doing everything, you have a head chef who takes the order, breaks it into scoped, verifiable tickets, and hands each one to a line cook to execute.

***The Head Chef (Orchestrator): ***

The Head Chef's job is to take the order from the human, break it into a working list of tickets, then call line cooks to each go out and complete one smaller, scoped job. The orchestrator is responsible for planning, coordination, and task decomposition. Its only tool is delegate_task, and it only sees high-level goals plus summaries of subagent outputs.

***The Line Cooks (Subagents): ***

The Line Cook's job is to take the ticket (task assignment) given by the Head Chef and get the job done, no questions asked. Each line cook gets its own fresh station (context window), does its work, returns the plate, and clocks out. Subagents can read, write, use MCPs, and any other tools needed. They only see their assigned prompt and a fresh context window (no prior history).

The trick to keeping things orderly: the line cook doesn't get the full order history. It also doesn't get your 15,000-token master plan document, it doesn't need to see all that. It gets the minimum viable context to cook one specific dish.

In AI agents like Codex, you create a line cook by literally telling your agent to "use subagents." The new instance gets a prompt, a set of files it can access, and any context it needs.

Three immediate wins from running a back of house

There are three clear wins you get from Orchestrators and Subagents, instead of trying to one-shot whatever you are building or sticking to a single, frontier, expensive model.

1. Tokens: your effective context window goes from ~200K to 25M+

Here's how the human, orchestrator, and sub-agents interact:

  • The human talks exclusively to the orchestrator.

  • The orchestrator is stripped of all tools other than delegate_task.

  • If the orchestrator wants to take an action, it spawns a sub-agent via delegate_task.

  • Each sub-agent has its own fresh context window, starting only with a prompt.

  • Sub-agents can read, write, use MCPs, and any other tools.

  • Sub-agents return a summary of their work back to the Head Chef.

This means the orchestrator never has to read files, write files, or see tool-call results directly, effectively extending its context window to as many sub-agents as it can spawn. You can work all day without losing context, compacting, or starting over.

https://huggingface.co/moonshotai/Kimi-K2.5

2. Control: you can enforce sequential workflows at each turn of the agentic loop

Instead of one agent doing the exploration, cooking, tasting, and plating, each step becomes a precise, sequential ticket. This is also a great place to use different models for different tasks. With significantly faster models like Codex Spark (~1,200 toks/sec), we can add validation and QA steps that would normally be too time-costly.

The orchestrator follows a script, spawning one sub-agent per phase:

  1. Sub-agent A breaks the order into a "contract" with subtasks and criteria.

  2. Sub-agent B explores the next subtask.

  3. Sub-agent C tests the code generated in the prior subtask. If tests pass the validation criteria, move on. Otherwise respawn the coding line cook to fix identified issues.

  4. Sub-agent D documents the subtask and updates the scope checklist.

  5. If any subtasks remain, continue from step 2. Otherwise, service is done.

In internal trials, this sequential loop reduced manual interventions by 84.3% compared to single-agent runs on the same brief.

3. Speed: you can run well-defined tasks in parallel

If your task permits it, you can spawn multiple sub-agents in parallel. This works well for:

  1. generating logos, images, mascots, assets, mockups, designs, or tests

  2. exploring a massive codebase orders of magnitude faster

  3. building multiple pages quickly, where each subagent works on separate parts of a codebase and doesn't overwrite each other.

Running five parallel mascot generations took roughly one minute versus five minutes sequentially, about a 5x speedup on taste-driven exploration tasks.

5 Patterns That Actually Work

Over the past few weeks, we've tried dozens of workflows and setups across different AI agents. Below are five patterns we've found success with for building multi-agent workflows.

If you're new to this, start at the top and work your way down :)

Pattern 1: The Prep Line

Before service, a professional kitchen doesn't have one cook slowly dicing every single vegetable. It has a row of prep cooks each working independently on the same station, one dicing onions, one breaking down shallots, one portioning proteins. At the end, the sous chef inspects and picks what makes the cut.

This is the right shape for tasks like design exploration, code variations, or test generation. Have your line cooks each generate many options, then manually pick the best ones. Every line cook works on the same brief independently, and you (yes, you do have one small task) curate the results. This is the easiest way to get your feet wet with multi-agent workflows because every task is fully independent, with no file conflicts, dependency graphs, or merge logic.

As an example: we wanted to create 50 variations of a mascot for Parchi, so we dispatched 5 Codex spark sub-agents with 10 variations each. Then we cherry-picked the ones we liked and tossed the rest.

https://factory.ai/news/missions

The best part of this pattern: it's also a great way to inject taste into your AI workflow. ***Models today have very little taste. ***

Chances are, you might also lack taste. For most developers, the brute-force solution is sourcing examples of design or graphic patterns, or giving the AI coding agent enough style guidelines that you might as well have written the html/css yourself. Instead of that tedious manual process, have your Head Chef call a brigade of line cooks, then cherry-pick your favorite.

https://github.com/0xsero/parchi

Pattern 2: The Dinner Rush

During a Friday night dinner rush, every station in the kitchen, sauté, grill, garde manger, pastry, is firing simultaneously. Each line cook owns a different job, but they're all plating at once, all contributing to the same ticket.

This is the concept behind "swarms," pioneered by MoonshotAI when they trained Kimi-K2.5. With swarms, each line cook is responsible for a single, scoped, distinct task. These line cooks run simultaneously, all contributing to one shared goal.

Good fits: building multiple independent components of an app, writing tests for different modules, or porting pages from one framework to another.

The setup requires a few things to go right:

  • you need a deeply specific scope of work

  • that scope needs to break into individual, verifiable steps

  • each task must have clearly documented dependencies

  • each task should only require a predefined set of files to change, so line cooks don't overwrite each other.

Important: The key requirement is that tasks don't share files. The moment two line cooks need to edit the same file, you need a different pattern.

Pattern 3: Courses in Sequence

A tasting menu doesn't come out all at once. The amuse-bouche goes out before anyone fires the appetizer, appetizers clear before entrées start plating, and dessert waits its turn. But within a single course, every station is cooking in parallel.

This is the idea behind phased parallel execution. You break your project into courses (or "waves") where each course strictly depends on the one before it. Within each course, any number of tasks and line cooks can run in parallel. This is perfect for bigger projects like full app rebuilds or large refactors.

To make this pattern work, you need a dependency tree, strict ordering, and refined prompts. It's worth referencing https://factory.ai/news/missions to see how they've handled this.

Here's a real example from rebuilding an entire UI. Course 1 explored and mapped everything, while Course 2 built on top of that shared understanding. Neither course's line cooks needed the full conversation history. They got exactly the context brief relevant to their ticket.

As the human, you clearly define what is needed. The course structure gives you parallelism and sequencing, which is why it scales to real projects better than pure swarms.

Pattern 4: The Prep-to-Plate Assembly

Your line cooks don't each build a dish from scratch. One station trims and seasons the protein, the next sears it, the next finishes it in the oven, and the expediter plates and garnishes. Each station has one clear job, hands off cleanly, and nothing drags sauce from the previous ticket into the next.

In this pattern, line cooks operate sequentially down the pass. Each cook does one smaller task, validates it, then hands the workpiece to the next station.

This pattern is perfect for long-horizon tasks with clear, observable, and verifiable outcomes, research-heavy tasks, or multi-step pipelines. The core principle: do not keep dragging unrelated history through one giant thread. Each phase gets enough context to do its part, then hands off. State lives in files and task queues, not in conversation.

Clean handoff between phases. State lives in files and task queues, not in conversation history. In one example where the goal was to run a custom model on specific hardware, each line cook had a clear, bounded job.

Here's an example where the goal was to run a custom model on specific hardware. Pay special attention to how each line cook had a clear, bounded job.

Pattern 5: Here comes Gordon Ramsay

In a professional kitchen, the chef makes the dish, but it does not go straight to the customer (we wish). Instead, it passes through inspection first. One person checks whether it was cooked properly while another checks whether it matches the order and is plated correctly.

This final pattern isn't a project architecture so much as a discipline: you separate the line cooks that write code from the line cooks that check code. One builder cooks, while two verifiers (a code reviewer and a visual/functional tester) run in parallel to validate the output. If either verifier flags an issue, the builder gets another pass. Especially with the availability of near-instant coding models like Codex Spark, adding verification is practically free.

In this workflow, only one builder writes at a time, but multiple verifiers can run simultaneously. This is the single most important rule for avoiding merge conflicts and context drift, and it applies inside every other pattern on this list.

When to use it: Always. Whatever pattern you're running, layer this on top. Separating build from verify catches failures before they cascade into downstream tasks. Use browser automation, screenshots, and deterministic tests for the verify step. The goal is that no line cook's output makes it onto the pass without evidence that it works.

Where this is heading

If you take one thing from this reflection, let it be this: the era of the solo-agent one-shot is over. We're still early, and these patterns will keep evolving as models get faster, context windows get longer, and tooling matures.

Take off the apron and put on the chef's coat. You're running the kitchen now, and your brigade is waiting. You can read more about how to get started with Codex and Codex Spark here.

Thanks to input from Zhenwei Gao and James Wang, and @brickywhat who first introduced us to the term 'sloperator'. Illustrations by @halleychangg.

📋 讨论归档

讨论进行中…