返回列表
🧠 阿头学 · 💬 讨论题

构建 Claude Code 的经验——像智能体一样看世界

Anthropic 用四个产品迭代故事包装了一个核心判断:智能体工具设计的本质不是堆功能,而是持续删减与模型能力不匹配的旧工具——但这篇文章本身就是一篇精心设计的技术 PR,缺乏任何量化证据。
打开原文 ↗

2026-02-28 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • 工具是认知负担,不是资产 每多一个工具,模型就多一层决策权衡。Claude Code 维持在约 20 个工具,新增门槛极高。设计动作空间的核心不是"能做多少工具",而是"当前模型实际会用好哪些工具"。
  • 结构化工具接口碾压自然语言格式约定 AskUserQuestion 的三轮迭代证明:让大模型严格遵守 ad-hoc 文本格式极其脆弱,把交互需求抽象成带 schema 的工具调用才是稳定解。这一点已接近行业共识。
  • 模型变强后,旧辅助工具会退化为束缚 Todo→Task 的替换说明:原先防止模型跑偏的强约束,在模型能力提升后反而限制了它的自主规划和子智能体协作能力。这要求开发者主动拆护栏,而不只是加功能。
  • "渐进式披露"取代"一次性灌注"成为上下文管理的主范式 从 RAG 被动喂数据,到给 grep/skills 让模型自主搜索,再到子智能体按需检索——核心转变是让 Agent 从"被动接受上下文"变为"主动构建上下文"。但文章回避了多轮检索带来的延迟和 token 成本问题。
  • "像智能体一样看"听起来深刻,实际上就是读 trace、看输出、做迭代 标题的哲学感远大于方法论含量,本质仍是传统的产品/UX 观察实践,没有展开任何关于模型内部决策过程的技术性内容。

跟我们的关联

  • 对 ATou 做 Agent 产品的直接启示:AskUserQuestion 的迭代过程说明,Agent 产品的天花板不在 AI 智力,而在人机沟通带宽——"阻塞式交互 + 结构化选项"降低用户回复成本,这个设计模式可以直接搬到任何需要 Agent 向用户提问的场景中。
  • 对 Neta 的知识管理意味着:不要把所有规则塞进一个大 prompt,而是搭建"渐进式披露"的信息架构——第 0 层放最小必要信息,第 1 层放索引,第 2 层放模块化说明,第 3 层放检索子智能体。下一步可以用这个框架重新组织 steering 文件和 skills 目录。
  • 对 Uota 的团队协作隐喻:Todo→Task 的演进本质是管理学中"微操 vs 目标管理"的翻版。对高能力团队成员,硬性 checklist 会压制主动性;应该给带依赖关系、可自主修改的任务网络。下一步审视现有流程中哪些"防呆机制"已经变成了束缚。
  • 对所有人的元认知提醒:每次底层模型升级后,不只是"用新能力",还要主动复盘哪些旧工具、旧 prompt、旧流程已经从帮助变成噪音。这个"定期清理"的习惯比任何单一技巧都重要。

讨论引子

  • 你在自己的工作流中,有没有发现某个曾经很有用的工具/流程/规则,现在反而在拖你后腿?你是怎么发现的,又是怎么决定拆掉它的?
  • "渐进式披露"听起来很美,但多轮检索的延迟和成本在实际场景中能接受吗?什么时候应该"一次性灌注",什么时候应该"按需探索",边界在哪里?
  • 文章说工具设计是"艺术而非科学"——这是深刻的诚实,还是在为缺乏标准化评估方法找借口?

构建智能体运行框架(agent harness)最难的部分之一,是为它搭建动作空间(action space)。

Claude 通过工具调用(Tool Calling)来行动,但在 Claude API 中,工具可以用多种“原语”来构建,例如 bash、skills,以及最近加入的代码执行(想了解 Claude API 上以编程方式进行工具调用的更多内容,可以阅读 @RLanceMartin 的新文章)。

在这么多选择面前,你要如何为自己的智能体设计工具?你只需要一个工具,比如代码执行或 bash 吗?如果你有 50 个工具——每一种可能遇到的用例都配一个——又会怎样?

为了把自己代入模型的视角,我喜欢想象:你被丢给一道很难的数学题。你会希望拥有哪些工具来解它?这取决于你自己的能力!

纸笔是最低配,但你会受限于手算。计算器更好,但你得知道怎么操作那些更高级的功能。最快、最强的选择是电脑,但你需要知道如何用它来编写并运行代码。

这为我们设计智能体提供了一个很有用的框架:你要给它配备与它自身能力相匹配的工具。但你怎么知道它有哪些能力?你需要观察、阅读它的输出、不断实验。你要学会像智能体那样“看”。

下面是我们在构建 Claude Code 时,通过细致观察 Claude 得到的一些经验教训。

改进信息引出(Elicitation)与 AskUserQuestion 工具

https://x.com/trq212/status/2014480496013803643

在构建 AskUserQuestion 工具时,我们的目标是提升 Claude 的提问能力(通常称为信息引出,elicitation)。

虽然 Claude 完全可以用纯文本直接提问,但我们发现用户回答这些问题时,常常要花费不必要的时间。我们该如何降低这种摩擦,并提升用户与 Claude 之间的沟通带宽?

尝试 #1 - 修改 ExitPlanTool

我们首先尝试的是:在 ExitPlanTool 中新增一个参数,让它在输出计划的同时,带上一组问题数组。这是最容易实现的方案,但它让 Claude 感到困惑,因为我们一边要它给出计划,一边又要它围绕计划提出一组问题。如果用户的回答与计划内容冲突怎么办?Claude 需要调用两次 ExitPlanTool 吗?我们需要另一种方式。

(你可以在我们关于 prompt caching 的文章里,读到更多我们为什么要做 ExitPlanTool 的原因。)

尝试 #2 - 更改输出格式

接着,我们尝试修改 Claude 的输出指令,让它用一种略微调整过的 Markdown 格式来提问。例如,我们可以让它输出一组带项目符号的问题,并在括号里给出备选项。然后我们再把这些问题解析出来,以 UI 的形式展示给用户。

这看起来是最通用的改动,Claude 也似乎能按要求输出,但它并不可靠。Claude 可能会追加额外句子、漏掉选项,或者干脆换一种格式。

尝试 #3 - AskUserQuestion 工具

https://x.com/RLanceMartin/status/2027450018513490419

最后,我们决定做一个工具:Claude 可以在任何时候调用它,但我们会特别在计划模式(plan mode)中提示它使用。当工具触发时,我们会弹出一个模态窗口展示问题,并阻塞智能体的循环,直到用户作答。

这个工具让我们可以要求 Claude 输出结构化内容,也帮助我们确保 Claude 能给用户提供多个选项。它还让用户能够“组合”使用这项能力——例如在 Agent SDK 中调用它,或在 skills 里引用它。

最重要的是,Claude 似乎很喜欢调用这个工具,而且我们发现它产出的内容效果很好。再精心设计的工具,如果 Claude 不理解该怎么调用,也一样用不起来。

这就是 Claude Code 中信息引出的最终形态吗?我们也不确定。正如你将在下一个例子中看到的那样:对某个模型有效的方法,对另一个模型未必是最好的。

随能力升级而更新:Tasks 与 Todos

https://x.com/trq212/status/2024574133011673516

Claude Code 刚发布时,我们意识到模型需要一个 Todo 列表来保持在正确轨道上。Todos 可以在开始时写下,并在模型完成工作时逐项勾掉。为此我们给 Claude 提供了 TodoWrite 工具,用来编写或更新 Todos,并展示给用户。

但即便如此,我们仍经常看到 Claude 忘记自己该做什么。为了适配,我们每 5 轮插入一次系统提醒,提示 Claude 它的目标是什么。

然而,随着模型能力提升,它们不仅不再需要 Todo List 的提醒,甚至会觉得它在限制自己。反复收到 todo list 的提醒,会让 Claude 误以为自己必须死守这份列表,而不是去修改它。我们也看到 Opus 4.5 在使用子智能体(subagents)方面进步很大,但子智能体要如何在一份共享的 Todo List 上协作呢?

基于这些观察,我们用 Task Tool 替换了 TodoWrite(更多关于 Tasks 的内容可以在这里阅读)。Todos 的重点是让模型不跑偏,而 Tasks 更侧重于帮助智能体彼此沟通。Tasks 可以包含依赖关系,能在子智能体之间共享进度更新,而模型也可以修改或删除它们。

随着模型能力不断增强,你的模型曾经需要的工具,如今可能反而在束缚它们。因此,持续回访并审视你对“需要哪些工具”的既有假设非常重要。这也是为什么,最好只支持一小组能力画像相近的模型。

设计搜索接口

对 Claude 来说,一类尤其关键的工具是搜索工具——它们能帮助 Claude 构建自己的上下文。

Claude Code 刚推出时,我们用 RAG 向量数据库为 Claude 找上下文。RAG 强大且快速,但它需要建立索引和进行配置,而且在各种不同环境中可能会变得脆弱。更重要的是,这些上下文是被直接提供给 Claude 的,而不是由它自己去寻找。

但既然 Claude 能在网上搜索,为什么不能搜索你的代码库?给 Claude 一个 Grep 工具,我们就能让它自己搜索文件、自己搭建上下文。

这是我们观察到的一个规律:随着 Claude 变得更聪明,只要给对工具,它就会越来越擅长自己构建上下文。

当我们引入 Agent Skills 时,我们把“渐进式披露”(progressive disclosure)的理念正式化了——它允许智能体通过探索来逐步发现相关上下文。

Claude 可以读取 skill 文件,而这些文件又可以引用其他文件,模型就能递归地继续读取。事实上,skills 的一个常见用途,就是为 Claude 增强搜索能力——例如告诉它如何使用某个 API,或如何查询数据库。

在一年时间里,Claude 从几乎无法自己构建上下文,进化到能够跨越多层文件做嵌套搜索,找到它所需的精确上下文。

如今,“渐进式披露”已经成为我们在不新增工具的情况下,为系统添加新功能的常用手法。

渐进式披露:Claude Code 指南智能体

Claude Code 目前大约有 ~20 个工具,我们一直在问自己:这些工具真的都需要吗?新增一个工具的门槛很高,因为这会让模型多出一个需要思考与权衡的选项。

例如,我们发现 Claude 对如何使用 Claude Code 了解得不够。如果你问它怎么添加一个 MCP,或者某个斜杠命令(slash command)是干什么的,它往往答不上来。

我们当然可以把所有信息都塞进 system prompt(系统提示词)里,但由于用户很少问到这些内容,这么做会造成上下文变质(context rot),并干扰 Claude Code 的主业:写代码。

于是,我们尝试了一种渐进式披露:给 Claude 一个指向文档的链接,让它在需要时再加载并搜索更多信息。这个方法有效,但我们发现 Claude 为了找到正确答案,往往会把大量搜索结果加载进上下文;可实际上,你需要的只是答案本身。

因此,我们构建了 Claude Code Guide 子智能体:当你问 Claude 关于它自身的问题时,我们会提示它调用这个子智能体;而子智能体内部包含了大量关于如何高效检索文档、以及应该返回什么内容的指令。

这当然还不完美:当你问它如何把自己配置起来时,它仍可能犯糊涂,但已经比过去好太多了!我们得以在不新增工具的情况下,把新能力加入 Claude 的动作空间。

一门艺术,而非科学

如果你期待这篇文章给出一套关于如何构建工具的僵硬规则,很遗憾,它并不会这样做。为你的模型设计工具,既是科学,也是艺术。它在很大程度上取决于你所使用的模型、智能体的目标,以及它所处的运行环境。

多做实验,反复阅读你的输出,尝试新东西。像智能体一样去看。

链接: http://x.com/i/article/2027446899310313472

One of the hardest parts of building an agent harness is constructing its action space.

Claude acts through Tool Calling, but there are a number of ways tools can be constructed in the Claude API with primitives like bash, skills and recently code execution (read more about programmatic tool calling on the Claude API in @RLanceMartin's new article).

Given all these options, how do you design the tools of your agent? Do you need just one tool like code execution or bash? What if you had 50 tools, one for each use case your agent might run into?

To put myself in the mind of the model I like to imagine being given a difficult math problem. What tools would you want in order to solve it? It would depend on your own skills!

Paper would be the minimum, but you’d be limited by manual calculations. A calculator would be better, but you would need to know how to operate the more advanced options. The fastest and most powerful option would be a computer, but you would have to know how to use it to write and execute code.

This is a useful framework for designing your agent. You want to give it tools that are shaped to its own abilities. But how do you know what those abilities are? You pay attention, read its outputs, experiment. You learn to see like an agent.

Here are some lessons we’ve learned from paying attention to Claude while building Claude Code.

Improving Elicitation & the AskUserQuestion tool

https://x.com/trq212/status/2014480496013803643

When building the AskUserQuestion tool, our goal was to improve Claude’s ability to ask questions (often called elicitation).

While Claude could just ask questions in plain text, we found answering those questions felt like they took an unnecessary amount of time. How could we lower this friction and increase the bandwidth of communication between the user and Claude?

构建智能体运行框架(agent harness)最难的部分之一,是为它搭建动作空间(action space)。

Claude 通过工具调用(Tool Calling)来行动,但在 Claude API 中,工具可以用多种“原语”来构建,例如 bash、skills,以及最近加入的代码执行(想了解 Claude API 上以编程方式进行工具调用的更多内容,可以阅读 @RLanceMartin 的新文章)。

在这么多选择面前,你要如何为自己的智能体设计工具?你只需要一个工具,比如代码执行或 bash 吗?如果你有 50 个工具——每一种可能遇到的用例都配一个——又会怎样?

为了把自己代入模型的视角,我喜欢想象:你被丢给一道很难的数学题。你会希望拥有哪些工具来解它?这取决于你自己的能力!

纸笔是最低配,但你会受限于手算。计算器更好,但你得知道怎么操作那些更高级的功能。最快、最强的选择是电脑,但你需要知道如何用它来编写并运行代码。

这为我们设计智能体提供了一个很有用的框架:你要给它配备与它自身能力相匹配的工具。但你怎么知道它有哪些能力?你需要观察、阅读它的输出、不断实验。你要学会像智能体那样“看”。

下面是我们在构建 Claude Code 时,通过细致观察 Claude 得到的一些经验教训。

Attempt #1 - Editing the ExitPlanTool

The first thing we tried was adding a parameter to the ExitPlanTool to have an array of questions alongside the plan. This was the easiest thing to implement, but it confused Claude because we were simultaneously asking for a plan and a set of questions about the plan. What if the user’s answers conflicted with what the plan said? Would Claude need to call the ExitPlanTool twice? We needed another approach.

(you can read more about why we made an ExitPlanTool in our post on prompt caching)

改进信息引出(Elicitation)与 AskUserQuestion 工具

https://x.com/trq212/status/2014480496013803643

在构建 AskUserQuestion 工具时,我们的目标是提升 Claude 的提问能力(通常称为信息引出,elicitation)。

虽然 Claude 完全可以用纯文本直接提问,但我们发现用户回答这些问题时,常常要花费不必要的时间。我们该如何降低这种摩擦,并提升用户与 Claude 之间的沟通带宽?

尝试 #1 - 修改 ExitPlanTool

我们首先尝试的是:在 ExitPlanTool 中新增一个参数,让它在输出计划的同时,带上一组问题数组。这是最容易实现的方案,但它让 Claude 感到困惑,因为我们一边要它给出计划,一边又要它围绕计划提出一组问题。如果用户的回答与计划内容冲突怎么办?Claude 需要调用两次 ExitPlanTool 吗?我们需要另一种方式。

(你可以在我们关于 prompt caching 的文章里,读到更多我们为什么要做 ExitPlanTool 的原因。)

尝试 #2 - 更改输出格式

接着,我们尝试修改 Claude 的输出指令,让它用一种略微调整过的 Markdown 格式来提问。例如,我们可以让它输出一组带项目符号的问题,并在括号里给出备选项。然后我们再把这些问题解析出来,以 UI 的形式展示给用户。

这看起来是最通用的改动,Claude 也似乎能按要求输出,但它并不可靠。Claude 可能会追加额外句子、漏掉选项,或者干脆换一种格式。

尝试 #3 - AskUserQuestion 工具

https://x.com/RLanceMartin/status/2027450018513490419

最后,我们决定做一个工具:Claude 可以在任何时候调用它,但我们会特别在计划模式(plan mode)中提示它使用。当工具触发时,我们会弹出一个模态窗口展示问题,并阻塞智能体的循环,直到用户作答。

这个工具让我们可以要求 Claude 输出结构化内容,也帮助我们确保 Claude 能给用户提供多个选项。它还让用户能够“组合”使用这项能力——例如在 Agent SDK 中调用它,或在 skills 里引用它。

最重要的是,Claude 似乎很喜欢调用这个工具,而且我们发现它产出的内容效果很好。再精心设计的工具,如果 Claude 不理解该怎么调用,也一样用不起来。

这就是 Claude Code 中信息引出的最终形态吗?我们也不确定。正如你将在下一个例子中看到的那样:对某个模型有效的方法,对另一个模型未必是最好的。

Attempt #2 - Changing Output Format

Next we tried modifying Claude’s output instructions to serve a slightly modified markdown format that it could use to ask questions. For example, we could ask it to output a list of bullet point questions with alternatives in brackets. We could then parse and format that question as UI for the user.

While this was the most general change we could make and Claude even seemed to be okay at outputting this, it was not guaranteed. Claude would append extra sentences, omit options, or use a different format altogether.

随能力升级而更新:Tasks 与 Todos

https://x.com/trq212/status/2024574133011673516

Claude Code 刚发布时,我们意识到模型需要一个 Todo 列表来保持在正确轨道上。Todos 可以在开始时写下,并在模型完成工作时逐项勾掉。为此我们给 Claude 提供了 TodoWrite 工具,用来编写或更新 Todos,并展示给用户。

但即便如此,我们仍经常看到 Claude 忘记自己该做什么。为了适配,我们每 5 轮插入一次系统提醒,提示 Claude 它的目标是什么。

然而,随着模型能力提升,它们不仅不再需要 Todo List 的提醒,甚至会觉得它在限制自己。反复收到 todo list 的提醒,会让 Claude 误以为自己必须死守这份列表,而不是去修改它。我们也看到 Opus 4.5 在使用子智能体(subagents)方面进步很大,但子智能体要如何在一份共享的 Todo List 上协作呢?

基于这些观察,我们用 Task Tool 替换了 TodoWrite(更多关于 Tasks 的内容可以在这里阅读)。Todos 的重点是让模型不跑偏,而 Tasks 更侧重于帮助智能体彼此沟通。Tasks 可以包含依赖关系,能在子智能体之间共享进度更新,而模型也可以修改或删除它们。

随着模型能力不断增强,你的模型曾经需要的工具,如今可能反而在束缚它们。因此,持续回访并审视你对“需要哪些工具”的既有假设非常重要。这也是为什么,最好只支持一小组能力画像相近的模型。

Attempt #3 - The AskUserQuestion Tool

https://x.com/RLanceMartin/status/2027450018513490419

Finally, we landed on creating a tool that Claude could call at any point, but it was particularly prompted to do so during plan mode. When the tool triggered we would show a modal to display the questions and block the agent's loop until the user answered.

This tool allowed us to prompt Claude for a structured output and it helped us ensure that Claude gave the user multiple options. It also gave users ways to compose this functionality, for example calling it in the Agent SDK or using referring to it in skills.

Most importantly, Claude seemed to like calling this tool and we found its outputs worked well. Even the best designed tool doesn’t work if Claude doesn’t understand how to call it.

Is this the final form of elicitation in Claude Code? We’re not sure. As you’ll see in the next example, what works for one model may not be the best for another.

Updating with Capabilities - Tasks & Todos

https://x.com/trq212/status/2024574133011673516

When we first launched Claude Code, we realized that the model needed a Todo list to keep it on track. Todos could be written at the start and checked off as the model did work. To do this we gave Claude the TodoWrite tool, which would write or update Todos and display them to the user.

But even then we often saw Claude forgetting what it had to do. To adapt, we inserted system reminders every 5 turns that reminded Claude of its goal.

But as models improved, they not only did not need to be reminded of the Todo List but could find it limiting. Being sent reminders of the todo list made Claude think that it had to stick to the list instead of modifying it. We also saw Opus 4.5 also get much better at using subagents, but how could subagents coordinate on a shared Todo List?

Seeing this, we replaced TodoWrite with the Task Tool (read more on Tasks here). Whereas Todos were about keeping the model on track, Tasks were more about helping agents communicate with each other. Tasks could include dependencies, share updates across subagents and the model could alter and delete them.

As model capabilities increase, the tools that your models once needed might now be constraining them. It’s important to constantly revisit previous assumptions on what tools are needed. This is also why it's useful to stick to a small set of models to support that have a fairly similar capabilities profile.

Designing a Search Interface

A particularly important set of tools for Claude are the search tools that can be used to build its own context.

When Claude Code first came out, we used a RAG vector database to find context for Claude. While RAG was powerful and fast it required indexing and setup and could be fragile across a host of different environments. More importantly, Claude was given this context instead of finding the context itself.

But if Claude could search on the web, why not search your codebase? By giving Claude a Grep tool, we could let it search for files and build context itself.

This is a pattern we’ve seen as Claude gets smarter, it becomes increasingly good at building its context if it’s given the right tools.

When we introduced Agent Skills we formalized the idea of progressive disclosure, which allows agents to incrementally discover relevant context through exploration.

Claude could read skill files and those files could then reference other files that the model could read recursively. In fact, a common use of skills is to add more search capabilities to Claude like giving it instructions on how to use an API or query a database.

Over the course of a year Claude went from not really being able to build its own context, to being able to do nested search across several layers of files to find the exact context it needed.

Progressive disclosure is now a common technique we use to add new functionality without adding a tool.

Progressive Disclosure - The Claude Code Guide Agent

Claude Code currently has ~20 tools, and we are constantly asking ourselves if we need all of them. The bar to add a new tool is high, because this gives the model one more option to think about.

For example, we noticed that Claude did not know enough about how to use Claude Code. If you asked it how to add a MCP or what a slash command did, it would not be able to reply.

We could have put all of this information in the system prompt, but given that users rarely asked about this, it would have added context rot and interfered with Claude Code’s main job: writing code.

Instead, we tried a form of progressive disclosure. We gave Claude a link to its docs which it could then load to search for more information. This worked but we found that Claude would load a lot of results into context to find the right answer when really all you needed was the answer.

So we built the Claude Code Guide subagent which Claude is prompted to call when you ask about itself, the subagent has extensive instructions on how to search docs well and what to return.

While this isn’t perfect, Claude can still get confused when you ask it about how to set itself up, it is much better than it used to be! We were able to add things to Claude's action space without adding a tool.

设计搜索接口

对 Claude 来说,一类尤其关键的工具是搜索工具——它们能帮助 Claude 构建自己的上下文。

Claude Code 刚推出时,我们用 RAG 向量数据库为 Claude 找上下文。RAG 强大且快速,但它需要建立索引和进行配置,而且在各种不同环境中可能会变得脆弱。更重要的是,这些上下文是被直接提供给 Claude 的,而不是由它自己去寻找。

但既然 Claude 能在网上搜索,为什么不能搜索你的代码库?给 Claude 一个 Grep 工具,我们就能让它自己搜索文件、自己搭建上下文。

这是我们观察到的一个规律:随着 Claude 变得更聪明,只要给对工具,它就会越来越擅长自己构建上下文。

当我们引入 Agent Skills 时,我们把“渐进式披露”(progressive disclosure)的理念正式化了——它允许智能体通过探索来逐步发现相关上下文。

Claude 可以读取 skill 文件,而这些文件又可以引用其他文件,模型就能递归地继续读取。事实上,skills 的一个常见用途,就是为 Claude 增强搜索能力——例如告诉它如何使用某个 API,或如何查询数据库。

在一年时间里,Claude 从几乎无法自己构建上下文,进化到能够跨越多层文件做嵌套搜索,找到它所需的精确上下文。

如今,“渐进式披露”已经成为我们在不新增工具的情况下,为系统添加新功能的常用手法。

An Art, not a Science

If you were hoping for a set of rigid rules on how to build your tools, unfortunately that is not this guide. Designing the tools for your models is as much an art as it is a science. It depends heavily on the model you're using, the goal of the agent and the environment it’s operating in.

Experiment often, read your outputs, try new things. See like an agent.

Link: http://x.com/i/article/2027446899310313472

渐进式披露:Claude Code 指南智能体

Claude Code 目前大约有 ~20 个工具,我们一直在问自己:这些工具真的都需要吗?新增一个工具的门槛很高,因为这会让模型多出一个需要思考与权衡的选项。

例如,我们发现 Claude 对如何使用 Claude Code 了解得不够。如果你问它怎么添加一个 MCP,或者某个斜杠命令(slash command)是干什么的,它往往答不上来。

我们当然可以把所有信息都塞进 system prompt(系统提示词)里,但由于用户很少问到这些内容,这么做会造成上下文变质(context rot),并干扰 Claude Code 的主业:写代码。

于是,我们尝试了一种渐进式披露:给 Claude 一个指向文档的链接,让它在需要时再加载并搜索更多信息。这个方法有效,但我们发现 Claude 为了找到正确答案,往往会把大量搜索结果加载进上下文;可实际上,你需要的只是答案本身。

因此,我们构建了 Claude Code Guide 子智能体:当你问 Claude 关于它自身的问题时,我们会提示它调用这个子智能体;而子智能体内部包含了大量关于如何高效检索文档、以及应该返回什么内容的指令。

这当然还不完美:当你问它如何把自己配置起来时,它仍可能犯糊涂,但已经比过去好太多了!我们得以在不新增工具的情况下,把新能力加入 Claude 的动作空间。

一门艺术,而非科学

如果你期待这篇文章给出一套关于如何构建工具的僵硬规则,很遗憾,它并不会这样做。为你的模型设计工具,既是科学,也是艺术。它在很大程度上取决于你所使用的模型、智能体的目标,以及它所处的运行环境。

多做实验,反复阅读你的输出,尝试新东西。像智能体一样去看。

链接: http://x.com/i/article/2027446899310313472

相关笔记

One of the hardest parts of building an agent harness is constructing its action space.

Claude acts through Tool Calling, but there are a number of ways tools can be constructed in the Claude API with primitives like bash, skills and recently code execution (read more about programmatic tool calling on the Claude API in @RLanceMartin's new article).

Given all these options, how do you design the tools of your agent? Do you need just one tool like code execution or bash? What if you had 50 tools, one for each use case your agent might run into?

To put myself in the mind of the model I like to imagine being given a difficult math problem. What tools would you want in order to solve it? It would depend on your own skills!

Paper would be the minimum, but you’d be limited by manual calculations. A calculator would be better, but you would need to know how to operate the more advanced options. The fastest and most powerful option would be a computer, but you would have to know how to use it to write and execute code.

This is a useful framework for designing your agent. You want to give it tools that are shaped to its own abilities. But how do you know what those abilities are? You pay attention, read its outputs, experiment. You learn to see like an agent.

Here are some lessons we’ve learned from paying attention to Claude while building Claude Code.

Improving Elicitation & the AskUserQuestion tool

https://x.com/trq212/status/2014480496013803643

When building the AskUserQuestion tool, our goal was to improve Claude’s ability to ask questions (often called elicitation).

While Claude could just ask questions in plain text, we found answering those questions felt like they took an unnecessary amount of time. How could we lower this friction and increase the bandwidth of communication between the user and Claude?

Attempt #1 - Editing the ExitPlanTool

The first thing we tried was adding a parameter to the ExitPlanTool to have an array of questions alongside the plan. This was the easiest thing to implement, but it confused Claude because we were simultaneously asking for a plan and a set of questions about the plan. What if the user’s answers conflicted with what the plan said? Would Claude need to call the ExitPlanTool twice? We needed another approach.

(you can read more about why we made an ExitPlanTool in our post on prompt caching)

Attempt #2 - Changing Output Format

Next we tried modifying Claude’s output instructions to serve a slightly modified markdown format that it could use to ask questions. For example, we could ask it to output a list of bullet point questions with alternatives in brackets. We could then parse and format that question as UI for the user.

While this was the most general change we could make and Claude even seemed to be okay at outputting this, it was not guaranteed. Claude would append extra sentences, omit options, or use a different format altogether.

Attempt #3 - The AskUserQuestion Tool

https://x.com/RLanceMartin/status/2027450018513490419

Finally, we landed on creating a tool that Claude could call at any point, but it was particularly prompted to do so during plan mode. When the tool triggered we would show a modal to display the questions and block the agent's loop until the user answered.

This tool allowed us to prompt Claude for a structured output and it helped us ensure that Claude gave the user multiple options. It also gave users ways to compose this functionality, for example calling it in the Agent SDK or using referring to it in skills.

Most importantly, Claude seemed to like calling this tool and we found its outputs worked well. Even the best designed tool doesn’t work if Claude doesn’t understand how to call it.

Is this the final form of elicitation in Claude Code? We’re not sure. As you’ll see in the next example, what works for one model may not be the best for another.

Updating with Capabilities - Tasks & Todos

https://x.com/trq212/status/2024574133011673516

When we first launched Claude Code, we realized that the model needed a Todo list to keep it on track. Todos could be written at the start and checked off as the model did work. To do this we gave Claude the TodoWrite tool, which would write or update Todos and display them to the user.

But even then we often saw Claude forgetting what it had to do. To adapt, we inserted system reminders every 5 turns that reminded Claude of its goal.

But as models improved, they not only did not need to be reminded of the Todo List but could find it limiting. Being sent reminders of the todo list made Claude think that it had to stick to the list instead of modifying it. We also saw Opus 4.5 also get much better at using subagents, but how could subagents coordinate on a shared Todo List?

Seeing this, we replaced TodoWrite with the Task Tool (read more on Tasks here). Whereas Todos were about keeping the model on track, Tasks were more about helping agents communicate with each other. Tasks could include dependencies, share updates across subagents and the model could alter and delete them.

As model capabilities increase, the tools that your models once needed might now be constraining them. It’s important to constantly revisit previous assumptions on what tools are needed. This is also why it's useful to stick to a small set of models to support that have a fairly similar capabilities profile.

Designing a Search Interface

A particularly important set of tools for Claude are the search tools that can be used to build its own context.

When Claude Code first came out, we used a RAG vector database to find context for Claude. While RAG was powerful and fast it required indexing and setup and could be fragile across a host of different environments. More importantly, Claude was given this context instead of finding the context itself.

But if Claude could search on the web, why not search your codebase? By giving Claude a Grep tool, we could let it search for files and build context itself.

This is a pattern we’ve seen as Claude gets smarter, it becomes increasingly good at building its context if it’s given the right tools.

When we introduced Agent Skills we formalized the idea of progressive disclosure, which allows agents to incrementally discover relevant context through exploration.

Claude could read skill files and those files could then reference other files that the model could read recursively. In fact, a common use of skills is to add more search capabilities to Claude like giving it instructions on how to use an API or query a database.

Over the course of a year Claude went from not really being able to build its own context, to being able to do nested search across several layers of files to find the exact context it needed.

Progressive disclosure is now a common technique we use to add new functionality without adding a tool.

Progressive Disclosure - The Claude Code Guide Agent

Claude Code currently has ~20 tools, and we are constantly asking ourselves if we need all of them. The bar to add a new tool is high, because this gives the model one more option to think about.

For example, we noticed that Claude did not know enough about how to use Claude Code. If you asked it how to add a MCP or what a slash command did, it would not be able to reply.

We could have put all of this information in the system prompt, but given that users rarely asked about this, it would have added context rot and interfered with Claude Code’s main job: writing code.

Instead, we tried a form of progressive disclosure. We gave Claude a link to its docs which it could then load to search for more information. This worked but we found that Claude would load a lot of results into context to find the right answer when really all you needed was the answer.

So we built the Claude Code Guide subagent which Claude is prompted to call when you ask about itself, the subagent has extensive instructions on how to search docs well and what to return.

While this isn’t perfect, Claude can still get confused when you ask it about how to set itself up, it is much better than it used to be! We were able to add things to Claude's action space without adding a tool.

An Art, not a Science

If you were hoping for a set of rigid rules on how to build your tools, unfortunately that is not this guide. Designing the tools for your models is as much an art as it is a science. It depends heavily on the model you're using, the goal of the agent and the environment it’s operating in.

Experiment often, read your outputs, try new things. See like an agent.

Link: http://x.com/i/article/2027446899310313472

📋 讨论归档

讨论进行中…