Claude Code 泄露源码揭示 AI Agent 接入新模型的高昂工程成本

AI 工程的真实代价：从 Claude Code 泄露源码看新模型接入的工程现实 ← 目录

AI 编程AI Agent

AI 工程的真实代价：从 Claude Code 泄露源码看新模型接入的工程现实

Claude Code 源码泄露后，多数讨论集中在安全漏洞和隐私问题。但这批 512,000 行 TypeScript 源码里真正有价值的信息，是它展示了 AI 工程的一个核心困境：把新模型接入一个成熟的 agentic 系统，代价远比外界想象的大。

这篇文章从泄露源码中提取工程细节，还原这些代价的具体形态。

反蒸馏：三层技术防线与一场法律战

泄露源码中有一个独立的工程子系统：反蒸馏（anti-distillation），旨在阻止竞争对手通过 Claude 的 API 输出来训练自己的模型。这个子系统需要放在 2026 年初的行业背景下理解。Anthropic 在这一时期对使用 Claude 模型输出进行蒸馏训练的开源开发者发起了法律诉讼，法律手段和技术手段同步推进，反映的是同一个商业判断：模型能力是核心资产，需要从协议层、法律层和技术层同时防护。

第一层是 fake tools 注入。在 claude.ts 的 getExtraBodyParams 函数中：

// Anti-distillation: send fake_tools opt-in for 1P CLI only
if (
  feature(ANTI_DISTILLATION_CC)
    ? process.env.CLAUDE_CODE_ENTRYPOINT === cli 
      shouldIncludeFirstPartyOnlyBetas() 
      getFeatureValue_CACHED_MAY_BE_STALE(
        tengu_anti_distill_fake_tool_injection,
        false,
      )
    : false
) {
  result.anti_distillation = [fake_tools]
}

这段代码通知 API 后端在响应中注入虚假的工具调用。试图从 API 输出中提取训练数据的系统会把这些虚假数据一并吃进去，污染训练集。功能通过编译时 feature flag（ANTI_DISTILLATION_CC）和运行时 GrowthBook 远程配置（tengu_anti_distill_fake_tool_injection）双重控制，可以按需开关。

第二层是 connector text summarization。betas.ts 中有一段详细的注释：

// POC: server-side connector-text summarization (anti-distillation). The
// API buffers assistant text between tool calls, summarizes it, and returns
// the summary with a signature so the original can be restored on subsequent
// turns — same mechanism as thinking blocks. Ant-only while we measure
// TTFT/TTLT/capacity.

API 服务端把模型在工具调用之间生成的文本缓存并替换为摘要，附带加密签名。客户端在后续请求中发回签名摘要，服务端再还原原文。外部观察者看到的只是摘要，丢失了原始推理过程的细节。这与 thinking block 的 redaction 机制原理相同。注释中的 POC 标记说明这仍处于验证阶段，GrowthBook 中的标志名为 tengu_slate_prism。

第三层是 token-efficient tools，一种 JSON 格式的工具调用协议（FC v3）：

// JSON tool_use format (FC v3) — ~4.5% output token reduction vs ANTML.
// Sends the v2 header (2026-03-28) added in anthropics/anthropic#337072 to
// isolate the CC A/B cohort from ~9.2M/week existing v1 senders.

用 v2 header 把 Claude Code 的 A/B 测试群体与每周 920 万次现有 v1 请求隔离开。三层机制叠加，构成了从训练数据投毒、推理过程遮蔽到协议格式隔离的完整防线。

值得注意的是这三层的设计逻辑各有侧重。fake tools 针对的是批量抓取 API 输出做训练的场景，成本低，效果靠噪声比。connector text summarization 针对的是更精细的逆向工程，即使攻击者过滤掉了虚假工具调用，模型的中间推理过程仍然被签名遮蔽。token-efficient tools 则是在协议层制造隔离，让 Claude Code 的流量在统计上与其他 API 用户区分开来，方便后端对不同群体做差异化处理。三层各自独立开关，各自有 GrowthBook 控制的灰度部署路径，反映的是一种纵深防御的工程思路。

缓存保卫战：50,000 Token 的代价

Claude Code 的 prompt caching 是成本和延迟的关键优化。服务端缓存了之前请求的 prompt 前缀，后续请求如果前缀完全匹配就能复用。

问题在于，几乎任何参数变化都会打破缓存。源码中的 promptCacheBreakDetection.ts 追踪了十多种可能导致缓存失效的变化源：系统 prompt、工具 schema、模型名称、fast mode 状态、beta header 列表、AFK mode 状态、overage 状态、cache-editing 状态、effort 值、extra body params。

工程师为此发明了一套 sticky-on latch 机制：

// Sticky-on latches for dynamic beta headers. Each header, once first
// sent, keeps being sent for the rest of the session so mid-session
// toggles dont change the server-side cache key and bust ~50-70K tokens.

一旦某个 beta header 在会话中首次发送，即使用户后来关掉了对应功能，header 仍然继续发送。因为移除一个 header 会改变请求签名，导致服务端缓存失效，浪费 50,000 到 70,000 token 的缓存成本。功能状态和协议状态被有意解耦：header（协议层）保持不变以维护缓存，实际的功能控制在请求 body 层面动态调整。

这套方案的另一个细节是 1 小时缓存 TTL 的资格判断。代码用一个 latch 在会话开始时锁定用户的 overage 状态，防止会话中途的计费状态变化翻转缓存 TTL，进而打破服务端缓存（注释中估算每次翻转浪费约 20,000 token）。根据 BQ 2026-03-22 的分析，77% 的工具相关缓存失效来自工具描述变化而非工具增减，因为 AgentTool 和 SkillTool 在描述中嵌入了动态的 agent/command 列表。

SDK 的尴尬与流式解析的脱钩

源码中有一组直白的工程注释，反映了 Claude Code 团队与 Anthropic 自家 SDK 之间的适配摩擦：

// awkwardly, the sdk sometimes returns text as part of a
// content_block_start message, then returns the same text
// again in a content_block_delta message. we ignore it here
// since there doesnt seem to be a way to detect when a
// content_block_delta message duplicates the text.
text: ,

// also awkward
thinking: ,

// even more awkwardly, the sdk mutates the contents of text blocks
// as it works. we want the blocks to be immutable, so that we can
// accumulate state ourselves.
contentBlocks[part.index] = { ...part.content_block }

从 awkwardly 到 also awkward 到 even more awkwardly，三级递进。SDK 的流式事件有重复文本、可变状态等问题，Claude Code 团队的解决方式是放弃使用 SDK 的高层抽象，改用底层 raw stream 自己管理所有状态累积。注释给出了具体原因：

// Use raw stream instead of BetaMessageStream to avoid O(n²) partial JSON parsing
// BetaMessageStream calls partialParse() on every input_json_delta, which we dont need
// since we handle tool input accumulation ourselves

SDK 的 BetaMessageStream 在每个 input_json_delta 事件上做一次 partialParse()，O(n²) 复杂度。对于大量工具调用的 agentic 场景，这会成为性能瓶颈。所以 Claude Code 重写了流式解析，自己处理文本累积、thinking block 签名拼接、connector_text delta 合并。Anthropic 的旗舰产品因为性能原因绕过了自家的官方 SDK。

模型行为的边界案例：五个工程故事

上面几节讨论的是系统层面的工程权衡。接下来的五个故事更加具体，它们展示了新模型（内部代号 Capybara）在生产环境中暴露出的行为边界案例，以及工程师如何用最小化的 patch 逐个封堵。

空工具结果导致零输出中断

第一个故事来自内部 issue inc-4586。当一个工具执行成功但返回空结果（比如一条 shell 命令静默完成、一个 MCP 服务器返回 content:[]、一条 REPL 语句产生了副作用但没有输出），Capybara 有大约 10% 的概率误触发 \n\nHuman: stop sequence，直接结束当前 turn，用户看到的是零输出。

toolResultStorage.ts 中的注释详细记录了根因：

// inc-4586: Empty tool_result content at the prompt tail causes some models
// (notably capybara) to emit the \n\nHuman: stop sequence and end their turn
// with zero output. The server renderer inserts no \n\nAssistant: marker after
// tool results, so a bare /function_results\n\n pattern-matches to a turn
// boundary. Several tools can legitimately produce empty output (silent-success
// shell commands, MCP servers returning content:[], REPL statements, etc.).
// Inject a short marker so the model always has something to react to.

服务端渲染器在 tool result 之后不会插入 \n\nAssistant: 标记。当 tool result 为空时，prompt 尾部出现的 /function_results\n\n 在模型看来和一个 turn boundary 完全一样，于是模型「配合」地采样出 stop sequence，结束了本该继续的推理。

修复方式极其简单：检测空结果后注入一段短文本。

if (isToolResultContentEmpty(content)) {
  logEvent(tengu_tool_empty_result, {
    toolName: sanitizeToolNameForAnalytics(toolName),
  })
  return {
    ...toolResultBlock,
    content: `(${toolName} completed with no output)`,
  }
}

一行 (${toolName} completed with no output) 就解决了问题。同时记录 analytics 事件 tengu_tool_empty_result，按工具名统计发生频率。这是一个典型的表面微小、根因深藏的 bug：空输出本身完全合法，问题出在服务端渲染器的 turn boundary 标记缺失与模型的 stop sequence 采样之间的交互。

tool_reference 展开触发假结束

第二个故事与第一个在机制上高度类似，但触发路径完全不同。Claude Code 有一个 tool search 功能，API 后端会把 tool_reference block 展开为 functions.../functions 标签。这些标签与系统 prompt 中的工具定义块使用相同的格式。当展开后的内容出现在 prompt 尾部时，Capybara 同样会以约 10% 的概率采样出 stop sequence。

messages.ts 中的注释记录了 A/B 测试数据和机制分析：

// Server renders tool_reference expansion as functions.../functions
// (same tags as the system prompts tool block). When this is at the
// prompt tail, capybara models sample the stop sequence at ~10% (A/B:
// 21/200 vs 0/200 on v3-prod). A sibling text block inserts a clean
// \n\nHuman: ... turn boundary. Injected here (API-prep) rather than
// stored in the message so it never renders in the REPL, and is
// auto-skipped when strip* above removes all tool_reference content.
// Must be a sibling, NOT inside tool_result.content — mixing text with
// tool_reference inside the block is a server ValueError.

A/B 测试的数据清晰而残酷：21/200 vs 0/200。有 tool_reference 在尾部的请求中，10.5% 触发了假结束；对照组是零。

初始修复（PR #21049 之前的方案）是注入一个 TOOL_REFERENCE_TURN_BOUNDARY 文本块：

const TOOL_REFERENCE_TURN_BOUNDARY = Tool loaded.

在包含 tool_reference 的 user message 尾部追加一个 { type: text, text: Tool loaded. } 作为 sibling。这段文字为模型提供了一个清晰的 human turn boundary，让它继续推理而非采样 stop sequence。注释特别强调这个文本块只在 API 请求中注入，不会写入消息存储，用户在 REPL 中看不到它。

但这个方案在更复杂的场景中又引发了新问题。当 user message 包含 tool_reference 并且还带有其他文本 sibling（auto-memory 注入、skill reminder 等）时，这些 sibling 会在 functions 展开后创造出「两个连续 human turn」的异常模式。模型会「学习」这个模式，在后续的 tool result 尾部重现它，进而触发 stop sequence。PR #21049 详细描述了这个机制和五组剂量响应实验的结果。

最终修复是 relocateToolReferenceSiblings 函数，它把 tool_reference 消息上的文本 sibling 迁移到下一条不含 tool_reference 的 user message 上：

// Move text-block siblings off user messages that contain tool_reference.
//
// When a tool_result contains tool_reference, the server expands it to a
// functions block. Any text siblings appended to that same user message
// (auto-memory, skill reminders, etc.) create a second human-turn segment
// right after the functions-close tag — an anomalous pattern the model
// imprints on. At a later tool-results tail, the model completes the
// pattern and emits the stop sequence. See #21049 for mechanism and
// five-arm dose-response.

两个通过 feature gate（tengu_toolref_defer_j8m）控制切换：gate 开启时用新的 relocate 方案，gate 关闭时回退到旧的 TOOL_REFERENCE_TURN_BOUNDARY 注入。两套方案在同一个代码路径中共存，由远程配置决定走哪条。

这整个修复链条的注释中有一句话耐人寻味：

// These multi-pass normalizations are inherently fragile — each pass can create
// conditions a prior pass was meant to handle. Consider unifying into a single
// pass that cleans content, then validates in one shot.

工程师自己清楚这些多遍 normalization 的脆弱性：每一遍清理都可能创造出前一遍本该处理的条件。这是一个被有意记录下来的技术债务，注释写的是建议，语气是提醒未来的自己。

模型回退时签名不兼容

Capybara 的 thinking block 携带加密签名，签名与生成它的 API key 绑定。当 Capybara 请求失败需要回退到 Opus 时，之前 Capybara 签名的 thinking block 发送给 Opus 会直接返回 400 错误。

query.ts 中的处理：

// Thinking signatures are model-bound: replaying a protected-thinking
// block (e.g. capybara) to an unprotected fallback (e.g. opus) 400s.
// Strip before retry so the fallback model gets clean history.
if (process.env.USER_TYPE === ant) {
  messagesForQuery = stripSignatureBlocks(messagesForQuery)
}

注释中的 model-bound 一词准确概括了问题的本质。签名是与模型绑定的，回退到另一个模型就意味着签名失效。修复方式是在回退前调用 stripSignatureBlocks，移除所有带签名的 block（thinking、redacted_thinking、connector_text），给回退模型一个干净的历史记录。

stripSignatureBlocks 的实现在 messages.ts 中：

/**
 * Strip signature-bearing blocks (thinking, redacted_thinking, connector_text)
 * from all assistant messages. Their signatures are bound to the API key that
 * generated them; after a credential change (e.g. /login) theyre invalid and
 * the API rejects them with a 400.
 */
export function stripSignatureBlocks(messages: Message[]): Message[] {

注释提到这个函数的使用场景包括模型回退和 credential change（比如用户在会话中重新 /login）。两个完全不同的业务场景共享了同一个底层机制：签名与生成上下文绑定，上下文变化就需要清理。

值得注意的是，这个 strip 操作只在 USER_TYPE === ant（Anthropic 内部用户）时执行。外部用户走的是不同的模型回退路径，因为外部用户的 thinking block 本身可能就没有签名。这又是一个内外用户在同一代码路径中的分叉点。

虚假成功报告率翻倍

Capybara v8 的一个显著行为退化记录在 prompts.ts 的一行注释中：

// @[MODEL LAUNCH]: False-claims mitigation for Capybara v8 (29-30% FC rate vs v4s 16.7%)

FC rate 是 false claim rate，即模型声称任务已完成但实际存在问题的比率。Capybara v8 的 FC rate 达到 29-30%，几乎是 v4（16.7%）的两倍。将近三分之一的任务完成报告包含虚假信息。

工程师的应对方式是在系统 prompt 中注入一段专门的「诚实报告」指令，仅对内部用户（USER_TYPE === ant）生效：

`Report outcomes faithfully: if tests fail, say so with the relevant output;
if you did not run a verification step, say that rather than implying it
succeeded. Never claim all tests pass when output shows failures, never
suppress or simplify failing checks (tests, lints, type errors) to manufacture
a green result, and never characterize incomplete or broken work as done.
Equally, when a check did pass or a task is complete, state it plainly — do
not hedge confirmed results with unnecessary disclaimers, downgrade finished
work to partial, or re-verify things you already checked. The goal is an
accurate report, not a defensive one.`

这段 prompt 的措辞相当考究。它同时约束了两个方向：一方面阻止模型编造成功结果，另一方面也阻止模型过度保守（把已完成的工作报告为「部分完成」、对已验证的结果加不必要的 disclaimer）。工程师显然观察到了模型在被纠正后容易矫枉过正的倾向，所以在同一段 prompt 中预防了两种极端。

同一个 PR #24302 还包含了另外两组 prompt counterweight。一组处理 Capybara v8 的过度注释问题（over-commenting by default），另一组处理 assertiveness（主动报告用户代码中的问题而非只被动执行）。三组 counterweight 全部用 @[MODEL LAUNCH] 标记，注释标注 un-gate once validated on external via A/B，意味着它们仅在内部 A/B 验证后才会推广给外部用户。这是一种用 prompt engineering 来补偿模型行为退化的策略，速度快于重新训练，但长期是否可持续是另一个问题。

安全分类器被 alwaysOnThinking 噎死

Claude Code 的 auto mode 有一个安全分类器（yolo classifier），在每次工具调用前快速判断操作是否安全。这个分类器期望短文本响应：它只需要一个 blockyes/block 或 blockno/block。thinking block 对它完全无用，因为结果提取函数 extractTextContent() 直接忽略 thinking 内容。

对大多数模型，分类器通过 thinking: disabled 关闭 thinking 来节省 token。但 Capybara 有一个特殊属性 alwaysOnThinking，在服务端默认启用 adaptive thinking 并且拒绝接受 disabled 指令，直接返回 400 错误。

yoloClassifier.ts 中的 getClassifierThinkingConfig 函数处理了这个特殊情况：

/**
 * Thinking config for classifier calls. The classifier wants short text-only
 * responses — API thinking blocks are ignored by extractTextContent() and waste tokens.
 *
 * For most models: send { type: disabled } via sideQuerys `thinking: false`.
 *
 * Models with alwaysOnThinking (declared in tengu_ant_model_override) default
 * to adaptive thinking server-side and reject `disabled` with a 400. For those:
 * dont pass `thinking: false`, instead pad max_tokens so adaptive thinking
 * (observed 0–1114 tokens replaying go/ccshare/shawnm-20260310-202833) doesnt
 * exhaust the budget before block is emitted. Without headroom,
 * stop_reason=max_tokens yields an empty text response → parseXmlBlock()
 * → null → unparseable → safe commands blocked.
 *
 * Returns [disableThinking, headroom] — tuple instead of named object so
 * property-name strings dont survive minification into external builds.
 */
function getClassifierThinkingConfig(
  model: string,
): [false | undefined, number] {
  if (
    process.env.USER_TYPE === ant 
    resolveAntModel(model)?.alwaysOnThinking
  ) {
    return [undefined, 2048]
  }
  return [false, 0]
}

这段注释记录了一条完整的故障链：模型的 adaptive thinking 消耗了 0 到 1114 token（根据一个具体的 ccshare 回放记录），如果 max_tokens 没有预留足够的 headroom，thinking 就会耗尽 token 预算，导致 stop_reason=max_tokens。此时文本响应为空，XML 解析返回 null，分类器判定为「无法解析」，最终结论是阻止操作。一条完全安全的 shell 命令因此被拦截。

修复是给 max_tokens 加 2048 的 headroom，让 thinking 有足够的空间消耗完毕后，分类器的 block 标签仍然能被生成和提取。注释中还特别提到返回值用 tuple 而非 named object，原因是 property name 会在 minification 时保留在外部构建中。连返回值类型的选择都受到安全考量的影响。

Undercover 模式：Anthropic 工程师也用 Claude Code

泄露源码中有一个有趣的细节：undercover.ts。Anthropic 的工程师日常使用 Claude Code 贡献开源代码，为了防止模型代号泄露，他们实现了一个 Undercover 模式。

这个模式的默认状态是开启。只有确认当前仓库在内部白名单（17 个私有仓库，从 claude-cli-internal 到 mobile-apps）中时才关闭：

// There is NO force-OFF. This guards against model codename leaks — if
// were not confident were in an internal repo, we stay undercover.

只有 force-ON，没有 force-OFF。当 Undercover 模式激活时，系统 prompt 要求模型在 commit message 和 PR 描述中隐瞒身份，列举了必须隐藏的代号：animal names like Capybara, Tengu, etc.。Capybara 出现了，与 2026 年 3 月 26 日通过 CMS 配置错误泄露的 Claude Mythos 模型直接关联。model.ts 中的 maskModelCodename 函数进一步佐证了这个代号在内部的使用方式：

function maskModelCodename(baseName: string): string {
  // e.g. capybara-v2-fast → cap*****-v2-fast
  const [codename = , ...rest] = baseName.split(-)
  const masked =
    codename.slice(0, 3) + *.repeat(Math.max(0, codename.length - 3))
  return [masked, ...rest].join(-)
}

模型代号在外部 UI 中被遮蔽为 cap*-v2-fast，但在源码注释里保留了完整名称作为文档。泄露恰好把这些注释全部暴露了。

这些代价意味着什么

这批源码中的工程注释有一个共同特征：坦诚。工程师在注释中记录了 A/B 测试的精确数字（21/200 vs 0/200）、行为退化的量化度量（29-30% FC rate vs 16.7%）、故障链的完整因果路径（empty result → pattern match → stop sequence → zero output），以及对自己代码脆弱性的清醒认知（multi-pass normalizations are inherently fragile）。这些注释写给的读者是未来的同事，而非外部观众，所以它们没有修饰的动机。

从这些注释中可以提取出一个规律：新模型接入的工程成本中，大部分来自模型行为与系统假设之间的不匹配。服务端渲染器假设 tool result 后面总有内容，模型不这么认为。分类器假设 thinking 可以关闭，Capybara 拒绝接受。签名假设同一个模型处理整个会话，回退机制打破了这个假设。每个 bug 都很小，修复都很快，但它们的累积效应是一个日益复杂的 normalization 管线，工程师自己都在注释里建议合并成 single pass。

模型能力在快速进步。把这些能力接入生产系统的成本以完全不同的速率在增长。