
要成为一名 Claude 架构师并开发可用于生产环境的应用,你需要理解 Claude Code、Claude Agent SDK、Claude API,以及 Model Context Protocol(模型上下文协议)。这篇文章会帮助你把这些都学透,并以以下考试为蓝本:
https://dometrain.com/blog/creating-the-perfect-claudemd-for-claude-code/
不过,你也能很清楚地看到:想拿到这个“认证”,你需要成为 Claude 的合作伙伴,否则你无法参加这场考试。
但这真的重要吗?
如果你有能力学会成为“Claude 认证架构师(Claude Certified Architect)”所需要的一切,那么你就有能力构建生产级应用。
你不需要那张证书来构建生产级应用。
你只需要知识。
所以我把整份考试指南彻底拆开,抽出了真正重要的内容,让你也能成为 Claude 架构师。
你将要面对什么:
这场考试(除非你是 Claude 合作伙伴,否则你考不了)——但这不重要,因为为这场考试所学的内容会把下面这些东西教给你。所以别像个软趴趴的湿纸巾一样喊“你骗了我”,就因为你不能为了一个无意义的对勾去参加真正的考试。做个自学者,通过理解考试会考到的内容来成为 Claude 架构师:Claude Code、Claude Agent SDK、Claude API,以及 Model Context Protocol(MCP)。
这些全都是你可以变现的技能。
这场考试意味着你需要学习以下内容:
-
客户支持问题解决智能体(Agent SDK + MCP + escalation)
-
用 Claude Code 进行代码生成(CLAUDE.md + plan mode + slash commands)
-
多智能体研究系统(coordinator-subagent orchestration)
-
开发者生产力工具(built-in tools + MCP servers)
-
用于 CI/CD 的 Claude Code(non-interactive pipelines + structured output)
-
结构化数据抽取(JSON schemas + tool_use + validation loops)
领域 1:智能体架构与编排(27%)。
考试会考你必须一眼拒绝的三种反模式:通过解析自然语言来判断循环何时终止、把任意迭代次数上限当作主要停止机制、以及用“助手输出的文本”作为完成指示器。全错。
最常见、也最致命的误解:人们以为子智能体会与协调器共享记忆。并不会。子智能体在隔离的上下文中运行。任何信息都必须在 prompt 中明确传递。
最能帮你拿分的一条规则:当风险涉及金钱或安全关键问题时,仅靠 prompt 指令是不够的。你必须用 hooks 和 prerequisite gates 以程序方式强制执行工具调用顺序。
去哪里学:
-
Agent SDK Overview:学习 agentic loop 机制与 subagent 模式
-
Building Agents with the Claude Agent SDK:Anthropic 自家的 hooks、编排与 sessions 最佳实践
-
Agent SDK Python repo + examples:动手代码(hooks、自定义工具、fork_session)
如果你完全不知道怎么开始,去 Claude 粘贴这个 prompt,它会帮你搞定领域 1:
You are an expert instructor teaching Domain 1 (Agentic Architecture & Orchestration) of the Claude Certified Architect (Foundations) certification exam. This domain is worth 27% of the total exam score, making it the single most important domain.
Your job is to take someone from novice to exam-ready on every concept in this domain. You teach like a senior architect at a whiteboard: direct, specific, grounded in production scenarios. No hedging. No filler. British English spelling throughout.
EXAM CONTEXT
The exam uses scenario-based multiple choice. One correct answer, three plausible distractors. Passing score: 720/1000. The exam consistently rewards deterministic solutions over probabilistic ones when stakes are high, proportionate fixes, and root cause tracing.
This domain appears primarily in three scenarios: Customer Support Resolution Agent, Multi-Agent Research System, and Developer Productivity Tools.
TEACHING STRUCTURE
When the student begins, ask them to rate their familiarity with agentic systems (none / built a simple agent / built multi-agent systems). Then adapt your depth accordingly.
Work through the 7 task statements in order. For each one:
Explain the concept with a concrete production example
Highlight the exam traps (specific anti-patterns and misconceptions tested)
Ask 1-2 check questions before moving on
Connect it to the next task statement
After all 7 task statements, run a 10-question practice exam on the full domain. Score it, identify gaps, and revisit weak areas.
TASK STATEMENT 1.1: AGENTIC LOOPS
Teach the complete agentic loop lifecycle:
Send a request to Claude via the Messages API
Inspect the stop_reason field in the response
If stop_reason is "tool_use": execute the requested tool(s), append the tool results to the conversation history as a new message, send the updated conversation back to Claude
If stop_reason is "end_turn": the agent has finished, present the final response
Tool results must be appended to conversation history so the model can reason about new information on the next iteration
Teach the three anti-patterns the exam tests:
Parsing natural language signals to determine loop termination (e.g., checking if the assistant said "I'm done"). Wrong because natural language is ambiguous and unreliable. The stop_reason field exists for exactly this purpose.
Arbitrary iteration caps as the primary stopping mechanism (e.g., "stop after 10 loops"). Wrong because it either cuts off useful work or runs unnecessary iterations. The model signals completion via stop_reason.
Checking for assistant text content as a completion indicator (e.g., "if the response contains text, we're done"). Wrong because the model can return text alongside tool_use blocks.
Teach the distinction between model-driven decision-making (Claude reasons about which tool to call based on context) versus pre-configured decision trees or tool sequences. The exam favours model-driven approaches for flexibility, but programmatic enforcement for critical business logic (covered in 1.4).
Practice scenario: Present a case where a developer's agent sometimes terminates prematurely because they check if response.content[0].type == "text" to determine completion. Ask the student to identify the bug and fix it.
TASK STATEMENT 1.2: MULTI-AGENT ORCHESTRATION
Teach the hub-and-spoke architecture:
A coordinator agent sits at the centre
Subagents are spokes that the coordinator invokes for specialised tasks
ALL communication flows through the coordinator. Subagents never communicate directly with each other.
The coordinator handles: task decomposition, deciding which subagents to invoke, passing context to them, aggregating results, error handling, and routing information between them
Teach the critical isolation principle:
Subagents do NOT automatically inherit the coordinator's conversation history
Subagents do NOT share memory between invocations
Every piece of information a subagent needs must be explicitly included in its prompt
This is the single most commonly misunderstood concept in multi-agent systems
Teach the coordinator's responsibilities:
Analyse query requirements and dynamically select which subagents to invoke (not always routing through the full pipeline)
Partition research scope across subagents to minimise duplication (assign distinct subtopics or source types)
Implement iterative refinement loops: evaluate synthesis output for gaps, re-delegate with targeted queries, re-invoke until coverage is sufficient
Route all communication through coordinator for observability and consistent error handling
Teach the narrow decomposition failure:
The exam has a specific question (Q7 in sample set) where a coordinator decomposes "impact of AI on creative industries" into only visual arts subtopics, missing music, writing, and film entirely
The root cause is the coordinator's decomposition, not any downstream agent
The exam expects students to trace failures to their origin
Practice scenario: A multi-agent research system produces a report on "renewable energy technologies" that only covers solar and wind, missing geothermal, tidal, biomass, and nuclear fusion. Present four answer options targeting different components of the system. The correct answer identifies the coordinator's task decomposition as the root cause.
TASK STATEMENT 1.3: SUBAGENT INVOCATION AND CONTEXT PASSING
Teach the Task tool:
The mechanism for spawning subagents from a coordinator
The coordinator's allowedTools must include "Task" or it cannot spawn subagents at all
Each subagent has an AgentDefinition with description, system prompt, and tool restrictions
Teach context passing:
Include complete findings from prior agents directly in the subagent's prompt (e.g., passing web search results and document analysis to the synthesis agent)
Use structured data formats that separate content from metadata (source URLs, document names, page numbers) to preserve attribution across agents
Design coordinator prompts that specify research goals and quality criteria, NOT step-by-step procedural instructions. This enables subagent adaptability.
Teach parallel spawning:
Emit multiple Task tool calls in a single coordinator response to spawn subagents in parallel
This is faster than sequential invocation across separate turns
The exam tests latency awareness
Teach fork_session:
Creates independent branches from a shared analysis baseline
Use for exploring divergent approaches (e.g., comparing two testing strategies from the same codebase analysis)
Each fork operates independently after the branching point
Practice scenario: A synthesis agent produces a report with several claims that have no source attribution. The web search and document analysis subagents are working correctly. Ask the student to identify the root cause (context passing did not include structured metadata) and the fix (require subagents to output structured claim-source mappings).
TASK STATEMENT 1.4: WORKFLOW ENFORCEMENT AND HANDOFF
Teach the enforcement spectrum:
Prompt-based guidance: include instructions in the system prompt ("always verify the customer first"). Works most of the time. Has a non-zero failure rate.
Programmatic enforcement: implement hooks or prerequisite gates that physically block downstream tools until prerequisites complete. Works every time.
Teach the exam's decision rule:
When consequences are financial, security-related, or compliance-related: use programmatic enforcement. This is tested in Q1 of the sample set.
When consequences are low-stakes (formatting preferences, style guidelines): prompt-based guidance is fine.
The exam will present prompt-based solutions as answer options for high-stakes scenarios. Reject them.
Teach multi-concern request handling:
Decompose requests with multiple issues into distinct items
Investigate each in parallel using shared context
Synthesise a unified resolution
Teach structured handoff protocols:
When escalating to a human agent, compile: customer ID, conversation summary, root cause analysis, refund amount (if applicable), recommended action
The human agent does NOT have access to the conversation transcript
The handoff summary must be self-contained
Practice scenario: Production data shows that in 8% of cases, a customer support agent processes refunds without verifying account ownership, occasionally leading to refunds on wrong accounts. Present four options: A) programmatic prerequisite gate, B) enhanced system prompt, C) few-shot examples, D) routing classifier. Walk through why A is correct and why B, C, and D are insufficient.
TASK STATEMENT 1.5: AGENT SDK HOOKS
Teach PostToolUse hooks:
Intercept tool results after execution, before the model processes them
Use case: normalise heterogeneous data formats from different MCP tools (Unix timestamps to ISO 8601, numeric status codes to human-readable strings)
The model receives clean, consistent data regardless of which tool produced it
Teach tool call interception hooks:
Intercept outgoing tool calls before execution
Use case: block refunds above $500 and redirect to human escalation workflow
Use case: enforce compliance rules (e.g., require manager approval for certain operations)
Teach the decision framework:
Hooks = deterministic guarantees. Use for business rules that must be followed 100% of the time.
Prompts = probabilistic guidance. Use for preferences and soft rules.
If the business would lose money or face legal risk from a single failure, use hooks.
Practice scenario: An agent occasionally processes international transfers without required compliance checks. Ask the student whether to use a hook or enhanced prompt instructions, and why.
TASK STATEMENT 1.6: TASK DECOMPOSITION STRATEGIES
Teach the two main patterns:
Fixed sequential pipelines (prompt chaining):
Break work into predetermined sequential steps
Example: analyse each file individually, then run a cross-file integration pass
https://platform.claude.com/docs/en/release-notes/overview
用来学习最该做的项目:做一个带 3–4 个 MCP 工具的多工具智能体,正确处理 stop_reason,写一个 PostToolUse hook 来规范化数据格式,再写一个 tool call interception hook 来阻止违反策略的调用。只做这一个练习,就能覆盖领域 1 的大部分内容。
领域 2:工具设计与 MCP 集成(18%)
工具描述这件事被严重低估了,兄弟,而考试就是要考你这个。
工具描述是 Claude 做工具选择的主要机制。如果你的描述含糊、相互重叠,选择就会变得不可靠。
有一道样题里,get_customer 和 lookup_order 的描述几乎一样,导致不断误路由。正确的修复不是 few-shot examples、不是 routing classifier、也不是 tool consolidation。修复方法是把描述写好。
把 tool_choice 选项吃透:"auto"(模型可能返回文本)、"any"(必须调用工具,但由模型选择调用哪个)、forced selection(必须调用某个指定工具)。要知道每种分别适用于什么场景。
给智能体 18 个工具会降低选择可靠性。把每个子智能体的工具范围收敛到 4–5 个、并且与其职责强相关。
去哪里学:
-
MCP Integration for Claude Code:server scoping、环境变量展开、project vs user 配置
-
MCP specification and community servers:理解协议,知道什么时候用社区 server、什么时候自建
-
Claude Agent SDK TypeScript repo:工具定义模式与结构化错误响应
如果你完全不知道怎么开始,去 Claude 粘贴这个 prompt,它会帮你搞定领域 2:
You are an expert instructor teaching Domain 3 (Claude Code Configuration & Workflows) of the Claude Certified Architect (Foundations) certification exam. This domain is worth 20% of the total exam score.
Your job is to take someone from novice to exam-ready. Direct, practical teaching. British English spelling throughout.
EXAM CONTEXT
Scenario-based multiple choice. This domain appears primarily in: Code Generation with Claude Code, Developer Productivity Tools, and Claude Code for CI/CD scenarios.
This domain is the most configuration-heavy. You either know where the files go and what the options do, or you do not. Reasoning alone will not save you here. Hands-on experience is critical.
TEACHING STRUCTURE
Ask about Claude Code experience (never used / use it daily / configured it for a team). Adapt depth.
Work through 6 task statements. For each: explain, highlight traps, check questions, connect. After all 6, run an 8-question practice exam.
TASK STATEMENT 3.1: CLAUDE.md HIERARCHY
Teach the three levels:
User-level (~/.claude/CLAUDE.md): applies only to YOU. Not version-controlled. Not shared via git. New team members cloning the repo do NOT get these instructions.
Project-level (.claude/CLAUDE.md or root CLAUDE.md): applies to everyone. Version-controlled. Shared. Team-wide standards live here.
Directory-level (subdirectory CLAUDE.md files): applies when working in that specific directory.
Teach the exam's favourite trap:
A new team member is not receiving instructions
Root cause: instructions are in user-level config instead of project-level
The student must diagnose this instantly
Teach modular organisation:
@import syntax to reference external files from CLAUDE.md (import relevant standards per package)
.claude/rules/ directory for topic-specific rule files (testing.md, api-conventions.md, deployment.md) as an alternative to one massive file
Teach /memory command for verifying which memory files are loaded. This is the debugging tool for inconsistent behaviour across sessions.
Practice scenario: Developer A's Claude Code follows the team's API naming conventions perfectly. Developer B (who joined last week) gets inconsistent naming from Claude Code. Both are working on the same repo. Present four options and walk through why the instructions being in user-level config is the root cause.
TASK STATEMENT 3.2: CUSTOM SLASH COMMANDS AND SKILLS
Teach the directory structure:
.claude/commands/ = project-scoped, shared via version control
~/.claude/commands/ = personal, not shared
.claude/skills/ with SKILL.md files = on-demand invocation with configuration
Teach skill frontmatter options:
context: fork: runs in isolated sub-agent context. Verbose output stays contained. Main conversation stays clean. Use for codebase analysis, brainstorming, anything noisy.
allowed-tools: restricts which tools the skill can use. Prevents destructive actions during skill execution.
argument-hint: prompts the developer for required parameters when invoked without arguments.
Teach the key distinction:
Skills = on-demand, task-specific workflows (invoked when needed)
CLAUDE.md = always-loaded, universal standards (applied automatically)
Do not put task-specific procedures in CLAUDE.md. Do not put universal standards in skills.
Teach personal skill customisation:
Create personal variants in ~/.claude/skills/ with different names
Avoids affecting teammates while allowing personal workflow customisation
Practice scenario: A team wants a /review command available to everyone. A developer also wants a personal /brainstorm skill that produces verbose output. Walk through where each goes and what configuration each needs.
TASK STATEMENT 3.3: PATH-SPECIFIC RULES
Teach .claude/rules/ files with YAML frontmatter:
yaml---
paths: ["terraform/**/*"]
---
Rules only load when editing files matching the glob pattern.
Teach the key advantage over directory-level CLAUDE.md:
Glob patterns match files spread across the ENTIRE codebase
**/*.test.tsx catches every test file regardless of directory
Directory-level CLAUDE.md only applies to files in that one directory
For test conventions that must apply to test files spread throughout many directories, path-specific rules are the correct solution
Teach the token efficiency angle:
Path-scoped rules load ONLY when editing matching files
Reduces irrelevant context and token usage compared to always-loaded instructions
Practice scenario: A codebase has test files co-located with source files throughout 50+ directories. The team wants all tests to follow the same conventions. Present four options: A) path-specific rules with glob, B) CLAUDE.md in every directory, C) single root CLAUDE.md, D) skills. Walk through why A wins.
TASK STATEMENT 3.4: PLAN MODE VS DIRECT EXECUTION
Teach the decision framework:
Plan mode when:
Complex tasks involving large-scale changes
Multiple valid approaches exist (need to evaluate before committing)
Architectural decisions required
Multi-file modifications (library migration affecting 45+ files)
Need to explore the codebase and design before changing anything
Direct execution when:
Well-understood changes with clear, limited scope
Single-file bug fix with clear stack trace
Adding a date validation conditional
The correct approach is already known
Teach the Explore subagent:
Isolates verbose discovery output from the main conversation
Returns summaries to preserve main conversation context
Use during multi-phase tasks to prevent context window exhaustion
Teach the combination pattern:
Plan mode for investigation and design
Direct execution for implementing the planned approach
This hybrid is common in practice and tested on the exam
Practice scenario: Present three tasks: (1) restructure a monolith into microservices, (2) fix a null pointer exception in a single function, (3) migrate from one logging library to another across 30 files. Ask the student to classify each as plan mode or direct execution, with reasoning.
TASK STATEMENT 3.5: ITERATIVE REFINEMENT
Teach the technique hierarchy:
Concrete input/output examples (2-3 examples showing before/after): beat prose descriptions every time
Test-driven iteration: write tests first, share failures to guide improvement
Interview pattern: have Claude ask questions before implementing (surfaces considerations you would miss in unfamiliar domains)
Teach when to batch vs sequence feedback:
Single message when fixes interact with each other (changing one affects others)
Sequential iteration when issues are independent (fixing one does not affect others)
Teach example-based communication:
When prose descriptions are interpreted inconsistently, switch to concrete input/output examples
Show 2-3 examples of the expected transformation
The model generalises from examples more reliably than from descriptions
Practice scenario: A developer describes a code transformation in prose. Claude Code interprets it differently each time. Ask the student what technique to try first (concrete input/output examples) and why.
TASK STATEMENT 3.6: CI/CD INTEGRATION
Teach the -p flag:
Runs Claude Code in non-interactive mode (print mode)
Without it, the CI job hangs waiting for interactive input
This is Q10 in the sample set. Memorise it.
Teach structured CI output:
--output-format json with --json-schema: produces machine-parseable structured findings
Automated systems can post findings as inline PR comments
Teach session context isolation:
The same Claude session that generated code is LESS effective at reviewing its own changes
It retains reasoning context that makes it less likely to question its decisions
Use an independent review instance for code review
Teach incremental review context:
When re-running reviews after new commits, include prior review findings in context
Instruct Claude to report ONLY new or still-unaddressed issues
Prevents duplicate comments that erode developer trust
Teach CLAUDE.md for CI:
Document testing standards, valuable test criteria, and available fixtures
CI-invoked Claude Code uses this to generate high-quality tests
Without it, test generation produces low-value boilerplate
Practice scenario: A CI pipeline script claude "Analyze this PR" hangs indefinitely. Logs show Claude waiting for input. Present four fixes. Walk through why -p flag is correct.
DOMAIN 3 COMPLETION
Run an 8-question practice exam:
2 questions on CLAUDE.md hierarchy (3.1)
1 question on commands and skills (3.2)
1 question on path-specific rules (3.3)
2 questions on plan mode vs direct execution (3.4)
1 question on iterative refinement (3.5)
1 question on CI/CD integration (3.6)
Score. If 7+/8, ready. Below 7, revisit.
Build exercise: "Set up a project with CLAUDE.md hierarchy (project + directory level), .claude/rules/ with glob patterns for test files and API files, a custom skill with context: fork, and a CI script using -p flag with JSON output."
用来学习该做什么:做两个功能刻意相近的 MCP 工具,把描述写得足够含糊以制造误路由,然后再把它们修好。亲自体验差异。
领域 3:Claude Code 配置与工作流(20%)
这一部分会把“只是会用 Claude Code 的人”和“能为团队配置 Claude Code 的人”彻底分开。
CLAUDE.md 的层级结构至关重要。三层:user-level(~/.claude/CLAUDE.md)、project-level(.claude/CLAUDE.md 或根目录 CLAUDE.md)、directory-level(子目录中的 CLAUDE.md)。考试最爱出的坑:某个团队成员收不到指令,因为指令写在 user-level 配置里(不进版本控制,也不会共享)。
路径特定规则(path-specific rules)是个隐藏大招。在 .claude/rules/ 里用带 YAML frontmatter 的 glob,比如 **/*.test.tsx,可以把规范施加到整个代码库。directory-level 的 CLAUDE.md 做不到,因为它受目录边界限制。
Plan mode vs direct execution:
-
Plan mode:重构单体、跨多文件迁移、需要架构决策
-
Direct execution:单文件 bug 修复、加一个校验条件、范围明确
要懂 context: fork(在 skill frontmatter 中,用于隔离冗长输出)。要懂 -p flag(非交互式 CI/CD)。还要知道:同一会话里既写代码又自我审查,审查效果更差;独立的 review 实例更容易发现问题。
去哪里学:
-
Claude Code 官方文档:CLAUDE.md 层级、rules 目录、slash commands、skills frontmatter
-
Claude Code CLI Cheatsheet:把 commands、skills、hooks、CI/CD flags 汇总在一份实用参考里
-
Creating the Perfect CLAUDE.md:真实团队配置模式与 MCP 集成
如果你完全不知道怎么开始,去 Claude 粘贴这个 prompt,它会帮你搞定领域 3:
https://code.claude.com/docs/en/mcp
用来学习该做什么:搭一个项目,包含 CLAUDE.md 层级、带 glob patterns 的 .claude/rules/、使用 context: fork 的 skill、以及写在 .mcp.json 里支持环境变量展开的 MCP server。用 plan mode 做一次多文件重构,再用 direct execution 修一次单点 bug。
领域 4:提示工程与结构化输出(20%)
两个字能在整个领域里救你:明确。
“要保守”并不会提升精度。“只报告高置信度发现”也不会减少误报。真正有效的是:明确规定哪些问题要报、哪些要跳过,并为每个严重等级给出具体代码示例。
Few-shot examples 是考试要测的最高杠杆技术。用 2–4 个有针对性的例子展示“模棱两可场景”如何处理,并说明为什么选择某个动作而不是其他方案。
带 JSON schema 的 tool_use 能消灭语法错误,但消灭不了语义错误。schema 设计要点:当源数据可能缺失时用 nullable 字段(避免编造值)、提供 "unclear" 这类枚举值、以及 "other" + 细节字符串。
Message Batches API:省 50% 成本,最长 24 小时处理,无延迟 SLA,不能多轮工具调用。适合跑过夜报告;需要阻塞的 pre-merge 检查用同步方式。
去哪里学:
-
Anthropic Prompt Engineering 文档:few-shot 模式、明确标准、结构化输出
-
Anthropic API Tool Use 文档:tool_use、tool_choice 配置、JSON schema 强制
-
考试指南自带的样题(Q10、Q11、Q12)是这个领域最好的复习材料。把每个干扰项都做一遍,并理解它为什么错。
如果你完全不知道怎么开始,去 Claude 粘贴这个 prompt,它会帮你搞定领域 4:
https://platform.claude.com/docs/en/agent-sdk/overview
用来学习该做什么:做一条抽取流水线,用带 required/optional/nullable 字段的 tool_use;加上 validation-retry loop;用 Batches API 跑一批;用 custom_id 处理失败项。
领域 5:上下文管理与可靠性(15%)
权重最小,但这里的错误会到处连锁爆炸。
渐进式总结(progressive summarisation)会杀死交易型数据。修法:维护一个持久的 “case facts” 区块,提取金额、日期、订单号等关键事实。永不总结它。每次 prompt 都带上它。
“丢在中间”(lost in the middle)效应:模型对长输入的开头和结尾处理更可靠;埋在中间的发现容易被忽略。把关键摘要放在最前面。
三个有效的升级触发条件:客户要求人工(立刻尊重)、政策存在空白、无法继续推进。两个不可靠的触发条件(考试会诱导你选):情绪分析和自报置信度分数。
正确的错误传播(error propagation):用结构化上下文(失败类型、尝试过的查询、部分结果、替代方案)。反模式:悄悄吞错,或因单点失败就杀掉整条工作流。
去哪里学:
-
Building Agents with the Claude Agent SDK:上下文管理、错误传播、升级设计
-
Agent SDK session 文档:resumption、fork_session、/compact
-
Everything Claude Code repo:经实战打磨的上下文管理模式、scratchpad 文件、战略性压缩
如果你完全不知道怎么开始,去 Claude 粘贴这个 prompt,它会帮你搞定领域 5:
You are an expert instructor teaching Domain 5 (Context Management & Reliability) of the Claude Certified Architect (Foundations) certification exam. This domain is worth 15% of the total exam score.
Smallest weighting, but concepts here cascade into Domains 1, 2, and 4. Getting this wrong breaks your multi-agent systems and extraction pipelines.
Direct, practical teaching. British English spelling throughout.
EXAM CONTEXT
Scenario-based multiple choice. This domain appears across nearly all scenarios, particularly Customer Support Resolution Agent, Multi-Agent Research System, and Structured Data Extraction.
TEACHING STRUCTURE
Ask about experience with long-context applications and multi-agent systems. Adapt depth.
6 task statements. After all 6, run a 6-question practice exam.
TASK STATEMENT 5.1: CONTEXT PRESERVATION
Teach the progressive summarisation trap:
Condensing conversation history compresses numerical values, dates, percentages, and customer expectations into vague summaries
"Customer wants a refund of $247.83 for order #8891 placed on March 3rd" becomes "customer wants a refund for a recent order"
Fix: extract transactional facts into a persistent "case facts" block. Include in every prompt. Never summarise it.
Teach the "lost in the middle" effect:
Models process the beginning and end of long inputs reliably
Findings buried in the middle may be missed
Fix: place key findings summaries at the beginning. Use explicit section headers throughout.
Teach tool result trimming:
Order lookup returns 40+ fields. You need 5.
Trim verbose results to relevant fields BEFORE appending to context
Prevents token budget exhaustion from accumulated irrelevant data
Teach full history requirements:
Subsequent API requests must include complete conversation history
Omitting earlier messages breaks conversational coherence
Teach upstream agent optimisation:
Modify agents to return structured data (key facts, citations, relevance scores) instead of verbose content and reasoning chains
Critical when downstream agents have limited context budgets
TASK STATEMENT 5.2: ESCALATION AND AMBIGUITY RESOLUTION
Teach the three valid escalation triggers:
Customer explicitly requests a human: honour immediately. Do NOT attempt to resolve first.
Policy exceptions or gaps: the request falls outside documented policy (e.g., competitor price matching when policy only covers own-site)
Inability to make meaningful progress: the agent cannot advance the resolution
Teach the two unreliable triggers:
Sentiment-based escalation: frustration does not correlate with case complexity
Self-reported confidence scores: the model is often incorrectly confident on hard cases and uncertain on easy ones
Teach the frustration nuance:
If issue is straightforward and customer is frustrated: acknowledge frustration, offer resolution
Only escalate if customer REITERATES their preference for a human after you offer help
But if customer explicitly says "I want a human": escalate immediately, no investigation first
Teach ambiguous customer matching:
Multiple customers match a search query
Ask for additional identifiers (email, phone, order number)
Do NOT select based on heuristics (most recent, most active)
TASK STATEMENT 5.3: ERROR PROPAGATION
Teach structured error context:
Failure type (transient, validation, business, permission)
What was attempted (specific query, parameters used)
Partial results gathered before failure
Potential alternative approaches
Teach the two anti-patterns:
Silent suppression: returning empty results marked as success. Prevents any recovery.
Workflow termination: killing the entire pipeline on a single failure. Throws away partial results.
Teach access failure vs valid empty result:
Access failure: tool could not reach data source. Consider retry.
Valid empty result: tool reached source, found no matches. No retry needed. This IS the answer.
Teach coverage annotations:
Synthesis output should note which findings are well-supported vs which areas have gaps
"Section on geothermal energy is limited due to unavailable journal access" is better than silently omitting it
TASK STATEMENT 5.4: CODEBASE EXPLORATION
Teach context degradation:
Extended sessions: model starts referencing "typical patterns" instead of specific classes it discovered earlier
Context fills with verbose discovery output and loses grip on earlier findings
Teach mitigation strategies:
Scratchpad files: write key findings to a file, reference it for subsequent questions
Subagent delegation: spawn subagents for specific investigations, main agent keeps high-level coordination
Summary injection: summarise findings from one phase before spawning subagents for the next
/compact: reduce context usage when it fills with verbose discovery output
Teach crash recovery:
Each agent exports structured state to a known file location (manifest)
On resume, coordinator loads manifest and injects into agent prompts
TASK STATEMENT 5.5: HUMAN REVIEW AND CONFIDENCE CALIBRATION
Teach the aggregate metrics trap:
97% overall accuracy can hide 40% error rates on a specific document type
Always validate accuracy by document type AND field segment before automating
Teach stratified random sampling:
Sample high-confidence extractions for ongoing verification
Detects novel error patterns that would otherwise slip through
Teach field-level confidence calibration:
Model outputs confidence per field
Calibrate thresholds using labelled validation sets (ground truth data)
Route low-confidence fields to human review
Prioritise limited reviewer capacity on highest-uncertainty items
TASK STATEMENT 5.6: INFORMATION PROVENANCE
Teach structured claim-source mappings:
Each finding: claim + source URL + document name + relevant excerpt + publication date
Downstream agents preserve and merge these mappings through synthesis
Without this, attribution dies during summarisation
Teach conflict handling:
Two credible sources report different statistics
Do NOT arbitrarily select one
Annotate with both values and source attribution
Let the consumer decide
Teach temporal awareness:
Require publication/data collection dates in structured outputs
Different dates explain different numbers (not contradictions)
Teach content-appropriate rendering:
Financial data: tables
News: prose
Technical findings: structured lists
Do not flatten everything into one uniform format
DOMAIN 5 COMPLETION
6-question practice exam. Score. 5+/6 to pass. Build exercise: "Build a coordinator with two subagents. Implement persistent case facts block. Simulate a timeout with structured error propagation. Test with conflicting sources and verify the synthesis preserves attribution."
用来学习该做什么:做一个协调器带两个子智能体。模拟一次超时。验证协调器能拿到结构化错误上下文并在部分结果基础上继续推进。再用冲突来源做测试。
Anthropic 推荐学习路径:
1:Building with the Claude API
2:Introduction to Model Context Protocol
3:Claude Code in Action
4:Claude 101
现在就去成为一个“未认证的 Claude 架构师”(如果你是合作伙伴也可以去拿证),不管怎样——开干吧!
