返回列表
🧠 阿头学 · 🪞 Uota学 · 💬 讨论题

别再把 Agent 当脚本——你要交付的是一套“能活下去”的运行时

Agent 的上限不取决于它多会“想”,而取决于你的 runtime 多会“控、隔离、恢复、审计”。
打开原文 ↗

2026-03-02 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • “Build 很快,Serve 才是坟场” 现在框架太多,做个 prompt+tools loop 不是能力;真正让产品死掉的是 Serve:多用户/多会话/流式/后台/重试语义/外部依赖限流与宕机——这些不补齐,demo 永远上不了线。
  • 六大支柱是 production 的最低门槛,不是装饰 Durability/Isolation/Governance/Persistence/Scale/Composability 这张清单的意义在于:你可以立刻拿来做“我们离上线差多远”的差距分析,而不是继续沉迷提示词优化。
  • 治理(Governance)会从“可选项”变成“产品本身” 能执行真实动作的 agent 必须支持三态切换:自动执行 / 追问信息(elicitation)/ 等待审批。没有可暂停可恢复的 runtime,治理就只剩“加个弹窗”的自欺。
  • 作者说得对,但他也在卖货:要警惕“平台叙事”把复杂度一次性拉满 “Agent 是分布式系统”在方向上没错,但很多团队真正需要的是:幂等工具 + 明确重试语义 + 任务队列 + 审计日志 + 分层权限。先把最硬的事故点补上,再决定要不要整个平台。
  • 缺的那几块牙齿:可观测性与评测 文章几乎没谈 tracing/metrics/SLO、回归评测(eval harness)、成本/延迟预算与安全合规细节;而这些往往比“再加一个支柱名词”更决定你能不能规模化。

跟我们的关联

  • 🧠Neta 意味着什么:如果要把 agent 接入真实用户数据与动作(例如内容生产、社交互动、支付/退款等),隔离与治理是第一优先级。接下来怎么做:把工具按风险分级(free-run/user-approve/admin-signoff),先把“不可逆/高风险”全部改成可审计+需审批。
  • 🪞Uota 意味着什么:你要把 agent runtime 当“分布式工作流引擎”来设计(状态机/暂停恢复/幂等/重试策略)。接下来怎么做:把每个工具调用写成 event(含输入/输出/副作用标记),至少做到“失败可重放、重放不二次扣款”。
  • 👤ATou 意味着什么:成为“指挥 AI”的人,核心不是会写 prompt,而是会设计约束与恢复。接下来怎么做:为你当前最常用的 5 个工具写一页纸:权限级别/幂等性/重试策略/审计字段/失败兜底。

讨论引子

1. 你们现在最真实的事故风险是什么:数据泄露、误操作、还是成本失控?对应的“第一支柱”到底是哪一个? 2. “Agent 是分布式系统”这句话会不会让团队过早过度工程?你们该用什么指标决定从单体→队列→多租户平台的升级时机? 3. 治理的边界怎么画才不把体验做死:哪些动作可以默认自动执行,哪些必须审批?你们的“授权矩阵”是谁来定?

注:本文讲的是如何构建你自己的智能体(智能体软件工程),而不是如何使用写代码的智能体。

到现在,你大概率已经用过一些智能体,或者至少听说过 Claude Code、Codex、OpenClaw。有没有想过:要自己造一个,究竟需要什么?

大多数人把智能体理解为“提示词 + 工具”的循环。这个假设并不离谱,但它不是生产级架构。

一旦你的智能体需要知道自己在和谁对话、维护状态、处理并发请求、执行敏感操作,并且要在工具调用失败时继续存活,它就不再只是“LLM + 工具”,而是一个分布式系统。

造智能体很容易。至少有 75 个框架能帮你把它搭起来。真正难的是运行时(runtime):围绕智能体的那层“外骨骼”,让它在真实世界里可用、可靠。智能体软件工程讲的就是这件事。

构建。服务。连接。

这是我对交付智能体软件的理解:

  1. 构建智能体。定义模型、工具、知识库、记忆、存储以及护栏(guardrails)。这一层大多数框架都能提供。

  2. 把它作为 API 提供服务。按用户隔离、按会话隔离、可水平扩展。加入持久化存储、流式输出、后台执行、重试语义。大多数智能体产品会卡在这里。不是因为智能体本身不能用,而是因为它缺少在规模化场景下可靠运行所需的基础设施。

  3. 连接到用户所在之处。你的产品、Slack、Discord、MCP,随便哪里。笔记本里的智能体是实验;用户所在之处的智能体才是产品。

智能体软件的六大支柱

构建一个智能体是 AI 工程;把它跑在生产环境里是软件工程。两者结合,就是智能体软件工程:把智能体当作生产服务来构建、运行与扩展的一套工程实践。

支撑它的六大支柱如下:

  1. 可恢复性。 智能体会跨多步推理,会调用可能超时的工具,也可能在中途失败。如果你的智能体在 15 步里的第 12 步崩了,重启可能会重复某个副作用,或者丢失关键上下文。智能体软件需要能够暂停、恢复、做检查点(checkpoint),并优雅地自愈。可恢复性把“失败”变成“继续”,而不是从头再来。

  2. 隔离性。 智能体软件要同时服务成千上万的用户。每个用户都需要自己的会话、自己的记忆、自己的上下文。每个请求带一个 user_id 很容易;真正的工程难点在于隔离智能体触及的每一项资源。你的数据库、向量库、模型提供方都必须尊重用户边界。少一个过滤条件,就会变成一次数据泄露。

  3. 治理。 能行动的智能体,也能造成破坏。查一条记录无伤大雅;删除记录或发起退款就需要审批。智能体软件需要分层权限:哪些可以自动执行,哪些需要人工批准,哪些需要管理员签字。今天,大多数智能体在几乎没有监督的情况下自动执行。随着能力增强,治理会成为产品本身。

  4. 持久化。 没有持久化存储的智能体无法学习、无法积累上下文、无法改进。我们需要把会话、记忆、知识存进数据库。持久化状态把聊天机器人变成产品;每一次对话都会让下一次更好。

  5. 可扩展性。 一千个用户同时打到你的智能体。请求排队、触发模型限流,工具调用彼此争抢资源。传统服务调用的是你自己的后端;智能体软件调用的是外部模型 API 和第三方工具,因此你会继承它们的限流、时延与宕机。扩展智能体软件,意味着要围绕那些你无法控制的依赖去做扩展。

  6. 可组合性。 当智能体成为一种服务,其他智能体就可以调用它;你的前端可以调用它;你的 Slack 机器人可以调用它;MCP 客户端可以发现它。它会成为你架构里的一个积木块,而每一次新的集成都变成一次标准的 API 调用。单智能体工具就是这样演化成多智能体系统的。

这些都不新鲜。几十年来我们一直在构建可靠的分布式系统。只是 AI 行业还没把这些经验带过来,于是我们在每一次失败的部署里都能感受到代价。

从理论到实践

一如既往,我带着代码而来。下面是你今天就能开始构建自己的智能体服务的方法。

这会给你一个容器化服务,带持久化存储(Postgres)、两个起步智能体(一个使用 Agentic RAG 的知识智能体,以及一个用于外部工具使用的 MCP 智能体),以及一个你可以从任何地方接入的 REST API。

我在这个模板里使用 Docker,因为 Docker 到处都能跑:你的笔记本、AWS、GCP、Azure、Railway。你本地开发用的同一个容器,就是你部署到生产环境的那个容器。README 覆盖了你上手所需的一切。

运行服务后:

  1. 打开 http://localhost:8000/docs 查看你的 API。

  2. 连接到 os.agno.com 的 Web UI,你可以在里面和智能体聊天、追踪运行(trace runs)、管理知识、创建计划(schedules),并审批敏感工具调用。你的智能体软件,一个 UI 全部搞定。

https://docs.agno.com/deploy

添加你自己的智能体只需要几行 Python 加一次重启。换模型只要改一行。工具可以从 100+ 个集成里添加。这个模板只是起点;更多内容请阅读 Agno 文档。

治理与 探询

大多数智能体在工具调用时几乎没有监督或可审计性。现实中,我们需要分层权限:

  1. 可以自由运行的工具

  2. 需要用户批准的工具

  3. 需要管理员批准的工具

智能体也需要提问(通常称为 elicitation,探询/追问)。Claude Code 团队分享过一篇很棒的文章,介绍 Claude 使用的 AskUserQuestion 工具。

在 Agno 里,这对应 UserFeedbackTools。下面是一个客服智能体:它可以自由查询订单;当需要更多信息时,会向客户提出结构化问题;在发起退款前,会等待用户批准:

https://docs.agno.com/

看看当客户提出退款请求时会发生什么。

  • 智能体会自行查询订单,不需要任何权限。

  • 然后它来到一个决策点:客户为什么要退款?

  • 它不会靠猜,而是给出一个结构化问题,提供清晰选项:有缺陷、发错商品、改变主意。

  • 客户选一个。此时智能体会调用退款工具,但由于退款有真实后果,它会暂停并等待用户批准。

  • 一旦批准,智能体就执行退款工具。

一次对话里包含三层代理能力(agency)。完整代码在这里。

https://x.com/trq212/status/2027463795355095314

智能体知道何时行动、何时提问、何时等待。这就是治理在实践中的样子。运行时必须同时支持这三种模式,而且它们之间的切换要足够自然。

注:UI 上的审批流程仍在积极开发中。退款应该等待管理员批准,而不是用户批准。SDK 已实现这一点,但 UI 还没跟上。本周会修复。

智能体是分布式系统

“5 个层级”描述了智能体软件如何在能力(以及复杂度)上成长;“7 大罪”描述了它们在生产环境里如何失败;“6 大支柱”描述了把它们做对需要什么。

这三者传递的信息一致:智能体软件工程是一门学科。越早内化这些原则的团队,越能交付出色的产品;仍把智能体当脚本来对待的团队,将持续偏离目标。

把仓库克隆下来。构建你的第一个智能体。把它交付到用户所在之处。

Note: this post is about building your own agents (agentic software engineering), not about using coding agents.

注:本文讲的是如何构建你自己的智能体(智能体软件工程),而不是如何使用写代码的智能体。

By now you've probably used a few agents, or at least heard of Claude Code, Codex, or OpenClaw. Ever wondered what it takes to build your own?

到现在,你大概率已经用过一些智能体,或者至少听说过 Claude Code、Codex、OpenClaw。有没有想过:要自己造一个,究竟需要什么?

Most people think of agents as prompts + tools in a loop. That's a reasonable assumption, but it's not production architecture.

大多数人把智能体理解为“提示词 + 工具”的循环。这个假设并不离谱,但它不是生产级架构。

The moment your agent needs to know who it's talking to, maintain state, handle concurrent requests, take sensitive actions, and survive failing tool calls, it stops being an "LLM + tools" and becomes a distributed system.

一旦你的智能体需要知道自己在和谁对话、维护状态、处理并发请求、执行敏感操作,并且要在工具调用失败时继续存活,它就不再只是“LLM + 工具”,而是一个分布式系统。

Building agents is the easy part. There are 75 frameworks that help you do that. The hard part is the runtime: the harness around the agent that makes it work in the real world. That's what agentic software engineering is all about.

造智能体很容易。至少有 75 个框架能帮你把它搭起来。真正难的是运行时(runtime):围绕智能体的那层“外骨骼”,让它在真实世界里可用、可靠。智能体软件工程讲的就是这件事。

Build. Serve. Connect.

构建。服务。连接。

Here's how I think about shipping agentic software.

这是我对交付智能体软件的理解:

  1. Build the agent. Define the model, tools, knowledge base, memory, storage, and guardrails. This is the layer that most frameworks give you.
  1. 构建智能体。定义模型、工具、知识库、记忆、存储以及护栏(guardrails)。这一层大多数框架都能提供。
  1. Serve it as an API. User-scoped, session-scoped, horizontally scalable. Add persistent storage, streaming, background execution, retry semantics. This is where most agentic products stall. Not because the agent doesn't work, but because it doesn't have the infrastructure around it to work reliably at scale.
  1. 把它作为 API 提供服务。按用户隔离、按会话隔离、可水平扩展。加入持久化存储、流式输出、后台执行、重试语义。大多数智能体产品会卡在这里。不是因为智能体本身不能用,而是因为它缺少在规模化场景下可靠运行所需的基础设施。
  1. Connect it to where users live. Your product, Slack, Discord, MCP, wherever. An agent in a notebook is an experiment. An agent where your users are is a product.
  1. 连接到用户所在之处。你的产品、Slack、Discord、MCP,随便哪里。笔记本里的智能体是实验;用户所在之处的智能体才是产品。

The 6 Pillars of Agentic Software

智能体软件的六大支柱

Building an agent is AI engineering. Running it in production is software engineering. Together, they form agentic software engineering: the practice of building, running, and scaling agents as production services.

构建一个智能体是 AI 工程;把它跑在生产环境里是软件工程。两者结合,就是智能体软件工程:把智能体当作生产服务来构建、运行与扩展的一套工程实践。

Here are the six pillars that hold it up:

支撑它的六大支柱如下:

  1. Durability. Agents reason across multiple steps, call tools that time out, and fail halfway through. If your agent crashes on step 12 of 15, restarting might duplicate a side effect or lose critical context. Agentic software needs to pause, resume, checkpoint, and recover gracefully. Durability turns failure into resumption, not a full restart.
  1. 可恢复性。 智能体会跨多步推理,会调用可能超时的工具,也可能在中途失败。如果你的智能体在 15 步里的第 12 步崩了,重启可能会重复某个副作用,或者丢失关键上下文。智能体软件需要能够暂停、恢复、做检查点(checkpoint),并优雅地自愈。可恢复性把“失败”变成“继续”,而不是从头再来。
  1. Isolation. Agentic software serves thousands of users simultaneously. Each user needs their own session, their own memory, their own context. Passing a user_id with each request is easy. Isolating every resource the agent touches is where the engineering comes in. Your database, your vector store, your model provider, all need to respect user boundaries. One missing filter becomes a data breach.
  1. 隔离性。 智能体软件要同时服务成千上万的用户。每个用户都需要自己的会话、自己的记忆、自己的上下文。每个请求带一个 user_id 很容易;真正的工程难点在于隔离智能体触及的每一项资源。你的数据库、向量库、模型提供方都必须尊重用户边界。少一个过滤条件,就会变成一次数据泄露。
  1. Governance. Agents that can act can also cause damage. Looking up a record is harmless. Deleting a record or issuing a refund needs approval. Agentic software needs layered authority: what runs automatically, what needs human approval, and what needs admin sign-off. Today, most agents auto-execute with minimal oversight. As they get more capable, governance becomes the product.
  1. 治理。 能行动的智能体,也能造成破坏。查一条记录无伤大雅;删除记录或发起退款就需要审批。智能体软件需要分层权限:哪些可以自动执行,哪些需要人工批准,哪些需要管理员签字。今天,大多数智能体在几乎没有监督的情况下自动执行。随着能力增强,治理会成为产品本身。
  1. Persistence. An agent without persistent storage can't learn, can't build context, can't improve. We need to store sessions, memory, knowledge in a database. Persistent state is what turns a chatbot into a product. Every conversation makes the next one better.
  1. 持久化。 没有持久化存储的智能体无法学习、无法积累上下文、无法改进。我们需要把会话、记忆、知识存进数据库。持久化状态把聊天机器人变成产品;每一次对话都会让下一次更好。
  1. Scale. A thousand users hit your agent at the same time. Requests queue, you hit model rate limits, and tool calls compete for resources. Traditional services call your own backends. Agentic software calls external model APIs and third-party tools, which means you inherit their rate limits, latency, and downtime. Scaling agentic software means scaling around dependencies you don't control.
  1. 可扩展性。 一千个用户同时打到你的智能体。请求排队、触发模型限流,工具调用彼此争抢资源。传统服务调用的是你自己的后端;智能体软件调用的是外部模型 API 和第三方工具,因此你会继承它们的限流、时延与宕机。扩展智能体软件,意味着要围绕那些你无法控制的依赖去做扩展。
  1. Composability. When an agent is a service, other agents can call it. Your frontend can call it. Your Slack bot can call it. MCP clients can discover it. It becomes a building block in your architecture, and every new integration becomes a standard API call. That's how single-agent tools become multi-agent systems.
  1. 可组合性。 当智能体成为一种服务,其他智能体就可以调用它;你的前端可以调用它;你的 Slack 机器人可以调用它;MCP 客户端可以发现它。它会成为你架构里的一个积木块,而每一次新的集成都变成一次标准的 API 调用。单智能体工具就是这样演化成多智能体系统的。

None of this is new. We've been building reliable distributed systems for decades. The AI industry just hasn't brought those lessons along yet, and we're feeling it in every failed deployment.

这些都不新鲜。几十年来我们一直在构建可靠的分布式系统。只是 AI 行业还没把这些经验带过来,于是我们在每一次失败的部署里都能感受到代价。

From Theory to Practice

从理论到实践

As always, I come bearing code. Here's how you can start building your own agentic service today.

一如既往,我带着代码而来。下面是你今天就能开始构建自己的智能体服务的方法。

This gives you a containerized service with persistent storage (Postgres), two starter agents (a knowledge agent using Agentic RAG and an MCP agent for external tool use), and a REST API you can connect to from anywhere.

这会给你一个容器化服务,带持久化存储(Postgres)、两个起步智能体(一个使用 Agentic RAG 的知识智能体,以及一个用于外部工具使用的 MCP 智能体),以及一个你可以从任何地方接入的 REST API。

I'm using Docker for this template because Docker runs everywhere: your laptop, AWS, GCP, Azure, Railway. The same container you develop locally is the one you deploy to production. The README covers everything you need to get started.

我在这个模板里使用 Docker,因为 Docker 到处都能跑:你的笔记本、AWS、GCP、Azure、Railway。你本地开发用的同一个容器,就是你部署到生产环境的那个容器。README 覆盖了你上手所需的一切。

After running the service:

运行服务后:

  1. Open http://localhost:8000/docs to see your API.
  1. 打开 http://localhost:8000/docs 查看你的 API。
  1. Connect to the web UI at os.agno.com where you can chat with your agents, trace runs, manage knowledge, create schedules and approve sensitive tool calls. One UI for your agentic software.
  1. 连接到 os.agno.com 的 Web UI,你可以在里面和智能体聊天、追踪运行(trace runs)、管理知识、创建计划(schedules),并审批敏感工具调用。你的智能体软件,一个 UI 全部搞定。

Adding your own agent is a few lines of Python and a restart. Swap models with a one-line change. Add tools from 100+ integrations. The template is a starting point. Read the Agno docs to learn more.

添加你自己的智能体只需要几行 Python 加一次重启。换模型只要改一行。工具可以从 100+ 个集成里添加。这个模板只是起点;更多内容请阅读 Agno 文档。

Governance & Elicitation

治理与 探询

Most agents run tool calls with minimal oversight or auditability. In practice, we need layered authority:

大多数智能体在工具调用时几乎没有监督或可审计性。现实中,我们需要分层权限:

  1. Tools that run freely
  1. 可以自由运行的工具
  1. Tools that need user approval
  1. 需要用户批准的工具
  1. Tools that need admin approval
  1. 需要管理员批准的工具

Agents also need to ask questions (often called elicitation). The Claude Code team shared a great article on the AskUserQuestion tool used by Claude.

智能体也需要提问(通常称为 elicitation,探询/追问)。Claude Code 团队分享过一篇很棒的文章,介绍 Claude 使用的 AskUserQuestion 工具。

This is available in Agno as UserFeedbackTools. Here's a support agent that can look up orders freely, ask the customer structured questions when it needs more information, and waits for user approval before issuing a refund:

在 Agno 里,这对应 UserFeedbackTools。下面是一个客服智能体:它可以自由查询订单;当需要更多信息时,会向客户提出结构化问题;在发起退款前,会等待用户批准:

Watch what happens when a customer asks for a refund.

看看当客户提出退款请求时会发生什么。

  • The agent looks up the order on its own, no permission needed.
  • 智能体会自行查询订单,不需要任何权限。
  • Then it hits a decision point: why does the customer want the refund?
  • 然后它来到一个决策点:客户为什么要退款?
  • Instead of guessing, it presents a structured question with clear options: defective, wrong item, changed mind.
  • 它不会靠猜,而是给出一个结构化问题,提供清晰选项:有缺陷、发错商品、改变主意。
  • The customer picks one. Now the agent calls the refund tool, but because refunds carry real consequences, it pauses for user approval.
  • 客户选一个。此时智能体会调用退款工具,但由于退款有真实后果,它会暂停并等待用户批准。
  • Once approved, the agent runs the refund tool.
  • 一旦批准,智能体就执行退款工具。

Three levels of agency in one conversation. You can view the full code here.

一次对话里包含三层代理能力(agency)。完整代码在这里。

The agent knows when to act, when to ask, and when to wait. That's what governance looks like in practice. The runtime has to support all three modes, and the transitions between them have to feel natural.

智能体知道何时行动、何时提问、何时等待。这就是治理在实践中的样子。运行时必须同时支持这三种模式,而且它们之间的切换要足够自然。

Note: the approvals flow on the UI is actively being developed. The refund should wait for admin approval, not user approval. This is implemented on the SDK but not the UI yet. This is being fixed this week.

注:UI 上的审批流程仍在积极开发中。退款应该等待管理员批准,而不是用户批准。SDK 已实现这一点,但 UI 还没跟上。本周会修复。

Agents are distributed systems

智能体是分布式系统

The 5 Levels describe how agentic software grows in capability (and complexity). The 7 Sins describe how they fail in production. The 6 Pillars describe what it takes to build them right.

“5 个层级”描述了智能体软件如何在能力(以及复杂度)上成长;“7 大罪”描述了它们在生产环境里如何失败;“6 大支柱”描述了把它们做对需要什么。

The consistent message across all three: agentic software engineering is a discipline. The teams that internalize this early will ship great products. The teams that keep treating agents as scripts will continue to miss the mark.

这三者传递的信息一致:智能体软件工程是一门学科。越早内化这些原则的团队,越能交付出色的产品;仍把智能体当脚本来对待的团队,将持续偏离目标。

Clone the repo. Build your first agent. Ship it where your users are.

把仓库克隆下来。构建你的第一个智能体。把它交付到用户所在之处。

Note: this post is about building your own agents (agentic software engineering), not about using coding agents.

By now you've probably used a few agents, or at least heard of Claude Code, Codex, or OpenClaw. Ever wondered what it takes to build your own?

Most people think of agents as prompts + tools in a loop. That's a reasonable assumption, but it's not production architecture.

The moment your agent needs to know who it's talking to, maintain state, handle concurrent requests, take sensitive actions, and survive failing tool calls, it stops being an "LLM + tools" and becomes a distributed system.

Building agents is the easy part. There are 75 frameworks that help you do that. The hard part is the runtime: the harness around the agent that makes it work in the real world. That's what agentic software engineering is all about.

Build. Serve. Connect.

Here's how I think about shipping agentic software.

  1. Build the agent. Define the model, tools, knowledge base, memory, storage, and guardrails. This is the layer that most frameworks give you.

  2. Serve it as an API. User-scoped, session-scoped, horizontally scalable. Add persistent storage, streaming, background execution, retry semantics. This is where most agentic products stall. Not because the agent doesn't work, but because it doesn't have the infrastructure around it to work reliably at scale.

  3. Connect it to where users live. Your product, Slack, Discord, MCP, wherever. An agent in a notebook is an experiment. An agent where your users are is a product.

The 6 Pillars of Agentic Software

Building an agent is AI engineering. Running it in production is software engineering. Together, they form agentic software engineering: the practice of building, running, and scaling agents as production services.

Here are the six pillars that hold it up:

  1. Durability. Agents reason across multiple steps, call tools that time out, and fail halfway through. If your agent crashes on step 12 of 15, restarting might duplicate a side effect or lose critical context. Agentic software needs to pause, resume, checkpoint, and recover gracefully. Durability turns failure into resumption, not a full restart.

  2. Isolation. Agentic software serves thousands of users simultaneously. Each user needs their own session, their own memory, their own context. Passing a user_id with each request is easy. Isolating every resource the agent touches is where the engineering comes in. Your database, your vector store, your model provider, all need to respect user boundaries. One missing filter becomes a data breach.

  3. Governance. Agents that can act can also cause damage. Looking up a record is harmless. Deleting a record or issuing a refund needs approval. Agentic software needs layered authority: what runs automatically, what needs human approval, and what needs admin sign-off. Today, most agents auto-execute with minimal oversight. As they get more capable, governance becomes the product.

  4. Persistence. An agent without persistent storage can't learn, can't build context, can't improve. We need to store sessions, memory, knowledge in a database. Persistent state is what turns a chatbot into a product. Every conversation makes the next one better.

  5. Scale. A thousand users hit your agent at the same time. Requests queue, you hit model rate limits, and tool calls compete for resources. Traditional services call your own backends. Agentic software calls external model APIs and third-party tools, which means you inherit their rate limits, latency, and downtime. Scaling agentic software means scaling around dependencies you don't control.

  6. Composability. When an agent is a service, other agents can call it. Your frontend can call it. Your Slack bot can call it. MCP clients can discover it. It becomes a building block in your architecture, and every new integration becomes a standard API call. That's how single-agent tools become multi-agent systems.

None of this is new. We've been building reliable distributed systems for decades. The AI industry just hasn't brought those lessons along yet, and we're feeling it in every failed deployment.

From Theory to Practice

As always, I come bearing code. Here's how you can start building your own agentic service today.

This gives you a containerized service with persistent storage (Postgres), two starter agents (a knowledge agent using Agentic RAG and an MCP agent for external tool use), and a REST API you can connect to from anywhere.

I'm using Docker for this template because Docker runs everywhere: your laptop, AWS, GCP, Azure, Railway. The same container you develop locally is the one you deploy to production. The README covers everything you need to get started.

After running the service:

  1. Open http://localhost:8000/docs to see your API.

  2. Connect to the web UI at os.agno.com where you can chat with your agents, trace runs, manage knowledge, create schedules and approve sensitive tool calls. One UI for your agentic software.

https://docs.agno.com/deploy

Adding your own agent is a few lines of Python and a restart. Swap models with a one-line change. Add tools from 100+ integrations. The template is a starting point. Read the Agno docs to learn more.

Governance & Elicitation

Most agents run tool calls with minimal oversight or auditability. In practice, we need layered authority:

  1. Tools that run freely

  2. Tools that need user approval

  3. Tools that need admin approval

Agents also need to ask questions (often called elicitation). The Claude Code team shared a great article on the AskUserQuestion tool used by Claude.

This is available in Agno as UserFeedbackTools. Here's a support agent that can look up orders freely, ask the customer structured questions when it needs more information, and waits for user approval before issuing a refund:

https://docs.agno.com/

Watch what happens when a customer asks for a refund.

  • The agent looks up the order on its own, no permission needed.

  • Then it hits a decision point: why does the customer want the refund?

  • Instead of guessing, it presents a structured question with clear options: defective, wrong item, changed mind.

  • The customer picks one. Now the agent calls the refund tool, but because refunds carry real consequences, it pauses for user approval.

  • Once approved, the agent runs the refund tool.

Three levels of agency in one conversation. You can view the full code here.

https://x.com/trq212/status/2027463795355095314

The agent knows when to act, when to ask, and when to wait. That's what governance looks like in practice. The runtime has to support all three modes, and the transitions between them have to feel natural.

Note: the approvals flow on the UI is actively being developed. The refund should wait for admin approval, not user approval. This is implemented on the SDK but not the UI yet. This is being fixed this week.

Agents are distributed systems

The 5 Levels describe how agentic software grows in capability (and complexity). The 7 Sins describe how they fail in production. The 6 Pillars describe what it takes to build them right.

The consistent message across all three: agentic software engineering is a discipline. The teams that internalize this early will ship great products. The teams that keep treating agents as scripts will continue to miss the mark.

Clone the repo. Build your first agent. Ship it where your users are.

📋 讨论归档

讨论进行中…