返回列表
🧠 阿头学 · 💬 讨论题

Harness 不是外挂,应该直接并入后端

这篇文章最有价值的判断是“agent harness 不该独立于后端存在”,但作者把这个方向性判断几乎直接包装成自家 iii 的必然答案,论证明显超前于证据。
打开原文 ↗

2026-05-01 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • 真正的争议不是模型,而是基础设施边界 作者判断当下 AI 系统的关键问题不是选哪家模型,而是要把多少“编排、工具、记忆、状态、错误处理”编码进系统;这个判断是对的,因为模型能力再强,也不能自动解决生产级系统的可调试性和可治理性。
  • 薄 harness 与厚 harness 本质是控制权分配 文中把 Anthropic、OpenAI、CrewAI、LangGraph 放在一条光谱上,这个划分是成立的:越薄越信模型,越厚越信显式逻辑;但作者进一步认为这只是“同一后端原语里的不同模式”,这个说法有启发性,却还没被充分证明。
  • 把 agent 当成普通执行单元是全文最站得住脚的部分 作者主张 agent 应该像 API 服务、队列消费者、爬虫一样,被视作一个 worker,共享同一套 trigger、state、trace 和 handoff 机制;这个判断很强,因为它直接击中了今天 agent 系统最大的问题——AI 编排层和业务执行层割裂。
  • “一切皆 worker”是优雅抽象,但不是复杂度消失 文中宣称 worker / trigger / function 三原语能压平基础设施类别,这个抽象确实简洁;但抽象统一不等于运行语义统一,队列、HTTP、sandbox、浏览器、IoT 的失败模式、权限模型和 SLA 仍然完全不同,作者在这里明显说大了。
  • iii 更像一种产品世界观,而不是已被验证的行业定论 文章把行业问题、架构抽象和 iii 产品能力严丝合缝地对齐,这种叙述非常高明;但它更接近“定义战场并宣称自己最适合”而不是“用数据证明自己最优”。

跟我们的关联

  • 对 ATou 意味着什么、下一步怎么用 ATou 如果在看 agent 产品或 AI infra,不该再把“prompt 层”和“backend 层”分开评估,而要优先看状态、重试、trace、权限是否统一;下一步可以把现有 agent 栈按“谁负责 function / trigger / worker”重新拆一遍,看看复杂度到底堆在哪。
  • 对 Neta 意味着什么、下一步怎么用 Neta 可以直接把这篇文章当成一个架构判断框架:越是把 agent 当特殊物种的系统,长期越难治理;下一步可用“三原语模型”审视现有产品设计,区分哪些是业务必需复杂度,哪些只是历史工具分类造成的伪复杂度。
  • 对 Uota 意味着什么、下一步怎么用 Uota 若在讨论 agent 落地,不该只争“LangGraph 还是原生 loop”,而要追问 agent 调用链是否进入统一可观测和统一执行语义;下一步可以拿这篇文章做辩题:统一原语到底是在降复杂度,还是在把复杂度藏到底层。
  • 对投资判断意味着什么、下一步怎么用 这篇文章提示一个值得下注的方向:agent infra 的护城河可能不在 planner,而在统一执行上下文;下一步看项目时要重点追问它是“多加一层 AI 编排”,还是“真正改写 backend 默认接口”。

讨论引子

1. “harness 就是后端”这个判断里,哪些部分是趋势,哪些部分只是 iii 的销售话术? 2. 把一切都抽象成 worker,究竟是在降低系统复杂度,还是在掩盖不同运行语义的真实成本? 3. 如果未来模型变强、agent 需要的显式编排变少,这类统一运行时会更重要,还是反而失去价值?

当下 AI 基础设施里,最重要的架构问题,不是哪一个模型更该用。真正的问题是,要把它做成有用的东西,到底需要多少基础设施。

Anthropic、OpenAI、CrewAI、LangChain 都把那层包装称为 agent harness。这个 harness 包括编排循环、工具、记忆、上下文管理,以及让模型真正变得有用的错误处理。他们都同意一点,模型不是产品。基础设施才是。分歧在于,这层基础设施到底该有多少。

Anthropic 把自己的 harness 做得很薄。它是一个很优雅的循环,组装提示词,调用模型,执行工具调用,然后重复。所有决策都由模型来做。OpenAI 增加了更多结构,指令栈、编排模式,以及显式的交接模式。CrewAI 走的是多路并进的方式,用确定性的 Flows 负责路由和校验,其余部分交给自治代理。LangGraph 的 harness 最大。每一个决策都是一个节点,每一次状态迁移都是一条定义好的边,整个工作流都编码在 harness 里。

这是一条连续光谱,从强烈地 信任模型、弱化逻辑编码,到较少信任模型、强烈地 编码逻辑。而每个在用代理做系统的团队,都必须决定自己需要多大的 harness。

但这场争论里埋着一个几乎没人质疑的前提,大家默认 harness 是传统后端之外的东西。

代理的循环、工具和记忆活在一层,也就是 harness。队列、状态、HTTP 路由、服务端渲染、可观测性,以及其他所有后端组件,则活在另一层,也就是所谓的后端。

这个区分只是暂时的。它只是代理式基础设施真正被吸收到后端之前的一小步。

今天的代理是怎么工作的

现在大多数代理架构都是这样运作的。harness 是一个 Python 进程,或者 TypeScript,或者某个托管框架,它包裹着模型。当代理决定采取行动时,harness 会把一次工具调用翻译成一个 HTTP 请求,再由这个请求触发后端里的 某个东西,比如向队列发布消息,或者写入数据库。后端是它自己的世界,始终和代理分离。

harness 按自己的节奏重试,队列按自己的条件重试,HTTP 层则处理自己的超时。它们之间没有直接打通的追踪链路。系统一旦出问题,调试就意味着跨系统对日志做关联,再把观察到的行为一点点拼回去。这在后端工程里很常见,但以前的系统大多是确定性的,代理最多只能算随机性的。

每多一个代理,概率空间就会变大。最基础的量级,是 agents^2 * services。换句话说,1 个代理加 5 个后端系统,就有 5 条带随机性的调试路径。4 个代理加 5 个后端系统,就变成 80 条带随机性的调试路径。

没有什么好办法能把代理变得更确定。它们的很多基础能力,本来就是为了在相似甚至相同的输入下给出不同回答。它们不是偶然随机,而是有意随机,因为正是这一点,让计算机第一次以全新的方式变得有用。真正值几十亿美元的问题是,怎样在正确的上下文里,用正确的 harness,把代理处理好。

退后一步看

如今 harness 的核心承诺,是试图在旧范式里运转一种新范式,也就是在确定性的后端中运行随机性的 LLM。问题不在于构建 agent harness 天生就是错的,而在于,真正有效的解法必须从拆解后端是什么开始。

直到很近之前,我们大多数人都把后端及其运作方式视作理所当然,包括我自己。如果没有代理和驱动它们的 LLM,我大概永远不会去想这个问题。于是开始了一段追根溯源的过程,想弄清后端最基础的构件到底是什么。

一开始,以为后端是服务的集合,这些服务分属不同的产品类别,再通过库、集成、架构图、编排代码拼起来,后来这张清单越拉越长。最后才意识到,自己一直是在自上而下地想这个问题,而不是自下而上。想通这一点后,后端突然变得很简单。

后端由三个核心元素组成,负责编排工作的 worker,负责触发这些服务的 trigger,以及服务内部真正执行工作的 function。

把后端抽象出来

想明白这件事后,就很清楚了,我和团队里那些非常出色的人,完全可以基于这个抽象去搭一个后端。这不只是纸上谈兵。我们发现,这个抽象在代理世界里非常有用,在更广泛的场景里也一样有用,因为它把后端的执行上下文完整包住了。所以做了 iii,把这个抽象开放给所有人。

iii 的工作方式,就和上面的描述一模一样。

它把 Function 定义为一个工作单元,带有稳定标识符,比如 orders::validate,接收输入,也可以选择返回输出。它可以存在于任何进程里,也可以用任何语言编写。

Trigger 是让 function 运行起来的东西。它可以是一次直接函数调用,一个 HTTP 端点,一个 cron 调度,一个队列订阅,一次状态变化,一个流事件,或者任何别的东西。Trigger 是声明式的。worker 只需要说,这个 function 会在这件事发生时运行,剩下的路由、序列化和投递都由 iii 处理。

Worker 是任何连接到引擎并注册 functions 和 triggers 的进程。

一个 TypeScript API 服务是 worker。一个 Python ML 流水线是 worker。一个 Rust 微服务是 worker。一个代理也是 worker。

这就是改变一切的那个想法。代理连接到引擎,注册 functions 和 triggers,通过 state::set 持久化上下文,通过基于队列的 trigger 交接工作,再通过 pub/sub 广播结果。它不是通过单独的集成层去调用后端。它和其他一切一样,直接参与同一个系统,使用同一套原语。

const iii = registerWorker('ws://localhost:49134', { workerName: 'agentic-backend' })

iii.registerFunction('agents::researcher', async (data) => { // the unit of work
  // Python Worker: requests + duckduckgo-search
  const sources = await iii.trigger({
    function_id: 'web::search',
    payload: { query: data.topic, limit: 10 }
  })
  // Rust Worker: scraper + tokio, fetched in parallel
  const pages = await iii.trigger({
    function_id: 'web::scrape',
    payload: { urls: sources.map(s => s.url) }
  })
  // TypeScript Worker: wraps the OpenAI SDK
  const findings = await iii.trigger({
    function_id: 'llm::summarize',
    payload: { topic: data.topic, documents: pages }
  })
  await iii.trigger({ // Rust Worker: persist to shared state
    function_id: 'state::set',
    payload: { scope: 'research-tasks', key: data.task_id, value: findings }
  })
  iii.trigger({ // TypeScript Worker: hand off to the critic
    function_id: 'agents::critic',
    payload: { task_id: data.task_id },
    action: TriggerAction.Enqueue({ queue: 'agent-tasks' }) // run in the queue
  })
  return findings
})

iii.registerTrigger({ // HTTP entrypoint
  type: 'http',
  function_id: 'agents::researcher',
  config: { api_path: '/agents/research', http_method: 'POST' }
})

iii.registerTrigger({ // also runs on a pending state row
  type: 'state',
  function_id: 'agents::researcher',
  config: { scope: 'research-tasks', condition: 'status == "pending"' }
})

三次调用。registerFunction 定义工作。registerTrigger 把它绑定到现实世界里,在这个例子里,是一个 HTTP 端点,以及 一个状态变化触发器,二者都对应同一个 function。现在,这个 researcher 既可以通过一次 POST 请求调用,也会在研究任务进入 pending 状态时自动触发。再加一个 trigger,它还可以按 cron 调度运行。function 本身不用变。triggers 可以自由组合。

代理用和支付服务一样的 trigger() 调用来存储状态。它通过和订单流水线一样的队列机制把工作交给 critic。代理的工具就是 functions。它的记忆就是 state。它的编排就是 triggers 和组合。这里不需要一套特殊的代理基础设施,因为根本没这个必要。

harness 就是 后端。

一路到底都是 worker

这比代理融入后端更深一层。关键在于 iii 把什么看作原语,以及当一个原语只用几行代码就能回答所有问题时,会发生什么。

在大多数平台里,每多一种能力,就多一个类别。需要队列,就去评估队列产品。需要流处理,就是另一种产品。需要沙箱,又是另一种。每一种都有自己的内部机制、自己的生命周期、自己的集成方式。平台是一个商品目录,你的工作是去挑,然后拼起来。

在 iii 里,几乎所有问题的答案都一样,加一个 worker,然后由它去注册 triggers 和 functions。

想要沙箱。加一个 worker。想要一个能研究主题的代理。加一个 worker。想要实时流处理。加一个 worker。想要市场拓展能力,比如线索评分、邮件序列、CRM 同步。加一个 worker。想要 cron 调度。它已经是一个 worker。想要可观测性。也已经是一个 worker。

worker 连上系统,注册自己能做什么,系统就把它吸收进来,实时、可发现、可观测。答案不会因为你加的是哪种能力而改变。不会因为语言不同而改变。不会因为它属于基础设施还是业务逻辑而改变。不会因为它是人写出来的还是代理创建出来的而改变。加一个 worker。

这不只是架构上的统一,而是类别本身的坍塌。在传统系统里,每种能力都活在自己独立的本体里。队列有 broker 语义,HTTP 有路由语义,cron 有调度语义,代理有编排语义。在 iii 里,它们全都变成同一件事,一个注册 functions 和 triggers 的进程。语义存在于 functions 中,而不是基础设施里。

软件里的范式切换,从来不是加功能,而是压平类别。一切皆文件,让 Unix 具备了可组合性。组件就是函数,让 React 的心智模型真正站稳。在 iii 里,答案永远是加一个 worker。这就是原语。这就是整个模型。

一个活着的系统

因为一切都是 worker,会自然长出三个传统架构做不到的特性。

实时发现。 当一个 worker 接入时,它会收到其他所有 worker 上已注册 function 的完整目录。新 function 出现时,每个 worker 都会收到通知。worker 断开时,每个 worker 也会收到通知。引擎是唯一真实来源。

对代理来说,这也是认知基础设施。代理能准确看到整个系统在 此刻 能做什么。不会有代理拿到过期上下文的风险。

实时扩展。 你可以在不重新部署、不重画架构图的情况下,给正在运行的 iii 系统加进新的 worker 和能力。不需要改配置,也不需要重启,因为系统是在运行时扩展的。

这才是代理系统真正想要的运作方式。增加一个新能力时,根本不需要打断生产环境。接入一个新 worker,它的 functions 就会分发到整个系统里,任何能使用它们的代理都可以直接调用,甚至还可以由代理自己扩展出新的 worker。

实时可观测。 iii 的可观测性建立在 OpenTelemetry 之上。每次 function 调用都会带着一个 trace ID。每次 trigger() 调用,都会把它跨 worker、跨语言、跨队列交接一路传下去。所有通过 iii Logger 发出的日志,都会自动关联到当前 trace 和 span,以结构化的 OpenTelemetry LogRecords 形式输出,再路由到你使用的后端里,比如 iii Console、Grafana、Jaeger、Datadog。这不是一个需要单独安装和集成的组件,它只是另一个 worker。Traces、metrics 和结构化日志,都由引擎本身产生,而不是靠应用层中间件来补。

当代理调用一个工具,这个工具把消息放进队列,触发下游 function,再把结果写入 state,整条链路就是一个 trace。不是三个彼此独立、只能靠时间戳关联或者手工追踪 trace id 的系统。而是一条 trace,跨语言、跨 worker、跨代理和后端的边界。你可以从一个缓慢的 waterfall span,直接跳到能解释发生了什么的关联日志。

会创建 worker 的代理

到这里,这个模型开始真正地递归起来。

iii 支持具备硬件隔离的 microVM worker。沙箱能力本身就是一个 worker,它有自己的文件系统、网络栈和进程树。你只需要一条命令就能创建一个 worker,iii worker add ./my-worker。这个 sandbox worker 连上引擎,注册 functions 和 triggers,然后像其他所有 worker 一样参与整个系统。

现在想想,当代理也能做这件事时会发生什么。

一个代理 worker 也可以在运行时拉起一个新的 sandbox worker。这个 sandbox 会拥有自己的隔离环境。它会注册自己的 functions 和 triggers。这些 functions 会立刻出现在实时目录里。其他代理和服务都能调用它们。等这个 sandbox 不再需要时,它断开连接,自己的 functions 也会随之注销。

sandbox 不是一个独立的沙箱产品。它只是一个 worker,和其他东西一样使用同一套原语,只不过它提供的是硬件隔离。代理创建一个 sandbox worker,本质上只是一个 worker 创建了另一个 worker。

这就是当基础设施变成设计模式,而不再是产品类别时会呈现出的样子。需要为不受信任的代码提供隔离执行。那就是一个 sandbox worker。需要一个临时的专家代理。拉起一个 worker,注册 functions,用完再关掉。需要一群并行任务执行器。让一个 worker 去拉起其他 worker。原语不变,变化的是模式。

那个区分会消失

再回到 harness 这场争论。Anthropic 说要薄。LangGraph 说要厚。他们争论的是,围绕模型到底该编码多少认知结构。薄还是厚,这个问题确实重要,但它只是设计空间 内部 的问题,不是设计空间本身的问题。

当代理就是 worker 时,薄和厚只取决于你注册多少 functions,以及怎样把它们组合起来。薄 harness 是一个只有少量 functions 的代理 worker,让模型自己决定下一步该 trigger() 什么。厚 harness 是一个拥有更多 functions、显式审批关卡和条件逻辑的代理 worker,在把下一步放进队列前先进行控制。原语和系统都一样,只是模式不同。

脚手架这个比喻也会跟着变化。行业里谈 harness 脚手架时,总把它当作临时的东西。模型变强了,就把它拆掉。Manus 说过,他们把 Claude 的代理框架重建了四次,每一次重写,都是因为找到了塑造上下文的更好方式。Claude Code 也在随着新模型吸收能力而删去规划步骤。

如果 harness 本身就是用和后端其余部分相同的原语搭起来的,那拆掉脚手架,其实就只是把某个 function 简化掉。你不需要重构一整层集成架构。不需要重建两个系统之间的接口。你只需要注册更少的 functions,或者换一种组合方式。

任何东西都可以是 worker

worker 可以是任何能打开 WebSocket、注册 function、并且会说这套原语接口的东西。它不受对象类型限制,也不受语言限制。

iii 提供 TypeScript、Python 和 Rust 的 SDK。但那不是系统的边界。那只是一个开放线协议的三种实现,JSON over WebSocket。引擎并不知道连接另一端是什么语言。它看到的是 functions、triggers 和一条连接。如果你的团队用 Go、Java、Swift 或 Zig,那就写一个会说这套协议的小 SDK,你就能成为系统里的一等公民。原语接口才是契约。其他一切都只是设计模式。

这意味着,什么东西可以成为 worker,这个集合是真正没有上限的。一个 Node.js 服务。一个 Python ML 流水线。一个代理。一个队列。一个运行在 microVM 里的 sandbox。一个浏览器。iii 还提供了浏览器 SDK,所以某个人笔记本上的一个标签页,也能注册 functions、参与实时发现、调用后端 functions,也能被后端 functions 反过来 调用。浏览器进入系统的方式,和一个 Kubernetes pod 没有区别。

一个 Raspberry Pi 是 worker。边缘侧的 IoT 传感器是 worker。运行轻客户端的手机是 worker。一个启动后注册 function、干完活再断开的 CI runner 也是 worker。引擎不会区别对待它们。每一种新语言、每一种新设备、每一种新运行时,只要实现了这套原语接口,就能免费获得整个系统,实时发现、实时扩展、实时可观测、持久化 trigger,以及跨一切的调用能力。不是因为我们为每一种情况都做了单独集成,而是因为这套原语本身允许这种组合。

这场下注

整个行业都在争论,到底要在模型外面包多少脚手架。这场争论确实重要,但它默认了 harness 是它自己的世界,和后端分离,也和工具真正触发时运行的那层基础设施分离。

iii 的下注不一样。它认为,只要原语足够小、足够通用,也就是 worker、trigger、function,那么什么能参与这个系统,这个问题的答案就会变成,任何东西。一个云服务。一个代理。一个浏览器。一个微控制器。一个刚刚被代理拉起来的 sandbox。它们都以同样的方式组合。彼此都能发现对方。它们都共享同一条追踪链路。

当你不再把代理基础设施和后端基础设施分开看,也不再把 任何一种 参与者当成架构上有本质差异的类别时,系统会以一种单纯加功能永远做不到的方式被简化。harness 和后端之间,云和边缘之间,基础设施和应用之间,人写的服务和代理创建的 worker 之间,那些边界都会融化成同样的三个原语。

harness 不是压在后端上面的一层。harness 本来就是 后端的一部分。而后端,就是任何连接到 iii 的东西。

只要原语选对了,类别就会坍塌,复杂度也会被大幅压平。

iii 已经开源。可以从我们的 quickstart 开始。

The most important architectural question in AI infrastructure right now isn’t which model to use. It’s how much infrastructure is required to build something useful with it.

当下 AI 基础设施里,最重要的架构问题,不是哪一个模型更该用。真正的问题是,要把它做成有用的东西,到底需要多少基础设施。

Anthropic, OpenAI, CrewAI, LangChain all call that wrapping the agent harness. The harness includes the orchestration loop, tools (MCP, A2A), memory, context management, and error handling that make a model useful. They all agree the model isn’t the product. The infrastructure is. They disagree deeply on how much of it should exist.

Anthropic、OpenAI、CrewAI、LangChain 都把那层包装称为 agent harness。这个 harness 包括编排循环、工具、记忆、上下文管理,以及让模型真正变得有用的错误处理。他们都同意一点,模型不是产品。基础设施才是。分歧在于,这层基础设施到底该有多少。

Anthropic keeps their harness thin. It’s an elegant loop: Assemble the prompt, call the model, execute tool calls, and repeat. The model decides everything. OpenAI adds more structure: instruction stacks, orchestration modes, and explicit handoff patterns. CrewAI takes a multi-pronged approach: deterministic Flows for routing and validation, autonomous agents for the rest. LangGraph has the biggest harness Every decision is a node, every transition a defined edge, the entire workflow encoded in the harness.

Anthropic 把自己的 harness 做得很薄。它是一个很优雅的循环,组装提示词,调用模型,执行工具调用,然后重复。所有决策都由模型来做。OpenAI 增加了更多结构,指令栈、编排模式,以及显式的交接模式。CrewAI 走的是多路并进的方式,用确定性的 Flows 负责路由和校验,其余部分交给自治代理。LangGraph 的 harness 最大。每一个决策都是一个节点,每一次状态迁移都是一条定义好的边,整个工作流都编码在 harness 里。

The spectrum runs from strongly trusting the model and weakly encoding the logic to weakly trusting the model, and strongly encoding the logic. And every team building with agents has to choose what size of harness they need.

这是一条连续光谱,从强烈地 信任模型、弱化逻辑编码,到较少信任模型、强烈地 编码逻辑。而每个在用代理做系统的团队,都必须决定自己需要多大的 harness。

But there’s an assumption buried in the debate that nobody is questioning: that the harness is extrinsic to the traditional backend.

但这场争论里埋着一个几乎没人质疑的前提,大家默认 harness 是传统后端之外的东西。

The agent’s loop, its tools, and its memory live in one layer: the harness. While execution infrastructure such as queues, state, HTTP routing, server side rendering, observability, and all other backend components live in another: “the backend”.

代理的循环、工具和记忆活在一层,也就是 harness。队列、状态、HTTP 路由、服务端渲染、可观测性,以及其他所有后端组件,则活在另一层,也就是所谓的后端。

I believe that this is temporary and it’s just a small step along the way to true adoption and acceptance of agentic infrastructure into “the backend”.

这个区分只是暂时的。它只是代理式基础设施真正被吸收到后端之前的一小步。

How Agents Work Today

今天的代理是怎么工作的

Here’s how most agentic architectures work. The harness is a Python process (or TypeScript, or a managed framework) that wraps the model. When the agent decides to act, the harness translates a tool call into an HTTP request which in turn triggers something to happen on the backend like a queue publish or a database write. The backend is its own world that is kept separate from the agents.

现在大多数代理架构都是这样运作的。harness 是一个 Python 进程,或者 TypeScript,或者某个托管框架,它包裹着模型。当代理决定采取行动时,harness 会把一次工具调用翻译成一个 HTTP 请求,再由这个请求触发后端里的 某个东西,比如向队列发布消息,或者写入数据库。后端是它自己的世界,始终和代理分离。

The harness retries on its own schedule, the queue retries on its own conditions, and the HTTP layer manages its own timeouts. There is no trace directly connecting these disparate systems. When something breaks debugging means correlating logs across systems and reconstructing observed behavior. This is a common process in backend engineering but whereas prior systems were largely deterministic, agents are stochastic at best. .

harness 按自己的节奏重试,队列按自己的条件重试,HTTP 层则处理自己的超时。它们之间没有直接打通的追踪链路。系统一旦出问题,调试就意味着跨系统对日志做关联,再把观察到的行为一点点拼回去。这在后端工程里很常见,但以前的系统大多是确定性的,代理最多只能算随机性的。

With every additional agent the probabilities widen and at the most basic level are agents^2 * services. Put another way, 1 agent and 5 backend systems is 5 stochastic paths to debug. 4 agents and 5 backend systems are 80 stochastic paths to debug.

每多一个代理,概率空间就会变大。最基础的量级,是 agents^2 * services。换句话说,1 个代理加 5 个后端系统,就有 5 条带随机性的调试路径。4 个代理加 5 个后端系统,就变成 80 条带随机性的调试路径。

There is no good way to make agents more deterministic, much of their basic functionality is intended to give varied answers for similar and even identical inputs. They’re not stochastic by chance, they’re stochastic by intention because they make computers useful in a brand new way. The billion dollar question is how to handle agents properly by creating the correct harnesses in the correct contexts.

没有什么好办法能把代理变得更确定。它们的很多基础能力,本来就是为了在相似甚至相同的输入下给出不同回答。它们不是偶然随机,而是有意随机,因为正是这一点,让计算机第一次以全新的方式变得有用。真正值几十亿美元的问题是,怎样在正确的上下文里,用正确的 harness,把代理处理好。

Taking a Step Back

退后一步看

The fundamental promise of harnesses today is that they are trying to operate a new paradigm (stochastic LLMs) within an old one (deterministic backends). It’s not that construction of agent harnesses is inherently wrong, it’s that effective solutions must begin with the deconstruction of what a backend is.

如今 harness 的核心承诺,是试图在旧范式里运转一种新范式,也就是在确定性的后端中运行随机性的 LLM。问题不在于构建 agent harness 天生就是错的,而在于,真正有效的解法必须从拆解后端是什么开始。

Most of us have up until very recently taken the backend and how it works for granted; including myself. Without agents and the LLMs that power them I probably would have never thought about this problem before. So I embarked on a journey to figure out the fundamental building blocks of a backend.

直到很近之前,我们大多数人都把后端及其运作方式视作理所当然,包括我自己。如果没有代理和驱动它们的 LLM,我大概永远不会去想这个问题。于是开始了一段追根溯源的过程,想弄清后端最基础的构件到底是什么。

At first I thought that backends are collections of services that exist in categories of products and are assembled with libraries, integrations, architecture diagrams, orchestration code, and the list kept growing. Eventually I realized that I was approaching this solution from the top down instead of bottom up. Once I realized that the backend became very simple:

一开始,以为后端是服务的集合,这些服务分属不同的产品类别,再通过库、集成、架构图、编排代码拼起来,后来这张清单越拉越长。最后才意识到,自己一直是在自上而下地想这个问题,而不是自下而上。想通这一点后,后端突然变得很简单。

A backend is composed of three essential elements: workers that orchestrate work, triggers that invoke these services, and functions within the services that do the actual work.

后端由三个核心元素组成,负责编排工作的 worker,负责触发这些服务的 trigger,以及服务内部真正执行工作的 function。

Abstracting the Backend

把后端抽象出来

Once I realized this it became clear that I, and my very talented team, could build a backend using this abstraction. Far from an academic exercise we’ve found this abstraction has very real utility both in the agentic world and more broadly as our abstraction completely encapsulates the execution context of “the backend”. So we built iii to make that abstraction available to everyone.

想明白这件事后,就很清楚了,我和团队里那些非常出色的人,完全可以基于这个抽象去搭一个后端。这不只是纸上谈兵。我们发现,这个抽象在代理世界里非常有用,在更广泛的场景里也一样有用,因为它把后端的执行上下文完整包住了。所以做了 iii,把这个抽象开放给所有人。

iii works just like my description above:

iii 的工作方式,就和上面的描述一模一样。

It defines Function as a unit of work with a stable identifier (ex. orders::validate) that receives input, and optionally returns output it can live in any process, and in any language.

它把 Function 定义为一个工作单元,带有稳定标识符,比如 orders::validate,接收输入,也可以选择返回输出。它可以存在于任何进程里,也可以用任何语言编写。

A Trigger is what causes a function to run; it can be a direct call to a function, an HTTP endpoint, a cron schedule, a queue subscription, a state change, a stream event, or anything else. Triggers are declarative: the worker says “this function runs when this thing happens,” and iii handles routing, serialization, and delivery.

Trigger 是让 function 运行起来的东西。它可以是一次直接函数调用,一个 HTTP 端点,一个 cron 调度,一个队列订阅,一次状态变化,一个流事件,或者任何别的东西。Trigger 是声明式的。worker 只需要说,这个 function 会在这件事发生时运行,剩下的路由、序列化和投递都由 iii 处理。

A Worker is any process that connects to the engine and registers functions and triggers.

Worker 是任何连接到引擎并注册 functions 和 triggers 的进程。

A TypeScript API service is a worker. A Python ML pipeline is a worker. A Rust microservice is a worker. And an agent is a worker.

一个 TypeScript API 服务是 worker。一个 Python ML 流水线是 worker。一个 Rust 微服务是 worker。一个代理也是 worker。

This is the idea that changes everything. An agent connects to the engine, registers functions and triggers, persists context through state::set, hands off work through queue-backed triggers, and broadcasts results via pub/sub. It doesn’t call “the backend” through a separate integration layer. It participates in the same system, with the same primitives, as everything else.

这就是改变一切的那个想法。代理连接到引擎,注册 functions 和 triggers,通过 state::set 持久化上下文,通过基于队列的 trigger 交接工作,再通过 pub/sub 广播结果。它不是通过单独的集成层去调用后端。它和其他一切一样,直接参与同一个系统,使用同一套原语。

const iii = registerWorker('ws://localhost:49134', { workerName: 'agentic-backend' })

iii.registerFunction('agents::researcher', async (data) => { // the unit of work
  // Python Worker: requests + duckduckgo-search
  const sources = await iii.trigger({
    function_id: 'web::search',
    payload: { query: data.topic, limit: 10 }
  })
  // Rust Worker: scraper + tokio, fetched in parallel
  const pages = await iii.trigger({
    function_id: 'web::scrape',
    payload: { urls: sources.map(s => s.url) }
  })
  // TypeScript Worker: wraps the OpenAI SDK
  const findings = await iii.trigger({
    function_id: 'llm::summarize',
    payload: { topic: data.topic, documents: pages }
  })
  await iii.trigger({ // Rust Worker: persist to shared state
    function_id: 'state::set',
    payload: { scope: 'research-tasks', key: data.task_id, value: findings }
  })
  iii.trigger({ // TypeScript Worker: hand off to the critic
    function_id: 'agents::critic',
    payload: { task_id: data.task_id },
    action: TriggerAction.Enqueue({ queue: 'agent-tasks' }) // run in the queue
  })
  return findings
})

iii.registerTrigger({ // HTTP entrypoint
  type: 'http',
  function_id: 'agents::researcher',
  config: { api_path: '/agents/research', http_method: 'POST' }
})

iii.registerTrigger({ // also runs on a pending state row
  type: 'state',
  function_id: 'agents::researcher',
  config: { scope: 'research-tasks', condition: 'status == "pending"' }
})
const iii = registerWorker('ws://localhost:49134', { workerName: 'agentic-backend' })

iii.registerFunction('agents::researcher', async (data) => { // the unit of work
  // Python Worker: requests + duckduckgo-search
  const sources = await iii.trigger({
    function_id: 'web::search',
    payload: { query: data.topic, limit: 10 }
  })
  // Rust Worker: scraper + tokio, fetched in parallel
  const pages = await iii.trigger({
    function_id: 'web::scrape',
    payload: { urls: sources.map(s => s.url) }
  })
  // TypeScript Worker: wraps the OpenAI SDK
  const findings = await iii.trigger({
    function_id: 'llm::summarize',
    payload: { topic: data.topic, documents: pages }
  })
  await iii.trigger({ // Rust Worker: persist to shared state
    function_id: 'state::set',
    payload: { scope: 'research-tasks', key: data.task_id, value: findings }
  })
  iii.trigger({ // TypeScript Worker: hand off to the critic
    function_id: 'agents::critic',
    payload: { task_id: data.task_id },
    action: TriggerAction.Enqueue({ queue: 'agent-tasks' }) // run in the queue
  })
  return findings
})

iii.registerTrigger({ // HTTP entrypoint
  type: 'http',
  function_id: 'agents::researcher',
  config: { api_path: '/agents/research', http_method: 'POST' }
})

iii.registerTrigger({ // also runs on a pending state row
  type: 'state',
  function_id: 'agents::researcher',
  config: { scope: 'research-tasks', condition: 'status == "pending"' }
})

Three calls. registerFunction defines the work. registerTrigger binds it to the world — in this case an HTTP endpoint and a state change reaction, for the same function. The researcher is now callable via a POST request and automatically fires whenever a research task enters a pending state. Add another trigger and it also runs on a cron schedule. The function doesn’t change. The triggers compose.

三次调用。registerFunction 定义工作。registerTrigger 把它绑定到现实世界里,在这个例子里,是一个 HTTP 端点,以及 一个状态变化触发器,二者都对应同一个 function。现在,这个 researcher 既可以通过一次 POST 请求调用,也会在研究任务进入 pending 状态时自动触发。再加一个 trigger,它还可以按 cron 调度运行。function 本身不用变。triggers 可以自由组合。

The agent stores state with the same trigger() call a payment service would use. It hands off to the critic through the same queue mechanism an order pipeline would use. The agent’s “tools” are functions. Its “memory” is state. Its “orchestration” is triggers and composition. There is no special agent infrastructure because there doesn’t need to be.

代理用和支付服务一样的 trigger() 调用来存储状态。它通过和订单流水线一样的队列机制把工作交给 critic。代理的工具就是 functions。它的记忆就是 state。它的编排就是 triggers 和组合。这里不需要一套特殊的代理基础设施,因为根本没这个必要。

The harness is the backend.

harness 就是 后端。

Workers all the way down

一路到底都是 worker

This goes deeper than agents fitting into a backend. It’s about what iii considers a primitive and what happens when one primitive, in just a few lines of code, is the answer to every question.

这比代理融入后端更深一层。关键在于 iii 把什么看作原语,以及当一个原语只用几行代码就能回答所有问题时,会发生什么。

In most platforms, every new capability is a new category. Need queues? Evaluate queue products. Need streaming? Different product. Sandboxing? Another. Each has its own internals, its own lifecycle, its own integration story. The platform is a catalog. Your job is to shop it and assemble it.

在大多数平台里,每多一种能力,就多一个类别。需要队列,就去评估队列产品。需要流处理,就是另一种产品。需要沙箱,又是另一种。每一种都有自己的内部机制、自己的生命周期、自己的集成方式。平台是一个商品目录,你的工作是去挑,然后拼起来。

In iii, the answer to almost any question is the same: add a worker, which in turn registers triggers and functions.

在 iii 里,几乎所有问题的答案都一样,加一个 worker,然后由它去注册 triggers 和 functions。

I want sandboxing. Add a worker. I want an agent that researches topics. Add a worker. I want real-time streaming. Add a worker. I want go-to-market capabilities like lead scoring, email sequences, CRM sync. Add a worker. I want cron scheduling. It’s already a worker. I want observability. Already a worker.

想要沙箱。加一个 worker。想要一个能研究主题的代理。加一个 worker。想要实时流处理。加一个 worker。想要市场拓展能力,比如线索评分、邮件序列、CRM 同步。加一个 worker。想要 cron 调度。它已经是一个 worker。想要可观测性。也已经是一个 worker。

The worker connects, registers what it can do, and the system absorbs it: live, discoverable, observable. The answer doesn’t change based on what kind of capability you’re adding. It doesn’t change based on language, or whether it’s infrastructure or business logic, or whether a human or an agent is creating it. Add a worker.

worker 连上系统,注册自己能做什么,系统就把它吸收进来,实时、可发现、可观测。答案不会因为你加的是哪种能力而改变。不会因为语言不同而改变。不会因为它属于基础设施还是业务逻辑而改变。不会因为它是人写出来的还是代理创建出来的而改变。加一个 worker。

This is not just architectural uniformity. It’s a collapse of categories. In traditional systems, every capability lives in its own ontology. Queues have broker semantics, HTTP has routing semantics, cron has scheduling semantics, agents have orchestration semantics. In iii, they are all the same thing: a process that registers functions and triggers. The semantics live in the functions, not in the infrastructure.

这不只是架构上的统一,而是类别本身的坍塌。在传统系统里,每种能力都活在自己独立的本体里。队列有 broker 语义,HTTP 有路由语义,cron 有调度语义,代理有编排语义。在 iii 里,它们全都变成同一件事,一个注册 functions 和 triggers 的进程。语义存在于 functions 中,而不是基础设施里。

Paradigm shifts in software don’t add features. They collapse categories. “Everything is a file” made Unix composable. Components as functions made React’s mental model stick. In iii, the answer is always “add a worker.” That’s the primitives. That’s the whole model.

软件里的范式切换,从来不是加功能,而是压平类别。一切皆文件,让 Unix 具备了可组合性。组件就是函数,让 React 的心智模型真正站稳。在 iii 里,答案永远是加一个 worker。这就是原语。这就是整个模型。

A live system

一个活着的系统

Because everything is a worker, three properties emerge that traditional architectures cannot produce:

因为一切都是 worker,会自然长出三个传统架构做不到的特性。

Live discovery. When a worker connects, it receives the full catalog of every function registered across every other worker. When new functions appear, every worker gets notified. When a worker disconnects, every worker is notified. The engine is the single source of truth.

实时发现。 当一个 worker 接入时,它会收到其他所有 worker 上已注册 function 的完整目录。新 function 出现时,每个 worker 都会收到通知。worker 断开时,每个 worker 也会收到通知。引擎是唯一真实来源。

For agents, this is also cognitive infrastructure. An agent can see exactly what the entire system can do right now. There is no risk of an agent receiving outdated context.

对代理来说,这也是认知基础设施。代理能准确看到整个系统在 此刻 能做什么。不会有代理拿到过期上下文的风险。

Live extensibility. Add new workers and capabilities to a running iii system without redeploying or redesigning the architecture. There are no config changes and no restarts, because the system extends at runtime

实时扩展。 你可以在不重新部署、不重画架构图的情况下,给正在运行的 iii 系统加进新的 worker 和能力。不需要改配置,也不需要重启,因为系统是在运行时扩展的。

This is how agentic systems actually want to operate. You don’t ever need to interrupt production to add a new capability. You connect a new worker, its functions distribute across the system, and any agent that can use them at will; or even extend the system with their own workers.

这才是代理系统真正想要的运作方式。增加一个新能力时,根本不需要打断生产环境。接入一个新 worker,它的 functions 就会分发到整个系统里,任何能使用它们的代理都可以直接调用,甚至还可以由代理自己扩展出新的 worker。

Live observability. iii’s observability is built on OpenTelemetry. Every function invocation carries a trace ID. Every trigger() call propagates it across workers, across languages, across queue handoffs. Every log emitted through the iii Logger is automatically correlated to the active trace and span, emitted as structured OpenTelemetry LogRecords, and routed to whichever backend you use: the iii Console, Grafana, Jaeger, Datadog. This isn’t a separate component to install and integrate, it’s just another worker Traces, metrics, and structured logs are produced by the engine itself, not by application-level middleware.

实时可观测。 iii 的可观测性建立在 OpenTelemetry 之上。每次 function 调用都会带着一个 trace ID。每次 trigger() 调用,都会把它跨 worker、跨语言、跨队列交接一路传下去。所有通过 iii Logger 发出的日志,都会自动关联到当前 trace 和 span,以结构化的 OpenTelemetry LogRecords 形式输出,再路由到你使用的后端里,比如 iii Console、Grafana、Jaeger、Datadog。这不是一个需要单独安装和集成的组件,它只是另一个 worker。Traces、metrics 和结构化日志,都由引擎本身产生,而不是靠应用层中间件来补。

When an agent calls a tool that enqueues a message that triggers a downstream function that writes to state, the entire chain is one trace. Not three separate systems connected with timestamp correlation or manually tracked trace ids. One trace, across languages, across workers, across the agent-backend boundary. You go from a slow waterfall span directly to the correlated logs that explain what happened.

当代理调用一个工具,这个工具把消息放进队列,触发下游 function,再把结果写入 state,整条链路就是一个 trace。不是三个彼此独立、只能靠时间戳关联或者手工追踪 trace id 的系统。而是一条 trace,跨语言、跨 worker、跨代理和后端的边界。你可以从一个缓慢的 waterfall span,直接跳到能解释发生了什么的关联日志。

Agents that create workers

会创建 worker 的代理

Here is where the model gets truly recursive.

到这里,这个模型开始真正地递归起来。

iii supports hardware-isolated microVM workers with hardware isolation. The sandbox functionality itself is a worker with its own filesystem, network stack, and process tree. You create a worker with a single command: iii worker add ./my-worker. The sandbox worker connects to the engine, registers functions and triggers, and participates in the system exactly like every other worker.

iii 支持具备硬件隔离的 microVM worker。沙箱能力本身就是一个 worker,它有自己的文件系统、网络栈和进程树。你只需要一条命令就能创建一个 worker,iii worker add ./my-worker。这个 sandbox worker 连上引擎,注册 functions 和 triggers,然后像其他所有 worker 一样参与整个系统。

Now consider what happens when an agent can do this.

现在想想,当代理也能做这件事时会发生什么。

An agent worker can also spin up a new sandbox worker at runtime. That sandbox gets its own isolated environment. It registers its own functions and triggers. Those functions immediately appear in the live catalog. Other agents and services can invoke them. When the sandbox is no longer needed, it disconnects and its functions unregister.

一个代理 worker 也可以在运行时拉起一个新的 sandbox worker。这个 sandbox 会拥有自己的隔离环境。它会注册自己的 functions 和 triggers。这些 functions 会立刻出现在实时目录里。其他代理和服务都能调用它们。等这个 sandbox 不再需要时,它断开连接,自己的 functions 也会随之注销。

The sandbox is not a separate “sandbox product.” It is a worker, using the same primitives as everything else it just happens to provide hardware isolation. An agent creating a sandbox worker is just one worker creating another.

sandbox 不是一个独立的沙箱产品。它只是一个 worker,和其他东西一样使用同一套原语,只不过它提供的是硬件隔离。代理创建一个 sandbox worker,本质上只是一个 worker 创建了另一个 worker。

This is what it looks like when infrastructure becomes a design pattern instead of a product category. Need isolated execution for untrusted code? That’s a sandbox worker. Need a temporary specialist agent? Spin up a worker, register functions, shut it down when finished. Need a fleet of parallel task executors? Have a worker spin up other workers. The primitive is the same. The pattern varies.

这就是当基础设施变成设计模式,而不再是产品类别时会呈现出的样子。需要为不受信任的代码提供隔离执行。那就是一个 sandbox worker。需要一个临时的专家代理。拉起一个 worker,注册 functions,用完再关掉。需要一群并行任务执行器。让一个 worker 去拉起其他 worker。原语不变,变化的是模式。

The distinction disappears

那个区分会消失

Go back to the harness debate. Anthropic says thin. LangGraph says thick. They’re arguing about how much cognitive structure to encode around the model. The thin-vs-thick debate matters, but it’s a question within a design space, not about the design space itself.

再回到 harness 这场争论。Anthropic 说要薄。LangGraph 说要厚。他们争论的是,围绕模型到底该编码多少认知结构。薄还是厚,这个问题确实重要,但它只是设计空间 内部 的问题,不是设计空间本身的问题。

When agents are workers, thin versus thick is just a question of how many functions you register and how you compose them. A thin harness is an agent worker with a few functions that lets the model decide what to trigger() next. A thick harness is an agent worker with more functions, explicit approval gates, and conditional logic before enqueuing the next step. It’s the same primitives and system, but a different pattern.

当代理就是 worker 时,薄和厚只取决于你注册多少 functions,以及怎样把它们组合起来。薄 harness 是一个只有少量 functions 的代理 worker,让模型自己决定下一步该 trigger() 什么。厚 harness 是一个拥有更多 functions、显式审批关卡和条件逻辑的代理 worker,在把下一步放进队列前先进行控制。原语和系统都一样,只是模式不同。

The scaffolding metaphor shifts too. The industry talks about harness scaffolding as temporary. As models improve, you remove it. Manus has described rebuilding Claude’s agent framework four times, with each rewrite the result of discovering a better way to shape context. Claude Code strips planning steps as new models absorb the capability.

脚手架这个比喻也会跟着变化。行业里谈 harness 脚手架时,总把它当作临时的东西。模型变强了,就把它拆掉。Manus 说过,他们把 Claude 的代理框架重建了四次,每一次重写,都是因为找到了塑造上下文的更好方式。Claude Code 也在随着新模型吸收能力而删去规划步骤。

If the harness is built from the same primitives as the rest of the backend, then removing scaffolding just means simplifying a function. You don’t rearchitect an integration layer. You don’t rebuild the interface between two systems. You just register fewer functions, or compose them differently.

如果 harness 本身就是用和后端其余部分相同的原语搭起来的,那拆掉脚手架,其实就只是把某个 function 简化掉。你不需要重构一整层集成架构。不需要重建两个系统之间的接口。你只需要注册更少的 functions,或者换一种组合方式。

Anything is a worker

任何东西都可以是 worker

A worker is anything that can open a WebSocket, register a function, and speak the primitives interface. There is no constraint on what that thing is or what language it’s written in.

worker 可以是任何能打开 WebSocket、注册 function、并且会说这套原语接口的东西。它不受对象类型限制,也不受语言限制。

iii ships SDKs for TypeScript, Python, and Rust. But those aren’t the boundaries of the system. They’re three implementations of an open wire protocol: JSON over WebSocket. The engine doesn’t know what language is on the other end of the connection. It sees functions, triggers, and a connection. If your team writes Go, or Java, or Swift, or Zig, you write a small SDK that speaks the protocol and you’re a first-class participant. The primitives interface is the contract. Everything else is a design pattern.

iii 提供 TypeScript、Python 和 Rust 的 SDK。但那不是系统的边界。那只是一个开放线协议的三种实现,JSON over WebSocket。引擎并不知道连接另一端是什么语言。它看到的是 functions、triggers 和一条连接。如果你的团队用 Go、Java、Swift 或 Zig,那就写一个会说这套协议的小 SDK,你就能成为系统里的一等公民。原语接口才是契约。其他一切都只是设计模式。

This means the set of what can be a worker is genuinely unbounded. A Node.js service. A Python ML pipeline. An agent. A queue. A sandbox running inside a microVM. A browser. iii ships a browser SDK, so a tab on someone’s laptop can register functions, participate in live discovery, invoke backend functions, and be invoked by backend functions. The browser is in the system the same way a Kubernetes pod is.

这意味着,什么东西可以成为 worker,这个集合是真正没有上限的。一个 Node.js 服务。一个 Python ML 流水线。一个代理。一个队列。一个运行在 microVM 里的 sandbox。一个浏览器。iii 还提供了浏览器 SDK,所以某个人笔记本上的一个标签页,也能注册 functions、参与实时发现、调用后端 functions,也能被后端 functions 反过来 调用。浏览器进入系统的方式,和一个 Kubernetes pod 没有区别。

A Raspberry Pi is a worker. An IoT sensor at the edge is a worker. A phone running a thin client is a worker. A CI runner that spins up, registers a function, does work, and disconnects is a worker. The engine doesn’t distinguish between these. Every new language, every new device, every new runtime that implements the primitives interface gets the full system for free: live discovery, live extensibility, live observability, durable triggers, cross-everything invocation. Not because we built a special integration for each one, but because the primitive allows this composition.

一个 Raspberry Pi 是 worker。边缘侧的 IoT 传感器是 worker。运行轻客户端的手机是 worker。一个启动后注册 function、干完活再断开的 CI runner 也是 worker。引擎不会区别对待它们。每一种新语言、每一种新设备、每一种新运行时,只要实现了这套原语接口,就能免费获得整个系统,实时发现、实时扩展、实时可观测、持久化 trigger,以及跨一切的调用能力。不是因为我们为每一种情况都做了单独集成,而是因为这套原语本身允许这种组合。

The bet

这场下注

The industry is debating how much scaffolding to wrap around the model. That debate matters, but it takes for granted that the harness is its own world, separate from the backend and separate from the infrastructure that actually runs when a tool fires.

整个行业都在争论,到底要在模型外面包多少脚手架。这场争论确实重要,但它默认了 harness 是它自己的世界,和后端分离,也和工具真正触发时运行的那层基础设施分离。

iii makes a different bet: that the right primitives (worker, trigger, function) are small enough and universal enough that the question “what can participate in this system?” has the answer: anything. A cloud service. An agent. A browser. A microcontroller. A sandbox an agent just spun up. They all compose the same way. They all discover each other. They all trace the same.

iii 的下注不一样。它认为,只要原语足够小、足够通用,也就是 worker、trigger、function,那么什么能参与这个系统,这个问题的答案就会变成,任何东西。一个云服务。一个代理。一个浏览器。一个微控制器。一个刚刚被代理拉起来的 sandbox。它们都以同样的方式组合。彼此都能发现对方。它们都共享同一条追踪链路。

When you stop treating “agent infrastructure” as separate from “backend infrastructure,” and when you stop treating any category of participant as architecturally different from any other, the system simplifies in a way that adding features never achieves. The boundaries between harness and backend, between cloud and edge, between infrastructure and application, and between human-written services and agent-created workers all dissolve into the same three primitives.

当你不再把代理基础设施和后端基础设施分开看,也不再把 任何一种 参与者当成架构上有本质差异的类别时,系统会以一种单纯加功能永远做不到的方式被简化。harness 和后端之间,云和边缘之间,基础设施和应用之间,人写的服务和代理创建的 worker 之间,那些边界都会融化成同样的三个原语。

The harness isn’t on top of the backend. The harness is a part of the backend. And the backend is whatever connects to iii.

harness 不是压在后端上面的一层。harness 本来就是 后端的一部分。而后端,就是任何连接到 iii 的东西。

When you get the primitives right, the categories collapse and complexity is radically simplified.

只要原语选对了,类别就会坍塌,复杂度也会被大幅压平。

iii is open source. Get started with our quickstart.

iii 已经开源。可以从我们的 quickstart 开始。

The most important architectural question in AI infrastructure right now isn’t which model to use. It’s how much infrastructure is required to build something useful with it.

Anthropic, OpenAI, CrewAI, LangChain all call that wrapping the agent harness. The harness includes the orchestration loop, tools (MCP, A2A), memory, context management, and error handling that make a model useful. They all agree the model isn’t the product. The infrastructure is. They disagree deeply on how much of it should exist.

Anthropic keeps their harness thin. It’s an elegant loop: Assemble the prompt, call the model, execute tool calls, and repeat. The model decides everything. OpenAI adds more structure: instruction stacks, orchestration modes, and explicit handoff patterns. CrewAI takes a multi-pronged approach: deterministic Flows for routing and validation, autonomous agents for the rest. LangGraph has the biggest harness Every decision is a node, every transition a defined edge, the entire workflow encoded in the harness.

The spectrum runs from strongly trusting the model and weakly encoding the logic to weakly trusting the model, and strongly encoding the logic. And every team building with agents has to choose what size of harness they need.

But there’s an assumption buried in the debate that nobody is questioning: that the harness is extrinsic to the traditional backend.

The agent’s loop, its tools, and its memory live in one layer: the harness. While execution infrastructure such as queues, state, HTTP routing, server side rendering, observability, and all other backend components live in another: “the backend”.

I believe that this is temporary and it’s just a small step along the way to true adoption and acceptance of agentic infrastructure into “the backend”.

How Agents Work Today

Here’s how most agentic architectures work. The harness is a Python process (or TypeScript, or a managed framework) that wraps the model. When the agent decides to act, the harness translates a tool call into an HTTP request which in turn triggers something to happen on the backend like a queue publish or a database write. The backend is its own world that is kept separate from the agents.

The harness retries on its own schedule, the queue retries on its own conditions, and the HTTP layer manages its own timeouts. There is no trace directly connecting these disparate systems. When something breaks debugging means correlating logs across systems and reconstructing observed behavior. This is a common process in backend engineering but whereas prior systems were largely deterministic, agents are stochastic at best. .

With every additional agent the probabilities widen and at the most basic level are agents^2 * services. Put another way, 1 agent and 5 backend systems is 5 stochastic paths to debug. 4 agents and 5 backend systems are 80 stochastic paths to debug.

There is no good way to make agents more deterministic, much of their basic functionality is intended to give varied answers for similar and even identical inputs. They’re not stochastic by chance, they’re stochastic by intention because they make computers useful in a brand new way. The billion dollar question is how to handle agents properly by creating the correct harnesses in the correct contexts.

Taking a Step Back

The fundamental promise of harnesses today is that they are trying to operate a new paradigm (stochastic LLMs) within an old one (deterministic backends). It’s not that construction of agent harnesses is inherently wrong, it’s that effective solutions must begin with the deconstruction of what a backend is.

Most of us have up until very recently taken the backend and how it works for granted; including myself. Without agents and the LLMs that power them I probably would have never thought about this problem before. So I embarked on a journey to figure out the fundamental building blocks of a backend.

At first I thought that backends are collections of services that exist in categories of products and are assembled with libraries, integrations, architecture diagrams, orchestration code, and the list kept growing. Eventually I realized that I was approaching this solution from the top down instead of bottom up. Once I realized that the backend became very simple:

A backend is composed of three essential elements: workers that orchestrate work, triggers that invoke these services, and functions within the services that do the actual work.

Abstracting the Backend

Once I realized this it became clear that I, and my very talented team, could build a backend using this abstraction. Far from an academic exercise we’ve found this abstraction has very real utility both in the agentic world and more broadly as our abstraction completely encapsulates the execution context of “the backend”. So we built iii to make that abstraction available to everyone.

iii works just like my description above:

It defines Function as a unit of work with a stable identifier (ex. orders::validate) that receives input, and optionally returns output it can live in any process, and in any language.

A Trigger is what causes a function to run; it can be a direct call to a function, an HTTP endpoint, a cron schedule, a queue subscription, a state change, a stream event, or anything else. Triggers are declarative: the worker says “this function runs when this thing happens,” and iii handles routing, serialization, and delivery.

A Worker is any process that connects to the engine and registers functions and triggers.

A TypeScript API service is a worker. A Python ML pipeline is a worker. A Rust microservice is a worker. And an agent is a worker.

This is the idea that changes everything. An agent connects to the engine, registers functions and triggers, persists context through state::set, hands off work through queue-backed triggers, and broadcasts results via pub/sub. It doesn’t call “the backend” through a separate integration layer. It participates in the same system, with the same primitives, as everything else.

const iii = registerWorker('ws://localhost:49134', { workerName: 'agentic-backend' })

iii.registerFunction('agents::researcher', async (data) => { // the unit of work
  // Python Worker: requests + duckduckgo-search
  const sources = await iii.trigger({
    function_id: 'web::search',
    payload: { query: data.topic, limit: 10 }
  })
  // Rust Worker: scraper + tokio, fetched in parallel
  const pages = await iii.trigger({
    function_id: 'web::scrape',
    payload: { urls: sources.map(s => s.url) }
  })
  // TypeScript Worker: wraps the OpenAI SDK
  const findings = await iii.trigger({
    function_id: 'llm::summarize',
    payload: { topic: data.topic, documents: pages }
  })
  await iii.trigger({ // Rust Worker: persist to shared state
    function_id: 'state::set',
    payload: { scope: 'research-tasks', key: data.task_id, value: findings }
  })
  iii.trigger({ // TypeScript Worker: hand off to the critic
    function_id: 'agents::critic',
    payload: { task_id: data.task_id },
    action: TriggerAction.Enqueue({ queue: 'agent-tasks' }) // run in the queue
  })
  return findings
})

iii.registerTrigger({ // HTTP entrypoint
  type: 'http',
  function_id: 'agents::researcher',
  config: { api_path: '/agents/research', http_method: 'POST' }
})

iii.registerTrigger({ // also runs on a pending state row
  type: 'state',
  function_id: 'agents::researcher',
  config: { scope: 'research-tasks', condition: 'status == "pending"' }
})

Three calls. registerFunction defines the work. registerTrigger binds it to the world — in this case an HTTP endpoint and a state change reaction, for the same function. The researcher is now callable via a POST request and automatically fires whenever a research task enters a pending state. Add another trigger and it also runs on a cron schedule. The function doesn’t change. The triggers compose.

The agent stores state with the same trigger() call a payment service would use. It hands off to the critic through the same queue mechanism an order pipeline would use. The agent’s “tools” are functions. Its “memory” is state. Its “orchestration” is triggers and composition. There is no special agent infrastructure because there doesn’t need to be.

The harness is the backend.

Workers all the way down

This goes deeper than agents fitting into a backend. It’s about what iii considers a primitive and what happens when one primitive, in just a few lines of code, is the answer to every question.

In most platforms, every new capability is a new category. Need queues? Evaluate queue products. Need streaming? Different product. Sandboxing? Another. Each has its own internals, its own lifecycle, its own integration story. The platform is a catalog. Your job is to shop it and assemble it.

In iii, the answer to almost any question is the same: add a worker, which in turn registers triggers and functions.

I want sandboxing. Add a worker. I want an agent that researches topics. Add a worker. I want real-time streaming. Add a worker. I want go-to-market capabilities like lead scoring, email sequences, CRM sync. Add a worker. I want cron scheduling. It’s already a worker. I want observability. Already a worker.

The worker connects, registers what it can do, and the system absorbs it: live, discoverable, observable. The answer doesn’t change based on what kind of capability you’re adding. It doesn’t change based on language, or whether it’s infrastructure or business logic, or whether a human or an agent is creating it. Add a worker.

This is not just architectural uniformity. It’s a collapse of categories. In traditional systems, every capability lives in its own ontology. Queues have broker semantics, HTTP has routing semantics, cron has scheduling semantics, agents have orchestration semantics. In iii, they are all the same thing: a process that registers functions and triggers. The semantics live in the functions, not in the infrastructure.

Paradigm shifts in software don’t add features. They collapse categories. “Everything is a file” made Unix composable. Components as functions made React’s mental model stick. In iii, the answer is always “add a worker.” That’s the primitives. That’s the whole model.

A live system

Because everything is a worker, three properties emerge that traditional architectures cannot produce:

Live discovery. When a worker connects, it receives the full catalog of every function registered across every other worker. When new functions appear, every worker gets notified. When a worker disconnects, every worker is notified. The engine is the single source of truth.

For agents, this is also cognitive infrastructure. An agent can see exactly what the entire system can do right now. There is no risk of an agent receiving outdated context.

Live extensibility. Add new workers and capabilities to a running iii system without redeploying or redesigning the architecture. There are no config changes and no restarts, because the system extends at runtime

This is how agentic systems actually want to operate. You don’t ever need to interrupt production to add a new capability. You connect a new worker, its functions distribute across the system, and any agent that can use them at will; or even extend the system with their own workers.

Live observability. iii’s observability is built on OpenTelemetry. Every function invocation carries a trace ID. Every trigger() call propagates it across workers, across languages, across queue handoffs. Every log emitted through the iii Logger is automatically correlated to the active trace and span, emitted as structured OpenTelemetry LogRecords, and routed to whichever backend you use: the iii Console, Grafana, Jaeger, Datadog. This isn’t a separate component to install and integrate, it’s just another worker Traces, metrics, and structured logs are produced by the engine itself, not by application-level middleware.

When an agent calls a tool that enqueues a message that triggers a downstream function that writes to state, the entire chain is one trace. Not three separate systems connected with timestamp correlation or manually tracked trace ids. One trace, across languages, across workers, across the agent-backend boundary. You go from a slow waterfall span directly to the correlated logs that explain what happened.

Agents that create workers

Here is where the model gets truly recursive.

iii supports hardware-isolated microVM workers with hardware isolation. The sandbox functionality itself is a worker with its own filesystem, network stack, and process tree. You create a worker with a single command: iii worker add ./my-worker. The sandbox worker connects to the engine, registers functions and triggers, and participates in the system exactly like every other worker.

Now consider what happens when an agent can do this.

An agent worker can also spin up a new sandbox worker at runtime. That sandbox gets its own isolated environment. It registers its own functions and triggers. Those functions immediately appear in the live catalog. Other agents and services can invoke them. When the sandbox is no longer needed, it disconnects and its functions unregister.

The sandbox is not a separate “sandbox product.” It is a worker, using the same primitives as everything else it just happens to provide hardware isolation. An agent creating a sandbox worker is just one worker creating another.

This is what it looks like when infrastructure becomes a design pattern instead of a product category. Need isolated execution for untrusted code? That’s a sandbox worker. Need a temporary specialist agent? Spin up a worker, register functions, shut it down when finished. Need a fleet of parallel task executors? Have a worker spin up other workers. The primitive is the same. The pattern varies.

The distinction disappears

Go back to the harness debate. Anthropic says thin. LangGraph says thick. They’re arguing about how much cognitive structure to encode around the model. The thin-vs-thick debate matters, but it’s a question within a design space, not about the design space itself.

When agents are workers, thin versus thick is just a question of how many functions you register and how you compose them. A thin harness is an agent worker with a few functions that lets the model decide what to trigger() next. A thick harness is an agent worker with more functions, explicit approval gates, and conditional logic before enqueuing the next step. It’s the same primitives and system, but a different pattern.

The scaffolding metaphor shifts too. The industry talks about harness scaffolding as temporary. As models improve, you remove it. Manus has described rebuilding Claude’s agent framework four times, with each rewrite the result of discovering a better way to shape context. Claude Code strips planning steps as new models absorb the capability.

If the harness is built from the same primitives as the rest of the backend, then removing scaffolding just means simplifying a function. You don’t rearchitect an integration layer. You don’t rebuild the interface between two systems. You just register fewer functions, or compose them differently.

Anything is a worker

A worker is anything that can open a WebSocket, register a function, and speak the primitives interface. There is no constraint on what that thing is or what language it’s written in.

iii ships SDKs for TypeScript, Python, and Rust. But those aren’t the boundaries of the system. They’re three implementations of an open wire protocol: JSON over WebSocket. The engine doesn’t know what language is on the other end of the connection. It sees functions, triggers, and a connection. If your team writes Go, or Java, or Swift, or Zig, you write a small SDK that speaks the protocol and you’re a first-class participant. The primitives interface is the contract. Everything else is a design pattern.

This means the set of what can be a worker is genuinely unbounded. A Node.js service. A Python ML pipeline. An agent. A queue. A sandbox running inside a microVM. A browser. iii ships a browser SDK, so a tab on someone’s laptop can register functions, participate in live discovery, invoke backend functions, and be invoked by backend functions. The browser is in the system the same way a Kubernetes pod is.

A Raspberry Pi is a worker. An IoT sensor at the edge is a worker. A phone running a thin client is a worker. A CI runner that spins up, registers a function, does work, and disconnects is a worker. The engine doesn’t distinguish between these. Every new language, every new device, every new runtime that implements the primitives interface gets the full system for free: live discovery, live extensibility, live observability, durable triggers, cross-everything invocation. Not because we built a special integration for each one, but because the primitive allows this composition.

The bet

The industry is debating how much scaffolding to wrap around the model. That debate matters, but it takes for granted that the harness is its own world, separate from the backend and separate from the infrastructure that actually runs when a tool fires.

iii makes a different bet: that the right primitives (worker, trigger, function) are small enough and universal enough that the question “what can participate in this system?” has the answer: anything. A cloud service. An agent. A browser. A microcontroller. A sandbox an agent just spun up. They all compose the same way. They all discover each other. They all trace the same.

When you stop treating “agent infrastructure” as separate from “backend infrastructure,” and when you stop treating any category of participant as architecturally different from any other, the system simplifies in a way that adding features never achieves. The boundaries between harness and backend, between cloud and edge, between infrastructure and application, and between human-written services and agent-created workers all dissolve into the same three primitives.

The harness isn’t on top of the backend. The harness is a part of the backend. And the backend is whatever connects to iii.

When you get the primitives right, the categories collapse and complexity is radically simplified.

iii is open source. Get started with our quickstart.

📋 讨论归档

讨论进行中…