返回列表
🧠 阿头学 · 🪞 Uota学

Agent 的真正锁定来自记忆,而记忆藏在 Harness 里

这篇文章准确抓住了 Agent 时代“状态层会成为护城河”的关键,但它把“开放 Harness”说成唯一解,明显带有推广 Deep Agents 的产品立场。
打开原文 ↗

2026-04-12 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • Harness 不会消失 作者判断“模型不会吞掉 Harness”是对的,因为工具编排、上下文裁剪、状态管理、权限控制这些系统层工作不会凭空消失,最多只是被模型厂商藏进 API 黑箱里。
  • 记忆本质上属于上下文工程 文章最有价值的判断是“记忆不是独立插件”,因为短期记忆、长期记忆、摘要压缩、系统指令暴露方式都受 Harness 控制;谁控制上下文,谁就实质控制记忆效果。
  • 封闭 Harness 会放大迁移成本 这点判断站得住:无状态 API 尚且容易替换,一旦状态、摘要、偏好和长期记忆被托管在厂商内部,切模型、切平台、切框架都会明显变难,锁定就从“可忍受”变成“结构性”。
  • 记忆会成为智能体产品的核心护城河 作者强调“没有记忆的 agent 容易被复刻”,这个判断大体正确,因为模型和工具越来越同质化,真正能积累差异化体验的是用户历史、偏好和行为沉淀。
  • 开放是方向,但不是唯一解 文章把“记忆重要”进一步推导成“必须开放 Harness”,这一步说得过满;更准确的说法应是:至少要保证记忆可见、可导出、可迁移、可审计,而不一定非得整套系统都开源。

跟我们的关联

  • 对 ATou 意味着什么、下一步怎么用 ATou 如果在做 agent 产品判断,不能只看首日效果和模型分数,必须把“状态归属权”列为一等指标;下一步可以把选型标准改成四问:状态存哪、记忆是否可读、能否迁移、上下文压缩规则谁控制。
  • 对 Neta 意味着什么、下一步怎么用 Neta 如果关注长期系统能力,这篇文章提醒你真正值得积累的是“可复用的用户状态资产”,不是一次性 prompt 技巧;下一步可以梳理哪些交互值得沉淀为偏好、画像、任务历史,并优先建立自有存储层。
  • 对 Uota 意味着什么、下一步怎么用 Uota 如果在意体验黏性,这篇文章意味着个性化不是锦上添花,而是留存核心;下一步应把“记忆如何让体验越来越像我”设计成产品主线,而不是只做一次性对话成功率。
  • 对投资/战略意味着什么、下一步怎么用 这篇文章其实指出了 agent 基建的价值捕获点会向 memory+harness 层转移;下一步判断项目时,应区分“卖模型调用的壳”与“真正掌握用户状态资产的基础设施”,后者更可能形成长期壁垒。

讨论引子

1. 如果一个平台不开源,但能完整导出、审计和迁移记忆,它还算“锁定型 Harness”吗? 2. 对大多数团队来说,买托管便利换锁定风险,是短期理性还是长期失误? 3. 未来真正的行业标准会先出现在“记忆格式”层,还是“Harness 协议”层?

Agent Harness 正在成为构建智能体的主流方式,而且不会消失。Harness 和智能体记忆紧密绑定。只要用了封闭的 Harness,尤其是藏在专有 API 后面的那种,本质上就是把智能体记忆的控制权交给第三方。记忆对打造好用、黏性的智能体体验极其重要,这会带来惊人的锁定效应。记忆,以及因此的 Harness,都应该是开放的,这样才能真正拥有自己的记忆。

Agent Harness 才是你构建智能体的方式,而且不会消失

过去三年里,构建智能体系统的最佳方式发生了巨大变化。ChatGPT 刚出来时,能做的基本只是简单的 RAG 链(LangChain)。后来模型强一些,就能搭更复杂的流程(LangGraph)。再后来模型又强了一大截,于是催生了新一类脚手架,Agent Harness。

Claude Code、Deep Agents、Pi(为 OpenClaw 提供能力)、OpenCode、Codex、Letta Code 等等,都是 Agent Harness 的例子。

💡Agent Harness 不会消失。

有一种观点认为,模型会吸收越来越多的脚手架。这不对。真正发生的情况是,2023 年需要的许多脚手架不再需要了,但它们被另一类脚手架替代了。智能体按定义就是一个与工具和其他数据源交互的 LLM。为了促成这种交互,LLM 周围永远会有一套系统。要证据吗?Claude Code 的源码泄露时,代码量是 51.2 万行,那些代码就是 Harness。连世界上做出最佳模型的人,都在重金投入 Harness。

当 OpenAI 和 Anthropic 的 API 内置了网页搜索之类的能力时,那也不是模型的一部分。更准确地说,这是一个轻量 Harness,藏在它们 API 背后,通过工具调用把模型和网页搜索 API 编排起来。

Harness 和记忆绑定

Sarah Wooders 写过一篇很好的博客,解释为什么 记忆不是插件(它就是 Harness),完全赞同。

https://x.com/nfcampos

有时也有人认为,记忆是一个独立服务,和任何具体 Harness 都无关。至少在现在,这不成立。

Harness 的重要职责之一,就是与上下文交互。用 Sarah 的话说:

想把记忆插进一个 Agent Harness,就像想把驾驶插进一辆车。管理上下文,也就等于管理记忆,是 Agent Harness 的核心能力与核心责任。

记忆只是上下文的一种形式。短期记忆(对话里的消息、大型工具调用结果)由 Harness 处理。长期记忆(跨会话记忆)需要由 Harness 读取与更新。Sarah 还列举了很多 Harness 与记忆绑定的方式:

AGENTS.md 或 CLAUDE.md 文件如何加载进上下文? 技能元数据如何展示给智能体?(在 system prompt 里?在 system messages 里?) 智能体能否修改自己的系统指令? 压缩后什么会保留,什么会丢失? 交互是否被存储并可查询? 记忆的元数据如何呈现给智能体? 当前工作目录如何表示?会暴露多少文件系统信息?

现在,记忆这个概念还处在婴儿期。记忆太早期了。说实话,长期记忆往往不在 MVP 里。一般先让智能体能跑通,再谈个性化。这意味着行业仍在摸索记忆,也意味着记忆还没有广为人知或通用的抽象。未来如果记忆变得更成熟,最佳实践被不断发现,独立的记忆系统或许会变得合理。但现在还不行。就像 Sarah 说的那样,最终,Harness 如何管理上下文和整体状态,才是智能体记忆的基础。

如果不拥有 Harness,就不拥有记忆

Harness 与记忆密不可分。

💡只要用了封闭的 Harness,尤其是藏在 API 后面的那种,就不拥有自己的记忆。

这会以几种方式体现出来。

轻度糟糕:如果使用有状态的 API(比如 OpenAI 的 Responses API,或 Anthropic 的服务端压缩),状态就存到了它们的服务器上。想要切换模型并继续之前的线程,就做不到了。

https://docs.langchain.com/oss/python/deepagents/overview

糟糕:如果使用封闭的 Harness(比如 Claude Agent SDK,它底层用的是 Claude Code,而 Claude Code 不是开源的),这个 Harness 如何与记忆交互对你是未知的。也许它会在客户端生成一些产物,但这些东西的形态是什么,Harness 应该如何使用它们,你并不知道,因此也无法从一个 Harness 迁移到另一个 Harness。

https://docs.openclaw.ai/

💡但最糟糕的是另一件事,当整个 Harness(包括长期记忆)都藏在 API 后面。

这种情况下,对记忆没有任何所有权,也看不到记忆内部,包括长期记忆。不了解 Harness,也就不知道该如何使用记忆。更糟的是,连记忆本身都不属于你。也许有些部分通过 API 暴露出来,也许完全没有,无论哪种都无法控制。

https://github.com/langchain-ai/langgraph

当有人说 模型会吸收越来越多的 Harness,他们真正指的就是这个,他们指的是这些与记忆相关的部分,会被塞进模型提供商提供的 API 背后。

💡这非常令人担忧,这意味着记忆会被锁死在单一平台、单一模型上。

模型提供商在这件事上有极强动机,而且已经在这么做。Anthropic 推出了 Claude Managed Agents,把几乎所有东西都放到 API 背后,锁在它的平台里。

即使整个 Harness 没有全部藏在 API 后面,模型提供商也有动机把越来越多的东西搬进 API,并且已经开始这么做。比如 Codex 虽然开源,但它会生成一个加密的压缩摘要,离开 OpenAI 生态就无法使用。

为什么要这么做?因为记忆很重要,它带来的锁定效应,是单靠模型本身得不到的。

记忆很重要,而且会带来锁定效应

尽管记忆还很早期,但所有人都看得出来它很重要。记忆让智能体能在与用户交互中不断变好,并形成数据飞轮。记忆让智能体可以针对每个用户做个性化,打造会贴合用户欲望与使用习惯的智能体体验。

💡没有记忆,只要别人能用到同样的工具,就能轻易复刻你的智能体。

有了记忆,就能积累一个专有数据集,一个由用户交互与偏好构成的数据集。这个专有数据集让你能提供差异化、并且越来越聪明的体验。

到目前为止,切换模型提供商相对容易,它们的 API 很像,甚至几乎一样。提示词需要稍微改一改,但并不难。

这都是因为它们无状态。

一旦有了任何状态,切换就难得多。因为记忆很重要,而一切换就失去它。

说个故事。内部有一个邮件助手,基于 Fleet 的一个模板搭的,Fleet 是我们的无代码平台,用来构建企业级可用的 OpenClaw。这个平台内置了记忆,所以过去几个月里与邮件助手的互动都沉淀成了记忆。几周前,智能体不小心被删了,当时非常恼火。试着用同一个模板再建一个,但体验差太多了,不得不重新教它偏好、语气、所有东西。

邮件智能体被删这件事有一个好处,它让我意识到记忆有多强大、多黏人。

开放记忆,开放 Harness

记忆必须开放,并且由开发智能体体验的人拥有。这样才能真正积累并掌控自己的专有数据集。

记忆,以及因此的 Harness,应该与模型提供商分离。需要的是可选性,能随时试用最适合自己用例的模型。模型提供商有动机通过记忆制造锁定。

这就是为什么在做 Deep Agents。Deep Agents:

  • 是开源的

  • 与模型无关

  • 使用 agents.md 和 skills 这类开放标准

  • 提供 Mongo、Postgres、Redis 等存储记忆的插件

  • 可部署:(1)通过 LangSmith Deployment(可自托管,可部署到任何云上,可自带数据库作为记忆存储);(2)部署在任何标准 Web 托管框架之后

想拥有自己的记忆,就需要使用开放的 Harness

现在就试试 Deep Agents。

感谢几位朋友的审阅与想法:

  • Sydney Runkle,在 Deep Agents 和记忆方向做了很多很棒的工作

  • Viv Trivedy,是 Agent Harness 领域的重要声音

  • Nuno Campos,在金融智能体的上下文工程方面写了很多好内容

  • Sarah Wooders,Letta 的 CTO,这家公司一直站在有状态智能体的最前沿

Agent harnesses are becoming the dominant way to build agents, and they are not going anywhere. These harnesses are intimately tied to agent memory. If you used a closed harness - especially if it’s behind a proprietary API - you are choosing to yield control of your agent’s memory to a third party. Memory is incredibly important to creating good and sticky agentic experiences. This creates incredible lock in. Memory - and therefor harnesses - should be open, so that you own your own memory

Agent Harness 正在成为构建智能体的主流方式,而且不会消失。Harness 和智能体记忆紧密绑定。只要用了封闭的 Harness,尤其是藏在专有 API 后面的那种,本质上就是把智能体记忆的控制权交给第三方。记忆对打造好用、黏性的智能体体验极其重要,这会带来惊人的锁定效应。记忆,以及因此的 Harness,都应该是开放的,这样才能真正拥有自己的记忆。

Agent Harnesses are how you build agents, and they’re not going anywhere

Agent Harness 才是你构建智能体的方式,而且不会消失

The “best” way to build agentic systems has changed dramatically over the past three years. When ChatGPT came out, all you could do were simple RAG chains (LangChain). Then the models got a little better, and could create more complex flows (LangGraph). Then they got a lot better, and that gave rise to a new type of scaffolding - agent harnesses.

过去三年里,构建智能体系统的最佳方式发生了巨大变化。ChatGPT 刚出来时,能做的基本只是简单的 RAG 链(LangChain)。后来模型强一些,就能搭更复杂的流程(LangGraph)。再后来模型又强了一大截,于是催生了新一类脚手架,Agent Harness。

Examples of agent harnesses include Claude Code, Deep Agents, Pi (powers OpenClaw), OpenCode, Codex, Letta Code, and many more.

Claude Code、Deep Agents、Pi(为 OpenClaw 提供能力)、OpenCode、Codex、Letta Code 等等,都是 Agent Harness 的例子。

💡Agent harnesses are not going away.

💡Agent Harness 不会消失。

There is sometimes sentiment that models will absorb more and more of the scaffolding. This is not true. What has happened (and will continue to happen) is that a lot of the scaffolding needed in 2023 is no longer needed. But this has been replaced by other types of scaffolding. An agent, by definition, is an LLM interacting with tools and other sources of data. There will always be a system around the LLM to facilitate that type of interaction. Need evidence? When Claude Code’s source code was leaked, there was 512k lines of code. That code is the harness. Even the makers of the best model in the world are investing heavily in harnesses.

有一种观点认为,模型会吸收越来越多的脚手架。这不对。真正发生的情况是,2023 年需要的许多脚手架不再需要了,但它们被另一类脚手架替代了。智能体按定义就是一个与工具和其他数据源交互的 LLM。为了促成这种交互,LLM 周围永远会有一套系统。要证据吗?Claude Code 的源码泄露时,代码量是 51.2 万行,那些代码就是 Harness。连世界上做出最佳模型的人,都在重金投入 Harness。

When things like web search are built into OpenAI and Anthropic’s APIs - they are not “part of the model”. Rather, they are part of a lightweight harness that sits behind their APIs and orchestrates the model with those web search APIs (via nothing other than tool calling).

当 OpenAI 和 Anthropic 的 API 内置了网页搜索之类的能力时,那也不是模型的一部分。更准确地说,这是一个轻量 Harness,藏在它们 API 背后,通过工具调用把模型和网页搜索 API 编排起来。

Harnesses are tied to memory

Harness 和记忆绑定

Sarah Wooders wrote a great blog on why “memory isn’t a plugin (it’s the harness)”, and I couldn’t agree with it more.

Sarah Wooders 写过一篇很好的博客,解释为什么 记忆不是插件(它就是 Harness),完全赞同。

There is sometimes sentiment that memory is a standalone service, separate from any particular harness. At this point in time, that is not true.

有时也有人认为,记忆是一个独立服务,和任何具体 Harness 都无关。至少在现在,这不成立。

A large responsibility of the harness is to interact with context. As Sarah puts it:

Harness 的重要职责之一,就是与上下文交互。用 Sarah 的话说:

Asking to plug memory into an agent harness is like asking to plug driving into a car. Managing context, and therefore memory, is a core capability and responsibility of the agent harness.

想把记忆插进一个 Agent Harness,就像想把驾驶插进一辆车。管理上下文,也就等于管理记忆,是 Agent Harness 的核心能力与核心责任。

Memory is just a form of context. Short term memory (messages in the conversation, large tool call results) are handled by the harness. Long term memory (cross session memory) needs to be updated and read by the harness. Sarah lists out many other ways the harness is tied to memory:

记忆只是上下文的一种形式。短期记忆(对话里的消息、大型工具调用结果)由 Harness 处理。长期记忆(跨会话记忆)需要由 Harness 读取与更新。Sarah 还列举了很多 Harness 与记忆绑定的方式:

How is the AGENTS.md or CLAUDE.md file loaded into context? How is skill metadata shown to the agents? (in the system prompt? in system messages?) Can the agent modify its own system instructions? What survives compaction, and what's lost? Are interactions stored and made queryable? How is memory metadata presented to the agent? How is the current working directory represented? How much filesystem information is exposed?

AGENTS.md 或 CLAUDE.md 文件如何加载进上下文? 技能元数据如何展示给智能体?(在 system prompt 里?在 system messages 里?) 智能体能否修改自己的系统指令? 压缩后什么会保留,什么会丢失? 交互是否被存储并可查询? 记忆的元数据如何呈现给智能体? 当前工作目录如何表示?会暴露多少文件系统信息?

Right now, memory as a concept is in it’s infancy. It’s so early for memory. Transparently, we see that long term memory is often not part of the MVP. First you need to get an agent working generally, then you can worry about personalization. This means that we (as an industry) are still figuring out memory. This means there are not well known or common abstractions for memory. If memory does become more known, and as we discover best practices, it is possible that separate memory systems start to make sense. But not at this point in time. Right now, as Sarah said, “ultimately, how the harness manages context and state in general is the foundation for agent memory.”

现在,记忆这个概念还处在婴儿期。记忆太早期了。说实话,长期记忆往往不在 MVP 里。一般先让智能体能跑通,再谈个性化。这意味着行业仍在摸索记忆,也意味着记忆还没有广为人知或通用的抽象。未来如果记忆变得更成熟,最佳实践被不断发现,独立的记忆系统或许会变得合理。但现在还不行。就像 Sarah 说的那样,最终,Harness 如何管理上下文和整体状态,才是智能体记忆的基础。

if you don't own your harness, you don't own your memory

如果不拥有 Harness,就不拥有记忆

The harness is intimately tied to memory.

Harness 与记忆密不可分。

💡If you use a closed harness, especially if its behind an API, you don’t own your memory.

💡只要用了封闭的 Harness,尤其是藏在 API 后面的那种,就不拥有自己的记忆。

This manifests itself in several ways.

这会以几种方式体现出来。

Mildly bad: If you use a stateful API (like OpenAI’s Responses API, or Anthropic’s server side compaction), you are storing state on their server. If you want to swap models and resume previous threads - that is no longer doable.

轻度糟糕:如果使用有状态的 API(比如 OpenAI 的 Responses API,或 Anthropic 的服务端压缩),状态就存到了它们的服务器上。想要切换模型并继续之前的线程,就做不到了。

Bad: If you use a closed harness (like Claude Agent SDK, which uses Claude Code under the hood, which is not open source), this harness interacts with memory in a way that is unknown to you. Maybe it creates some artifacts client side - but what is the shape of those, and how should a harness use those? That is unknown, and therefor non-transferrable from one harness to another.

糟糕:如果使用封闭的 Harness(比如 Claude Agent SDK,它底层用的是 Claude Code,而 Claude Code 不是开源的),这个 Harness 如何与记忆交互对你是未知的。也许它会在客户端生成一些产物,但这些东西的形态是什么,Harness 应该如何使用它们,你并不知道,因此也无法从一个 Harness 迁移到另一个 Harness。

💡But worst is something else - when the whole harness, including long term memory is behind an API.

💡但最糟糕的是另一件事,当整个 Harness(包括长期记忆)都藏在 API 后面。

In this situation, you have zero ownership or visibility into memory, including long term memory. You do not know the harness (which means you don’t know how to use the memory). But even worse - you don’t even own the memory! Maybe some parts are exposed via API, maybe no parts are - you have no control over that.

这种情况下,对记忆没有任何所有权,也看不到记忆内部,包括长期记忆。不了解 Harness,也就不知道该如何使用记忆。更糟的是,连记忆本身都不属于你。也许有些部分通过 API 暴露出来,也许完全没有,无论哪种都无法控制。

When people say that the “models will absorb more and more of the harness” - this is what they really mean. They mean that these memory related parts will go behind the APIs that model providers offer.

当有人说 模型会吸收越来越多的 Harness,他们真正指的就是这个,他们指的是这些与记忆相关的部分,会被塞进模型提供商提供的 API 背后。

💡This is incredibly alarming - it means that memory will become locked into a single platform, a single model.

💡这非常令人担忧,这意味着记忆会被锁死在单一平台、单一模型上。

Model providers are incredibly incentivized to do this. And they are starting to. Anthropic launched Claude Managed Agents. This puts literally everything behind an API, locked into their platform.

模型提供商在这件事上有极强动机,而且已经在这么做。Anthropic 推出了 Claude Managed Agents,把几乎所有东西都放到 API 背后,锁在它的平台里。

Even if the whole harness isn’t behind the API, model providers are incentivized to move more and more behind APIs - and are already doing so. For example: even though Codex is an open source, it generates an encrypted compaction summary (that is not usable outside of the OpenAI ecosystem).

即使整个 Harness 没有全部藏在 API 后面,模型提供商也有动机把越来越多的东西搬进 API,并且已经开始这么做。比如 Codex 虽然开源,但它会生成一个加密的压缩摘要,离开 OpenAI 生态就无法使用。

Why are they doing this? Because memory is important, and it creates lock in that they don’t get from just the model.

为什么要这么做?因为记忆很重要,它带来的锁定效应,是单靠模型本身得不到的。

Memory is important, and it creates lock in

记忆很重要,而且会带来锁定效应

Although memory is early, it is clear to everyone that it is important. It is what allows agents to get better as users interact with them, and allows you build up a data flywheel. It is what allows your agent to be personalized to each of your users, and build up an agentic experience that molds to their desires and usage patterns.

尽管记忆还很早期,但所有人都看得出来它很重要。记忆让智能体能在与用户交互中不断变好,并形成数据飞轮。记忆让智能体可以针对每个用户做个性化,打造会贴合用户欲望与使用习惯的智能体体验。

💡Without memory, your agents are easily replicable by anyone who has access to the same tools.

💡没有记忆,只要别人能用到同样的工具,就能轻易复刻你的智能体。

With memory, you build up a proprietary dataset - a dataset of user interactions and preferences. This proprietary dataset allows you to provide a differentiated and increasingly intelligent experience.

有了记忆,就能积累一个专有数据集,一个由用户交互与偏好构成的数据集。这个专有数据集让你能提供差异化、并且越来越聪明的体验。

It’s been relatively easy to switch model providers to date. They have similar, if not identical, APIs. Sure, you have to change prompts a little bit, but that’s not that hard.

到目前为止,切换模型提供商相对容易,它们的 API 很像,甚至几乎一样。提示词需要稍微改一改,但并不难。

But this is all because they are stateless.

这都是因为它们无状态。

As soon as there is any state associated, its much harder to switch. Because this memory matters. And if you switch, you lose access to it.

一旦有了任何状态,切换就难得多。因为记忆很重要,而一切换就失去它。

Let me tell a story. I have an email assistant internally. It’s built on top of a template in Fleet, our no-code platform for building Enterprise ready OpenClaws. This platform has memory built in, so as I interacted with my email assistant over the past few months it built up memory. A few weeks ago, my agent got deleted by accident. I was pissed! I tried to create an agent from the same template - but the experience was so much worse. I had to reteach it all my preferences, my tone, everything.

说个故事。内部有一个邮件助手,基于 Fleet 的一个模板搭的,Fleet 是我们的无代码平台,用来构建企业级可用的 OpenClaw。这个平台内置了记忆,所以过去几个月里与邮件助手的互动都沉淀成了记忆。几周前,智能体不小心被删了,当时非常恼火。试着用同一个模板再建一个,但体验差太多了,不得不重新教它偏好、语气、所有东西。

The plus side of my email agent deleted - it made me realize how powerful and sticky memory could be.

邮件智能体被删这件事有一个好处,它让我意识到记忆有多强大、多黏人。

Open Memory, Open Harnesses

开放记忆,开放 Harness

Memory needs to be opened, owned by whomever is developing the agentic experience. It allows you to build up a proprietary dataset that you actually control.

记忆必须开放,并且由开发智能体体验的人拥有。这样才能真正积累并掌控自己的专有数据集。

Memory (and therefor harnesses) should be separate from model providers. You should want optionality to try out whatever models are best for your use case. Model providers are incentivized to create lock in via memory.

记忆,以及因此的 Harness,应该与模型提供商分离。需要的是可选性,能随时试用最适合自己用例的模型。模型提供商有动机通过记忆制造锁定。

This is why we are building Deep Agents. Deep Agents:

这就是为什么在做 Deep Agents。Deep Agents:

  • Is open source
  • 是开源的
  • Is model agnostic
  • 与模型无关
  • Uses open standards like agents.md and skills
  • 使用 agents.md 和 skills 这类开放标准

  • Has plugins to Mongo, Postgres, Redis and others for storing memories
  • 提供 Mongo、Postgres、Redis 等存储记忆的插件
  • Is deployable: (1) via LangSmith Deployment (self hostable, can be deployed on any cloud, can bring your own database to serve as a memory store); (2) behind any standard web hosting framework
  • 可部署:(1)通过 LangSmith Deployment(可自托管,可部署到任何云上,可自带数据库作为记忆存储);(2)部署在任何标准 Web 托管框架之后

In order to own your memory, you need to be using an Open Harness

想拥有自己的记忆,就需要使用开放的 Harness

Try out Deep Agents today.

现在就试试 Deep Agents。

Thank you to a few people for review and thoughts:

感谢几位朋友的审阅与想法:

  • Sydney Runkle, who is doing a lot of great Deep Agents and memory work
  • Sydney Runkle,在 Deep Agents 和记忆方向做了很多很棒的工作
  • Viv Trivedy, who is a leading voice on agent harnesses
  • Viv Trivedy,是 Agent Harness 领域的重要声音
  • Nuno Campos, who has some great writing on context engineering for finance agents
  • Nuno Campos,在金融智能体的上下文工程方面写了很多好内容
  • Sarah Wooders, who is CTO of Letta, a company that has consistently been at the forefront of stateful agents
  • Sarah Wooders,Letta 的 CTO,这家公司一直站在有状态智能体的最前沿

Agent harnesses are becoming the dominant way to build agents, and they are not going anywhere. These harnesses are intimately tied to agent memory. If you used a closed harness - especially if it’s behind a proprietary API - you are choosing to yield control of your agent’s memory to a third party. Memory is incredibly important to creating good and sticky agentic experiences. This creates incredible lock in. Memory - and therefor harnesses - should be open, so that you own your own memory

Agent Harnesses are how you build agents, and they’re not going anywhere

The “best” way to build agentic systems has changed dramatically over the past three years. When ChatGPT came out, all you could do were simple RAG chains (LangChain). Then the models got a little better, and could create more complex flows (LangGraph). Then they got a lot better, and that gave rise to a new type of scaffolding - agent harnesses.

Examples of agent harnesses include Claude Code, Deep Agents, Pi (powers OpenClaw), OpenCode, Codex, Letta Code, and many more.

💡Agent harnesses are not going away.

There is sometimes sentiment that models will absorb more and more of the scaffolding. This is not true. What has happened (and will continue to happen) is that a lot of the scaffolding needed in 2023 is no longer needed. But this has been replaced by other types of scaffolding. An agent, by definition, is an LLM interacting with tools and other sources of data. There will always be a system around the LLM to facilitate that type of interaction. Need evidence? When Claude Code’s source code was leaked, there was 512k lines of code. That code is the harness. Even the makers of the best model in the world are investing heavily in harnesses.

When things like web search are built into OpenAI and Anthropic’s APIs - they are not “part of the model”. Rather, they are part of a lightweight harness that sits behind their APIs and orchestrates the model with those web search APIs (via nothing other than tool calling).

Harnesses are tied to memory

Sarah Wooders wrote a great blog on why “memory isn’t a plugin (it’s the harness)”, and I couldn’t agree with it more.

https://x.com/nfcampos

There is sometimes sentiment that memory is a standalone service, separate from any particular harness. At this point in time, that is not true.

A large responsibility of the harness is to interact with context. As Sarah puts it:

Asking to plug memory into an agent harness is like asking to plug driving into a car. Managing context, and therefore memory, is a core capability and responsibility of the agent harness.

Memory is just a form of context. Short term memory (messages in the conversation, large tool call results) are handled by the harness. Long term memory (cross session memory) needs to be updated and read by the harness. Sarah lists out many other ways the harness is tied to memory:

How is the AGENTS.md or CLAUDE.md file loaded into context? How is skill metadata shown to the agents? (in the system prompt? in system messages?) Can the agent modify its own system instructions? What survives compaction, and what's lost? Are interactions stored and made queryable? How is memory metadata presented to the agent? How is the current working directory represented? How much filesystem information is exposed?

Right now, memory as a concept is in it’s infancy. It’s so early for memory. Transparently, we see that long term memory is often not part of the MVP. First you need to get an agent working generally, then you can worry about personalization. This means that we (as an industry) are still figuring out memory. This means there are not well known or common abstractions for memory. If memory does become more known, and as we discover best practices, it is possible that separate memory systems start to make sense. But not at this point in time. Right now, as Sarah said, “ultimately, how the harness manages context and state in general is the foundation for agent memory.”

if you don't own your harness, you don't own your memory

The harness is intimately tied to memory.

💡If you use a closed harness, especially if its behind an API, you don’t own your memory.

This manifests itself in several ways.

Mildly bad: If you use a stateful API (like OpenAI’s Responses API, or Anthropic’s server side compaction), you are storing state on their server. If you want to swap models and resume previous threads - that is no longer doable.

https://docs.langchain.com/oss/python/deepagents/overview

Bad: If you use a closed harness (like Claude Agent SDK, which uses Claude Code under the hood, which is not open source), this harness interacts with memory in a way that is unknown to you. Maybe it creates some artifacts client side - but what is the shape of those, and how should a harness use those? That is unknown, and therefor non-transferrable from one harness to another.

https://docs.openclaw.ai/

💡But worst is something else - when the whole harness, including long term memory is behind an API.

In this situation, you have zero ownership or visibility into memory, including long term memory. You do not know the harness (which means you don’t know how to use the memory). But even worse - you don’t even own the memory! Maybe some parts are exposed via API, maybe no parts are - you have no control over that.

https://github.com/langchain-ai/langgraph

When people say that the “models will absorb more and more of the harness” - this is what they really mean. They mean that these memory related parts will go behind the APIs that model providers offer.

💡This is incredibly alarming - it means that memory will become locked into a single platform, a single model.

Model providers are incredibly incentivized to do this. And they are starting to. Anthropic launched Claude Managed Agents. This puts literally everything behind an API, locked into their platform.

Even if the whole harness isn’t behind the API, model providers are incentivized to move more and more behind APIs - and are already doing so. For example: even though Codex is an open source, it generates an encrypted compaction summary (that is not usable outside of the OpenAI ecosystem).

Why are they doing this? Because memory is important, and it creates lock in that they don’t get from just the model.

Memory is important, and it creates lock in

Although memory is early, it is clear to everyone that it is important. It is what allows agents to get better as users interact with them, and allows you build up a data flywheel. It is what allows your agent to be personalized to each of your users, and build up an agentic experience that molds to their desires and usage patterns.

💡Without memory, your agents are easily replicable by anyone who has access to the same tools.

With memory, you build up a proprietary dataset - a dataset of user interactions and preferences. This proprietary dataset allows you to provide a differentiated and increasingly intelligent experience.

It’s been relatively easy to switch model providers to date. They have similar, if not identical, APIs. Sure, you have to change prompts a little bit, but that’s not that hard.

But this is all because they are stateless.

As soon as there is any state associated, its much harder to switch. Because this memory matters. And if you switch, you lose access to it.

Let me tell a story. I have an email assistant internally. It’s built on top of a template in Fleet, our no-code platform for building Enterprise ready OpenClaws. This platform has memory built in, so as I interacted with my email assistant over the past few months it built up memory. A few weeks ago, my agent got deleted by accident. I was pissed! I tried to create an agent from the same template - but the experience was so much worse. I had to reteach it all my preferences, my tone, everything.

The plus side of my email agent deleted - it made me realize how powerful and sticky memory could be.

Open Memory, Open Harnesses

Memory needs to be opened, owned by whomever is developing the agentic experience. It allows you to build up a proprietary dataset that you actually control.

Memory (and therefor harnesses) should be separate from model providers. You should want optionality to try out whatever models are best for your use case. Model providers are incentivized to create lock in via memory.

This is why we are building Deep Agents. Deep Agents:

  • Is open source

  • Is model agnostic

  • Uses open standards like agents.md and skills

  • Has plugins to Mongo, Postgres, Redis and others for storing memories

  • Is deployable: (1) via LangSmith Deployment (self hostable, can be deployed on any cloud, can bring your own database to serve as a memory store); (2) behind any standard web hosting framework

In order to own your memory, you need to be using an Open Harness

Try out Deep Agents today.

Thank you to a few people for review and thoughts:

  • Sydney Runkle, who is doing a lot of great Deep Agents and memory work

  • Viv Trivedy, who is a leading voice on agent harnesses

  • Nuno Campos, who has some great writing on context engineering for finance agents

  • Sarah Wooders, who is CTO of Letta, a company that has consistently been at the forefront of stateful agents

📋 讨论归档

讨论进行中…