🧠 阿头学 · 🪞 Uota学 · 💬 讨论题

AI 记忆不是“更大上下文”，而是可编辑文件 + 可检索索引

真正可用的“长期记忆”不靠把一切塞进 prompt，而是把记忆落到用户可控的本地文件（Markdown），再用混合检索（semantic + keyword）按需召回——透明、便宜、可版本控制。

2026-01-28 原文链接 ↗

阅读简报

双语对照

完整翻译

原文

讨论归档

核心观点

把记忆从“产品功能”降维成“文件系统” Memory=磁盘上的 Markdown：你能读、能改、能 Git、能迁移；这比云端黑箱记忆更像“你的第二大脑”，也更符合 ownership。
Context ≠ Memory：别再迷信大窗口 Context 是一次性、昂贵、有界；Memory 是持久、廉价、无界。正确架构是：少注入、多检索——让模型只看到当前任务需要的那一点。
混合检索是工程解，不是论文解 sqlite-vec 做向量相似度 + FTS5(BM25) 做关键词命中，70/30 加权把“概念”与“精确 token（人名/ID/日期）”同时兜住；这才是实际可用的 recall。
“会话压缩”必须配“压缩前刷新” compaction 是有损的，重要信息可能被总结掉；所以先 flush 到长期/每日记忆，再压缩会话历史，避免关键上下文蒸发。
多 agent 隔离很重要，但‘软沙箱’有安全边界问题 默认 workspace 隔离能分开个人/工作人格；但绝对路径理论上可越界——需要严格 sandbox 才算硬隔离。

跟我们的关联

### 👤ATou

你要做海外增长/品牌，最值钱的是“决策与复盘的可检索历史”。把增长实验、渠道素材、结论写成 Markdown + 索引，比在群聊里翻聊天记录靠谱太多。

### 🧠Neta

这篇本质是 context engineering 的产品化：用文件做 source of truth，用检索做注入门控，用 flush 保护重要信息。
如果你在做多 persona/多 agent：一定要设计 workspace/权限边界，否则“记忆串台”会直接毁体验。

### 🪞Uota

这是个人知识管理（PKM）的下一代形态：不是你维护 Notion，而是 agent 维护你的 Markdown repo，你只负责自然输入。

讨论引子

1. 你更想要“云端自动记忆”（省事）还是“本地可控记忆”（可编辑可迁移）？为什么？ 2. 你的场景里，检索最致命的问题是“漏召回”还是“错召回”？你会如何做验收？ 3. 多 agent 隔离的边界应该做到什么程度才算安全/可用（软隔离 vs 硬隔离）？

Clawdbot 如何记住一切

Clawdbot 是 Peter Steinberger 创建的一款开源个人 AI 助手（MIT 许可）。截至本文撰写时，它已在 GitHub 上迅速走红，获得超过 32,600 个 star。与运行在云端的 ChatGPT 或 Claude 不同，Clawdbot 运行在你的本地机器上，并可与 Discord、WhatsApp、Telegram 等你已在使用的聊天平台集成。

Clawdbot 与众不同之处，在于它能够自主处理真实世界的任务：管理邮件、安排日历事件、处理航班值机，以及按计划运行后台任务。但真正吸引我的是它的持久记忆系统：它能 24/7 保持上下文记忆，记住对话，并在此前互动的基础上无限延展。

如果你读过我之前关于 ChatGPT 记忆和 Claude 记忆的文章，你会知道我一直着迷于不同 AI 产品如何处理“记忆”。Clawdbot 采取了一条根本不同的路径：它不使用云端、由公司控制的记忆，而是把一切都保留在本地，让用户对自己的上下文与技能拥有完全的所有权。

让我们深入看看它是如何工作的。

上下文如何构建

在深入记忆之前，先理解一下模型在每次请求中会看到什么：

系统提示词（system prompt）定义了代理的能力与可用工具。与记忆相关的是 Project Context：它包含会被注入到每一次请求中的、用户可编辑的 Markdown 文件：

这些文件与记忆文件一起存放在代理的工作区中，使整个代理配置透明且可编辑。

上下文 vs 记忆

理解“上下文”和“记忆”的区别，是理解 Clawdbot 的基础。

上下文是模型在单次请求中看到的一切：

上下文是：

短暂的（Ephemeral）——只存在于这一次请求中

有界的（Bounded）——受模型上下文窗口限制（例如 200K tokens）

昂贵的（Expensive）——每个 token 都会影响 API 成本与速度

记忆是存储在磁盘上的内容：

记忆是：

持久的（Persistent）——可跨重启、跨天、跨月保留

无界的（Unbounded）——可无限增长

廉价的（Cheap）——存储本身没有 API 成本

可搜索的（Searchable）——会被索引以便语义检索

记忆工具

代理通过两个专用工具访问记忆：

memory_search

用途：在所有文件中查找相关记忆

memory_get

用途：在找到后读取特定内容

写入记忆

并不存在专门的 memory_write 工具。代理使用标准的写入与编辑工具把内容写入记忆，就像处理任何文件一样。由于记忆只是 Markdown，你也可以手动编辑这些文件（它们会自动重新建立索引）。

写入到哪里由提示词通过 AGENTS.md 驱动决定：

在预压缩刷新（pre-compaction flush）与会话结束时也会发生自动写入（后文会介绍）。

记忆存储

Clawdbot 的记忆系统建立在一个原则之上：“记忆就是代理工作区里的纯 Markdown。”

两层记忆系统

记忆存放在代理的工作区中（默认：~/clawd/）：

第 1 层：每日日志（memory/YYYY-MM-DD.md）

这些是只追加（append-only）的每日笔记，代理会在一天中不断写入。当代理想记住某件事，或被明确要求记住某件事时，就会写到这里。

第 2 层：长期记忆（MEMORY.md）

这是经过整理的、持久的知识。当发生重要事件、产生关键想法、做出决策、形成观点、获得经验教训时，代理会写入这里。

代理如何知道要读取记忆

自动加载的 AGENTS.md 文件包含相关指令：

记忆如何建立索引

当你保存一个记忆文件时，幕后会发生这些事情：

sqlite-vec 是一个 SQLite 扩展，使你可以直接在 SQLite 内进行向量相似度搜索，无需外部向量数据库。

FTS5 是 SQLite 内置的全文搜索引擎，为 BM25 关键词匹配提供能力。两者结合，使 Clawdbot 能在一个轻量级数据库文件中运行混合搜索（语义 + 关键词）。

记忆如何被搜索

当你搜索记忆时，Clawdbot 会并行运行两种搜索策略：向量搜索（语义）用于找到“意思相同”的内容；BM25 搜索（关键词）用于找到包含精确 token 的内容。

结果会用加权评分进行合并：

为什么是 70/30？语义相似度是记忆召回的主要信号，但 BM25 的关键词匹配能捕捉到向量可能漏掉的精确术语（姓名、ID、日期等）。低于 minScore 阈值（默认 0.35）的结果会被过滤掉。这些值都是可配置的。

这样，无论你是在找概念（“那个数据库的东西”）还是找细节（“POSTGRES_URL”），都能得到不错的结果。

多代理记忆

Clawdbot 支持多个代理，并为每个代理提供完全隔离的记忆：

Markdown 文件（事实来源）位于各自的工作区中，而 SQLite 索引（派生数据）位于状态目录中。每个代理都有自己的工作区与索引。记忆管理器以 agentId + workspaceDir 为键，因此不会自动发生跨代理的记忆搜索。

代理能读取彼此的记忆吗？默认不能。每个代理只能看到自己的工作区。不过，工作区是“软沙箱”（默认工作目录），而不是硬边界。除非你启用严格沙箱，否则代理理论上可以通过绝对路径访问另一个工作区。

这种隔离有助于分离不同语境：比如一个用于 WhatsApp 的“个人”代理与一个用于 Slack 的“工作”代理，各自拥有不同的记忆与人格设定。

压缩（Compaction）

每个 AI 模型都有上下文窗口上限。Claude 是 200K tokens，GPT-5.1 是 1M。长对话最终都会撞上这堵墙。

当这种情况发生时，Clawdbot 会使用压缩：把较早的对话总结成一条更紧凑的条目，同时保留最近的消息不变。

自动压缩 vs 手动压缩

自动：接近上下文限制时触发

你会看到：🧹 Auto-compaction complete（在 verbose 模式下）

原始请求会用压缩后的上下文重新尝试

手动：使用 /compact 命令

/compact 聚焦于决策与未决问题

与某些优化不同，压缩会持久化到磁盘。摘要会写入会话的 JSONL 记录文件，因此后续会话将从压缩后的历史开始。

记忆刷新（Memory Flush）

基于 LLM 的压缩是有损过程，重要信息可能被总结掉，甚至潜在丢失。为此，Clawdbot 使用预压缩记忆刷新（pre-compaction memory flush）。

记忆刷新可在 clawdbot.yaml 文件或 clawdbot.json 文件中配置。

修剪（Pruning）

工具结果可能非常庞大。一次 exec 命令的输出就可能有 50,000 个字符的日志。修剪会在不重写历史的前提下，裁掉这些旧输出。这是一个有损过程，旧输出无法恢复。

磁盘上的 JSONL 文件：不变（完整输出仍在其中）

缓存 TTL 修剪（Cache-TTL Pruning）

Anthropic 会将提示词前缀缓存最多 5 分钟，以降低重复调用时的延迟与成本。当在 TTL 窗口内发送相同的提示词前缀时，缓存 token 的成本会降低约 90%。当 TTL 过期后，下一次请求必须重新缓存整个提示词。

问题在于：如果一个会话在 TTL 之后处于空闲状态，下一次请求会失去缓存，必须以完整的“cache write”定价重新缓存整个对话历史。

缓存 TTL 修剪通过检测缓存是否过期，并在下一次请求之前裁掉旧的工具结果来解决这个问题。需要重新缓存的提示词更短，意味着成本更低：

会话生命周期

会话不会永远持续。它会根据可配置规则重置，从而为记忆形成自然边界。默认行为是每天重置，但也提供其他模式。

会话记忆钩子（Session Memory Hook）

当你运行 /new 开启一个新会话时，会话记忆钩子可以自动保存上下文：

结论

Clawdbot 的记忆系统之所以成功，是因为它践行了几个关键原则：

透明胜过黑箱

记忆是纯 Markdown。你可以阅读、编辑并进行版本控制。没有不透明的数据库或专有格式。

搜索胜过注入

代理不会把一切都塞进上下文，而是检索真正相关的内容。这让上下文更聚焦，也降低成本。

持久胜过会话

重要信息保存在磁盘文件中，而不只是对话历史里。压缩无法摧毁已经写入文件的内容。

混合胜过纯粹

仅靠向量搜索会漏掉精确匹配；仅靠关键词搜索会漏掉语义。混合方式两者兼得。

参考

Clawdbot Documentation - 覆盖安装、配置与全部功能的官方文档

GitHub Repository - 源代码、问题（issues）与社区贡献

如果你觉得这很有意思，我很想听听你的看法。欢迎在 Twitter、LinkedIn 分享，或通过 guptaamanthan01[at]gmail[dot]com 联系我。

你可以在 https://manthanguptaa.in/ 阅读我更多的博客文章。

链接：http://x.com/i/article/2015775451810246656

How Clawdbot Remembers Everything

Source: https://x.com/manthanguptaa/status/2015780646770323543?s=46
Mirror: https://x.com/manthanguptaa/status/2015780646770323543?s=46
Published: 2026-01-26T13:35:32+00:00
Saved: 2026-01-28

Content

Let’s dive into how it works.

How Context is Built

Before diving into memory, let’s understand what the model sees on each request:

The system prompt defines the agent’s capabilities and available tools. What’s relevant for memory is Project Context, which includes user-editable Markdown files injected into every request:

These files live in the agent’s workspace alongside memory files, making the entire agent configuration transparent and editable.

Context vs Memory

Understanding the distinction between context and memory is fundamental to understanding Clawdbot.

Context is everything the model sees for a single request:

Context is:

Ephemeral - exists only for this request

Bounded - limited by the model’s context window (e.g., 200K tokens)

Expensive - every token counts toward API costs and speed

Memory is what’s stored on disk:

Memory is:

Persistent - survives restarts, days, months

Unbounded - can grow indefinitely

Cheap - no API cost to store

Searchable - indexed for semantic retrieval

The Memory Tools

The agent accesses memory through two specialized tools:

memory_search

Purpose: Find relevant memories across all files

Returns

memory_get

Purpose: Read specific content after finding it

Returns:

Writing to Memory

The decision of where to write is prompt-driven via AGENTS.md:

Automatic writes also occur during pre-compaction flush and session end (covered in later sections).

Memory Storage

Clawdbot’s memory system is built on the principle that “Memory is plain Markdown in the agent workspace.”

Two-Layer Memory System

Memory lives in the agent’s workspace (default: ~/clawd/):

Layer 1: Daily Logs (memory/YYYY-MM-DD.md)

These are append-only daily notes that the agent writes here throughout the day. The agent writes this when the agent wants to remember something or when explicitly told to remember something.

Layer 2: Long-term Memory (MEMORY.md)

This is curated, persistent knowledge. Agent writes to this when significant events, thoughts, decisions, opinions, and lessons are learned.

How the Agent Knows to Read Memory

The AGENTS.md file (which is automatically loaded) contains instructions:

How Memory Gets Indexed

When you save a memory file, here’s what happens behind the scenes:

sqlite-vec is a SQLite extension that enables vector similarity search directly in SQLite, no external vector database required.

How Memory is Searched

When you search memory, Clawdbot runs two search strategies in parallel. Vector search (semantic) finds content that means the same thing and BM25 search (keyword) finds content with exact tokens.

The results are combined with weighted scoring:

This ensures you get good results whether you are searching for concepts (“that database thing”) or specifics (“POSTGRES_URL”).

Multi-Agent Memory

Clawdbot supports multiple agents, each with complete memory isolation:

This isolation is useful for separating contexts. A “personal” agent for WhatsApp and a “work” agent for Slack, each with distinct memories and personalities.

Compaction

Every AI model has a context window limit. Claude has 200K tokens, GPT-5.1 has 1M. Long conversations eventually hit this wall.

When that happens, Clawdbot uses compaction: summarizing older conversation into a compact entry while keeping recent messages intact.

Automatic vs Manual Compaction

Automatic: Triggers when approaching context limit

You will see: 🧹 Auto-compaction complete in verbose mode

The original request will retry with compacted context

Manual: Use /compact command

/compact Focus on decisions and open questions

Unlike some optimizations, compaction persists to disk. The summary is written to the session’s JSONL transcript file, so future sessions start with the compacted history.

The Memory Flush

LLM-based compaction is a lossy process. Important information may be summarized away and potentially lost. To counter that, Clawdbot uses the pre-compaction memory flush.

The memory flush is configurable in clawdbot.yaml file or clawdbot.json file.

Pruning

JSONL file on disk: UNCHANGED (full outputs still there)

Cache-TTL Pruning

The problem: if a session goes idle past the TTL, the next request loses the cache and must re-cache the full conversation history at full “cache write” pricing.

Cache-TTL pruning solves this by detecting when the cache has expired and trimming old tool results before the next request. Smaller prompt to re-cache means lower cost:

Session Lifecycle

Sessions don’t last forever. They reset based on configurable rules, creating natural boundaries for memory. The default behaviour is reset everyday. But there are other modes available.

Session Memory Hook

When you run /new to start a fresh session, the session memory hook can automatically save context:

Conclusion

Clawdbot’s memory system succeeds because it embraces several key principles:

Transparency Over Black Boxes

Memory is plain Markdown. You can read it, edit it, and version control it. No opaque databases or proprietary formats.

Search Over Injection

Rather than stuffing context with everything, the agent searches for what’s relevant. This keeps context focused and costs down.

Persistence Over Session

Important information survives in files on disk, not just in conversation history. Compaction can’t destroy what’s already saved.

Hybrid Over Pure

Vector search alone misses exact matches. Keyword search alone misses semantics. Hybrid gives you both.

References

Clawdbot Documentation - Official docs covering setup, configuration, and all features

GitHub Repository - Source code, issues, and community contributions

If you found this interesting, I would love to hear your thoughts. Share it on Twitter, LinkedIn, or reach out at guptaamanthan01[at]gmail[dot]com.

You can find more blogs from me on https://manthanguptaa.in/

Link: http://x.com/i/article/2015775451810246656

📋 讨论归档

讨论进行中…