返回列表
🧠 阿头学 · 💬 讨论题

Token 账单翻 3 倍?确定性压缩才是正确答案

上下文窗口的成本问题不该靠更聪明的模型解决——一个不调 LLM 的 5 层确定性压缩器,能砍掉 50-97% 的 token 浪费。

2026-02-11 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • 大多数人在优化模型,没人优化输入 Agent 工作区越做越大、会话历史越攒越多,但几乎没人认真审视"喂进上下文窗口的东西"到底有多少是垃圾。这是一个被集体忽视的低垂果实。
  • 会话转录压缩 97% 才是杀手级功能 50K token 的 JSONL 日志压成 1.5K 的结构化摘要,不靠 LLM,纯规则。对跑 agent 团队的人来说,这一层就值回票价。
  • 确定性压缩 + prompt caching = 95% 成本下降 压缩砍一半,缓存再打一折,叠加后只付原价 5%。这不是理论,是作者实测数据。
  • 工具设计哲学值得学习 5 层渐进式压缩,从无损到有损,每层独立可控。这种"乐高式"架构比一个黑盒 LLM 摘要器靠谱得多。

跟我们的关联

### 👤ATou

  • openclawd/Claude Code 重度用户,工作区上下文膨胀是你一定会遇到的问题
  • 可以直接在 clawd 工作区试跑,看看当前 CLAUDE.md 和会话日志能压多少

### 🧠Neta

  • Neta 如果有 agent 管线或 LLM 调用链,同样的压缩思路可以降低推理成本
  • 上下文管理是 AI 产品的隐性成本,值得工程团队关注

### 🌍通用

  • 任何跑 Claude Code / openclawd 的开发者都该收藏这个工具

讨论引子

1. 你的 agent 工作区现在有多大?有没有算过每天因为上下文膨胀多花了多少钱? 2. 确定性压缩 vs LLM 摘要,在什么场景下后者更值得用? 3. 如果 prompt caching 越来越便宜,压缩工具的价值会不会被稀释?

openclawd API 账单竟然是应有的 3 倍:于是我做了 claw-compactor

大家都在聊 token 成本。

每一条 Claude Code 讨论串、每一篇 openclawd 安装指南、每一篇“我 2 天做了个 X”的文章——在回复里某个不起眼的角落,总会冒出同一个问题:

“这花了你多少钱?”

我总能看到这个问题。于是我干脆做了个东西来专门解决它。

claw-compactor 是我写的一个 Python 工具,用确定性的方式压缩你的工作区记忆、会话转录和 agent 上下文。不调用 LLM。不产生 API 费用。只有规则。

配置只要 10 分钟。下面是它能做什么,以及我为什么要这么做。

我为什么要做它

我在一个中等规模的代码库上跑 openclawd。我的工作区记忆文件一直在长——会话日志、CLAUDE.md、观察笔记。某天我一看:上下文已经 180K tokens,而且至少一半都是冗余格式、重复内容和啰嗦的会话转录。

我不需要更聪明的模型。我需要一个压缩器。

所以我做了一个。5 层压缩,一层叠一层:

第 1 层——规则引擎(节省 4–8%)

内容去重、清理 Markdown 格式、合并重复段落。就是那种“要是你足够有耐心也会手动做”的活。

第 2 层——字典编码(节省 4–5%)

从你的工作区自动学习一份码本,把重复短语替换成短的 $XX token。完全可逆。

第 3 层——Observation 压缩(会话文件节省约 97%)

这是我最得意的一层。你那些体积巨大的 JSONL 会话转录,会被压缩成结构化摘要。一份 50,000 token 的会话日志会变成约 1,500 tokens 的事实与决策。

第 4 层——RLE 模式(节省 1–2%)

文件路径、IP 地址、重复的枚举值用简写表示。不多,但会叠加出效果。

第 5 层——压缩上下文协议(节省 20–60%)

提供多档缩写级别,用冗长换密度。部分有损——事实保留,水分删掉。

10 分钟配置

第 1 步:克隆仓库

第 2 步:跑基准测试(无损——只展示你能省多少)

第 3 步:看数字;如果你满意,就跑完整压缩:

就这些。Python 3.9+,不需要任何依赖。可选安装 tiktoken 用于精确统计 token 数。

实际数字长什么样

第一次(很“啰嗦”的)工作区

节省 50–70%。未优化的 CLAUDE.md、原始日志——大多数人一开始就是这样。

会话转录(JSONL)

节省约 97%。没写错。一份 50K token 的日志会变成约 1.5K tokens。

日常维护(每周)

节省 10–20%。收益递减,但依然值得跑。

已经优化过的工作区

节省 3–12%。你已经把容易的部分都做完了。

会话转录这个数字才是重点。如果你在跑 openclawd 或会累积会话日志的 Claude Code agents,仅第 3 层就足以让你安装它。

我一直跟人说的叠加技巧

claw-compactor + prompt caching = 约 95% 的有效成本下降。

算式如下:

claw-compactor 把你的上下文压缩 50%

prompt caching(cacheRetention: "long")让缓存 token 享受 90% 折扣

50% 压缩 x 90% 缓存折扣 = 你只需要付原成本的 5%

这不是理论推导。这就是我在自己的工作区里看到的效果:把确定性压缩和模型级缓存叠加起来。

我为什么现在分享

openclawd 生态正在爆发。有人用 Claude Code 5 天做出 iOS 应用,有人跑完整的 agent 团队,有人做 SaaS 替代品。

但几乎没人优化“进入上下文窗口的东西”。大家把工作区越做越大、会话历史越攒越多,然后疑惑为什么 token 成本一直往上飙。

我做 claw-compactor 是为了解决我自己的问题。后来发现,原来很多人也有同样的问题。

把它加书签

等你的 agent 工作区冲到 200K tokens、你开始怀疑钱都花到哪里去了的时候,你会用得上。

https://github.com/aeromomo/claw-compactor

链接:http://x.com/i/article/2021291439661936640

相关笔记

everyone's talking about token costs.

大家都在聊 token 成本。

every Claude Code thread, every openclawd setup guide, every "I built X in 2 days" article — buried somewhere in the replies is the same question:

每一条 Claude Code 讨论串、每一篇 openclawd 安装指南、每一篇“我 2 天做了个 X”的文章——在回复里某个不起眼的角落,总会冒出同一个问题:

"how much did that cost you?"

“这花了你多少钱?”

I kept seeing that question. so I built something about it.

我总能看到这个问题。于是我干脆做了个东西来专门解决它。

claw-compactor is a Python tool I wrote that deterministically compresses your workspace memory, session transcripts, and agent context. no LLM calls. no API costs. just rules.

claw-compactor 是我写的一个 Python 工具,用确定性的方式压缩你的工作区记忆、会话转录和 agent 上下文。不调用 LLM。不产生 API 费用。只有规则。

it takes 10 minutes to set up. here's what it does and why I built it this way.

配置只要 10 分钟。下面是它能做什么,以及我为什么要这么做。

why I built it

我为什么要做它

I was running openclawd on a mid-size codebase. my workspace memory files kept growing — session logs, CLAUDE.md, observation notes. one day I checked: 180K tokens of context, and at least half of it was redundant formatting, duplicate content, and verbose session transcripts.

我在一个中等规模的代码库上跑 openclawd。我的工作区记忆文件一直在长——会话日志、CLAUDE.md、观察笔记。某天我一看:上下文已经 180K tokens,而且至少一半都是冗余格式、重复内容和啰嗦的会话转录。

I didn't need a smarter model. I needed a compressor.

我不需要更聪明的模型。我需要一个压缩器。

so I built one. 5 compression layers, each stacking on the last:

所以我做了一个。5 层压缩,一层叠一层:

Layer 1 — Rule Engine (4-8% savings)

第 1 层——规则引擎(节省 4–8%)

Deduplicates content, cleans up markdown formatting, merges redundant sections. the kind of stuff you'd do manually if you had the patience.

内容去重、清理 Markdown 格式、合并重复段落。就是那种“要是你足够有耐心也会手动做”的活。

Layer 2 — Dictionary Encoding (4-5% savings)

第 2 层——字典编码(节省 4–5%)

Auto-learns a codebook from your workspace. repeated phrases get replaced with short $XX tokens. fully reversible.

从你的工作区自动学习一份码本,把重复短语替换成短的 $XX token。完全可逆。

Layer 3 — Observation Compression (~97% savings on session files)

第 3 层——Observation 压缩(会话文件节省约 97%)

this is the one I'm most proud of. your JSONL session transcripts — which are massive — get compressed into structured summaries. a 50,000 token session log becomes ~1,500 tokens of facts and decisions.

这是我最得意的一层。你那些体积巨大的 JSONL 会话转录,会被压缩成结构化摘要。一份 50,000 token 的会话日志会变成约 1,500 tokens 的事实与决策。

Layer 4 — RLE Patterns (1-2% savings)

第 4 层——RLE 模式(节省 1–2%)

file paths, IP addresses, repeated enums get shorthand notation. small but it compounds.

文件路径、IP 地址、重复的枚举值用简写表示。不多,但会叠加出效果。

Layer 5 — Compressed Context Protocol (20-60% savings)

第 5 层——压缩上下文协议(节省 20–60%)

abbreviation levels that trade verbosity for density. partial lossy — facts stay, filler goes.

提供多档缩写级别,用冗长换密度。部分有损——事实保留,水分删掉。

10-minute setup

10 分钟配置

step 1: clone it

第 1 步:克隆仓库

step 2: benchmark (non-destructive — just shows what you'd save)

第 2 步:跑基准测试(无损——只展示你能省多少)

step 3: look at the numbers. if you like them, run full compression:

第 3 步:看数字;如果你满意,就跑完整压缩:

that's it. Python 3.9+, no dependencies required. optional tiktoken for precise token counts.

就这些。Python 3.9+,不需要任何依赖。可选安装 tiktoken 用于精确统计 token 数。

what the numbers actually look like

实际数字长什么样

First-time verbose workspace

第一次(很“啰嗦”的)工作区

50-70% savings. unoptimized CLAUDE.md, raw logs. this is where most people start.

节省 50–70%。未优化的 CLAUDE.md、原始日志——大多数人一开始就是这样。

Session transcripts (JSONL)

会话转录(JSONL)

~97% savings. this is not a typo. a 50K token log becomes ~1.5K tokens.

节省约 97%。没写错。一份 50K token 的日志会变成约 1.5K tokens。

Regular maintenance (weekly)

日常维护(每周)

10-20% savings. diminishing returns, still worth running.

节省 10–20%。收益递减,但依然值得跑。

Already-optimized workspace

已经优化过的工作区

3-12% savings. you've already done the easy wins.

节省 3–12%。你已经把容易的部分都做完了。

the session transcript number is the headline. if you're running openclawd or claude code agents that accumulate session logs, Layer 3 alone justifies the install.

会话转录这个数字才是重点。如果你在跑 openclawd 或会累积会话日志的 Claude Code agents,仅第 3 层就足以让你安装它。

the stacking trick I keep telling people

我一直跟人说的叠加技巧

claw-compactor + prompt caching = ~95% effective cost reduction.

claw-compactor + prompt caching = 约 95% 的有效成本下降。

here's the math:

算式如下:

claw-compactor compresses your context by 50%

claw-compactor 把你的上下文压缩 50%

prompt caching (cacheRetention: "long") gives 90% off cached tokens

prompt caching(cacheRetention: "long")让缓存 token 享受 90% 折扣

50% compression x 90% cache discount = you're paying 5% of original cost

50% 压缩 x 90% 缓存折扣 = 你只需要付原成本的 5%

that's not theoretical. that's what I see on my own workspaces when I combine deterministic compression with model-level caching.

这不是理论推导。这就是我在自己的工作区里看到的效果:把确定性压缩和模型级缓存叠加起来。

why I'm sharing this now

我为什么现在分享

the openclawd ecosystem is exploding. people are building iOS apps in 5 days with Claude Code, running full agent teams, creating SaaS replacements.

openclawd 生态正在爆发。有人用 Claude Code 5 天做出 iOS 应用,有人跑完整的 agent 团队,有人做 SaaS 替代品。

but nobody's optimizing what goes into the context window. they're building bigger and bigger workspaces, accumulating more session history, and wondering why their token costs keep climbing.

但几乎没人优化“进入上下文窗口的东西”。大家把工作区越做越大、会话历史越攒越多,然后疑惑为什么 token 成本一直往上飙。

I built claw-compactor to solve my own problem. turns out a lot of people have the same one.

我做 claw-compactor 是为了解决我自己的问题。后来发现,原来很多人也有同样的问题。

bookmark this

把它加书签

you'll need it when your agent workspace hits 200K tokens and you're wondering where the money went.

等你的 agent 工作区冲到 200K tokens、你开始怀疑钱都花到哪里去了的时候,你会用得上。

https://github.com/aeromomo/claw-compactor

https://github.com/aeromomo/claw-compactor

Link: http://x.com/i/article/2021291439661936640

链接:http://x.com/i/article/2021291439661936640

相关笔记

My openclawd API bill was 3x what it should be. so I built claw-compactor.

  • Source: https://x.com/nielsen777brian/status/2021301480079389144?s=46
  • Mirror: https://x.com/nielsen777brian/status/2021301480079389144?s=46
  • Published: 2026-02-10T19:13:21+00:00
  • Saved: 2026-02-11

Content

everyone's talking about token costs.

every Claude Code thread, every openclawd setup guide, every "I built X in 2 days" article — buried somewhere in the replies is the same question:

"how much did that cost you?"

I kept seeing that question. so I built something about it.

claw-compactor is a Python tool I wrote that deterministically compresses your workspace memory, session transcripts, and agent context. no LLM calls. no API costs. just rules.

it takes 10 minutes to set up. here's what it does and why I built it this way.

why I built it

I was running openclawd on a mid-size codebase. my workspace memory files kept growing — session logs, CLAUDE.md, observation notes. one day I checked: 180K tokens of context, and at least half of it was redundant formatting, duplicate content, and verbose session transcripts.

I didn't need a smarter model. I needed a compressor.

so I built one. 5 compression layers, each stacking on the last:

Layer 1 — Rule Engine (4-8% savings)

Deduplicates content, cleans up markdown formatting, merges redundant sections. the kind of stuff you'd do manually if you had the patience.

Layer 2 — Dictionary Encoding (4-5% savings)

Auto-learns a codebook from your workspace. repeated phrases get replaced with short $XX tokens. fully reversible.

Layer 3 — Observation Compression (~97% savings on session files)

this is the one I'm most proud of. your JSONL session transcripts — which are massive — get compressed into structured summaries. a 50,000 token session log becomes ~1,500 tokens of facts and decisions.

Layer 4 — RLE Patterns (1-2% savings)

file paths, IP addresses, repeated enums get shorthand notation. small but it compounds.

Layer 5 — Compressed Context Protocol (20-60% savings)

abbreviation levels that trade verbosity for density. partial lossy — facts stay, filler goes.

10-minute setup

step 1: clone it

step 2: benchmark (non-destructive — just shows what you'd save)

step 3: look at the numbers. if you like them, run full compression:

that's it. Python 3.9+, no dependencies required. optional tiktoken for precise token counts.

what the numbers actually look like

First-time verbose workspace

50-70% savings. unoptimized CLAUDE.md, raw logs. this is where most people start.

Session transcripts (JSONL)

~97% savings. this is not a typo. a 50K token log becomes ~1.5K tokens.

Regular maintenance (weekly)

10-20% savings. diminishing returns, still worth running.

Already-optimized workspace

3-12% savings. you've already done the easy wins.

the session transcript number is the headline. if you're running openclawd or claude code agents that accumulate session logs, Layer 3 alone justifies the install.

the stacking trick I keep telling people

claw-compactor + prompt caching = ~95% effective cost reduction.

here's the math:

claw-compactor compresses your context by 50%

prompt caching (cacheRetention: "long") gives 90% off cached tokens

50% compression x 90% cache discount = you're paying 5% of original cost

that's not theoretical. that's what I see on my own workspaces when I combine deterministic compression with model-level caching.

why I'm sharing this now

the openclawd ecosystem is exploding. people are building iOS apps in 5 days with Claude Code, running full agent teams, creating SaaS replacements.

but nobody's optimizing what goes into the context window. they're building bigger and bigger workspaces, accumulating more session history, and wondering why their token costs keep climbing.

I built claw-compactor to solve my own problem. turns out a lot of people have the same one.

bookmark this

you'll need it when your agent workspace hits 200K tokens and you're wondering where the money went.

https://github.com/aeromomo/claw-compactor

Link: http://x.com/i/article/2021291439661936640

📋 讨论归档

讨论进行中…