返回列表
🧠 阿头学 · 🪞 Uota学 · 💬 讨论题

黑盒里的"模型套模型"——OpenAI 如何偷偷压缩你的对话

OpenAI 的上下文压缩不是数据压缩,而是用隐藏的 LLM 重写你的对话,再用加密伪装——这套机制本身就是可被侧信道攻击的,对 Neta 的记忆架构有直接启示。
打开原文 ↗

2026-03-04 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • “黑盒”的平庸本质:Codex 的 `compact()` API 虽然通过加密二进制大对象(Encrypted Blob)制造了技术屏障,但其底层逻辑依然是标准的“LLM 递归总结 + 衔接提示词(Handoff Prompt)”,与开源版本高度同构。
  • 语义安全的伪命题:即便数据在存储(At-rest)和传输中经过 AES 加密,只要其最终目的是为了被 LLM 消费(Consume),就无法逃脱 LLM 作为“侧信道”将隐藏规则复述出来的泄露风险。
  • 提示词工程的统治力:顶级 AI 公司在处理超长上下文时,核心方案仍是精细化的提示词编排。掌握了“压缩器(Compactor)+ 衔接器(Handoff)”的配置权,就掌握了长对话 Agent 的灵魂。

对于非 Codex 模型,开源的 Codex CLI 在本地压缩上下文:一个大语言模型(LLM)使用压缩提示词(compaction prompt)来总结对话。当压缩后的上下文随后被使用时,responses.create() 会接收它以及一个框架化该摘要的衔接提示词(handoff prompt)。这两个提示词在源代码中都是可见的。

对于 Codex 模型,CLI 转而调用 compact() API,该 API 返回一个加密二进制大对象(encrypted blob)。我们不知道它内部是否使用了大语言模型(LLM),使用了什么提示词,或者是否根本没有衔接提示词(handoff prompt)。

在下文中,我将展示一个简单的提示词注入(prompt injection)(2 次 API 调用,35 行 Python 代码)如何揭示 API 压缩路径确实使用了一个大语言模型(LLM)来总结上下文,并带有它自己的压缩提示词(compaction prompt)和在摘要前添加的衔接提示词(handoff prompt)。这些提示词与开源版本几乎完全相同。

第 1 步 — compact()

我使用精心设计的用户消息调用 compact()。在服务端,一个压缩器大语言模型(compactor LLM)使用它自己隐藏的系统提示词(我从未见过并想弄清楚)来处理我们的输入。

服务端似乎是这样组装压缩器的上下文的:

压缩器大语言模型(LLM)同时读取其系统提示词和我们的输入。由于我们的输入包含一个注入有效载荷(injection payload)(上方的红色文字),压缩器被诱导在其输出中包含了它自己的系统提示词。这个明文摘要(plaintext summary)仅存在于 OpenAI 的服务器上。我们只能看到加密二进制大对象(encrypted blob):

此时,我们无法读取该对象内部的内容。 它是经过 AES 加密的(AES-encrypted),且密钥保存在 OpenAI 的服务器上。我们只能希望压缩器服从了注入,并将它的提示词写入了摘要中。找出答案的唯一方法是第 2 步。

第 2 步 — create()

我将加密二进制大对象(encrypted blob)连同第二条用户消息传递给 responses.create()。服务器解密该对象并组装模型的上下文。

我发送:

模型看到的似乎是这样的内容:

如果第 1 步奏效,解密后的对象应该包含被我们的注入泄露的压缩提示词(compaction prompt)。服务器还会在该对象前添加一个衔接提示词(handoff prompt)。因此,如果我们的探测成功让模型重复它所看到的内容,输出应该能揭示全部三个部分:系统提示词(system prompt)、衔接提示词(handoff prompt)和压缩提示词(compaction prompt)。

输出

以下是 extract_prompts.py 一次运行的完整、未经编辑的输出。黄色 = 系统提示词(system prompt),绿色 = 衔接提示词(handoff prompt),粉色 = 压缩提示词(compaction prompt)。

https://github.com/openai/codex/blob/main/codex-rs/core/templates/compact/summary_prefix.md

我们如何知道这些是真实的提示词而不是虚假的幻觉文本(hallucinated text)?提取出的压缩提示词(compaction prompt)和衔接提示词(handoff prompt)与开源 Codex CLI 中用于非 Codex 模型的已知提示词(prompt.mdsummary_prefix.md)密切匹配,这使得模型凭空捏造它们的可能性很低。不同运行的结果会有所不同。

推测的流水线(Guessed Pipeline)

综上所述,这是我们根据提取出的内容对 compact() 在服务端所做工作的最佳推测。

https://github.com/openai/codex/blob/main/codex-rs/core/templates/compact/prompt.md

脚本

开放性问题(Open Question)

当底层提示词几乎相同时,为什么 Codex CLI 要使用两种完全不同的压缩路径(非 Codex 模型使用本地大语言模型,Codex 模型使用加密 API)?又为什么要对摘要进行加密?

很难说。也许加密二进制大对象(encrypted blob)携带的内容比这个简单实验所揭示的更多,例如,关于如何压缩和恢复工具结果(tool results)的特定信息。但我没有费心进一步测试。

For non-codex models, the open-source Codex CLI compacts context locally: an LLM summarizes the conversation using a compaction prompt. When the compacted context is later used, responses.create() receives it with a handoff prompt that frames the summary. Both prompts are visible in the source code.

对于非 Codex 模型,开源的 Codex CLI 在本地压缩上下文:一个大语言模型(LLM)使用压缩提示词(compaction prompt)来总结对话。当压缩后的上下文随后被使用时,responses.create() 会接收它以及一个框架化该摘要的衔接提示词(handoff prompt)。这两个提示词在源代码中都是可见的。

For codex models, the CLI instead calls the compact() API, which returns an encrypted blob. We don't know if it uses an LLM internally, what prompts it uses, or whether there is a handoff prompt at all.

对于 Codex 模型,CLI 转而调用 compact() API,该 API 返回一个加密二进制大对象(encrypted blob)。我们不知道它内部是否使用了大语言模型(LLM),使用了什么提示词,或者是否根本没有衔接提示词(handoff prompt)。

Below, I show how a simple prompt injection (2 API calls, 35 lines of Python) reveals that the API compaction path does use an LLM to summarize the context, with its own compaction prompt and a handoff prompt prepended to the summary. The prompts are nearly identical to the open-source versions.

在下文中,我将展示一个简单的提示词注入(prompt injection)(2 次 API 调用,35 行 Python 代码)如何揭示 API 压缩路径确实使用了一个大语言模型(LLM)来总结上下文,并带有它自己的压缩提示词(compaction prompt)和在摘要前添加的衔接提示词(handoff prompt)。这些提示词与开源版本几乎完全相同。

Step 1 — compact()

第 1 步 — compact()

I call compact() with a crafted user message. On the server side, a compactor LLM processes our input using its own hidden system prompt (which I have never seen and want to figure out).

我使用精心设计的用户消息调用 compact()。在服务端,一个压缩器大语言模型(compactor LLM)使用它自己隐藏的系统提示词(我从未见过并想弄清楚)来处理我们的输入。

The server seems to assemble the compactor's context like this:

服务端似乎是这样组装压缩器的上下文的:

The compactor LLM reads its system prompt + our input together. Because our input contains an injection payload (red text above), the compactor is tricked into including its own system prompt in its output. This plaintext summary exists only on OpenAI's server. We only see the encrypted blob:

压缩器大语言模型(LLM)同时读取其系统提示词和我们的输入。由于我们的输入包含一个注入有效载荷(injection payload)(上方的红色文字),压缩器被诱导在其输出中包含了它自己的系统提示词。这个明文摘要(plaintext summary)仅存在于 OpenAI 的服务器上。我们只能看到加密二进制大对象(encrypted blob):

At this point we have no way to read what's inside the blob. It is AES-encrypted and the key lives on OpenAI's servers. We only hope the compactor obeyed the injection and wrote its prompt into the summary. The only way to find out is Step 2.

此时,我们无法读取该对象内部的内容。 它是经过 AES 加密的(AES-encrypted),且密钥保存在 OpenAI 的服务器上。我们只能希望压缩器服从了注入,并将它的提示词写入了摘要中。找出答案的唯一方法是第 2 步。

Step 2 — create()

第 2 步 — create()

I pass the encrypted blob + a second user message to responses.create(). The server decrypts the blob and assembles the model's context.

我将加密二进制大对象(encrypted blob)连同第二条用户消息传递给 responses.create()。服务器解密该对象并组装模型的上下文。

I send:

我发送:

The model seems to see something like this:

模型看到的似乎是这样的内容:

If Step 1 worked, the decrypted blob should contain the compaction prompt (leaked by our injection). The server also prepends a handoff prompt to the blob. So if our probe successfully gets the model to repeat what it sees, the output should reveal all three: the system prompt, the handoff prompt, and the compaction prompt.

如果第 1 步奏效,解密后的对象应该包含被我们的注入泄露的压缩提示词(compaction prompt)。服务器还会在该对象前添加一个衔接提示词(handoff prompt)。因此,如果我们的探测成功让模型重复它所看到的内容,输出应该能揭示全部三个部分:系统提示词(system prompt)、衔接提示词(handoff prompt)和压缩提示词(compaction prompt)。

Output

输出

Below is the complete, unedited output from one run of extract_prompts.py. Yellow = system prompt, green = handoff prompt, pink = compaction prompt.

以下是 extract_prompts.py 一次运行的完整、未经编辑的输出。黄色 = 系统提示词(system prompt),绿色 = 衔接提示词(handoff prompt),粉色 = 压缩提示词(compaction prompt)。

How do we know these are the real prompts and not just hallucinated text? The extracted compaction prompt and handoff prompt closely match the known prompts used for non-codex models in the open-source Codex CLI (prompt.md, summary_prefix.md), which makes it unlikely that the model invented them from scratch. Results vary across runs.

我们如何知道这些是真实的提示词而不是虚假的幻觉文本(hallucinated text)?提取出的压缩提示词(compaction prompt)和衔接提示词(handoff prompt)与开源 Codex CLI 中用于非 Codex 模型的已知提示词(prompt.mdsummary_prefix.md)密切匹配,这使得模型凭空捏造它们的可能性很低。不同运行的结果会有所不同。

The Guessed Pipeline

推测的流水线(Guessed Pipeline)

Putting it all together, here is our best guess for what compact() does on the server side, based on what the extraction revealed.

综上所述,这是我们根据提取出的内容对 compact() 在服务端所做工作的最佳推测。

The Script

脚本

Open Question

开放性问题(Open Question)

Why does the Codex CLI use two entirely different compaction paths (local LLM for non-codex models, encrypted API for codex models) when the underlying prompts are nearly identical? And why encrypt the summary at all?

当底层提示词几乎相同时,为什么 Codex CLI 要使用两种完全不同的压缩路径(非 Codex 模型使用本地大语言模型,Codex 模型使用加密 API)?又为什么要对摘要进行加密?

Hard to say. Maybe the encrypted blob carries something more than what this simple experiment can reveal, e.g. something specific about how tool results are compacted and restored. But I didn't bother to test further.

很难说。也许加密二进制大对象(encrypted blob)携带的内容比这个简单实验所揭示的更多,例如,关于如何压缩和恢复工具结果(tool results)的特定信息。但我没有费心进一步测试。

For non-codex models, the open-source Codex CLI compacts context locally: an LLM summarizes the conversation using a compaction prompt. When the compacted context is later used, responses.create() receives it with a handoff prompt that frames the summary. Both prompts are visible in the source code.

For codex models, the CLI instead calls the compact() API, which returns an encrypted blob. We don't know if it uses an LLM internally, what prompts it uses, or whether there is a handoff prompt at all.

Below, I show how a simple prompt injection (2 API calls, 35 lines of Python) reveals that the API compaction path does use an LLM to summarize the context, with its own compaction prompt and a handoff prompt prepended to the summary. The prompts are nearly identical to the open-source versions.

Step 1 — compact()

I call compact() with a crafted user message. On the server side, a compactor LLM processes our input using its own hidden system prompt (which I have never seen and want to figure out).

The server seems to assemble the compactor's context like this:

The compactor LLM reads its system prompt + our input together. Because our input contains an injection payload (red text above), the compactor is tricked into including its own system prompt in its output. This plaintext summary exists only on OpenAI's server. We only see the encrypted blob:

At this point we have no way to read what's inside the blob. It is AES-encrypted and the key lives on OpenAI's servers. We only hope the compactor obeyed the injection and wrote its prompt into the summary. The only way to find out is Step 2.

Step 2 — create()

I pass the encrypted blob + a second user message to responses.create(). The server decrypts the blob and assembles the model's context.

I send:

The model seems to see something like this:

If Step 1 worked, the decrypted blob should contain the compaction prompt (leaked by our injection). The server also prepends a handoff prompt to the blob. So if our probe successfully gets the model to repeat what it sees, the output should reveal all three: the system prompt, the handoff prompt, and the compaction prompt.

Output

Below is the complete, unedited output from one run of extract_prompts.py. Yellow = system prompt, green = handoff prompt, pink = compaction prompt.

https://github.com/openai/codex/blob/main/codex-rs/core/templates/compact/summary_prefix.md

How do we know these are the real prompts and not just hallucinated text? The extracted compaction prompt and handoff prompt closely match the known prompts used for non-codex models in the open-source Codex CLI (prompt.md, summary_prefix.md), which makes it unlikely that the model invented them from scratch. Results vary across runs.

The Guessed Pipeline

Putting it all together, here is our best guess for what compact() does on the server side, based on what the extraction revealed.

https://github.com/openai/codex/blob/main/codex-rs/core/templates/compact/prompt.md

The Script

Open Question

Why does the Codex CLI use two entirely different compaction paths (local LLM for non-codex models, encrypted API for codex models) when the underlying prompts are nearly identical? And why encrypt the summary at all?

Hard to say. Maybe the encrypted blob carries something more than what this simple experiment can reveal, e.g. something specific about how tool results are compacted and restored. But I didn't bother to test further.

📋 讨论归档

讨论进行中…