返回列表
🧠 阿头学 · 💬 讨论题

Google 推出开放知识格式 OKF 试图统一 AI 上下文标准

Google 试图通过 OKF 标准争夺 AI 时代的数据目录定义权,但其“开放”声明与自家产品绑定的行为存在明显矛盾,商业意图大于技术中立。
打开原文 ↗

2026-06-14 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • 格式即协议 OKF 用 Markdown+YAML 标准化知识表示,这是一种降低 AI 智能体组装上下文成本的战术尝试,但仅解决了语法互操作而非语义互操作。
  • 产消解耦 规范强制分离知识生产者与消费者,这是避免工具链锁定的合理架构,但缺乏除 Google 生态外的实际落地激励机制。
  • 人机共读 坚持纯文本格式确保人类可编辑、机器可解析,这是当前降低维护知识库认知门槛的最优解,利于初期传播。
  • Google 战略意图 表面推广开放标准,实则通过 Google Cloud Knowledge Catalog 集成构建生态护城河,这种“标准未立,产品先行”的操作削弱了其中立性。

跟我们的关联

  • 对 Neta 意味着上下文管理需标准化,下一步应评估内部 Wiki 是否可迁移为 Markdown+YAML 结构以降低 Agent 接入成本。
  • 对 ATou 意味着基础设施竞争已转向协议层,下一步需警惕单一厂商主导的标准可能带来的长期锁定风险。
  • 对 Uota 意味着知识资产应版本控制化,下一步可尝试用 Git 流程管理文档,让 Agent 负责维护索引而非人工。

讨论引子

  • 静态文件格式能否真正解决企业级的权限与语义对齐问题?
  • 在 MCP 等协议已存在的情况下,OKF 的差异化价值是否足够支撑其成为标准?
  • 如果 Google 停止维护,OKF 是否会沦为另一个私有事实标准?

随着基础模型不断进步,缺少相关上下文依然常常限制它们的能力,尤其是在它们被用于构建智能体系统时。虽然这些模型可以帮你写代码、总结文档,或分析数据集,但它们仍然需要正确的信息,才能产出准确且可执行的结果。

这正是我们今天推出 Open Knowledge Format(OKF)的原因。它是一项开放规范,将 LLM-wiki 模式正式化为一种可移植、可互操作的格式。这是一种与厂商无关、同时对智能体和人类都友好的标准,用来表示现代 AI 系统所需的元数据、上下文和经过整理的知识。

按照当前发布的形式,OKF v0.1 将知识表示为一个由带有 YAML frontmatter 的 Markdown 文件组成的目录,并约定了一小组通用规范,使不同生产者编写的 wiki 可以被不同的智能体直接消费,而无需转换。

就是这么简单。没有复杂的压缩方案,没有新的运行时,也不需要特定 SDK。一个 OKF 文档包是:

  • 纯 Markdown,可在任何编辑器中阅读,可在 GitHub 上渲染,也可被任何搜索工具索引

  • 纯文件,可打包为 tarball,可托管在任何 git 仓库中,也可挂载到任何文件系统上

  • 只有 YAML frontmatter,用于那一小组需要可查询的结构化字段:type、title、description、resource、tags 和 timestamp

如果你用过 Obsidian、Notion、Hugo,或者过去一年里出现的各种 LLM wiki 模式,你会觉得这种形态很熟悉。OKF 将实现这些模式互操作所需的那一小组约定正式化了。

下面来看看 OKF 能为你的组织解决什么问题,它是如何工作的,如何开始使用它,以及接下来会走向何方。

碎片化的上下文版图

在大多数组织中,基础模型所使用的信息,绝大多数都是内部知识:某张表的 schema、你的业务对某项指标的定义、事故处理手册、两个系统之间的 join 路径、旧 API 的弃用通知,等等。

如今,这些知识原子散落在各种高度碎片化的系统中:

  • 带有各自 API 的元数据目录

  • wiki、第三方系统,或共享网盘中

  • 代码注释、docstring,或 notebook 单元里

  • 少数几位资深工程师的脑子里

当一个 AI 智能体需要回答 我们该如何从事件流中计算周活跃用户 这类问题时,它必须从这些分散且彼此不兼容的界面中拼装答案。每家厂商都有自己的目录、自己的 SDK、自己的知识图谱 schema,而这些知识都无法轻松在不同产品或组织之间迁移。

结果就是,每个智能体构建者都在从零解决同样的上下文组装问题,每家目录厂商都在重复发明同样的数据模型,而知识本身则被锁在生成它的那个界面背后。

把知识当作一个持续演化的 wiki

开发团队正在改变构建 AI 智能体的方式。与其让模型一遍又一遍地在同样的文档中搜索同样的事实,不如直接给智能体一套共享的 Markdown 知识库,并让它随着时间推移变得越来越有用。这样一来,智能体就可以承担起读取和更新自身文件的琐碎工作,而团队则像管理代码一样整理内容并进行维护。

知名 AI 研究者和教育者 Andrej Karpathy 在他的 LLM Wiki gist 中,对这个想法做出了最清晰的表达。正如他所写,LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass。那些让人类最终放弃个人 wiki 的整理工作,恰恰正是 LLM 擅长的事情。

类似的以 Wiki 承载知识的模式,正在以不同名字反复出现:Obsidian vaults 与编码智能体联动,AGENTS.md / CLAUDE.md 这一类约定文件,装满 index.mdlog.md 产物、供智能体在执行真实工作前查阅的仓库,以及数据团队内部的 metadata as code 仓库。

这种模式既有吸引力,也很强大,但每一种实现都是定制化的。Karpathy 的 wiki、你们团队的 wiki,以及某个厂商导出的目录,也许看起来都很像,都用了 Markdown、frontmatter 和交叉链接,但它们并不是被有意设计成可以协作的。并没有统一答案来规定每个文档应该携带哪些字段,或者哪些文件名分别代表什么。结果就是,编码在 wiki 中的知识仍然被封存在原始团队内部,每次构建新的智能体时都要重复劳动。

缺的不是另一项服务,而是一种格式

这个问题的答案,不是再来一个知识服务。你需要的是一种格式,一种表示知识的方法,它应该满足:

  • 任何人都能生产,不需要 SDK

  • 任何人都能消费,不需要集成

  • 在不同系统、组织和工具之间迁移时仍然成立

  • 与它所描述的代码一起存放在版本控制中

  • 人类可读,智能体可解析,同一份文件,不需要翻译层

OKF 从设计上就是这样一种格式。

OKF 如何工作:一屏看懂设计

一个 OKF bundle 是一个由 Markdown 文件构成的目录,用来表示概念:任何你想记录的东西,包括表、数据集、指标、playbook、runbook 和 API。每个概念对应一个文件。文件路径就是这个概念的身份标识:

每个概念文档都包含一小段 YAML front matter,用于结构化字段;其余内容则放在 Markdown 正文中:

概念之间通过普通的 Markdown 链接互相连接,从而使这个目录变成一个,其关系丰富程度超过了文件系统隐含的父子链接。bundle 还可以选择性包含 index.md 文件,供智能体在层级结构中导航时逐步展开信息,以及 log.md 文件,用来记录按时间排序的变更历史。

完整的 v0.1 规范,包括一致性标准、交叉链接规则,以及少量保留文件名,只占单页篇幅。

设计背后的三个原则

1. 尽可能少设限制。 OKF 对每个概念只强制要求一件事:必须有 type 字段。其他一切内容,比如有哪些类型、还应包含哪些字段、正文该有哪些章节,都交由生产者决定。这个规范定义的是互操作边界,而不是内容模型。

2. 生产者与消费者相互独立。 OKF 清晰地区分了谁来编写知识,谁来消费知识。一个由人手工编写的 bundle,可以被 AI 智能体消费。一个由元数据导出流水线生成的 bundle,可以被可视化工具浏览。一个由某个 LLM 合成的 bundle,可以被另一个 LLM 查询。格式本身就是契约;两端的工具都可以独立替换。

3. 这是格式,不是平台。 OKF 不绑定任何特定云平台、数据库、模型提供方或智能体框架。它永远不会要求你必须拥有某个专有账号或 SDK,才能读取、写入或提供服务。我们将它作为开放标准发布,因为知识格式的价值来自有多少参与方能够使用它,而不在于谁拥有它。

我们随规范一起发布了什么

为了让这种格式变得具体,我们在生产端和消费端都发布了参考实现

  • 一个增强智能体,它会遍历 BigQuery 数据集,为每张表和视图起草一个 OKF 概念文档,然后再用第二轮 LLM 遍历权威文档,为每个概念补充引文、schema 和 join 路径。

  • 一个静态 HTML 可视化工具,可将任何 OKF bundle 转成单个自包含文件中的交互式图视图;不需要后端,查看端无需安装,也不会有数据离开页面。

  • 三个可直接浏览的示例 bundleGA4 e-commerceStack OverflowBitcoin public datasets,由参考智能体生成,并作为符合 OKF 规范的持续演化示例提交到仓库中。

这些实现是有意作为概念验证发布的。这个智能体展示了一种生成 OKF 的方式,但这种格式本身并不依赖任何特定的智能体框架或 LLM。这个可视化工具展示了一种消费 OKF 的方式,但这种格式本身也不依赖 HTML 或图视图。我们期待,也欢迎,生产者和消费者生态远远超出我们当前发布的内容。

接下来要去哪里

OKF v0.1 是起点,不是已经完成的标准。随着更多生产者和消费者出现,以及我们逐步共同理解智能体在实践中真正需要怎样的知识表示,这种格式还会继续演化。

我们从第一天起就在开放环境中发布它,因为知识格式只有这样才配得上这个名字。无论你是在构建知识目录、增强流水线、为 AI 智能体量身打造的 wiki,还是 AI 知识领域中的任何其他东西,都是如此。

接下来,我们鼓励你:

  • 阅读规范,它很短

  • 为你的源系统、数据库或文档站点编写生产者

  • 编写消费者:浏览器、搜索索引,或者能对 bundle 进行推理的智能体

  • 用你自己的数据试试参考实现

  • 提交 issue、发 PR,或提出扩展建议:规范是带版本管理的,并且在设计上明确支持向后兼容的增长

仓库、规范和示例 bundle 都可以在 GitHub 上找到。我们也已经更新了 Google Cloud 的 Knowledge Catalog,使其能够摄取 Open Knowledge Format 并将其提供给我们的智能体。相关代码和示例可以在这里找到。

格式本身就是这项贡献。我们发布的工具,是为了让它真正落地,并降低尝试它的成本。无论你今天的知识以何种形式存在,OKF 都被设计成一种通用语,明天它可以借此被交换和流通。


本文由 Google Cloud Data Cloud 团队发布。Open Knowledge Format 是一项开放规范;我们明确欢迎贡献、替代实现,以及超出 Google 产品范围的采用。

除了作者之外,这项工作之所以能够成形,也离不开 Google 内许多其他人的关键想法,在此感谢他们的贡献。

As foundation models continue to improve, the lack of relevant context often limits what they can do, especially as they are used to build agentic systems. While these models can help you write code, summarize documents, or analyze a dataset, they still need the right information to produce accurate and actionable results.

随着基础模型不断进步,缺少相关上下文依然常常限制它们的能力,尤其是在它们被用于构建智能体系统时。虽然这些模型可以帮你写代码、总结文档,或分析数据集,但它们仍然需要正确的信息,才能产出准确且可执行的结果。

That’s why today, we’re introducing the Open Knowledge Format (OKF), an open specification that formalizes the LLM-wikipattern into a portable, interoperable format. This is a vendor-neutral, agent- and human-friendly standard for representing the metadata, context, and curated knowledge that modern AI systems need.

这正是我们今天推出 Open Knowledge Format(OKF)的原因。它是一项开放规范,将 LLM-wiki 模式正式化为一种可移植、可互操作的格式。这是一种与厂商无关、同时对智能体和人类都友好的标准,用来表示现代 AI 系统所需的元数据、上下文和经过整理的知识。

As published, OKF v0.1 represents knowledge as a directory of markdown files with YAML frontmatter, with a small set of agreed-upon conventions that let wikis written by different producers be consumed by different agents without translation.

按照当前发布的形式,OKF v0.1 将知识表示为一个由带有 YAML frontmatter 的 Markdown 文件组成的目录,并约定了一小组通用规范,使不同生产者编写的 wiki 可以被不同的智能体直接消费,而无需转换。

That's it. No complex compression scheme, no new runtime, no required SDK. A bundle of OKF documents is:

就是这么简单。没有复杂的压缩方案,没有新的运行时,也不需要特定 SDK。一个 OKF 文档包是:

  • Just markdown — readable in any editor, renderable on GitHub, indexable by any search tool
  • 纯 Markdown,可在任何编辑器中阅读,可在 GitHub 上渲染,也可被任何搜索工具索引
  • Just files — shippable as a tarball, hostable in any git repo, mountable on any filesystem
  • 纯文件,可打包为 tarball,可托管在任何 git 仓库中,也可挂载到任何文件系统上
  • Just YAML frontmatter — for the small set of structured fields that need to be queryable: type, title, description, resource, tags, and timestamp
  • 只有 YAML frontmatter,用于那一小组需要可查询的结构化字段:type、title、description、resource、tags 和 timestamp

If you've used Obsidian, Notion, Hugo, or any of the LLM wiki patterns that have emerged over the past year, the shape will feel familiar. OKF formalizes the small set of conventions needed to make these patterns interoperable.

如果你用过 Obsidian、Notion、Hugo,或者过去一年里出现的各种 LLM wiki 模式,你会觉得这种形态很熟悉。OKF 将实现这些模式互操作所需的那一小组约定正式化了。

Let’s take a look at the problem that OKF can solve for your organization, how it works, how to get started with it, and what’s next.

下面来看看 OKF 能为你的组织解决什么问题,它是如何工作的,如何开始使用它,以及接下来会走向何方。

A fragmented context landscape

碎片化的上下文版图

In most organizations, the information that foundation models use is overwhelmingly internal knowledge: the schema of a table, your business’ meaning of a metric, the runbook for an incident, the join paths between two systems, the deprecation notice for an old API, etc.

在大多数组织中,基础模型所使用的信息,绝大多数都是内部知识:某张表的 schema、你的业务对某项指标的定义、事故处理手册、两个系统之间的 join 路径、旧 API 的弃用通知,等等。

Today, these atoms of knowledge live in a variety of highly fragmented systems:

如今,这些知识原子散落在各种高度碎片化的系统中:

  • Metadata catalogs with their own APIs
  • 带有各自 API 的元数据目录
  • Wikis, third-party systems, or in shared drives
  • wiki、第三方系统,或共享网盘中
  • Code comments, docstrings, or notebook cells
  • 代码注释、docstring,或 notebook 单元里
  • The heads of a few senior engineers
  • 少数几位资深工程师的脑子里

When an AI agent needs to answer "How do I compute weekly active users from our event stream?" it has to assemble the answer from these scattered, mutually incompatible surfaces. Every vendor offers its own catalog, its own SDK, its own knowledge-graph schema, and none of the knowledge is easily portable across products or organizations.

当一个 AI 智能体需要回答 我们该如何从事件流中计算周活跃用户 这类问题时,它必须从这些分散且彼此不兼容的界面中拼装答案。每家厂商都有自己的目录、自己的 SDK、自己的知识图谱 schema,而这些知识都无法轻松在不同产品或组织之间迁移。

The result: Every agent builder is solving the same context-assembly problem from scratch, every catalog vendor is reinventing the same data models, and the knowledge itself is locked behind whichever surface created it.

结果就是,每个智能体构建者都在从零解决同样的上下文组装问题,每家目录厂商都在重复发明同样的数据模型,而知识本身则被锁在生成它的那个界面背后。

Knowledge as a living wiki

把知识当作一个持续演化的 wiki

Developer teams are changing how they build AI agents. Instead of using models to search the same documents for the same facts over and over, you can give your agents a shared markdown library that grows more useful over time. This lets your agents take on the drudgery of reading and updating their own files, while your team curates the content and manages it like code.

开发团队正在改变构建 AI 智能体的方式。与其让模型一遍又一遍地在同样的文档中搜索同样的事实,不如直接给智能体一套共享的 Markdown 知识库,并让它随着时间推移变得越来越有用。这样一来,智能体就可以承担起读取和更新自身文件的琐碎工作,而团队则像管理代码一样整理内容并进行维护。

Andrej Karpathy, the prominent AI researcher and educator, articulates this idea most crisply in his LLM Wiki gist. "LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass," he writes. The bookkeeping that causes humans to abandon personal wikis is exactly what LLMs are good at.

知名 AI 研究者和教育者 Andrej Karpathy 在他的 LLM Wiki gist 中,对这个想法做出了最清晰的表达。正如他所写,LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass。那些让人类最终放弃个人 wiki 的整理工作,恰恰正是 LLM 擅长的事情。

Similar knowledge-as-Wiki pattern keeps reappearing under different names: Obsidian vaults wired to coding agents, the AGENTS.md / CLAUDE.md family of convention files, repos full of index.md and log.md artifacts that agents consult before doing real work, and "metadata as code" repositories inside data teams.

类似的以 Wiki 承载知识的模式,正在以不同名字反复出现:Obsidian vaults 与编码智能体联动,AGENTS.md / CLAUDE.md 这一类约定文件,装满 index.mdlog.md 产物、供智能体在执行真实工作前查阅的仓库,以及数据团队内部的 metadata as code 仓库。

The pattern is compelling and powerful, but each instance is bespoke. Karpathy's wiki and your team's wiki and a vendor's catalog export may all look alike (markdown, frontmatter, cross-links), but none of them are intentionally designed to cooperate. There is no agreed-upon answer to what fields every document should carry, or what filenames mean what. As a result, the knowledge encoded in wikis remains siloed within the original teams, leading to redundant effort whenever a new agent is built.

这种模式既有吸引力,也很强大,但每一种实现都是定制化的。Karpathy 的 wiki、你们团队的 wiki,以及某个厂商导出的目录,也许看起来都很像,都用了 Markdown、frontmatter 和交叉链接,但它们并不是被有意设计成可以协作的。并没有统一答案来规定每个文档应该携带哪些字段,或者哪些文件名分别代表什么。结果就是,编码在 wiki 中的知识仍然被封存在原始团队内部,每次构建新的智能体时都要重复劳动。

What's missing is a format, not another service

缺的不是另一项服务,而是一种格式

The answer to this problem isn’t another knowledge service. You need a format, a way to represent knowledge that:

这个问题的答案,不是再来一个知识服务。你需要的是一种格式,一种表示知识的方法,它应该满足:

  • Anyone can produce, without an SDK
  • 任何人都能生产,不需要 SDK
  • Anyone can consume, without an integration
  • 任何人都能消费,不需要集成
  • Survives moving between systems, organizations, and tools
  • 在不同系统、组织和工具之间迁移时仍然成立
  • Lives in version control alongside the code it describes
  • 与它所描述的代码一起存放在版本控制中
  • Is readable by humans and parseable by agents: the same file, no translation layer
  • 人类可读,智能体可解析,同一份文件,不需要翻译层

By design, OKF is that format.

OKF 从设计上就是这样一种格式。

How OKF works: The design in one screen

OKF 如何工作:一屏看懂设计

An OKF bundle is a directory of markdown files representing concepts:anything you want to capture, including tables, datasets, metrics, playbooks, runbooks, and APIs. Each concept is one file. The file path is the concept's identity:

一个 OKF bundle 是一个由 Markdown 文件构成的目录,用来表示概念:任何你想记录的东西,包括表、数据集、指标、playbook、runbook 和 API。每个概念对应一个文件。文件路径就是这个概念的身份标识:

Each concept document has a small block of YAML front matter for structured fields and a markdown body for everything else:

每个概念文档都包含一小段 YAML front matter,用于结构化字段;其余内容则放在 Markdown 正文中:

Concepts link to each other with normal markdown links, turning the directory into a graph of relationships that is richer than the parent/child links implied by the file system. Bundles can optionally include index.md files (for progressive disclosure as agents navigate the hierarchy) and log.md files (for chronological history of changes).

概念之间通过普通的 Markdown 链接互相连接,从而使这个目录变成一个,其关系丰富程度超过了文件系统隐含的父子链接。bundle 还可以选择性包含 index.md 文件,供智能体在层级结构中导航时逐步展开信息,以及 log.md 文件,用来记录按时间排序的变更历史。

The full v0.1 specification (including conformance criteria, cross-linking rules, and the small number of reserved filenames) fits on a single page.

完整的 v0.1 规范,包括一致性标准、交叉链接规则,以及少量保留文件名,只占单页篇幅。

Three principles behind the design

设计背后的三个原则

1. Minimally opinionated. OKF requires exactly one thing of every concept: a type field. Everything else (e.g., what types exist, what other fields to include, what sections the body has) is left to the producer. The spec defines the interoperability surface, not the content model.

1. 尽可能少设限制。 OKF 对每个概念只强制要求一件事:必须有 type 字段。其他一切内容,比如有哪些类型、还应包含哪些字段、正文该有哪些章节,都交由生产者决定。这个规范定义的是互操作边界,而不是内容模型。

2. Producer/consumer independence. OKF cleanly separates who writes the knowledge from who consumes it. A bundle hand-authored by a human can be consumed by an AI agent. A bundle generated by a metadata export pipeline can be browsed in a visualizer. A bundle synthesized by one LLM can be queried by another. The format is the contract; the tooling at each end is independently swappable.

2. 生产者与消费者相互独立。 OKF 清晰地区分了谁来编写知识,谁来消费知识。一个由人手工编写的 bundle,可以被 AI 智能体消费。一个由元数据导出流水线生成的 bundle,可以被可视化工具浏览。一个由某个 LLM 合成的 bundle,可以被另一个 LLM 查询。格式本身就是契约;两端的工具都可以独立替换。

3. Format, not platform. OKF is not tied to any specific cloud, database, model provider, or agent framework. It will never require a proprietary account or SDK to read, write, or serve. We're publishing it as an open standard because the value of a knowledge format comes from how many parties speak it, not from who owns it.

3. 这是格式,不是平台。 OKF 不绑定任何特定云平台、数据库、模型提供方或智能体框架。它永远不会要求你必须拥有某个专有账号或 SDK,才能读取、写入或提供服务。我们将它作为开放标准发布,因为知识格式的价值来自有多少参与方能够使用它,而不在于谁拥有它。

What we're shipping with the spec

我们随规范一起发布了什么

To make the format concrete, we're publishing reference implementations at both the producer and consumer ends:

为了让这种格式变得具体,我们在生产端和消费端都发布了参考实现

  • An enrichment agent that walks a BigQuery dataset, drafts an OKF concept document for every table and view, then runs a second LLM pass that crawls authoritative documentation and enriches each concept with citations, schemas, and join paths.
  • 一个增强智能体,它会遍历 BigQuery 数据集,为每张表和视图起草一个 OKF 概念文档,然后再用第二轮 LLM 遍历权威文档,为每个概念补充引文、schema 和 join 路径。
  • A static HTML visualizer that turns any OKF bundle into an interactive graph view in a single self-contained file; no backend, no install on the viewing side, no data leaves the page.
  • 一个静态 HTML 可视化工具,可将任何 OKF bundle 转成单个自包含文件中的交互式图视图;不需要后端,查看端无需安装,也不会有数据离开页面。

These are proofs of concept, deliberately. The agent demonstrates one way to produce OKF; nothing about the format requires a specific agent framework or LLM. The visualizer demonstrates one way to consume it; nothing about the format requires HTML or a graph view. We expect (and want!) the ecosystem of producers and consumers to grow far beyond what we've shipped.

这些实现是有意作为概念验证发布的。这个智能体展示了一种生成 OKF 的方式,但这种格式本身并不依赖任何特定的智能体框架或 LLM。这个可视化工具展示了一种消费 OKF 的方式,但这种格式本身也不依赖 HTML 或图视图。我们期待,也欢迎,生产者和消费者生态远远超出我们当前发布的内容。

Where we go from here

接下来要去哪里

OKF v0.1 is a starting point, not a finished standard. The format will evolve as more producers and consumers emerge and as we collectively learn what knowledge representations agents actually need in practice.

OKF v0.1 是起点,不是已经完成的标准。随着更多生产者和消费者出现,以及我们逐步共同理解智能体在实践中真正需要怎样的知识表示,这种格式还会继续演化。

We're publishing in the open from day one because that's the only way a knowledge format earns its name, whether you're building a knowledge catalog, an enrichment pipeline, a wiki tailored to AI agents, or anything in the AI knowledge domain.

我们从第一天起就在开放环境中发布它,因为知识格式只有这样才配得上这个名字。无论你是在构建知识目录、增强流水线、为 AI 智能体量身打造的 wiki,还是 AI 知识领域中的任何其他东西,都是如此。

From here, we encourage you to:

接下来,我们鼓励你:

  • Read the spec (it's short!)
  • 阅读规范,它很短
  • Write a producer for your source system, your database, your documentation site
  • 为你的源系统、数据库或文档站点编写生产者
  • Write a consumer: a viewer, a search index, an agent that reasons over bundles
  • 编写消费者:浏览器、搜索索引,或者能对 bundle 进行推理的智能体
  • Try the reference implementation against your own data
  • 用你自己的数据试试参考实现
  • File issues, send PRs, or propose extensions: The spec is versioned and explicitly designed for backward-compatible growth
  • 提交 issue、发 PR,或提出扩展建议:规范是带版本管理的,并且在设计上明确支持向后兼容的增长

The repo, the spec, and the sample bundles are available in GitHub. We have also updated Google Cloud’s Knowledge Catalog to be able to ingest Open Knowledge Format and serve it to our agents. You can find the relevant code and examples here.

仓库、规范和示例 bundle 都可以在 GitHub 上找到。我们也已经更新了 Google Cloud 的 Knowledge Catalog,使其能够摄取 Open Knowledge Format 并将其提供给我们的智能体。相关代码和示例可以在这里找到。

The format itself is the contribution. The tools we've shipped exist to make it real, and to lower the cost of trying it out. Whatever shape your knowledge takes today, OKF is designed to be the lingua franca it can be exchanged for tomorrow.

格式本身就是这项贡献。我们发布的工具,是为了让它真正落地,并降低尝试它的成本。无论你今天的知识以何种形式存在,OKF 都被设计成一种通用语,明天它可以借此被交换和流通。



Published by the Google Cloud Data Cloud team. Open Knowledge Format is an open specification; contributions, alternative implementations, and adoption beyond Google products are all explicitly welcomed.

本文由 Google Cloud Data Cloud 团队发布。Open Knowledge Format 是一项开放规范;我们明确欢迎贡献、替代实现,以及超出 Google 产品范围的采用。

In addition to the authors, this work came together thanks to key ideas from many others at Google, and we thank them for their contributions.

除了作者之外,这项工作之所以能够成形,也离不开 Google 内许多其他人的关键想法,在此感谢他们的贡献。

As foundation models continue to improve, the lack of relevant context often limits what they can do, especially as they are used to build agentic systems. While these models can help you write code, summarize documents, or analyze a dataset, they still need the right information to produce accurate and actionable results.

That’s why today, we’re introducing the Open Knowledge Format (OKF), an open specification that formalizes the LLM-wikipattern into a portable, interoperable format. This is a vendor-neutral, agent- and human-friendly standard for representing the metadata, context, and curated knowledge that modern AI systems need.

As published, OKF v0.1 represents knowledge as a directory of markdown files with YAML frontmatter, with a small set of agreed-upon conventions that let wikis written by different producers be consumed by different agents without translation.

That's it. No complex compression scheme, no new runtime, no required SDK. A bundle of OKF documents is:

  • Just markdown — readable in any editor, renderable on GitHub, indexable by any search tool

  • Just files — shippable as a tarball, hostable in any git repo, mountable on any filesystem

  • Just YAML frontmatter — for the small set of structured fields that need to be queryable: type, title, description, resource, tags, and timestamp

If you've used Obsidian, Notion, Hugo, or any of the LLM wiki patterns that have emerged over the past year, the shape will feel familiar. OKF formalizes the small set of conventions needed to make these patterns interoperable.

Let’s take a look at the problem that OKF can solve for your organization, how it works, how to get started with it, and what’s next.

A fragmented context landscape

In most organizations, the information that foundation models use is overwhelmingly internal knowledge: the schema of a table, your business’ meaning of a metric, the runbook for an incident, the join paths between two systems, the deprecation notice for an old API, etc.

Today, these atoms of knowledge live in a variety of highly fragmented systems:

  • Metadata catalogs with their own APIs

  • Wikis, third-party systems, or in shared drives

  • Code comments, docstrings, or notebook cells

  • The heads of a few senior engineers

When an AI agent needs to answer "How do I compute weekly active users from our event stream?" it has to assemble the answer from these scattered, mutually incompatible surfaces. Every vendor offers its own catalog, its own SDK, its own knowledge-graph schema, and none of the knowledge is easily portable across products or organizations.

The result: Every agent builder is solving the same context-assembly problem from scratch, every catalog vendor is reinventing the same data models, and the knowledge itself is locked behind whichever surface created it.

Knowledge as a living wiki

Developer teams are changing how they build AI agents. Instead of using models to search the same documents for the same facts over and over, you can give your agents a shared markdown library that grows more useful over time. This lets your agents take on the drudgery of reading and updating their own files, while your team curates the content and manages it like code.

Andrej Karpathy, the prominent AI researcher and educator, articulates this idea most crisply in his LLM Wiki gist. "LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass," he writes. The bookkeeping that causes humans to abandon personal wikis is exactly what LLMs are good at.

Similar knowledge-as-Wiki pattern keeps reappearing under different names: Obsidian vaults wired to coding agents, the AGENTS.md / CLAUDE.md family of convention files, repos full of index.md and log.md artifacts that agents consult before doing real work, and "metadata as code" repositories inside data teams.

The pattern is compelling and powerful, but each instance is bespoke. Karpathy's wiki and your team's wiki and a vendor's catalog export may all look alike (markdown, frontmatter, cross-links), but none of them are intentionally designed to cooperate. There is no agreed-upon answer to what fields every document should carry, or what filenames mean what. As a result, the knowledge encoded in wikis remains siloed within the original teams, leading to redundant effort whenever a new agent is built.

What's missing is a format, not another service

The answer to this problem isn’t another knowledge service. You need a format, a way to represent knowledge that:

  • Anyone can produce, without an SDK

  • Anyone can consume, without an integration

  • Survives moving between systems, organizations, and tools

  • Lives in version control alongside the code it describes

  • Is readable by humans and parseable by agents: the same file, no translation layer

By design, OKF is that format.

How OKF works: The design in one screen

An OKF bundle is a directory of markdown files representing concepts:anything you want to capture, including tables, datasets, metrics, playbooks, runbooks, and APIs. Each concept is one file. The file path is the concept's identity:

Each concept document has a small block of YAML front matter for structured fields and a markdown body for everything else:

Concepts link to each other with normal markdown links, turning the directory into a graph of relationships that is richer than the parent/child links implied by the file system. Bundles can optionally include index.md files (for progressive disclosure as agents navigate the hierarchy) and log.md files (for chronological history of changes).

The full v0.1 specification (including conformance criteria, cross-linking rules, and the small number of reserved filenames) fits on a single page.

Three principles behind the design

1. Minimally opinionated. OKF requires exactly one thing of every concept: a type field. Everything else (e.g., what types exist, what other fields to include, what sections the body has) is left to the producer. The spec defines the interoperability surface, not the content model.

2. Producer/consumer independence. OKF cleanly separates who writes the knowledge from who consumes it. A bundle hand-authored by a human can be consumed by an AI agent. A bundle generated by a metadata export pipeline can be browsed in a visualizer. A bundle synthesized by one LLM can be queried by another. The format is the contract; the tooling at each end is independently swappable.

3. Format, not platform. OKF is not tied to any specific cloud, database, model provider, or agent framework. It will never require a proprietary account or SDK to read, write, or serve. We're publishing it as an open standard because the value of a knowledge format comes from how many parties speak it, not from who owns it.

What we're shipping with the spec

To make the format concrete, we're publishing reference implementations at both the producer and consumer ends:

  • An enrichment agent that walks a BigQuery dataset, drafts an OKF concept document for every table and view, then runs a second LLM pass that crawls authoritative documentation and enriches each concept with citations, schemas, and join paths.

  • A static HTML visualizer that turns any OKF bundle into an interactive graph view in a single self-contained file; no backend, no install on the viewing side, no data leaves the page.

  • Three ready-to-browse sample bundles: GA4 e-commerce, Stack Overflow, and Bitcoin public datasets, produced by the reference agent and committed to the repo as living examples of conformant OKF.

These are proofs of concept, deliberately. The agent demonstrates one way to produce OKF; nothing about the format requires a specific agent framework or LLM. The visualizer demonstrates one way to consume it; nothing about the format requires HTML or a graph view. We expect (and want!) the ecosystem of producers and consumers to grow far beyond what we've shipped.

Where we go from here

OKF v0.1 is a starting point, not a finished standard. The format will evolve as more producers and consumers emerge and as we collectively learn what knowledge representations agents actually need in practice.

We're publishing in the open from day one because that's the only way a knowledge format earns its name, whether you're building a knowledge catalog, an enrichment pipeline, a wiki tailored to AI agents, or anything in the AI knowledge domain.

From here, we encourage you to:

  • Read the spec (it's short!)

  • Write a producer for your source system, your database, your documentation site

  • Write a consumer: a viewer, a search index, an agent that reasons over bundles

  • Try the reference implementation against your own data

  • File issues, send PRs, or propose extensions: The spec is versioned and explicitly designed for backward-compatible growth

The repo, the spec, and the sample bundles are available in GitHub. We have also updated Google Cloud’s Knowledge Catalog to be able to ingest Open Knowledge Format and serve it to our agents. You can find the relevant code and examples here.

The format itself is the contribution. The tools we've shipped exist to make it real, and to lower the cost of trying it out. Whatever shape your knowledge takes today, OKF is designed to be the lingua franca it can be exchanged for tomorrow.


Published by the Google Cloud Data Cloud team. Open Knowledge Format is an open specification; contributions, alternative implementations, and adoption beyond Google products are all explicitly welcomed.

In addition to the authors, this work came together thanks to key ideas from many others at Google, and we thank them for their contributions.

📋 讨论归档

讨论进行中…