返回列表
🧠 阿头学 · 💬 讨论题

Mintlify 用“虚拟文件系统”替代 RAG 补丁,方向对但宣传过满

Mintlify 把文档站伪装成可被 agent 用 grep/cat/ls 浏览的虚拟文件系统,这个产品判断明显优于继续死磕前台沙箱,但文中对“零边际成本”“对照组成立”的说法带有明显营销夸张。
打开原文 ↗

2026-04-04 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • 问题不是检索不够强,而是交互接口太弱 作者的判断很准:传统 RAG 只能吐 top-K 片段,一旦答案跨页、跨章节或需要精确语法,系统就会失灵;把文档暴露成可 `ls/cd/cat/grep/find` 的“文件系统”后,agent 才真正获得了主动探索能力,这比继续堆 rerank 更像换范式而不是补丁。
  • 对静态文档场景,容器确实是错配基础设施 文中给出的 46 秒 p90 会话创建时间说明,前台助手若每次都走 clone + sandbox 初始化,用户体验必然崩;这个判断基本无争议。对于“只读、结构稳定、无需真实执行环境”的任务,启动微型虚拟机就是拿重武器打轻目标。
  • ChromaFs 的工程价值在于“幻象足够真” 这套设计最强的地方不是新技术,而是抽象选择正确:复用 just-bash 提供 shell 语义,用 Chroma 反查页面 chunk、重组整页、缓存目录树,再把写操作全部封死。它没有追求“真实文件系统”,而是只实现 agent 真正依赖的最小能力集,这种工程判断明显高杠杆。
  • grep 的“双阶段过滤”是全文最扎实的技术点 先用 Chroma 做粗筛,再把候选文件批量预取到 Redis,最后交回内存里的 grep 做精筛,这个设计兼顾了速度和输出语义一致性;这比纯向量检索更可控,也比逐文件远程扫描靠谱得多,因此是最值得迁移的模式。
  • 文中的商业叙事可信一半,夸张一半 “从 46 秒到 100 毫秒”说明方向大概率对,但“零边际计算成本”明显不严谨,因为 Chroma 查询、缓存填充、权限过滤、S3 lazy fetch 都有真实成本;此外 46 秒和 100 毫秒未必是同口径对比,这部分更像厂商案例宣传,不是严格基准测试。

跟我们的关联

  • 对 ATou 意味着什么 如果你的产品核心是让用户“问文档、查配置、找语法”,就不该默认上容器;下一步应该先盘点 agent 真正依赖的原子操作,优先做只读虚拟环境,而不是先建重型基础设施。
  • 对 Neta 意味着什么 这说明很多 agent 产品的上限不在模型,而在接口设计;下一步可以把“给模型什么世界模型”当成独立产品层来思考,比较文件系统、SQL shell、结构化导航图谱哪种更适配任务。
  • 对 Uota 意味着什么 这篇文章强化了一个判断:优秀工程不是“更真实”,而是“更够用”;下一步可以总结成方法论——先做最小可用幻象,再决定哪些能力必须落到真实环境。
  • 对团队/业务意味着什么 这类方案直接影响单位经济模型,因为它把高频前台会话从重基础设施中解放出来;下一步若评估投资或产品策略,应重点看其适用边界是否足够大,而不是只看演示里的速度数字。

讨论引子

1. 对文档型 agent 来说,“文件系统”真的是最佳通用接口,还是只是比传统 RAG 更顺手的一种中间态? 2. 如果把权限、缓存、预取都做在虚拟层,系统会不会在安全一致性上引入比容器更隐蔽的风险? 3. 对哪些任务而言,“最小可用幻象”是正解;又对哪些任务而言,不上真实环境就一定会误导 agent?

RAG 很好用,直到它不再好用。

我们的助手只能检索与查询匹配的文本片段。如果答案分散在多个页面里,或者用户需要的精确语法没有落在 top-K 结果中,它就卡住了。我们希望它能像你逛代码库一样逛文档。

智能体正在把文件系统当作主要接口,因为对智能体来说,grep、cat、ls、find 就够了。只要把每个文档页面看作一个文件,把每个章节看作一个目录,它就能精确查字符串、读整页内容、自己在结构里穿行。我们只差一个能镜像线上文档站点的文件系统。

容器瓶颈

最直接的做法就是给智能体一个真实文件系统。大多数 harness 会通过启动隔离沙箱并克隆仓库来解决。我们已经在异步后台智能体上这么做了,那种场景里延迟不是问题;但对前端助手而言,用户正盯着加载转圈,这套方法就崩了。我们的 p90 会话创建时间(包括 GitHub clone 和其他初始化)是 约 46 秒

除去延迟,为了读静态文档而专门跑微型虚拟机,也带来了不小的基础设施账单。

按每月 85 万次对话来算,即便是最小配置(1 vCPU、2 GiB RAM、会话生命周期 5 分钟),按 Daytona 的按秒沙箱计费($0.0504/h per vCPU,$0.0162/h per GiB RAM)换算下来,一年也要超过 $70,000。会话时间更长,这个数字会翻倍。(这只是最朴素的估算;真正的生产方案可能会有 warm pools 和 container sharing,但结论依然成立)

文件系统这条路必须又快又便宜,所以得从文件系统本身重新想。

假装有个 Shell

智能体不需要一个真实的文件系统,它只需要一个幻象。我们的文档早就为了搜索而做过索引、切分,并存进了 Chroma 数据库,所以我们做了 ChromaFs:一个虚拟文件系统,用来拦截 UNIX 命令,并把它们翻译成对同一套数据库的查询。会话创建从约 46 秒降到 约 100 毫秒;而且 ChromaFs 复用了我们本来就在付费的基础设施,因此每次对话的边际计算成本为零。

export class ChromaFs implements IFileSystem {
  private files = new Set<string>();
  private dirs = new Map<string, string[]>();

  async readFile(path: string): Promise<string> {
     this.assertInit();
     const normalized = normalizePath(path);

    // Serve from cache or fetch from Chroma
    const slug = normalized.replace(/\\.mdx$/, '').slice(1);

    // Pages are chunked in Chroma. Reassemble them on the fly:
    const results = await this.collection.get<ChunkMetadata>({
      where: { page: slug },
      include: [IncludeEnum.documents, IncludeEnum.metadatas],
    });

    const chunks = results.ids
      .map((id, i) => ({
        document: results.documents[i] ?? '',
        chunkIndex: parseInt(String(results.metadatas[i]?.chunk_index ?? 0), 10),
      }))
      .sort((a, b) => a.chunkIndex - b.chunkIndex);

    return chunks.map((c) => c.document).join('');

  }

  // Enforce completely stateless, read-only interaction
  async writeFile(): Promise<void> { throw erofs(); }
  async appendFile(): Promise<void> { throw erofs(); }
  async mkdir(): Promise<void> { throw erofs(); }
  async rm(): Promise<void> { throw erofs(); }
}

ChromaFs 基于 Vercel Labs 的 just-bash(致敬 Malte!)构建。just-bash 是用 TypeScript 重写的 bash,支持 grep、cat、ls、find、cd 等等。just-bash 提供可插拔的 IFileSystem 接口,因此解析、管道、参数标志这些逻辑都由它处理,而 ChromaFs 负责把每一次底层文件系统调用翻译成 Chroma 查询。

https://arxiv.org/abs/2601.11672

工作原理

引导构建目录树

在智能体执行第一条命令之前,ChromaFs 必须先知道有哪些文件存在。我们把完整的文件树以 gzip 压缩的 JSON 文档(path_tree)形式存放在 Chroma collection 里:

初始化时,服务端会拉取并解压这份文档,生成两份内存结构:一个保存文件路径的 Set<string>,以及一个把目录映射到子项列表的 Map

目录树一旦建好,ls、cd、find 都能在本地内存里完成解析,不需要任何网络调用。目录树会被缓存,同一站点的后续会话甚至可以完全跳过从 Chroma 拉取。

访问控制

注意 path tree 里的 isPublicgroups 字段。在构建文件树之前,ChromaFs 会根据当前用户的权限裁剪文件树,并把同样的过滤条件应用到后续所有 Chroma 查询中。

如果在真实沙箱里做到这种按用户粒度的访问控制,要么得管理 Linux 用户组、chmod 权限,要么按客户等级维护隔离的容器镜像。到了 ChromaFs,这只是 buildFileTree 运行前几行过滤。

从分块重组页面

Chroma 里的页面为了 embedding 会被拆成 chunks,所以当智能体运行 cat /auth/oauth.mdx 时,ChromaFs 会抓取所有 page slug 匹配的 chunks,按 chunk_index 排序,再拼接成完整页面。结果会被缓存,因此在 grep 工作流里重复读取时不会二次命中数据库。

并不是每个文件都必须存在于 Chroma 里。我们注册了按需解析的 lazy file pointers,用于指向存放在客户 S3 buckets 里的大型 OpenAPI specs。智能体在 /api-specs/ 里能看到 v2.json,但内容只有在它运行 cat 时才会拉取。

所有写操作都会抛出 EROFS(Read-Only File System)错误。智能体可以自由探索,但永远无法修改文档,这让系统保持无状态,不需要做会话清理,也没有一个智能体污染另一个智能体视图的风险。

优化 Grep

cat 和 ls 很容易虚拟化,但如果 grep -r 真的按最朴素的方式在网络上逐个扫描文件,那会慢到不可用。我们拦截 just-bash 的 grep,用 yargs-parser 解析 flags,然后把它们翻译成 Chroma 查询(固定字符串用 $contains,模式匹配用 $regex)。

Chroma 充当 粗过滤器,先找出哪些文件可能包含命中,然后我们把这些匹配 chunks 批量 bulkPrefetch 到 Redis 缓存里。接着,我们改写 grep 命令,让它只针对命中的文件,再把它交回 just-bash 做 精细过滤 in-memory execution,这样一来,大型递归查询也能在毫秒级完成。

const chromaFilter = toChromaFilter(
  scannedArgs.patterns,
  scannedArgs.fixedStrings,
  scannedArgs.ignoreCase
);

// 1. Coarse Filter: Ask Chroma for slugs matching the string/regex
const matchedSlugs = await chromaFs.findMatchingFiles(chromaFilter, slugsUnderDirs);
if (matchedSlugs.length === 0) return { stdout: ‘’, exitCode: 1 };

// 2. Prefetch: Pull the chunked files into local cache concurrently
await chromaFs.bulkPrefetch(matchedSlugs);

// 3. Fine Filter: Narrow the arguments to ONLY the resolved hits
const matchedPaths = matchedSlugs.map((s) => ‘/’ + s + ‘.mdx’);
const narrowedArgs = [...args, ...matchedPaths]; // e.g. ["-i", "OAuth", "/docs/auth.mdx"]

// 4. Exec: Let the in-memory RegExp engine format the final output
return execBuiltin(narrowedArgs, ctx);

结论

ChromaFs 为数十万用户的文档助手提供支持,每天覆盖 30,000+ 次对话。通过用现有 Chroma 数据库上的虚拟文件系统替代沙箱,我们实现了即时会话创建、零边际计算成本,并且在不新增任何基础设施的情况下内建了 RBAC。

在任意 Mintlify 文档站点上都能试用,或者直接访问 mintlify.com/docs。

[在这里阅读完整文章:https://www.mintlify.com/blog/how-we-built-a-virtual-filesystem-for-our-assistant]

RAG is great, until it isn't.

RAG 很好用,直到它不再好用。

Our assistant could only retrieve chunks of text that matched a query. If the answer lived across multiple pages, or the user needed exact syntax that didn't land in a top-K result, it was stuck. We wanted it to explore docs the way you'd explore a codebase.

我们的助手只能检索与查询匹配的文本片段。如果答案分散在多个页面里,或者用户需要的精确语法没有落在 top-K 结果中,它就卡住了。我们希望它能像你逛代码库一样逛文档。

Agents are converging on filesystems as their primary interface because grep, cat, ls, and find are all an agent needs. If each doc page is a file and each section is a directory, the agent can search for exact strings, read full pages, and traverse the structure on its own. We just needed a filesystem that mirrored the live docs site.

智能体正在把文件系统当作主要接口,因为对智能体来说,grep、cat、ls、find 就够了。只要把每个文档页面看作一个文件,把每个章节看作一个目录,它就能精确查字符串、读整页内容、自己在结构里穿行。我们只差一个能镜像线上文档站点的文件系统。

The Container Bottleneck

容器瓶颈

The obvious way to do this is to just give the agent a real filesystem. Most harnesses solve this by spinning up an isolated sandbox and cloning the repo. We already use sandboxes for asynchronous background agents where latency is an afterthought, but for a frontend assistant where a user is staring at a loading spinner, the approach falls apart. Our p90 session creation time (including GitHub clone and other setup) was ~46 seconds.

最直接的做法就是给智能体一个真实文件系统。大多数 harness 会通过启动隔离沙箱并克隆仓库来解决。我们已经在异步后台智能体上这么做了,那种场景里延迟不是问题;但对前端助手而言,用户正盯着加载转圈,这套方法就崩了。我们的 p90 会话创建时间(包括 GitHub clone 和其他初始化)是 约 46 秒

Beyond latency, dedicated micro-VMs for reading static documentation introduced a serious infrastructure bill.

除去延迟,为了读静态文档而专门跑微型虚拟机,也带来了不小的基础设施账单。

At 850,000 conversations a month, even a minimal setup (1 vCPU, 2 GiB RAM, 5-minute session lifetime) would put us north of $70,000 a year based on Daytona's per-second sandbox pricing ($0.0504/h per vCPU, $0.0162/h per GiB RAM). Longer session times double that. (This is based on a purely naive approach, a true production workflow would probably have warm pools and container sharing, but the point still stands)

按每月 85 万次对话来算,即便是最小配置(1 vCPU、2 GiB RAM、会话生命周期 5 分钟),按 Daytona 的按秒沙箱计费($0.0504/h per vCPU,$0.0162/h per GiB RAM)换算下来,一年也要超过 $70,000。会话时间更长,这个数字会翻倍。(这只是最朴素的估算;真正的生产方案可能会有 warm pools 和 container sharing,但结论依然成立)

We needed the filesystem workflow to be instant and cheap, which meant rethinking the filesystem itself.

文件系统这条路必须又快又便宜,所以得从文件系统本身重新想。

Faking a Shell

假装有个 Shell

The agent doesn't need a real filesystem; it just needs the illusion of one. Our documentation was already indexed, chunked, and stored in a Chroma database to power our search, so we built ChromaFs: a virtual filesystem that intercepts UNIX commands and translates them into queries against that same database. Session creation dropped from ~46 seconds to ~100 milliseconds, and since ChromaFs reuses infrastructure we already pay for, the marginal per-conversation compute cost is zero.

智能体不需要一个真实的文件系统,它只需要一个幻象。我们的文档早就为了搜索而做过索引、切分,并存进了 Chroma 数据库,所以我们做了 ChromaFs:一个虚拟文件系统,用来拦截 UNIX 命令,并把它们翻译成对同一套数据库的查询。会话创建从约 46 秒降到 约 100 毫秒;而且 ChromaFs 复用了我们本来就在付费的基础设施,因此每次对话的边际计算成本为零。

export class ChromaFs implements IFileSystem {
  private files = new Set<string>();
  private dirs = new Map<string, string[]>();

  async readFile(path: string): Promise<string> {
     this.assertInit();
     const normalized = normalizePath(path);

    // Serve from cache or fetch from Chroma
    const slug = normalized.replace(/\\.mdx$/, '').slice(1);

    // Pages are chunked in Chroma. Reassemble them on the fly:
    const results = await this.collection.get<ChunkMetadata>({
      where: { page: slug },
      include: [IncludeEnum.documents, IncludeEnum.metadatas],
    });

    const chunks = results.ids
      .map((id, i) => ({
        document: results.documents[i] ?? '',
        chunkIndex: parseInt(String(results.metadatas[i]?.chunk_index ?? 0), 10),
      }))
      .sort((a, b) => a.chunkIndex - b.chunkIndex);

    return chunks.map((c) => c.document).join('');

  }

  // Enforce completely stateless, read-only interaction
  async writeFile(): Promise<void> { throw erofs(); }
  async appendFile(): Promise<void> { throw erofs(); }
  async mkdir(): Promise<void> { throw erofs(); }
  async rm(): Promise<void> { throw erofs(); }
}
export class ChromaFs implements IFileSystem {
  private files = new Set<string>();
  private dirs = new Map<string, string[]>();

  async readFile(path: string): Promise<string> {
     this.assertInit();
     const normalized = normalizePath(path);

    // Serve from cache or fetch from Chroma
    const slug = normalized.replace(/\\.mdx$/, '').slice(1);

    // Pages are chunked in Chroma. Reassemble them on the fly:
    const results = await this.collection.get<ChunkMetadata>({
      where: { page: slug },
      include: [IncludeEnum.documents, IncludeEnum.metadatas],
    });

    const chunks = results.ids
      .map((id, i) => ({
        document: results.documents[i] ?? '',
        chunkIndex: parseInt(String(results.metadatas[i]?.chunk_index ?? 0), 10),
      }))
      .sort((a, b) => a.chunkIndex - b.chunkIndex);

    return chunks.map((c) => c.document).join('');

  }

  // Enforce completely stateless, read-only interaction
  async writeFile(): Promise<void> { throw erofs(); }
  async appendFile(): Promise<void> { throw erofs(); }
  async mkdir(): Promise<void> { throw erofs(); }
  async rm(): Promise<void> { throw erofs(); }
}

ChromaFs is built on just-bash by Vercel Labs (shoutout Malte!), a TypeScript reimplementation of bash that supports grep, cat, ls, find, cd, and more. just-bash exposes a pluggable IFileSystem interface, so it handles all the parsing, piping, and flag logic while ChromaFs translates every underlying filesystem call into a Chroma query.

ChromaFs 基于 Vercel Labs 的 just-bash(致敬 Malte!)构建。just-bash 是用 TypeScript 重写的 bash,支持 grep、cat、ls、find、cd 等等。just-bash 提供可插拔的 IFileSystem 接口,因此解析、管道、参数标志这些逻辑都由它处理,而 ChromaFs 负责把每一次底层文件系统调用翻译成 Chroma 查询。

How it works

工作原理

Bootstrapping the Directory Tree

引导构建目录树

ChromaFs needs to know what files exist before the agent runs a single command. We store the entire file tree as a gzipped JSON document (path_tree) inside the Chroma collection:

在智能体执行第一条命令之前,ChromaFs 必须先知道有哪些文件存在。我们把完整的文件树以 gzip 压缩的 JSON 文档(path_tree)形式存放在 Chroma collection 里:

On init, the server fetches and decompresses this document into two in-memory structures: a Set<string> of file paths and a Map mapping directories to children.

初始化时,服务端会拉取并解压这份文档,生成两份内存结构:一个保存文件路径的 Set<string>,以及一个把目录映射到子项列表的 Map

Once built, ls, cd, and find resolve in local memory with no network calls. The tree is cached, so subsequent sessions for the same site skip the Chroma fetch entirely.

目录树一旦建好,ls、cd、find 都能在本地内存里完成解析,不需要任何网络调用。目录树会被缓存,同一站点的后续会话甚至可以完全跳过从 Chroma 拉取。

Access Control

访问控制

Notice the isPublic and groups fields in the path tree. Before building the file tree, ChromaFs prunes the file tree based on the current user's permissions and applies a matching filter to all subsequent Chroma queries.

注意 path tree 里的 isPublicgroups 字段。在构建文件树之前,ChromaFs 会根据当前用户的权限裁剪文件树,并把同样的过滤条件应用到后续所有 Chroma 查询中。

In a real sandbox, this level of per-user access control would require managing Linux user groups, chmod permissions, or maintaining isolated container images per customer tier. In ChromaFs it's a few lines of filtering before buildFileTree runs.

如果在真实沙箱里做到这种按用户粒度的访问控制,要么得管理 Linux 用户组、chmod 权限,要么按客户等级维护隔离的容器镜像。到了 ChromaFs,这只是 buildFileTree 运行前几行过滤。

Reassembling Pages from Chunks

从分块重组页面

Pages in Chroma are split into chunks for embedding, so when the agent runs cat /auth/oauth.mdx, ChromaFs fetches all chunks with a matching page slug, sorts by chunk_index, and joins them into the full page. Results are cached so repeated reads during grep workflows never hit the database twice.

Chroma 里的页面为了 embedding 会被拆成 chunks,所以当智能体运行 cat /auth/oauth.mdx 时,ChromaFs 会抓取所有 page slug 匹配的 chunks,按 chunk_index 排序,再拼接成完整页面。结果会被缓存,因此在 grep 工作流里重复读取时不会二次命中数据库。

Not every file needs to exist in Chroma. We register lazy file pointers that resolve on access for large OpenAPI specs stored in customers' S3 buckets. The agent sees v2.json in /api-specs/, but the content only fetches when it runs cat.

并不是每个文件都必须存在于 Chroma 里。我们注册了按需解析的 lazy file pointers,用于指向存放在客户 S3 buckets 里的大型 OpenAPI specs。智能体在 /api-specs/ 里能看到 v2.json,但内容只有在它运行 cat 时才会拉取。

Every write operation throws an EROFS (Read-Only File System) error. The agent explores freely but can never mutate documentation, which makes the system stateless with no session cleanup and no risk of one agent corrupting another's view.

所有写操作都会抛出 EROFS(Read-Only File System)错误。智能体可以自由探索,但永远无法修改文档,这让系统保持无状态,不需要做会话清理,也没有一个智能体污染另一个智能体视图的风险。

Optimizing Grep

优化 Grep

cat and ls are straightforward to virtualize, but grep -r would be far too slow if it naively scanned every file over the network. We intercept just-bash’s grep, parse the flags with yargs-parser, and translate them into a Chroma query ($contains for fixed strings, $regex for patterns).

cat 和 ls 很容易虚拟化,但如果 grep -r 真的按最朴素的方式在网络上逐个扫描文件,那会慢到不可用。我们拦截 just-bash 的 grep,用 yargs-parser 解析 flags,然后把它们翻译成 Chroma 查询(固定字符串用 $contains,模式匹配用 $regex)。

Chroma acts as a coarse filter that identifies which files might contain the hit, and we bulkPrefetch those matching chunks into a Redis cache. From there, we rewrite the grep command to target only the matched files and hand it back to just-bash for fine filter in-memory execution, which means large recursive queries complete in milliseconds.

Chroma 充当 粗过滤器,先找出哪些文件可能包含命中,然后我们把这些匹配 chunks 批量 bulkPrefetch 到 Redis 缓存里。接着,我们改写 grep 命令,让它只针对命中的文件,再把它交回 just-bash 做 精细过滤 in-memory execution,这样一来,大型递归查询也能在毫秒级完成。

const chromaFilter = toChromaFilter(
  scannedArgs.patterns,
  scannedArgs.fixedStrings,
  scannedArgs.ignoreCase
);

// 1. Coarse Filter: Ask Chroma for slugs matching the string/regex
const matchedSlugs = await chromaFs.findMatchingFiles(chromaFilter, slugsUnderDirs);
if (matchedSlugs.length === 0) return { stdout: ‘’, exitCode: 1 };

// 2. Prefetch: Pull the chunked files into local cache concurrently
await chromaFs.bulkPrefetch(matchedSlugs);

// 3. Fine Filter: Narrow the arguments to ONLY the resolved hits
const matchedPaths = matchedSlugs.map((s) => ‘/’ + s + ‘.mdx’);
const narrowedArgs = [...args, ...matchedPaths]; // e.g. ["-i", "OAuth", "/docs/auth.mdx"]

// 4. Exec: Let the in-memory RegExp engine format the final output
return execBuiltin(narrowedArgs, ctx);
const chromaFilter = toChromaFilter(
  scannedArgs.patterns,
  scannedArgs.fixedStrings,
  scannedArgs.ignoreCase
);

// 1. Coarse Filter: Ask Chroma for slugs matching the string/regex
const matchedSlugs = await chromaFs.findMatchingFiles(chromaFilter, slugsUnderDirs);
if (matchedSlugs.length === 0) return { stdout: ‘’, exitCode: 1 };

// 2. Prefetch: Pull the chunked files into local cache concurrently
await chromaFs.bulkPrefetch(matchedSlugs);

// 3. Fine Filter: Narrow the arguments to ONLY the resolved hits
const matchedPaths = matchedSlugs.map((s) => ‘/’ + s + ‘.mdx’);
const narrowedArgs = [...args, ...matchedPaths]; // e.g. ["-i", "OAuth", "/docs/auth.mdx"]

// 4. Exec: Let the in-memory RegExp engine format the final output
return execBuiltin(narrowedArgs, ctx);

Conclusion

结论

ChromaFs powers the documentation assistant for hundreds of thousands of users across 30,000+ conversations a day. By replacing sandboxes with a virtual filesystem over our existing Chroma database, we got instant session creation, zero marginal compute cost, and built-in RBAC without any new infrastructure.

ChromaFs 为数十万用户的文档助手提供支持,每天覆盖 30,000+ 次对话。通过用现有 Chroma 数据库上的虚拟文件系统替代沙箱,我们实现了即时会话创建、零边际计算成本,并且在不新增任何基础设施的情况下内建了 RBAC。

Try it on any Mintlify docs site, or mintlify.com/docs.

在任意 Mintlify 文档站点上都能试用,或者直接访问 mintlify.com/docs。

[Read the full article at: https://www.mintlify.com/blog/how-we-built-a-virtual-filesystem-for-our-assistant]

[在这里阅读完整文章:https://www.mintlify.com/blog/how-we-built-a-virtual-filesystem-for-our-assistant]

RAG is great, until it isn't.

Our assistant could only retrieve chunks of text that matched a query. If the answer lived across multiple pages, or the user needed exact syntax that didn't land in a top-K result, it was stuck. We wanted it to explore docs the way you'd explore a codebase.

Agents are converging on filesystems as their primary interface because grep, cat, ls, and find are all an agent needs. If each doc page is a file and each section is a directory, the agent can search for exact strings, read full pages, and traverse the structure on its own. We just needed a filesystem that mirrored the live docs site.

The Container Bottleneck

The obvious way to do this is to just give the agent a real filesystem. Most harnesses solve this by spinning up an isolated sandbox and cloning the repo. We already use sandboxes for asynchronous background agents where latency is an afterthought, but for a frontend assistant where a user is staring at a loading spinner, the approach falls apart. Our p90 session creation time (including GitHub clone and other setup) was ~46 seconds.

Beyond latency, dedicated micro-VMs for reading static documentation introduced a serious infrastructure bill.

At 850,000 conversations a month, even a minimal setup (1 vCPU, 2 GiB RAM, 5-minute session lifetime) would put us north of $70,000 a year based on Daytona's per-second sandbox pricing ($0.0504/h per vCPU, $0.0162/h per GiB RAM). Longer session times double that. (This is based on a purely naive approach, a true production workflow would probably have warm pools and container sharing, but the point still stands)

We needed the filesystem workflow to be instant and cheap, which meant rethinking the filesystem itself.

Faking a Shell

The agent doesn't need a real filesystem; it just needs the illusion of one. Our documentation was already indexed, chunked, and stored in a Chroma database to power our search, so we built ChromaFs: a virtual filesystem that intercepts UNIX commands and translates them into queries against that same database. Session creation dropped from ~46 seconds to ~100 milliseconds, and since ChromaFs reuses infrastructure we already pay for, the marginal per-conversation compute cost is zero.

export class ChromaFs implements IFileSystem {
  private files = new Set<string>();
  private dirs = new Map<string, string[]>();

  async readFile(path: string): Promise<string> {
     this.assertInit();
     const normalized = normalizePath(path);

    // Serve from cache or fetch from Chroma
    const slug = normalized.replace(/\\.mdx$/, '').slice(1);

    // Pages are chunked in Chroma. Reassemble them on the fly:
    const results = await this.collection.get<ChunkMetadata>({
      where: { page: slug },
      include: [IncludeEnum.documents, IncludeEnum.metadatas],
    });

    const chunks = results.ids
      .map((id, i) => ({
        document: results.documents[i] ?? '',
        chunkIndex: parseInt(String(results.metadatas[i]?.chunk_index ?? 0), 10),
      }))
      .sort((a, b) => a.chunkIndex - b.chunkIndex);

    return chunks.map((c) => c.document).join('');

  }

  // Enforce completely stateless, read-only interaction
  async writeFile(): Promise<void> { throw erofs(); }
  async appendFile(): Promise<void> { throw erofs(); }
  async mkdir(): Promise<void> { throw erofs(); }
  async rm(): Promise<void> { throw erofs(); }
}

ChromaFs is built on just-bash by Vercel Labs (shoutout Malte!), a TypeScript reimplementation of bash that supports grep, cat, ls, find, cd, and more. just-bash exposes a pluggable IFileSystem interface, so it handles all the parsing, piping, and flag logic while ChromaFs translates every underlying filesystem call into a Chroma query.

https://arxiv.org/abs/2601.11672

How it works

Bootstrapping the Directory Tree

ChromaFs needs to know what files exist before the agent runs a single command. We store the entire file tree as a gzipped JSON document (path_tree) inside the Chroma collection:

On init, the server fetches and decompresses this document into two in-memory structures: a Set<string> of file paths and a Map mapping directories to children.

Once built, ls, cd, and find resolve in local memory with no network calls. The tree is cached, so subsequent sessions for the same site skip the Chroma fetch entirely.

Access Control

Notice the isPublic and groups fields in the path tree. Before building the file tree, ChromaFs prunes the file tree based on the current user's permissions and applies a matching filter to all subsequent Chroma queries.

In a real sandbox, this level of per-user access control would require managing Linux user groups, chmod permissions, or maintaining isolated container images per customer tier. In ChromaFs it's a few lines of filtering before buildFileTree runs.

Reassembling Pages from Chunks

Pages in Chroma are split into chunks for embedding, so when the agent runs cat /auth/oauth.mdx, ChromaFs fetches all chunks with a matching page slug, sorts by chunk_index, and joins them into the full page. Results are cached so repeated reads during grep workflows never hit the database twice.

Not every file needs to exist in Chroma. We register lazy file pointers that resolve on access for large OpenAPI specs stored in customers' S3 buckets. The agent sees v2.json in /api-specs/, but the content only fetches when it runs cat.

Every write operation throws an EROFS (Read-Only File System) error. The agent explores freely but can never mutate documentation, which makes the system stateless with no session cleanup and no risk of one agent corrupting another's view.

Optimizing Grep

cat and ls are straightforward to virtualize, but grep -r would be far too slow if it naively scanned every file over the network. We intercept just-bash’s grep, parse the flags with yargs-parser, and translate them into a Chroma query ($contains for fixed strings, $regex for patterns).

Chroma acts as a coarse filter that identifies which files might contain the hit, and we bulkPrefetch those matching chunks into a Redis cache. From there, we rewrite the grep command to target only the matched files and hand it back to just-bash for fine filter in-memory execution, which means large recursive queries complete in milliseconds.

const chromaFilter = toChromaFilter(
  scannedArgs.patterns,
  scannedArgs.fixedStrings,
  scannedArgs.ignoreCase
);

// 1. Coarse Filter: Ask Chroma for slugs matching the string/regex
const matchedSlugs = await chromaFs.findMatchingFiles(chromaFilter, slugsUnderDirs);
if (matchedSlugs.length === 0) return { stdout: ‘’, exitCode: 1 };

// 2. Prefetch: Pull the chunked files into local cache concurrently
await chromaFs.bulkPrefetch(matchedSlugs);

// 3. Fine Filter: Narrow the arguments to ONLY the resolved hits
const matchedPaths = matchedSlugs.map((s) => ‘/’ + s + ‘.mdx’);
const narrowedArgs = [...args, ...matchedPaths]; // e.g. ["-i", "OAuth", "/docs/auth.mdx"]

// 4. Exec: Let the in-memory RegExp engine format the final output
return execBuiltin(narrowedArgs, ctx);

Conclusion

ChromaFs powers the documentation assistant for hundreds of thousands of users across 30,000+ conversations a day. By replacing sandboxes with a virtual filesystem over our existing Chroma database, we got instant session creation, zero marginal compute cost, and built-in RBAC without any new infrastructure.

Try it on any Mintlify docs site, or mintlify.com/docs.

[Read the full article at: https://www.mintlify.com/blog/how-we-built-a-virtual-filesystem-for-our-assistant]

📋 讨论归档

讨论进行中…