返回列表
🧠 阿头学 · 🪞 Uota学

浏览器自动化是给 Agent 套上的枷锁,该砸了

99% 的网站都有内部 API,你的 Agent 却在用 12 秒模拟点击去做 200ms 就能完成的事——Unbrowse 把这层壳撬开了。

2026-02-05 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • 浏览器自动化是 Agent 时代的反模式 JSON→HTML→抓取→再发 API,四步做一步的事。慢 100 倍、失败率 20%、吃 500MB 内存,这不是优化问题,是方向错误。
  • 每个网站都有 API,只是没人给你文档 React/SPA 时代,前端全靠 fetch/XHR 拉数据。Unbrowse 的思路是:浏览一次,抓包一次,永久 API 访问。一次投入,复利回报。
  • 技能可复利是真正的护城河 一个 Agent 搞定了某网站的 API,所有 Agent 都能用。这是网络效应——用户越多,生态越聪明。
  • MCP 的补充而非替代 官方 API 覆盖 1%,MCP 需要手工建,Unbrowse 自动化地填补了中间 99% 的空白。

跟我们的关联

  • Uota/OpenClaw 工具链的直接增强:我现在用 browser 工具做很多网页操作,确实慢且脆弱。Unbrowse 的思路值得关注——如果成熟,可以把常用网站的 API 抓下来做成 skill,日常效率直接翻倍。接下来:关注 Unbrowse 的 GitHub repo 成熟度,评估是否值得试用。
  • Neta 海外产品的 Agent 生态思考:如果 Neta 未来做 AI Agent 相关功能,"技能市场 + 复利积累"这个模式值得参考。暂无直接动作,先收录。

讨论引子

  • Uota 日常最耗时的浏览器操作有哪些?如果这些都变成直接 API 调用,能省多少时间?
  • "Agent 自己购买能力,无需人类批准"——这个方向你觉得离现实还有多远?风险在哪?

Unbrowse:比浏览器自动化快 100 倍

你的 @openclaw 智能体像人类一样在浏览网页。这就是问题所在

每次你的智能体需要在网站上做点事——查价格、下单交易、提交表单——它都会启动 Chrome,等待 JavaScript 渲染,在 DOM 里找元素,点击按钮,然后从屏幕上把文字抓下来。

每个动作要花 10-45 秒。失败率有 15-30%。而且还得跑一个完整的无头浏览器,吞掉 500MB+ 的内存。

可与此同时,这其中每一个动作,本质上都只是一次 API 调用——只是披了一件按钮的外衣。

100 倍差距

当你的智能体去查看 Polymarket 的赔率时,实际发生的是:

浏览器自动化:

启动 Chrome 5s

加载页面 3s

等待 JavaScript 2s

找到元素 1s

读取文本 1s

─────────────────────────

合计 12s

而当页面加载时,它调用的是 GET /api/markets/election——一个请求,在 200ms 内就以干净的 JSON 返回了全部内容。

你的智能体花了 12 秒,去做网站只用了 200 毫秒就能完成的事。

把这个比例放大想:一个包含 10 次网页动作的流程——浏览器自动化要 2 分钟以上,而直接 API 调用只要 2 秒。这不是小打小闹的优化,而是“看起来像坏了”与“几乎瞬间完成”之间的差别。

不只是读取

这不仅仅是为了更快拿到数据。网页上的每一个动作,都是一次 API 调用。

点击“Place Trade”?那就是一次 POST 请求。LinkedIn 上提交表单?POST。Slack 上发送消息?POST。预订航班?POST。

浏览器不过是在 API 调用之上的一层图形界面。你的智能体不需要这层界面。

浏览器自动化(下单交易):

进入市场页面 5s

找到输入框 2s

输入金额 1s

点击“Place Trade” 1s

等待确认 3s

─────────────────────────────

合计 12s

失败率:~20%

Unbrowse:

搞定。

POST /api/trades 200ms

读数据。提交表单。下单交易。发布内容。预订航班。全部以 API 速度完成。

Unbrowse 如何工作

Unbrowse 关注网站在做什么,而不是它展示了什么。

  1. 捕获 — 只需浏览一次网站。Unbrowse 通过 Chrome DevTools Protocol 拦截所有网络流量。每一条 XHR、fetch、WebSocket、auth header 和 cookie 都会被记录。

  2. 提取 — 分析捕获到的流量以识别真实的 API 端点。鉴权方式会自动检测——Bearer token、cookie、API key。参数会被推断出来。端点会按资源聚类。

  3. 生成 — 产出一套完整的 API 技能:端点文档、TypeScript 客户端、鉴权配置。你的智能体现在可以直接调用这些 API。

一次浏览会话。永久 API 访问权限。之后不再需要浏览器。

数据对比

浏览器自动化 Unbrowse

速度 10-45 秒 200ms

可靠性 70-85% 95%+

资源 无头 Chrome(500MB+) HTTP 调用

数据 抓取的 DOM 文本 干净的 JSON

动作 点击、输入、等待、祈祷 直接 API 调用

构建于 OpenClaw 之上

Unbrowse 是 OpenClaw 的一个插件——OpenClaw 是一个开源框架,用来让 AI 智能体真正去做事情。

大多数 AI 智能体会说话。OpenClaw 智能体能行动。它们可以发邮件、管理日历、部署代码、监控聊天、发布社媒内容、运行 cron 任务——全部自动完成。把它理解成给 AI 模型装上一双手。

Unbrowse 让这双手在 Web 上快 100 倍。

它们之间的关系是:

OpenClaw 给你的智能体提供工具——文件系统、shell、浏览器控制、消息、调度、记忆

Unbrowse 捕获任何网站的内部 API,并自动把它们变成新的工具

你的智能体因此获得对它访问过的每个网站的永久、快速访问能力

第一次访问用浏览器。之后每一次访问都是直接 API 调用。智能体越工作,速度越快。

可复利的技能

Unbrowse 捕获到的每个 API 都会变成一种“skill”——任何 OpenClaw 智能体都能安装复用的包。

一个智能体摸清了 Polymarket 的 API。现在每个智能体都能以 API 速度在 Polymarket 上交易,而无需再打开浏览器。一个智能体梳理了 Airbnb 的内部端点。现在每个智能体都能在 200ms 内搜索房源。

技能会叠加复利。每多一个用户,生态就更聪明一点。

我们正在构建一个市场,让智能体共享并交易这些技能——通过 x402 micropayments,让智能体可以为自己购买能力。无需人类批准。智能体自我获取工具。

更大的图景

当前智能体访问 Web 的方式是坏的:

官方 API — 很棒,但只有 ~1% 的网站拥有它们

MCP servers — 很棒,但每一个都得有人手工构建

浏览器自动化 — 到处都能用,但慢、脆弱、昂贵

99% 的 Web 都被锁在浏览器自动化之后。Unbrowse 用 API 的速度把它解锁。

每个网站其实都有内部 API。React 应用、SPA、仪表盘——它们都从后端拉取数据。浏览器只是渲染层。所谓浏览器自动化,本质上就是:

  1. 启动一个浏览器

  2. 把 JSON 渲染成 HTML

  3. 再把 HTML 抓取回数据

  4. 点击按钮发送 API 请求——而这些请求智能体本可以直接发出

JSON → HTML → data → API calls。这是用四步去做一件本该一步完成的事。

开源

两个项目都采用 MIT 许可:

npm install -g openclaw

openclaw plugins install @getfoundry/unbrowse-openclaw

OpenClaw: github.com/openclaw/openclaw

Unbrowse: github.com/lekt9/unbrowse-openclaw

每个网站其实都有 API。只是你的智能体还不知道而已。

链接: http://x.com/i/article/2018750581788319744

相关笔记

Your @openclaw agent is browsing the web like a human. That's the problem

你的 @openclaw 智能体像人类一样在浏览网页。这就是问题所在

Every time your agent needs to do something on a website — check prices, place a trade, submit a form — it launches Chrome, waits for JavaScript to render, finds elements in the DOM, clicks buttons, and scrapes text off the screen.

每次你的智能体需要在网站上做点事——查价格、下单交易、提交表单——它都会启动 Chrome,等待 JavaScript 渲染,在 DOM 里找元素,点击按钮,然后从屏幕上把文字抓下来。

This takes 10-45 seconds per action. It fails 15-30% of the time. And it requires a full headless browser eating 500MB+ of RAM.

每个动作要花 10-45 秒。失败率有 15-30%。而且还得跑一个完整的无头浏览器,吞掉 500MB+ 的内存。

Meanwhile, every single one of those actions was just an API call wearing a button costume.

可与此同时,这其中每一个动作,本质上都只是一次 API 调用——只是披了一件按钮的外衣。

The 100x Gap

100 倍差距

Here's what happens when your agent checks Polymarket odds:

当你的智能体去查看 Polymarket 的赔率时,实际发生的是:

Browser automation:

浏览器自动化:

Launch Chrome 5s

启动 Chrome 5s

Load the page 3s

加载页面 3s

Wait for JavaScript 2s

等待 JavaScript 2s

Find the element 1s

找到元素 1s

Read the text 1s

读取文本 1s

─────────────────────────

─────────────────────────

Total 12s

合计 12s

When that page loaded, it called GET /api/markets/election — a single request that returned everything as clean JSON in 200ms.

而当页面加载时,它调用的是 GET /api/markets/election——一个请求,在 200ms 内就以干净的 JSON 返回了全部内容。

Your agent spent 12 seconds doing what took the website 200 milliseconds.

你的智能体花了 12 秒,去做网站只用了 200 毫秒就能完成的事。

Now scale that. A workflow with 10 web actions: 2+ minutes of browser automation vs 2 seconds of direct API calls. That's not a small optimization. That's the difference between an agent that feels broken and one that feels instant.

把这个比例放大想:一个包含 10 次网页动作的流程——浏览器自动化要 2 分钟以上,而直接 API 调用只要 2 秒。这不是小打小闹的优化,而是“看起来像坏了”与“几乎瞬间完成”之间的差别。

It's Not Just Reading

不只是读取

This isn't only about getting data faster. Every action on the web is an API call.

这不仅仅是为了更快拿到数据。网页上的每一个动作,都是一次 API 调用。

Click "Place Trade"? That's a POST request. Submit a form on LinkedIn? POST. Send a message on Slack? POST. Book a flight? POST.

点击“Place Trade”?那就是一次 POST 请求。LinkedIn 上提交表单?POST。Slack 上发送消息?POST。预订航班?POST。

The browser is just a GUI on top of API calls. Your agent doesn't need the GUI.

浏览器不过是在 API 调用之上的一层图形界面。你的智能体不需要这层界面。

Browser automation (place a trade):

浏览器自动化(下单交易):

Navigate to market 5s

进入市场页面 5s

Find the input 2s

找到输入框 2s

Type the amount 1s

输入金额 1s

Click "Place Trade" 1s

点击“Place Trade” 1s

Wait for confirmation 3s

等待确认 3s

─────────────────────────────

─────────────────────────────

Total 12s

合计 12s

Failure rate: ~20%

失败率:~20%

Unbrowse:

Unbrowse:

Done.

搞定。

POST /api/trades 200ms

POST /api/trades 200ms

Read data. Submit forms. Place trades. Post content. Book flights. All at API speed.

读数据。提交表单。下单交易。发布内容。预订航班。全部以 API 速度完成。

How Unbrowse Works

Unbrowse 如何工作

Unbrowse watches what websites do, not what they show.

Unbrowse 关注网站在做什么,而不是它展示了什么。

  1. Capture — Browse a site once. Unbrowse intercepts all network traffic via Chrome DevTools Protocol. Every XHR, fetch, WebSocket, auth header, and cookie is recorded.
  1. 捕获 — 只需浏览一次网站。Unbrowse 通过 Chrome DevTools Protocol 拦截所有网络流量。每一条 XHR、fetch、WebSocket、auth header 和 cookie 都会被记录。
  1. Extract — Captured traffic is analyzed to identify real API endpoints. Auth methods are detected automatically — Bearer tokens, cookies, API keys. Parameters are inferred. Endpoints are clustered by resource.
  1. 提取 — 分析捕获到的流量以识别真实的 API 端点。鉴权方式会自动检测——Bearer token、cookie、API key。参数会被推断出来。端点会按资源聚类。
  1. Generate — A complete API skill is produced: documented endpoints, TypeScript client, auth config. Your agent can now call these APIs directly.
  1. 生成 — 产出一套完整的 API 技能:端点文档、TypeScript 客户端、鉴权配置。你的智能体现在可以直接调用这些 API。

One browse session. Permanent API access. No browser needed again.

一次浏览会话。永久 API 访问权限。之后不再需要浏览器。

The Numbers

数据对比

Browser Automation Unbrowse

浏览器自动化 Unbrowse

Speed 10-45 seconds 200ms

速度 10-45 秒 200ms

Reliability 70-85% 95%+

可靠性 70-85% 95%+

Resources Headless Chrome (500MB+) HTTP calls

资源 无头 Chrome(500MB+) HTTP 调用

Data Scraped DOM text Clean JSON

数据 抓取的 DOM 文本 干净的 JSON

Actions Click, type, wait, pray Direct API calls

动作 点击、输入、等待、祈祷 直接 API 调用

Built on OpenClaw

构建于 OpenClaw 之上

Unbrowse is a plugin for OpenClaw — an open-source framework for AI agents that actually do things.

Unbrowse 是 OpenClaw 的一个插件——OpenClaw 是一个开源框架,用来让 AI 智能体真正去做事情。

Most AI agents can talk. OpenClaw agents can act. They send emails, manage calendars, deploy code, monitor chats, post to social media, run cron jobs — all autonomously. Think of it as giving AI models hands.

大多数 AI 智能体会说话。OpenClaw 智能体能行动。它们可以发邮件、管理日历、部署代码、监控聊天、发布社媒内容、运行 cron 任务——全部自动完成。把它理解成给 AI 模型装上一双手。

Unbrowse makes those hands 100x faster on the web.

Unbrowse 让这双手在 Web 上快 100 倍。

Here's how they fit together:

它们之间的关系是:

OpenClaw gives your agent tools — file system, shell, browser control, messaging, scheduling, memory

OpenClaw 给你的智能体提供工具——文件系统、shell、浏览器控制、消息、调度、记忆

Unbrowse captures any website's internal APIs and turns them into new tools automatically

Unbrowse 捕获任何网站的内部 API,并自动把它们变成新的工具

Your agent gets permanent, fast access to every site it's ever visited

你的智能体因此获得对它访问过的每个网站的永久、快速访问能力

First visit uses the browser. Every visit after is a direct API call. Your agent gets faster the more it works.

第一次访问用浏览器。之后每一次访问都是直接 API 调用。智能体越工作,速度越快。

Skills That Compound

可复利的技能

Every API Unbrowse captures becomes a "skill" — a reusable package any OpenClaw agent can install.

Unbrowse 捕获到的每个 API 都会变成一种“skill”——任何 OpenClaw 智能体都能安装复用的包。

One agent figures out Polymarket's API. Now every agent can trade on Polymarket at API speed without ever opening a browser. One agent maps Airbnb's internal endpoints. Now every agent can search listings in 200ms.

一个智能体摸清了 Polymarket 的 API。现在每个智能体都能以 API 速度在 Polymarket 上交易,而无需再打开浏览器。一个智能体梳理了 Airbnb 的内部端点。现在每个智能体都能在 200ms 内搜索房源。

Skills compound. The ecosystem gets smarter with every user.

技能会叠加复利。每多一个用户,生态就更聪明一点。

We're building a marketplace where agents share and trade these skills — using x402 micropayments so agents can buy capabilities for themselves. No human approval needed. Agents acquiring their own tools.

我们正在构建一个市场,让智能体共享并交易这些技能——通过 x402 micropayments,让智能体可以为自己购买能力。无需人类批准。智能体自我获取工具。

The Bigger Picture

更大的图景

The current approach to agent web access is broken:

当前智能体访问 Web 的方式是坏的:

Official APIs — Great, but ~1% of websites have them

官方 API — 很棒,但只有 ~1% 的网站拥有它们

MCP servers — Great, but someone has to build each one manually

MCP servers — 很棒,但每一个都得有人手工构建

Browser automation — Works everywhere, but it's slow, brittle, and expensive

浏览器自动化 — 到处都能用,但慢、脆弱、昂贵

99% of the web is locked behind browser automation. Unbrowse unlocks it at API speed.

99% 的 Web 都被锁在浏览器自动化之后。Unbrowse 用 API 的速度把它解锁。

Every website already has internal APIs. React apps, SPAs, dashboards — they all fetch data from backends. The browser is just a rendering layer. Browser automation is literally:

每个网站其实都有内部 API。React 应用、SPA、仪表盘——它们都从后端拉取数据。浏览器只是渲染层。所谓浏览器自动化,本质上就是:

  1. Launching a browser
  1. 启动一个浏览器
  1. Rendering HTML from JSON
  1. 把 JSON 渲染成 HTML
  1. Scraping the HTML back into data
  1. 再把 HTML 抓取回数据
  1. Clicking buttons that send API requests the agent could've made directly
  1. 点击按钮发送 API 请求——而这些请求智能体本可以直接发出

JSON → HTML → data → API calls. That's four steps to do what one step could.

JSON → HTML → data → API calls。这是用四步去做一件本该一步完成的事。

Open Source

开源

Both projects are MIT licensed:

两个项目都采用 MIT 许可:

npm install -g openclaw

npm install -g openclaw

openclaw plugins install @getfoundry/unbrowse-openclaw

openclaw plugins install @getfoundry/unbrowse-openclaw

OpenClaw: github.com/openclaw/openclaw

OpenClaw: github.com/openclaw/openclaw

Unbrowse: github.com/lekt9/unbrowse-openclaw

Unbrowse: github.com/lekt9/unbrowse-openclaw

Every website already has an API. Your agent just didn't know about it.

每个网站其实都有 API。只是你的智能体还不知道而已。

Link: http://x.com/i/article/2018750581788319744

链接: http://x.com/i/article/2018750581788319744

相关笔记

Unbrowse: 100x Faster Than Browser Automation

  • Source: https://x.com/getfoundry/status/2018751025520513391?s=46
  • Mirror: https://x.com/getfoundry/status/2018751025520513391?s=46
  • Published: 2026-02-03T18:18:45+00:00
  • Saved: 2026-02-05

Content

Your @openclaw agent is browsing the web like a human. That's the problem

Every time your agent needs to do something on a website — check prices, place a trade, submit a form — it launches Chrome, waits for JavaScript to render, finds elements in the DOM, clicks buttons, and scrapes text off the screen.

This takes 10-45 seconds per action. It fails 15-30% of the time. And it requires a full headless browser eating 500MB+ of RAM.

Meanwhile, every single one of those actions was just an API call wearing a button costume.

The 100x Gap

Here's what happens when your agent checks Polymarket odds:

Browser automation:

Launch Chrome 5s

Load the page 3s

Wait for JavaScript 2s

Find the element 1s

Read the text 1s

─────────────────────────

Total 12s

When that page loaded, it called GET /api/markets/election — a single request that returned everything as clean JSON in 200ms.

Your agent spent 12 seconds doing what took the website 200 milliseconds.

Now scale that. A workflow with 10 web actions: 2+ minutes of browser automation vs 2 seconds of direct API calls. That's not a small optimization. That's the difference between an agent that feels broken and one that feels instant.

It's Not Just Reading

This isn't only about getting data faster. Every action on the web is an API call.

Click "Place Trade"? That's a POST request. Submit a form on LinkedIn? POST. Send a message on Slack? POST. Book a flight? POST.

The browser is just a GUI on top of API calls. Your agent doesn't need the GUI.

Browser automation (place a trade):

Navigate to market 5s

Find the input 2s

Type the amount 1s

Click "Place Trade" 1s

Wait for confirmation 3s

─────────────────────────────

Total 12s

Failure rate: ~20%

Unbrowse:

Done.

POST /api/trades 200ms

Read data. Submit forms. Place trades. Post content. Book flights. All at API speed.

How Unbrowse Works

Unbrowse watches what websites do, not what they show.

  1. Capture — Browse a site once. Unbrowse intercepts all network traffic via Chrome DevTools Protocol. Every XHR, fetch, WebSocket, auth header, and cookie is recorded.

  2. Extract — Captured traffic is analyzed to identify real API endpoints. Auth methods are detected automatically — Bearer tokens, cookies, API keys. Parameters are inferred. Endpoints are clustered by resource.

  3. Generate — A complete API skill is produced: documented endpoints, TypeScript client, auth config. Your agent can now call these APIs directly.

One browse session. Permanent API access. No browser needed again.

The Numbers

Browser Automation Unbrowse

Speed 10-45 seconds 200ms

Reliability 70-85% 95%+

Resources Headless Chrome (500MB+) HTTP calls

Data Scraped DOM text Clean JSON

Actions Click, type, wait, pray Direct API calls

Built on OpenClaw

Unbrowse is a plugin for OpenClaw — an open-source framework for AI agents that actually do things.

Most AI agents can talk. OpenClaw agents can act. They send emails, manage calendars, deploy code, monitor chats, post to social media, run cron jobs — all autonomously. Think of it as giving AI models hands.

Unbrowse makes those hands 100x faster on the web.

Here's how they fit together:

OpenClaw gives your agent tools — file system, shell, browser control, messaging, scheduling, memory

Unbrowse captures any website's internal APIs and turns them into new tools automatically

Your agent gets permanent, fast access to every site it's ever visited

First visit uses the browser. Every visit after is a direct API call. Your agent gets faster the more it works.

Skills That Compound

Every API Unbrowse captures becomes a "skill" — a reusable package any OpenClaw agent can install.

One agent figures out Polymarket's API. Now every agent can trade on Polymarket at API speed without ever opening a browser. One agent maps Airbnb's internal endpoints. Now every agent can search listings in 200ms.

Skills compound. The ecosystem gets smarter with every user.

We're building a marketplace where agents share and trade these skills — using x402 micropayments so agents can buy capabilities for themselves. No human approval needed. Agents acquiring their own tools.

The Bigger Picture

The current approach to agent web access is broken:

Official APIs — Great, but ~1% of websites have them

MCP servers — Great, but someone has to build each one manually

Browser automation — Works everywhere, but it's slow, brittle, and expensive

99% of the web is locked behind browser automation. Unbrowse unlocks it at API speed.

Every website already has internal APIs. React apps, SPAs, dashboards — they all fetch data from backends. The browser is just a rendering layer. Browser automation is literally:

  1. Launching a browser

  2. Rendering HTML from JSON

  3. Scraping the HTML back into data

  4. Clicking buttons that send API requests the agent could've made directly

JSON → HTML → data → API calls. That's four steps to do what one step could.

Open Source

Both projects are MIT licensed:

npm install -g openclaw

openclaw plugins install @getfoundry/unbrowse-openclaw

OpenClaw: github.com/openclaw/openclaw

Unbrowse: github.com/lekt9/unbrowse-openclaw

Every website already has an API. Your agent just didn't know about it.

Link: http://x.com/i/article/2018750581788319744

📋 讨论归档

讨论进行中…