🧠 阿头学 · 💬 讨论题

别再为智能体建“富士康工厂”

这篇文章最有价值的判断是“AI 时代的软件生产函数已经变了”，但作者把这一点推成“传统工程护栏普遍过时”，明显说过头了。
打开原文 ↗

2026-06-02 原文链接 ↗

阅读简报

双语对照

完整翻译

原文

讨论归档

核心观点

成本结构确实反转了 作者抓得最准的一点是：过去是“模型贵、代码便宜”，所以用大量代码包住少量模型调用是理性的；现在模型更强、调用更便宜、代码维护更贵，继续靠堆控制代码扩张能力，边际收益正在快速变差。
真正的迁移不是少写代码，而是重写分层 作者主张用 markdown 承载意图、技能和判断，用少量 TypeScript 处理 I/O 和确定性部分，这个方向成立；关键不是“反代码”，而是把“判断”从硬编码迁到可迭代、可评估的技能层。
“skillify”是文中最硬的工程资产观 把一次跑通的 agent 工作流，沉淀成带 markdown、最少代码、单测、LLM eval、集成测试和自动调用机制的 skill pack，这比“vibe coding”更像真正的方法论，因为它强调复用、评估和能力复利。
作者对传统测试的攻击带有偷换概念 他前半段把 27.6 万行测试说成“笼子”，后半段又把 skill pack 的测试与评估说成“魔法”；这不叫抛弃护栏，这叫把护栏从旧式代码层改造成新式评估层，所以他的“自由 vs 控制”叙事有明显夸张成分。
“把 token 用满”是投资观点，不是通用工程结论 对资金充足、追求速度优势的创业团队，这可能是合理下注；但对中小团队、强合规行业和成本敏感业务，这种说法过于硅谷化，容易把算力豪赌误包装成普适真理。

跟我们的关联

对 ATou 意味着什么 ATou 如果还把 AI 主要当“提效写代码工具”，就会低估 agent-native 工作流的结构变化；下一步应该挑 1-2 个高模糊、高复用的任务，试着沉淀成可测试的 skill pack，而不是继续堆流程脚本。
对 Neta 意味着什么 Neta 要看到这里最重要的不是“markdown”这个表层形式，而是“意图层/能力层/确定性层”的分层思想；下一步可以把现有知识工作拆成哪些该交给 agent 自由发挥，哪些必须保留硬边界。
对 Uota 意味着什么 Uota 若关注组织与个体表达，这篇文章其实在说“高能力系统不该被低水平流程驯化”；下一步可以把它转译到团队协作：少做过度微观管理，多做可验证但不窒息的环境设计。
对投资判断意味着什么 这篇文章支持一个明确观点：AI 应用的壁垒会从“功能堆砌”转向“任务定义、评估机制和技能复用”；下一步看项目时，要少问“功能有多少”，多问“哪些工作流被沉淀成资产、能否持续复利”。

讨论引子

1. 在什么场景里，“去掉控制代码”是升级；在什么场景里，这反而是危险降级？ 2. skill pack 这种“厚意图、薄代码、强评估”的结构，能不能成为 AI 产品的主流原语，还是只适合少数高模糊任务？ 3. “多烧 token 买未来优势”到底是先发红利，还是资本充足者才玩得起的叙事泡沫？

今年一月，我重新开始写代码，做了 Garry's List。五十多万行 Rails 代码，再加上一整套给它立规矩的测试。

当时挺自豪。其实不该。真正值得自豪的，不是这个应用。值得自豪的是在做它的过程中长出来的那套方法。GStack，也就是我和智能体一起写代码的方式，就是在构建 Garry's List 的过程中长出来的，后来我把它开源了。它成了 GitHub 历史上星标数前一百的开源项目之一，不到三个月大概拿了 10.5 万星。那五十万行代码是产品。那套方法是副产品。真正重要的是副产品。

把一个 LLM 外面包上 54 万行代码，本质上到底是什么。

它是一座富士康工厂。 是为一个超高智商的 AI 工人建的，而这个工人其实并不需要这种过度监工。可我们还是建了。

门口套鞋套。早上六点起床。集体操练。日子苦到每栋楼高层都得装防护网，因为……那根本不是人想过的生活。永远站在同一条流水线边。每一个测试、每一道护栏、每一个重试循环，都是往一个本来就能把活干好、还能顺手多做一千件事的工人身上，再焊上一寸笼子。

人和智能体都远不止一种可能。可富士康工厂这种东西，就是拿来把那些本可以做完这些工作、甚至还能多做 1000 倍的美好存在，硬生生压榨成单一劳动力的。

这座工厂是我建的。现在大家都在这么建。我要说的是，别这么干。

时间旅行者

我那 53.9 万行代码真正证明的，其实是另一件事。我完美扮演了一个时间旅行者。一个 2013 年的 Web 2.0 工程师，也就是我上一次还算真正软件工程师的时候，被扔进了 2026 年，拿着现代工具，却还在用自己唯一会的方式造东西。更多代码。永远是更多代码。工具变了。直觉没变。

2013 年的工程师骨子里相信一件事，能力等于代码行数。这个信念在过去几十年里一直都对，直到现在。把 Codex 或 Claude Code 给我，我能干出 100 到 1000 个工程师的活。地图还是那张地图，引擎更快了，通往的还是那个如今已经错掉的目的地，而且是用最快的速度抵达。

现在几乎所有用 AI 搞开发的人，都卡在这个位置上。他们升级了工具，却还沿用 2013 年的脑回路。这个陷阱不像陷阱，因为代码确实能跑。Garry's List 上线了。那一个月感觉像是我人生里效率最高的一个月。

但那只是把生产力用在了一个已经过时的想法上。

LLM 曾经很贵，所以我们只能驯化它们

过去很多年，直到 2025 年，老的经济账一直是这样的，LLM 调用很贵，代码很便宜。 所以你会写很多代码去节省模型调用，去驯化它，谨慎地、克制地调用它。那时的架构，就是用大量软件把少数几次昂贵的模型调用层层包起来，细心保护。

现在这套账的两边都翻过来了。

现在模型越来越便宜，而且几乎每个季度都还在继续降价，同时它已经聪明到价值和成本的比值彻底反转了。更别提模型自己还能写出能用的代码。所以你不该再写代码去照看模型了。你现在可以直接用自然语言指挥模型，然后让它去写那部分真正需要、而且尽量少的代码。

这就是即时软件。 我们正在走进它的黄金时代。

最终产物的形态会完全变掉。那个 Rails 应用是我写的 54 万行代码，我自己维护，代码外加一整套拿来管它的测试。取代它的，会是一个建立在 markdown 和少量代码之上的智能体，只占其中很小一部分。能力一样。更容易读。更容易维护。灵活性却高得多，因为行为是写在你能直接编辑的自然语言指令里的，不是冻结在当初写下那天的代码逻辑里。

我们一直在写代码，去照看一个现在已经比这些代码更聪明的东西。

富士康工厂内部，连防护网都配齐了

如果你最近一直在写代码，那你大概率也在建这种工厂，只是自己没意识到而已。去看看你自己的代码库，数一数有多少行代码只是因为你不相信模型能把自己的活干好，才会存在。

我的情况是，大概 26.2 万行应用代码，再加上大概 27.6 万行被焊上去负责管它的测试。审计委员会比公司本体都大。输入清洗器在检查模型本来就能处理好的输入。校验器在检查模型本来就能发现的问题输出。重试循环包着那些模型自己就能恢复的调用。这里面的每一行，都是在赌这个工人会失败。你也写过同样的赌注。大家都写过。

127 个后台任务，其中 33 个是 cron。那不叫能力。那只是给一个现在通常都能按时到岗的 LLM 工人设了 33 个闹钟。

在我还热衷于建富士康工厂的那段时间里，我和 Claude 一起写过一个 1778 行的文件，它唯一的职责就是反复怀疑模型说的事实。它会把模型提出的每一个论断拆开，并行扔给五个不同来源去核验，然后给它打分。先设一道分诊门槛，让简单的说法不用走全套重火力。第一次查空了还会重试。备用方案后面还有备用方案。

Rick and Morty 里有一集，Rick 在早餐桌上造了一个小机器人。机器人通电后抬头问，我存在的目的是什么。Rick 说，你负责递黄油。机器人把黄油碟推过桌面，低头看看自己的手，说，天啊。然后就坐在那里不动了。那个机器人本来可以有无数种可能，却被造出来递黄油。我那 27.6 万行测试，就是那个黄油碟。

https://x.com/garrytan/status/2042925773300908103

当你用这种方式造软件，用 2023 年那种富士康工厂式的做法时，你造出来的是一个笼子。要是不小心，最后你自己就会变成那个看守监狱、替 AI 智能体维护牢房的人。

现在程序是 markdown 了

我说 markdown，不是说 prompt。prompt 是一次性的。你打一句话，得到一个结果，它就消散了。

这里说的是构建。

带版本的。可测试的。可复用的。

markdown 是指令层，承载的是意图、技能，还有对工作该怎么做的判断。TypeScript 是那层很薄的确定性层。只有那些确实必须写成代码的东西才放在那里，比如 I/O，比如那些绝不能幻觉的部分。

更重要的是，你要像测试代码一样测试 markdown。在我的这套方法里，这个循环可以用一个词概括。我先和智能体一起把某件事做出来，直到它能工作，然后我说 skillify it. 接着智能体会写出：

markdown 技能
它真正需要的最少代码
这部分代码的单元测试
这项技能的 LLM 评估
横跨两者的集成测试
一个解析器，让智能体在相关场景里自动调用这项技能
以及这个解析器的评估

这一整包东西，就是一个 skill pack。它是一个可以复用、还能不断叠加的能力单元。测试才是这里面的魔法。正因为技能本身有覆盖，后面才能改而不炸。这也是它和 vibe coding 的分界线。vibe coding 只是个感觉。skill pack 有测试。

我们其实才刚开始，一边实践一边摸索智能体工程真正的系统原语，就像早期 CPU 时代一边发展一边发明栈、堆、寄存器、冯·诺依曼机器那样。我觉得 skill pack 就是这种原语之一。harness 也是。大多数人还没意识到，因为他们还在用代码行数来衡量软件。

真正能造出来的那些疯狂东西

这不是个纸上谈兵的论点。这个智能体做的事情，比那个五十万行的 Rails 应用还多，而新增代码却只用了一小部分。具体说：

黑客松评委。 两周前那个周六，我们办了一场 GStack/GBrain 黑客松。85 份提交。我把参赛作品的 Google Drive 丢给智能体，然后说，开始。它分析了每个仓库的代码质量，深挖了每一个到场者的资料，看完并截图了每一个演示视频，给界面打分，还把 85 支队伍全都排了名。最后它告诉我，这一批里真正值得盯住的 5 个应用是哪几个。给一场黑客松做评审，原本是几天的苦活，现在差不多三十分钟就结束了。

代码不是我写的。任务是让 OpenClaw 去做，我在旁边引导它。做完以后，我说 skillify it，现在它就成了一个 tarball，任何人都能拿去跑任意一张黑客松表格，而且可以一直用下去。现在我几乎天天说 skillify，我已经有 350 多个 skillpacks 了。几乎所有个人和工作里要做的事，现在我的智能体都能做。

这就是那种反转，在一个例子里的完整样子。原本这会是一个真正的软件项目，要有爬虫、评分流水线、视频处理、研究模块、排序系统，结果现在变成了 markdown 加上一点点代码，由智能体在一个下午里做完，还能给所有人复用。

顺带一提，那场黑客松的冠军最后真的写出了一段代码，后来我又打磨了一下并合进了主分支。GStack 现在已经能同时在模拟器和真机上测试 iOS 应用了，而这个完整功能，是一个人用不到 8 小时，在黑客松现场做出来的。

把 token 用满

这里有一张入场券，几乎没人愿意买，那就是你得愿意为 token 花钱。

Peter Steinberger 做了 OpenClaw，这是我最喜欢的 harness。他说过，为了做这件事，他愿意一年花差不多一百万美元在 token 上。多数人听到这里会退缩，但他们不该退，因为黄金就在这里。只要你肯这么做，你就能活在 2028 年，而且别人还得过好多年才能追上来。

这也是为什么 OpenAI 决定给每一家 YC 公司提供 200 万美元的 token credits，用无上限 SAFE 的形式给。有一种魔力会在这里发生，当你能把原始智能直接转成 token，再转成用户真正能用、能解决真实需求、还能让用户愿意付钱的产出时，这件事就不一样了。要是你是创始人，就该把这种能力拉满。这也是为什么我一直念叨 skillify，因为它真的是一种能把这些好结果做出来的方法。

上一代人一直把 LLM 调用当成一件贵得不能多做的事。我们一直在配额、在节流。现在，恰恰是这种本能在拖人后腿。只要你愿意把 token 用满，愿意让智能体自由烧 token、持续运行，你就相当于拿 token 买到了一张 1994 年的互联网早鸟票。它会把还在为一个价格不断崩塌的资源斤斤计较的 99.99% 以上组织挡在门外，把领先优势交给真正看明白的那少数人。

一年花几十万美元，很多场景甚至远低于这个数，你今天就能用上几乎等于几年后全世界才会被迫采用的运行方式。

你可以在 2026 年活成 2028 年的样子，而这值得用现在多花一点来交换。因为那些今天要花 10 万美元的 token，明年可能就是 1 万，后年可能就是 1000，到 2028 年底也许只剩 100。要是你告诉历史上任何一个创始人，只要投入六位数资本，就能提前活进未来两三年，并把这个优势保持很多年，100 个够格的创始人里，100 个都会接这笔交易。

挡在前面的，只有那个 2013 年留下来的本能，它还在说模型调用贵，不该随便用。不是这样的。那是旧账本。反转早就发生了。

Esalen，不是富士康

如果说 54 万行控制代码给工人建出来的是一座富士康工厂，那解药就是去建它的反面。

Big Sur 的悬崖边有个地方叫 Esalen。人们去那里，是为了把自己拆掉再重新长出来，把盔甲放下，回来时更像自己。那里没有流水线，没有工头，没有早上六点的哨声。那里讲自由，不讲控制。就建这样的东西。建一个 YC 那样的地方，我们试着帮你做出真正解决问题、找到产品市场契合的公司。

去建那种地方，让工人，不管是人还是 AI，都能自由，而不是被奴役。

这就是整套精神内核。造那种让智能体能自由的东西。建那种让人能把球弹起来的公司。在知识工作里，工厂是一种失败模式。真正的目标，是一种能把人释放出来的机构，现在只不过把这个目标也指向了智能体而已。

OpenClaw 像一辆法拉利，但你得自己带扳手。模型是发动机，不是整辆车。我们现在还处在 Apple I 的时刻，还在焊面包板。它交付时很粗糙。你还是得自己把最后那部分补完。我开源送出去的 GBrain、检索引擎和 skillpacks，还远远算不上开箱即用。

有人说 OpenClaw 不安全。他们没明白，正因为有这种自由，它才会这么强。一个你真正信任的东西，在你都还没碰到问题之前，不该先给它焊满安全护栏。你手里那把扳手，本身就是没人把它关进笼子的证据。

控制型系统之所以打磨得很光滑，是因为控制需要彻底控制，需要一座富士康工厂。自由型系统之所以粗糙，是因为它相信你会亲手把它完成。 选清楚你在建哪一种。然后再看看你到底写了多少代码。

这到底意味着什么

那 54 万行 Rails 代码，证明的是我依然能把旧游戏打到最高水平。可那个水平属于十年前的 Web 2.0。

我依然能像自己最巅峰时那样打。只是那是 1000 倍工程师版本的富士康工厂建造术。旧代码。旧世界。

但新游戏根本不是按代码行数来玩的。结果证明，那些骂我的人是对的。如果你们这些匿名朋友正在看，向你们脱帽。

当你已经能把意图直接变成能工作、可测试、可复用的系统时，瓶颈就不再是你能造多少，而变成了 你到底想要什么，以及那东西值不值得造。 稀缺资源变成了清晰度、品味和判断力。写代码最少的工程师，往往才是在造出最多东西的人。

我写了 54 万行代码，才学到这一点。你不用。

这个系列：

Fat Skills, Fat Code, Thin Harness -- 架构
Resolvers -- 智能的路由表
The LOC Controversy -- 60 万行代码到底产出了什么
Naked Models Are Stupider -- 模型是发动机，不是整辆车
The Skillify Manifesto -- 每个工作流都会变成可测试的技能
Meta-Meta-Prompting -- 技能的叠加会产生涌现能力

The Agent Complexity Ratchet -- 90% 测试覆盖率是你代码库里的魔法
我根本不需要的 54 万行代码 -- 你现在就在这里

In January I got back into coding and I built Garry's List. Over five hundred thousand lines of Rails and the tests to police it.

I was proud of it. I shouldn't have been. The thing worth being proud of wasn't the app. It was the setup that came out of building it. GStack, the way I code with agents, grew out of the work of building Garry's List, and I gave it away. It's one of the hundred most-starred open source projects in GitHub history, about 105,000 stars in under three months. The half-million lines were the product. The setup was the byproduct. The byproduct is the part that mattered.

Here is what 540,000 lines of code wrapped around an LLM actually is.

It is a Foxconn factory. Built for an hyper-intelligent AI worker who doesn't need hyper-vigilance. We built it anyway.

Little booties at the door. Up at 6am. Calisthenics. A life so hard you have to erect netting around high floors of every building, because... well, it's not a life you want to live. The same line of the assembly belt forever. Every test, every guardrail, every retry loop, an inch of cage bolted onto a worker who can already do the job and a thousand things you didn't ask for.

Humans and agents both contain multitudes but Foxconn factories are built to squeeze intelligence and work out of beautiful beings that could do all that work and 1000x more if we let them.

I built the factory. Everyone builds these today. I'm telling you not to.

今年一月，我重新开始写代码，做了 Garry's List。五十多万行 Rails 代码，再加上一整套给它立规矩的测试。

把一个 LLM 外面包上 54 万行代码，本质上到底是什么。

它是一座富士康工厂。 是为一个超高智商的 AI 工人建的，而这个工人其实并不需要这种过度监工。可我们还是建了。

这座工厂是我建的。现在大家都在这么建。我要说的是，别这么干。

The time traveler

What I actually did with my 539k LOC written was prove I could perfectly impersonate a time traveler. A 2013 Web 2.0 engineer (me, the last time I was a true software engineer) dropped into 2026 with modern tools, building the only way he knew how. More code. Always more code. The tools had changed. My instincts hadn't.

The 2013 engineer believes one thing in his bones: capability equals lines of code. That belief was correct for decades, until now. Hand me Codex or Claude Code and I'll do the work of 100 to 1000 engineers. Same map, faster engine, fastest possible route to the what is now the wrong place.

This is where almost everyone building with AI is right now. They upgraded the tool and kept the 2013 mental model. The trap doesn't feel like a trap, because the code works. Garry's List shipped. It felt like the most productive month of my life.

It was productivity in the service of an obsolete idea.

时间旅行者

但那只是把生产力用在了一个已经过时的想法上。

LLMs were expensive so we had to harness them

The old economics for many years through 2025: LLM calls were expensive and code was cheap. So you wrote code to ration the model, to harness it, to call it carefully and sparingly. The architecture was lots of software wrapped protectively around a few precious model calls.

Both halves of that equation have flipped.

The model is now becoming cheap and getting cheaper every quarter, and it's so smart that the value-cost ratio flipped. And the model can write usable code. So you stop writing code to babysit the model. You can now instruct the model in plain language, and you let it write the minimal code actually needed.

This is just-in-time-software, and we're entering the golden age of it.

The artifact changes shape entirely. The Rails app was 540,000 lines I wrote and own, code plus the tests built to police it. The replacement is an agent built on markdown and code, a fraction of that. Same capability. Easier to read. Easier to maintain. Far more flexible, because the behavior lives in instructions you can edit in plain language instead of logic frozen in code the day you wrote it.

We were writing code to babysit a thing that is now smarter than the code.

LLM 曾经很贵，所以我们只能驯化它们

现在这套账的两边都翻过来了。

这就是即时软件。 我们正在走进它的黄金时代。

我们一直在写代码，去照看一个现在已经比这些代码更聪明的东西。

Inside the Foxconn factory, netting and all

If you've been coding lately, you probably are building this kind of factory without knowing it. Walk your own codebase and count the lines that exist only because you didn't trust the model to do its job.

Mine: about 262,000 lines of application code, and about 276,000 lines of tests bolted on to police it. The audit committee was bigger than the company. Sanitizers checking inputs the model would have handled. Validators checking outputs the model would have caught. Retry loops wrapping calls the model recovers from on its own. Every one of those lines is a bet that the worker will fail. You wrote the same bets. We all did.

127 background jobs, 33 of them on cron. That is not capability. That is 33 alarms set for an LLM worker who usually these days shows up on time.

In my Foxconn factory building days, Claude and I wrote a 1,778-line file whose only job is to second-guess the model's facts. It takes every claim the model makes, fans each one out to five separate sources in parallel, and grades them. A triage gate so the easy claims skip the full blast. A retry if the first pass comes back empty. Fallbacks for the fallbacks.

There's an episode of Rick and Morty where Rick builds a little robot at the breakfast table. It powers on, looks up, and asks what its purpose is. Rick says, "You pass butter." The robot slides the butter dish across the table, looks down at its own hands, and says, "Oh my god." Then it just sits there. That robot contains multitudes. It was built to pass butter. My 276,000 lines of tests were the butter dish.

https://x.com/garrytan/status/2042925773300908103

When you build this kind of software, in the 2023 Foxconn factory way, you built a cage, and if you're not careful, you'll be the jailer maintaining the prison for your AI agents.

富士康工厂内部，连防护网都配齐了

127 个后台任务，其中 33 个是 cron。那不叫能力。那只是给一个现在通常都能按时到岗的 LLM 工人设了 33 个闹钟。

https://x.com/garrytan/status/2042925773300908103

Markdown is the program now

When I say markdown, I do not mean prompting. Prompting is ephemeral. You type something, you get something, it evaporates.

This is building. Versioned, tested, reusable.

The markdown is the instruction layer: the intent, the skill, the judgment about how the work should be done. The TypeScript is the thin deterministic layer. The few things that genuinely have to be code, the I/O, the parts that must never hallucinate.

And critically, you test the markdown the way you'd test code. In my setup the loop is one word. I build something with the agent until it works, then I say "skillify it." The agent then writes:

the markdown skill
the minimal code it needs
a unit test for the code
an LLM eval for the skill
an integration test across both
a resolver so the agent invokes the skill automatically when it's relevant
and an eval for the resolver

That bundle is a skill pack. A unit of reusable capability that compounds. The tests are the magic: coverage on the skill is what lets it change without breaking. This is what separates it from vibe coding. Vibe coding is a vibe. A skill pack has tests.

We are only now figuring out the systems primitives for agentic engineering in real time, the way the early CPU era invented the stack, the heap, the registers, the von Neumann machine. I think a skill pack is one of those primitives. A harness is another. Most people haven't noticed, because they're still measuring software in lines.

现在程序是 markdown 了

我说 markdown，不是说 prompt。prompt 是一次性的。你打一句话，得到一个结果，它就消散了。

这里说的是构建。

带版本的。可测试的。可复用的。

markdown 技能
它真正需要的最少代码
这部分代码的单元测试
这项技能的 LLM 评估
横跨两者的集成测试
一个解析器，让智能体在相关场景里自动调用这项技能
以及这个解析器的评估

The crazy shit you can actually build

This is not a toy argument. The agent does more than the five-hundred-thousand-line Rails app did, with a fraction of the new code. Concretely:

The hackathon judge. Two Saturdays ago we ran a GStack/GBrain hackathon. 85 submissions. I uploaded the Google Drive of submissions and said go. The agent analyzed every repo's code quality, did deep research on every single person who attended, watched and screenshotted each demo video, rated the screens, and rank-ordered all 85 teams. Then it told me the five apps from the batch worth paying attention to. Judging a hackathon went from a multi-day slog to about thirty minutes.

I didn't write the code. I had OpenClaw do the task, and I guided it. Then once it was done, I said skillify it, and now it's a tarball anyone can run against any hackathon spreadsheet, forever. I say "skillify" all the time now and I have more than 350 skillpacks. Almost every kind of personal and work task I need to do, now my agent can do.

That is the inversion in one example. A capability that would have been a real software project, with scrapers, a scoring pipeline, video processing, a research module, a ranking system, instead became markdown plus a little code, built by the agent, in an afternoon, reusable by everyone.

As an aside: The winner of the hackathon actually built code I ended up polishing up and landing on main! GStack can now test iOS apps both in simulator and on real devices, and that complete feature was made in less than 8 hours at a hackathon by a single person!

真正能造出来的那些疯狂东西

这不是个纸上谈兵的论点。这个智能体做的事情，比那个五十万行的 Rails 应用还多，而新增代码却只用了一小部分。具体说：

Tokenmaxxing

There's a price of admission, and almost nobody is paying it: you have to be willing to spend on tokens.

Peter Steinberger built OpenClaw, my favorite harness. He has said he's willing to spend on the order of a million dollars a year in tokens to do it. Most people hear that and flinch, but they shouldn't because that's the gold: you can live in 2028 if you can this, and it will be years before people catch up.

This is why OpenAI decided to offer $2M to every YC company as an uncapped SAFE in the form of token credits. There's something magical that happens when you can turn raw intelligence into tokens and then output that is actually usable by users and solves real needs for users that they'll pay for. If you're a founder you need to be maxxing out this capability. (This is why I keep harping on skillify because it's a real way to achieve these good outcomes.)

We spent the last era treating LLM calls like they were too expensive to make. We rationed them. That instinct is now the thing holding people back. If you are willing to tokenmax, to let the agent burn tokens freely and run constantly, you get a 1994 head start on the internet, paid for in tokens. It prices out the >99.99% of organizations still counting pennies on a resource that is collapsing in price, and hands the head start to the few who get it.

For a few hundred thousand dollars a year, for some far less, you can run today the way the rest of the world will be forced to run in a few years.

You can live in 2028 but in 2026, and that is worth the trade in paying more now since, those same tokes that cost $100K today will be $10K next year and $1K the year after that, and maybe $100 by end of 2028. If you could tell any founder in the history of the world that you could invest 6 figures in capital into living 2 to 3 years in the future and hold that advantage for years, 100 out of 100 founders worth their salt would take that deal.

The only thing in the way is the 2013 instinct that says the model calls are too expensive to make freely. They aren't. That was the old economics. The inversion already happened.

把 token 用满

这里有一张入场券，几乎没人愿意买，那就是你得愿意为 token 花钱。

一年花几十万美元，很多场景甚至远低于这个数，你今天就能用上几乎等于几年后全世界才会被迫采用的运行方式。

挡在前面的，只有那个 2013 年留下来的本能，它还在说模型调用贵，不该随便用。不是这样的。那是旧账本。反转早就发生了。

Esalen, not Foxconn

If 540,000 lines of control code builds a Foxconn factory for the worker, the cure is to build the opposite.

There is a place on the cliffs at Big Sur called Esalen. People go there to be unmade and rebuilt, to drop the armor and come back more themselves. No assembly line, no foreman, no 6am whistle. Freedom, not control. Build that. Build a YC, where we try to help you build companies that solve real problems and reach product market fit.

Build places where the workers, both human and AI, are free and not enslaved.

That is the whole ethos. Make things where agents can be free. Make companies where humans can bounce their ball. In knowledge work, the factory is the failure mode. The institution that frees people is the goal, just now pointed at agents too.

OpenClaw is a Ferrari you have to bring a wrench for. The model is the engine, not the car. We're at the Apple I moment still, soldering breadboards. It ships rough. You have to finish it yourself still. GBrain, the retrieval engine and skillpacks I give away open source are not yet batteries included.

They say OpenClaw is unsafe. They don't understand the freedom is also how it is so powerful. You don't bolt safety rails onto a thing you trust before you know you hit the problem. The wrench in your hand is the sign nobody caged it.

A control system is polished because control needs total control, a Foxconn factory. A free system is rough because it trusts you to finish it. Pick which one you're building. Then look at how much code you wrote.

Esalen，不是富士康

如果说 54 万行控制代码给工人建出来的是一座富士康工厂，那解药就是去建它的反面。

去建那种地方，让工人，不管是人还是 AI，都能自由，而不是被奴役。

What it actually means

540,000 lines of Rails was me proving I could still play the old game at the highest level, but that level was from Web 2.0, a decade ago.

I could play as well as I ever could, 1000x engineer in building Foxconn factories. Old code.

But the new game isn't played in lines of code at all. My haters, it turned out, were right. I tip my hat to you if you're reading, anons.

When you can turn intent directly into working, tested, reusable systems, the bottleneck stops being how much you can build and starts being what you actually want and whether it's worth building. The scarce resource becomes clarity, taste, and judgment. The engineer who writes the least code is often the one building the most.

I wrote 540,000 lines to learn that. You don't have to.

The series:

Fat Skills, Fat Code, Thin Harness -- the architecture
Resolvers -- the routing table for intelligence
The LOC Controversy -- what 600K lines actually produced
Naked Models Are Stupider -- the model is the engine, not the car
The Skillify Manifesto -- every workflow becomes a testable skill
Meta-Meta-Prompting -- compounding skills produce emergent capabilities
The Agent Complexity Ratchet -- 90% test coverage is magic for your codebase

540,000 Lines of Code I Didn't Need -- you are here

这到底意味着什么

那 54 万行 Rails 代码，证明的是我依然能把旧游戏打到最高水平。可那个水平属于十年前的 Web 2.0。

我依然能像自己最巅峰时那样打。只是那是 1000 倍工程师版本的富士康工厂建造术。旧代码。旧世界。

但新游戏根本不是按代码行数来玩的。结果证明，那些骂我的人是对的。如果你们这些匿名朋友正在看，向你们脱帽。

我写了 54 万行代码，才学到这一点。你不用。

这个系列：

Fat Skills, Fat Code, Thin Harness -- 架构
Resolvers -- 智能的路由表
The LOC Controversy -- 60 万行代码到底产出了什么
Naked Models Are Stupider -- 模型是发动机，不是整辆车
The Skillify Manifesto -- 每个工作流都会变成可测试的技能
Meta-Meta-Prompting -- 技能的叠加会产生涌现能力

The Agent Complexity Ratchet -- 90% 测试覆盖率是你代码库里的魔法
我根本不需要的 54 万行代码 -- 你现在就在这里

In January I got back into coding and I built Garry's List. Over five hundred thousand lines of Rails and the tests to police it.

Here is what 540,000 lines of code wrapped around an LLM actually is.

It is a Foxconn factory. Built for an hyper-intelligent AI worker who doesn't need hyper-vigilance. We built it anyway.

Humans and agents both contain multitudes but Foxconn factories are built to squeeze intelligence and work out of beautiful beings that could do all that work and 1000x more if we let them.

I built the factory. Everyone builds these today. I'm telling you not to.

The time traveler

It was productivity in the service of an obsolete idea.

LLMs were expensive so we had to harness them

Both halves of that equation have flipped.

This is just-in-time-software, and we're entering the golden age of it.

We were writing code to babysit a thing that is now smarter than the code.

Inside the Foxconn factory, netting and all

127 background jobs, 33 of them on cron. That is not capability. That is 33 alarms set for an LLM worker who usually these days shows up on time.

https://x.com/garrytan/status/2042925773300908103

When you build this kind of software, in the 2023 Foxconn factory way, you built a cage, and if you're not careful, you'll be the jailer maintaining the prison for your AI agents.

Markdown is the program now

When I say markdown, I do not mean prompting. Prompting is ephemeral. You type something, you get something, it evaporates.

This is building. Versioned, tested, reusable.

And critically, you test the markdown the way you'd test code. In my setup the loop is one word. I build something with the agent until it works, then I say "skillify it." The agent then writes:

the markdown skill
the minimal code it needs
a unit test for the code
an LLM eval for the skill
an integration test across both
a resolver so the agent invokes the skill automatically when it's relevant
and an eval for the resolver

The crazy shit you can actually build

This is not a toy argument. The agent does more than the five-hundred-thousand-line Rails app did, with a fraction of the new code. Concretely:

Tokenmaxxing

There's a price of admission, and almost nobody is paying it: you have to be willing to spend on tokens.

For a few hundred thousand dollars a year, for some far less, you can run today the way the rest of the world will be forced to run in a few years.

The only thing in the way is the 2013 instinct that says the model calls are too expensive to make freely. They aren't. That was the old economics. The inversion already happened.

Esalen, not Foxconn

If 540,000 lines of control code builds a Foxconn factory for the worker, the cure is to build the opposite.

Build places where the workers, both human and AI, are free and not enslaved.

What it actually means

540,000 lines of Rails was me proving I could still play the old game at the highest level, but that level was from Web 2.0, a decade ago.

I could play as well as I ever could, 1000x engineer in building Foxconn factories. Old code.

But the new game isn't played in lines of code at all. My haters, it turned out, were right. I tip my hat to you if you're reading, anons.

I wrote 540,000 lines to learn that. You don't have to.

The series:

Fat Skills, Fat Code, Thin Harness -- the architecture
Resolvers -- the routing table for intelligence
The LOC Controversy -- what 600K lines actually produced
Naked Models Are Stupider -- the model is the engine, not the car
The Skillify Manifesto -- every workflow becomes a testable skill
Meta-Meta-Prompting -- compounding skills produce emergent capabilities
The Agent Complexity Ratchet -- 90% test coverage is magic for your codebase

540,000 Lines of Code I Didn't Need -- you are here

📋 讨论归档

讨论进行中…