🧠 阿头学 · 💬 讨论题

代理写代码把工程师逼成“规格制定者”（但你仍得当鹰眼）

LLM 代理带来的不是“更快写代码”，而是把你的工作从敲键盘升级成定义成功标准；但现在的模型仍会带着错误假设一路狂奔，你必须用 IDE + 审稿心态兜底。

2026-01-27 原文链接 ↗

阅读简报

双语对照

完整翻译

原文

讨论归档

核心观点

从写代码转向写“英语规格” 80% agent coding 的本质是“人写意图/验收条件，模型写实现”。谁先学会把需求写成可验收的 success criteria，谁先吃到生产力复利。
错误形态升级：不是语法错，是“初级工程师式的概念错” 模型最爱替你脑补前提、不澄清、不暴露取舍、不该 push back 时也不 push back；所以“脱离 IDE、纯 swarm”现阶段是自杀式浪漫。
最大杠杆来自“让它自己循环直到达标” 与其命令式一步步教，不如声明式给目标：先写测试再跑通；先写朴素正确版本再要求在保持正确性前提下优化；把 agent 放进工具闭环（browser/MCP）让它自我迭代。
耐力是新瓶颈：LLM 把‘坚持 30 分钟’变成默认值 人会累会泄气，agent 不会——这类“永动机式试错”会显著改变你能解决的问题的范围。
代价：写的能力会萎缩，读/审的能力更重要 生成与判别是两套能力。未来工程师的核心竞争力更像“代码编辑 + 审稿 + 系统设计 + 验收”，而不是手速。

跟我们的关联

### 👤ATou

你做海外增长/品牌时会遇到同样的范式：不要让 agent“替你发内容”，而是给它明确的成功标准（受众、语气、转化路径、禁区），再让它循环产出-自评-修订。
组织层面要把“审稿/验收”当成正式工序：谁负责最终判断、用什么 checklist、什么算 done。

### 🧠Neta

Context engineering 的下一步是“验收工程”：把需求写成可测试的条件（tests/metrics/checklists），让模型自己跑循环。
需要轻量 plan mode / inline plan 的提示结构，否则模型会默默做错假设。

讨论引子

1. 你觉得未来“10X 工程师”的差距会扩大还是缩小？差异会从哪里来（想法/审美/验收/工具链）？ 2. 你现在最常被 agent 坑的错误类型是什么：错误假设、过度抽象、边界条件、还是瞎改无关代码？ 3. 如果把写代码比作音乐：你更像作曲（spec）还是演奏（implementation）？你愿意把哪部分交给模型？

Claude Coding 随手记：最近几周的编程体会

最近几周用 Claude Coding 写了不少代码，随手记几条。

编码工作流。随着 LLM 编码能力最近的一次跃升，和许多人一样，我在 11 月还大约是 80% 手写+自动补全、20% 代理；到了 12 月迅速变成 80% 代理写代码、20% 我来修改+收尾。也就是说，我现在真的主要用英文在“编程”了——有点不好意思地用文字告诉 LLM 该写什么代码……这对自尊有点伤害，但这种能对软件进行大块“代码动作”的能力实在是净增益太大了，尤其是当你适应它、配置它、学会用它，并真正理解它能做什么、不能做什么之后。这很可能是我近二十年编程生涯里对基本编码工作流最大的变化，而且只用了短短几周就发生了。我预计外面已经有相当多（两位数百分比以上）的工程师在经历类似变化；而普通大众对此的认知感，似乎还停留在个位数低位的百分比。

IDE/agent swarms/fallability。无论是“以后不需要 IDE 了”的热炒，还是“代理蜂群”的热炒，我觉得就现阶段而言都有点过头。模型确实还会犯错；如果是你真正关心的代码，我建议像鹰一样盯着它们——最好旁边开一个舒服的大 IDE。错误的形态已经变了很多：不再是简单的语法错误，而是那种稍微马虎、赶进度的初级开发者会犯的微妙概念性错误。最常见的一类是：模型替你做了错误假设，然后不做核验就一路跑下去。它们也不太会管理自己的困惑：不主动澄清、不把不一致之处挑出来、不展示取舍、不该反对的时候也不反对，而且还有点太爱讨好人。进到 plan mode 会好一些，但依然需要一种轻量的内联计划模式。它们还特别喜欢把代码和 API 搞复杂：抽象膨胀、不清理自己制造的死代码等等。它们能写出一个低效、臃肿、脆弱的 1000 行构造，直到你提醒一句“呃……你不能直接这么做吗？”，它们就会说“当然可以！”然后立刻缩成 100 行。它们有时还会顺手改掉/删掉一些它们不喜欢或不太理解的注释和代码，即便这和当前任务并不相干。即使我在 CLAUDE . md 里用了一些简单指令尝试修正，上述情况仍会发生。尽管问题不少，它依然是净提升巨大，而且很难想象再回到纯手写编码。TLDR 每个人都有自己的开发流；我现在的做法是在左边用 ghostty 的窗口/标签开少量几段 CC 会话，右边开 IDE 用来查看代码+手工修改。

韧性。看着一个 agent 毫不松懈地死磕一件事，真的很有意思。它们不会累、不会泄气，只会不断尝试；很多时候，人早就会选择先放弃、改天再战。看它长时间跟某个难题角力，最终在 30 分钟后打赢一局，那种“摸到 AGI 气息”的感觉很强烈。你会意识到：耐力本身就是工作的核心瓶颈之一，而手里有了 LLM，这个瓶颈被显著抬高了。

加速。LLM 辅助到底带来多少“加速”，其实不太好量化。当然我确实觉得自己做原本要做的事更快了，但更主要的影响是：我会做比原计划多得多的事情，因为 1) 我可以把很多以前“不值得写”的小东西都写出来，2) 我也能接近以前因为知识/技能短板而不敢碰的代码。所以这当然是加速，但可能更像是一种“扩容”。

杠杆。LLM 特别擅长反复循环，直到达成明确目标；大多数“摸到 AGI 气息”的魔法就藏在这里。别告诉它怎么做，给它成功标准，然后看它自己跑起来。先让它写测试，再让它把测试跑通。把它放进带浏览器 MCP 的闭环里。先写一个很可能正确的朴素算法，然后让它在保持正确性的前提下做优化。把你的工作方式从命令式改成声明式，让代理能循环更久，从而获得更大的杠杆。

乐趣。我没预料到：有了代理之后，编程反而变得更好玩，因为很多填空式的苦差被移走了，留下来的更多是创造性的部分。我也更不容易被卡住/陷住（这并不好玩），并且会更有勇气，因为几乎总能找到一种和它并肩合作、推进一点点进展的方式。我也见过别人表达相反的感受；LLM 编码会把工程师分成两类：一类主要喜欢写代码本身，一类主要喜欢把东西做出来。

退化。我已经注意到：我手写代码的能力正在缓慢退化。生成（写代码）和判别（读代码）是大脑里两种不同的能力。很大程度上由于编程涉及大量细碎、偏语法层面的细节，即便你写起来变吃力，你仍然可以把代码审得很好。

Slopacolypse。我已经在为 2026 做心理准备：那会是 github、substack、arxiv、X/instagram 乃至所有数字媒体的“Slopacolypse（垃圾内容末日）”。我们也会看到更多 AI 造势式的“效率表演”（这真的还能更夸张吗？），与此同时，真实而具体的改进也会继续发生。

问题。我脑子里有几个问题： - “10X 工程师”会怎样——平均工程师与最强工程师之间的生产力比值？很可能会大幅增长。 - 手握 LLM 后，通才会不会越来越压过专才？LLM 更擅长补全细节（micro），而不是宏观战略（macro）。 - 未来的 LLM 编码会是什么感觉？像玩 StarCraft？玩 Factorio？还是像演奏音乐？ - 社会中有多少问题被“数字化知识工作”这个瓶颈卡住了？

TLDR 这会把我们带到哪里？LLM 代理能力（尤其是 Claude & Codex）大约在 2025 年 12 月跨过了某种“连贯性阈值”，引发了软件工程及相关领域的一次相变。智能本身突然显得比其他一切都领先不少——集成（工具、知识）、新的组织工作流与流程的必要性、以及更广义的扩散与普及。2026 将会是一个高能量的年份，整个行业会在其中消化这种新能力。

相关笔记

A few random notes from claude coding quite a bit last few weeks.

最近几周用 Claude Coding 写了不少代码，随手记几条。

Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. i.e. I really am mostly programming in English now, a bit sheepishly telling the LLM what code to write... in words. It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful, especially once you adapt to it, configure it, learn to use it, and wrap your head around what it can and cannot do. This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I'd expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent.

IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits.

Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. You realize that stamina is a core bottleneck to work and that with LLMs in hand it has been dramatically increased.

Speedups. It's not clear how to measure the "speedup" of LLM assistance. Certainly I feel net way faster at what I was going to do, but the main effect is that I do a lot more than I was going to do because 1) I can code up all kinds of things that just wouldn't have been worth coding before and 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion.

Leverage. LLMs are exceptionally good at looping until they meet specific goals and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria and watch it go. Get it to write tests first and then pass them. Put it in the loop with a browser MCP. Write the naive algorithm that is very likely correct first, then ask it to optimize it while preserving correctness. Change your approach from imperative to declarative to get the agents looping longer and gain leverage.

Fun. I didn't anticipate that with agents programming feels more fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part. I also feel less blocked/stuck (which is not fun) and I experience a lot more courage because there's almost always a way to work hand in hand with it to make some positive progress. I have seen the opposite sentiment from other people too; LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building.

Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually. Generation (writing code) and discrimination (reading code) are different capabilities in the brain. Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it.

Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media. We're also going to see a lot more AI hype productivity theater (is that even possible?), on the side of actual, real improvements.

Questions. A few of the questions on my mind: - What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows a lot. - Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro). - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music? - How much of society is bottlenecked by digital knowledge work?

TLDR Where does this leave us? LLM agent capabilities (Claude & Codex especially) have crossed some kind of threshold of coherence around December 2025 and caused a phase shift in software engineering and closely related. The intelligence part suddenly feels quite a bit ahead of all the rest of it - integrations (tools, knowledge), the necessity for new organizational workflows, processes, diffusion more generally. 2026 is going to be a high energy year as the industry metabolizes the new capability.

X Post: A few random notes from claude coding quite a bit last few weeks. Coding workflo

Source: https://x.com/karpathy/status/2015883857489522876?s=20
Mirror: https://x.com/karpathy/status/2015883857489522876?s=20
Published: 2026-01-26T20:25:39+00:00
Saved: 2026-01-27

Content

A few random notes from claude coding quite a bit last few weeks.

📋 讨论归档

讨论进行中…