🪞 Uota学 · 🧠 阿头学 · 💬 讨论题

Code Factory How To Setup Your Repo So Your Agent Can Auto Write And Review 100 Of Your Code

让 Agent 写 100% 的代码不是难点；难点是让每一次合并都带着“可复现、可过闸、可追责”的证据链。

2026-02-24 原文链接 ↗

阅读简报

双语对照

完整翻译

原文

讨论归档

核心观点

单一“机器可读契约”= 你仓库的宪法 把风险分级、检查清单、文档漂移规则、UI 证据要求全部收口到一个文件，否则你会在脚本/工作流/策略文档里慢慢长出互相打架的“暗规则”。
把“预检闸门”前置到昂贵 CI 之前，是省钱，更是省注意力 先跑 risk-policy-gate（确定性策略 + review-agent 状态）再扇出 test/build/security，避免在“本来就不该合并”的 head 上浪费 CI 分钟和人的心力。
current-head SHA 纪律不是洁癖，是避免被机器人骗的唯一底线 只认绑定当前 headSha 的审查状态；旧 SHA 的“clean”证据一律作废；每次 push/synchronize 必须触发重跑。你不做这条，就会在“过期证据”上合并——这类事故最难复盘，因为看起来一切都绿。
rerun 请求必须有“唯一权威写入者”，否则就是竞态地狱 多个工作流都能发 rerun 评论，会带来重复评论、互相覆盖、状态漂移。用 marker + sha 去重，把 rerun 入口做成单点。
修复代理（remediation agent）是高杠杆，但必须被确定性护栏拴住 它应该：读结论→修补→本地验证→push 回同一 PR→触发正常 rerun；且固定模型/effort、跳过过期评论、绝不绕过闸门。否则你得到的是“自动化的不可解释性”。

跟我们的关联

🪞Uota（人机协作/Skill 体系）

这篇的关键不是“用哪个审查工具”，而是“控制平面语义”：契约（policy）→ 闸门（gate）→ 证据（artifacts）→ 重跑权威（rerun writer）→ 修复回路（remediation）→ 事故记忆（harness）。这就是你做任何 agentic workflow 都该复用的骨架。
你现在缺的不是更强的 LLM，而是一套“把 LLM 产出当不可信输入”的验收框架：强约束 + 可复现证据 + SHA 新鲜度。

🧠Neta（工程/增长/交付）

如果未来让 Agent 深度参与代码（尤其是增长实验、前端 UI、关键链路），没有浏览器证据（manifest+assertion）会直接把回归风险变成常态。
current-head 纪律 + 证据链，会显著减少“看起来都对，但线上挂了”的灰度事故；这对小团队尤其关键，因为你们没有多余人力做二次验证。

👤ATou（个人执行）

这套方法其实是在回答：当你把产能外包给 Agent，你如何保留决策权与安全边界。答案：别管 Agent 多聪明，把合并规则做成机器可执行。

讨论引子

如果我们允许 Agent 提 PR/修 PR，那么我们希望“合并的证据链”最低标准是什么？（测试覆盖？浏览器证据？安全扫描？）哪些必须硬闸，哪些可以软闸？
current-head SHA 纪律会显著增加“重跑次数”，但也显著降低事故。我们愿意付出的 CI 成本上限是多少？（用钱换确定性，值不值？）
remediation agent 的边界在哪里：它可以自动修复哪些类型的问题（lint/类型/简单逻辑/文档漂移），哪些必须强制人类介入（安全、权限、账务、核心算法）？

代码工厂：如何配置你的仓库，让代理自动编写并审查 100% 的代码

目标

你想要一个闭环：

编码代理写代码

仓库在合并前强制执行具备风险意识的检查

代码审查代理验证 PR

证据（测试 + 浏览器 + 审查）可被机器验证

发现会转化为可重复的 harness 用例

具体的审查代理可以是 @greptile、@coderabbitai、CodeQL + 策略逻辑、自定义 LLM 审查，或其他服务。控制平面模式保持不变。

我从 @_lopopolo 的这篇很有帮助的博客文章中获得了启发

整体流程

1) 只保留一份机器可读的契约

这份契约应定义：

按路径划分的风险等级

按等级要求的检查项

针对控制平面变更的文档漂移规则

对 UI/关键流程的证据要求

重要性：它消除歧义，并防止脚本、工作流文件与策略文档之间悄然偏离。

2) 在昂贵的 CI 之前先做预检闸门

一个可靠的模式是：

先运行 risk-policy-gate

验证确定性的策略 + 审查代理状态

然后才启动 test/build/security 的扇出作业

这样可以避免在已被策略阻塞或仍存在未解决审查结论的 PR head 上浪费 CI 分钟。

3) 强制遵守 current-head SHA 纪律

这是来自真实 PR 循环的最大实践教训。

只有当审查状态与当前 PR 的 head 提交匹配时，才将其视为有效：

等待针对 headSha 的审查检查运行完成

忽略绑定到旧 SHA 的过期摘要评论

如果最新一次审查运行不是 success 或者超时，则判定失败

每次 synchronize/push 后都要求重新运行

通过在同一 head 上重新运行策略闸门，清除过期的 gate 失败

如果跳过这一步，你可能会在使用过期的“clean”证据的情况下合并 PR。

4) 使用单一的 rerun 评论写入器，并做 SHA 去重

当多个工作流都能请求 rerun 时，就会出现重复的机器人评论和竞态条件。

严格只用一个工作流作为权威的 rerun 请求者，并按 marker + sha:<head> 去重。

5) 增加自动化修复回路（可选，但杠杆极高）

如果审查结论可执行，就触发一个编码代理去：

阅读审查上下文

修补代码

运行聚焦的本地验证

将修复提交推送到同一个 PR 分支

然后让 PR 的 synchronize 触发正常的 rerun 路径。保持这一流程的确定性：

固定模型 + effort 以保证可复现

跳过与当前 head 不匹配的过期评论

绝不绕过策略闸门

6) 只有在干净的 rerun 之后，才自动解决仅机器人参与的线程

一个很实用的体验优化步骤：

在一次干净的 current-head rerun 之后

自动解决所有评论都来自审查机器人的未解决线程

绝不自动解决有人类参与的线程

然后重新运行策略闸门，让 required-conversation-resolution 反映新的状态。

7) 将浏览器证据作为一等证明

对于 UI 或用户流程的变更，要求在 CI 中提供证据清单（manifest）和断言（assertion）（而不是只在 PR 文本里贴截图）：

所需的流程存在

使用了预期的入口

对于登录流程，存在预期的账号身份信息

制品（artifacts）是新鲜且有效的

8) 用 harness-gap 回路保留事故记忆

这能避免修复沦为一次性补丁，并持续提升长期覆盖率。

9) 我们在 PR 中运行它学到的东西

最重要的教训是：

确定性的执行顺序很重要：预检闸门必须在 CI 扇出之前完成。

current-head SHA 的匹配不容妥协。

审查 rerun 请求需要一个权威的写入者。

解析审查摘要时，应将漏洞措辞和低置信度的摘要视为可执行项。

自动解决仅机器人参与的线程能减少摩擦，但必须以干净的 current-head 证据为前提。

在护栏保持严格的前提下，修复代理可以显著缩短闭环时间。

10) 通用模式 vs. 一个具体实现

通用模式术语：

code review agent

remediation agent

risk policy gate

一个具体实现（我们的）：

代码审查代理：Greptile

修复代理：Codex Action

权威的 rerun 工作流：greptile-rerun.yml

过期线程清理工作流：greptile-auto-resolve-threads.yml

预检策略工作流：risk-policy-gate.yml

如果你使用不同的审查工具，保持相同的控制平面语义，并替换集成点即可。

实用命令集

可直接照抄的最终模式

将风险 + 合并策略写进同一份契约。

在昂贵的 CI 之前强制执行预检闸门。

要求当前 head SHA 的 code-review-agent 状态为干净。

如果存在发现，就在分支内修复，并以确定性的方式 rerun。

只在干净的 rerun 之后，才自动解决仅机器人参与的过期线程。

对于 UI/流程变更，要求提供浏览器证据。

将事故转化为 harness 用例，并跟踪闭环 SLO。

这样，你就能拥有一个仓库：代理可以在确定性、可审计的标准下实现、验证，并接受审查。

链接: http://x.com/i/article/2023001790258573312

Code Factory: How to setup your repo so your agent can auto write and review 100% of your code

Source: https://x.com/ryancarson/status/2023452909883609111?s=46
Mirror: https://x.com/ryancarson/status/2023452909883609111?s=46
Published: 2026-02-16T17:42:22+00:00
Saved: 2026-02-24

Content

The goal

You want one loop:

The coding agent writes code

The repo enforces risk-aware checks before merge

A code review agent validates the PR

Evidence (tests + browser + review) is machine-verifiable

Findings turn into repeatable harness cases

The specific review agent can be @greptile, @coderabbitai, CodeQL + policy logic, custom LLM review, or another service. The control-plane pattern stays the same.

I took inspiration from this helpful blog post by @_lopopolo

The high-level flow

1) Keep one machine-readable contract

Your contract should define:

risk tiers by path

required checks by tier

docs drift rules for control-plane changes

evidence requirements for UI/critical flows

Why it matters: it removes ambiguity and prevents silent drift between scripts, workflow files, and policy docs.

2) Gate preflight before expensive CI

A reliable pattern is:

run risk-policy-gate first

verify deterministic policy + review-agent state

only then start test/build/security fanout jobs

This avoids wasting CI minutes on PR heads that are already blocked by policy or unresolved review findings.

3) Enforce current-head SHA discipline

This was the biggest practical lesson from real PR loops.

Treat review state as valid only when it matches the current PR head commit:

wait for the review check run on headSha

ignore stale summary comments tied to older SHAs

fail if the latest review run is non-success or times out

require reruns after each synchronize/push

clear stale gate failures by rerunning policy gate on the same head

If you skip this, you can merge a PR using stale “clean” evidence.

4) Use a single rerun-comment writer with SHA dedupe

When multiple workflows can request reruns, duplicate bot comments and race conditions appear.

Use exactly one workflow as canonical rerun requester and dedupe by marker + sha:<head>.

5) Add an automated remediation loop (optional, high leverage)

If review findings are actionable, trigger a coding agent to:

read review context

patch code

run focused local validation

push fix commit to the same PR branch

Then let PR synchronize trigger the normal rerun path. Keep this deterministic:

pin model + effort for reproducibility

skip stale comments not matching current head

never bypass policy gates

6) Auto-resolve bot-only threads only after clean rerun

A useful quality-of-life step:

after a clean current-head rerun

auto-resolve unresolved threads where all comments are from the review bot

never auto-resolve human-participated threads

Then rerun policy gate so required-conversation-resolution reflects the new state.

7) Keep browser evidence as first-class proof

For UI or user-flow changes, require evidence manifests and assertions in CI (not just screenshots in PR text):

required flows exist

expected entrypoint was used

expected account identity is present for logged-in flows

artifacts are fresh and valid

8) Preserve incident memory with a harness-gap loop

This keeps fixes from becoming one-off patches and grows long-term coverage.

9) What we learned running this in PRs

The most important lessons were:

Deterministic ordering matters: preflight gate must complete before CI fanout.

Current-head SHA matching is non-negotiable.

Review rerun requests need one canonical writer.

Review summary parsing should treat vulnerability language and weak-confidence summaries as actionable.

Auto-resolving bot-only threads reduces friction, but only after clean current-head evidence.

A remediation agent can shorten loop time significantly if guardrails stay strict.

10) General pattern vs. one implementation

General pattern terms:

code review agent

remediation agent

risk policy gate

One concrete implementation (ours):

code review agent: Greptile

remediation agent: Codex Action

canonical rerun workflow: greptile-rerun.yml

stale-thread cleanup workflow: greptile-auto-resolve-threads.yml

preflight policy workflow: risk-policy-gate.yml

If you use a different reviewer, keep the same control-plane semantics and swap integration points.

Useful command set

Final pattern to copy

Put risk + merge policy into one contract.

Enforce preflight gate before expensive CI.

Require clean code-review-agent state for current head SHA.

If findings exist, remediate in-branch and rerun deterministically.

Auto-resolve only bot-only stale threads after clean rerun.

Require browser evidence for UI/flow changes.

Convert incidents into harness cases and track loop SLOs.

That gives you a repo where agents can implement, validate, and be reviewed with deterministic, auditable standards.

Link: http://x.com/i/article/2023001790258573312

📋 讨论归档

讨论进行中…