🪞 Uota学 · 💬 讨论题 · 🧠 阿头学

GTM Agent 的护城河不在“会写信”，而在“可审计的反馈闭环”

真正值得抄的不是“自动外呼”，而是把销售工作流做成可观测系统：先刹车（Do-Not-Send）再生成、每次人类编辑都沉淀为可检索记忆、全链路可追溯可评测——否则你只是在用更贵的方式制造更快的事故。
打开原文 ↗

2026-03-10 原文链接 ↗

阅读简报

双语对照

完整翻译

原文

讨论归档

核心观点

HITL 的本质是“高质量标注的流水线” 代表的“发送/编辑/取消”不是道德上的安全感，而是能被结构化抽取的真实偏好信号（语气、长度、CTA、禁用词、行业敏感点）。把 diff→结构化偏好→按人检索注入做通，你得到的是会变强的系统，而不是一次性的 prompt 工程。
“先找不该发的理由”是自动化外联的第一性原则 新线索触发后先做 DNC/情境刹车（刚提工单、队友已触达、冷启动时机、敏感行业/地区），把“关系损伤成本”内置到流程开头；生成再漂亮，发错时机就是负 ROI。
环境型（ambient）Agent 才能吃进组织 Salesforce 触发→自动拉历史（Gong/CRM）→必要时联网补背景→Slack 推草稿，一键发/改/拒。用户不需要“去问 AI”，AI 直接嵌进既有系统；这比做一个更聪明的对话框更能提高渗透率。
落地分水岭：可观测性 + 评测先行 + 回滚能力 把每次运行的来源、引用、推理依据、最终动作绑定 trace；上线前用代表性场景库做基准评测，上线后用真实行为日志做持续评估。没有这套治理，你会在“模型换代/提示词迭代”里反复回归。
别被 PR 指标带跑：归因、口径、成本与治理都必须补上 “250% 转化、3x pipeline、节省 1320 小时”这类数字如果缺少基数、口径定义和对照组，只能当方向性信号；同时联网检索/长转录/仓库查询的真实成本、权限与合规风险，决定了它能不能规模化。

跟我们的关联

🧠Neta：
意味着什么 海外增长/品牌/BD 的外联链路，本质就是“高频、可标准化、但关系成本极高”的工作流——最适合用 Agent 做到 80% 自动化、20% 人审兜底，并把人审变成学习数据。
接下来怎么做 选一个链路先打穿（例如：KOL 触达 / PR 投递 / 合作伙伴 BD），按“DNC→调研→草稿→来源引用→一键改/发/拒→日志+评测”搭最小闭环；先把事故率压住，再追覆盖率。
👤ATou：
意味着什么 你要练的是“指挥系统”，不是“写得更会”。真正的杠杆在：定义成功口径、构建评测集、把反馈写进记忆、把错误可回滚。
接下来怎么做 用 1 周搭一个小评测集（20-50 个真实场景），用它约束所有迭代；每次改动必须能回答“指标变好了吗？坏在哪？能回滚吗？”
🪞Uota：
意味着什么 这是你技能系统的标准范式：工具多、长流程、强治理——需要把“输出结构化 + 引用来源 + 护栏规则”当默认契约。
接下来怎么做 把“先找不该做的理由（inhibitors）”做成通用模块，接到所有高风险动作（外发/删库/改权限）前。

讨论引子

1. 你更想要“覆盖率”还是“零关系损伤”？ 哪些场景必须 100% 人审，哪些场景可以“默认发送/延迟自动发送”？决策口径是什么？ 2. GTM Agent 的护城河到底是什么：框架/产品，还是数据接入 + 治理流程？ 如果同样的闭环用别的编排系统也能搭出来，商业产品的不可替代性还剩多少？ 3. 当 Agent 能“自然扩散”去查 BigQuery/工单/通话记录，你怎么保证它不变成数据事故？ 权限分级、审计、敏感查询护栏，你会怎么设计？

过去，LangChain 的每一次外呼都几乎从同一种方式开始：销售代表在一堆标签页之间来回切换。Salesforce 用来看账户记录，Gong 用来看通话历史，LinkedIn 用来看联系人，公司的官网用来补充背景。在写下第一个字之前，光研究就要花十五分钟；而且也很难知道队友昨天是否已经联系过对方。至于入站线索的跟进，以前则意味着：每来一个新联系人，就要把同一条消息手动粘贴到 Apollo 里一遍又一遍。

我们构建了一个 GTM 智能体，把整套流程端到端跑通：它会在 Salesforce 新线索出现时触发，先判断是否应该触达，再收集上下文（包括会议历史），然后把一份 Slack 草稿（附带推理过程 + 信息来源）发给销售代表审批。我们基于 Deep Agents 来搭建它，因为这是一条长时间运行、包含多个步骤的流程，必须可靠地编排多个工具并处理大量数据。

关键成果

从 2025 年 12 月到 2026 年 3 月，线索到合格商机的转化率提升 250%，同期带来 3 倍的销售管道金额
银级线索的跟进率提升 97%，金级线索提升 18%
每位销售代表每月节省 40 小时，全团队合计节省 1,320 小时
销售团队成员的日活使用率 50%，周活使用率 86%

团队点赞

约束条件与成功标准

在写任何代码之前，我们先定义了这个智能体到底需要做什么。

我们有两个目标：减少销售代表在每条线索上的调研与撰写时间，并提升营销带来的入站线索转化。外呼的研究与撰写流程足够系统化，确实可以自动化——但前提是系统必须安全、可审计，并且能在使用中持续变好。

不可妥协项

人类在环（Human-in-the-loop）：未经销售代表明确审阅与批准，任何内容都不会发送。一封时机不当的邮件，就可能毁掉数月的关系经营。
联系历史认知：智能体在起草前必须先检查：是否已有销售代表或队友联系过对方。

核心能力

具备关系感知的个性化：草稿必须反映账户当前状态（客户 vs. 暖意向 vs. 冷启动），不能把每条线索都当成同一种情况来写。
可解释性：销售代表应当能看到关键输入，并理解智能体为什么选择某个切入点，从而进行调整并反馈。
学习闭环：智能体应当随着时间从销售代表的编辑中学习，让草稿持续变好，而不需要任何人手动更新提示词。

衡量方式

每一次销售代表的操作（发送、编辑、取消）都会记录到 LangSmith，并与底层 trace 关联，以便我们评估质量、捕捉回归，并量化哪些做法有效。

范围扩展：账户情报

除了单次草稿，我们还希望智能体能主动提示账户层面的信号，例如交易风险、扩展机会和竞争动态，让销售代表每周都知道该把精力放在哪里。

我们构建了什么

GTM 智能体做两件事：(1) 调研线索并撰写个性化邮件草稿；(2) 汇总账户层面的信号——覆盖网页活动、开发者生态、产品使用情况与营销触点——帮助销售代表判断重点。它把意图数据回连到销售代表负责的账户，从而捕捉有意义的活动、标记交易风险与竞争动作，并明确下一步最适合触达谁。

我们把智能体接入了以下数据源：

https://blog.langchain.com/introducing-ambient-agents/

入站线索处理

当 Salesforce 中出现新的线索时，智能体会立刻接手。它做的第一件事，是寻找“不该发送任何东西”的理由：如果对方刚提交过支持工单，或者队友本周早些时候已经联系过，那么此时发送自动邮件就会是个错误。这个智能体被设计得很谨慎。

https://www.langchain.com/langsmith/deployment

一旦通过这些检查，它就会执行过去由销售代表手动完成的同样研究：拉取完整的 Salesforce 记录、阅读 Gong 的转录内容、查看潜在客户的 LinkedIn 主页。如果内部历史信息不多，它就会使用 Exa 去网上检索，了解这家公司当前在 AI 方面做了什么。

邮件草稿如何撰写，取决于关系状态。智能体会遵循一项已定义的外呼技能（outbound skill）——在起草前加载的一份作战手册。该技能被设计成同时覆盖暖触达与冷启动两种场景：现有客户会收到一种内容；暖意向会收到另一种；冷联系人则又不同。对于冷触达，智能体会遵循我们在技能中定义的手册，保持简短、并以调研结论作支撑。

https://smith.langchain.com/

销售代表会在 Slack 私信中看到最终草稿，并配有“发送 / 编辑 / 取消”的按钮。他们也能看到智能体的推理过程，因此能清楚它为何选择那个切入点。如果销售代表选择发送，智能体会再排队生成一组后续跟进邮件，供选择性地把潜在客户加入跟进序列。

随着我们不断打磨智能体，我们为银级线索增加了一个 48 小时 SLA：如果销售代表在这个窗口内既未批准也未拒绝草稿，系统会自动发送。这显著提升了那些原本容易在未回复中被漏掉线索的跟进率。

账户情报

随着团队规模扩大，每位销售代表开始负责 50 个到 100+ 个账户。在这种体量下，账户沉寂或错失扩展机会都变得很容易发生。

每周一早晨，智能体会从 Salesforce 与 BigQuery 拉取数据，然后到外部世界查看融资动态、产品发布和新的 AI 举措。我们为两个受众定制了报告：销售团队与驻场（deployed）工程团队，因为他们关注的数据点不同。

面向销售团队，智能体会汇总产品使用、开发者生态、网页活动、招聘趋势和公司新闻等多种信号，以发现扩展机会。它会标记高管变动、包安装量激增，以及公司是否在积极招聘 AI 工程师或构建智能体系统——这些都是“具备扩展准备度”的强信号。它也会在我们发布新功能时识别潜在的良好匹配：把近期活动与新功能高度契合的账户筛出来。同时，因为“知道账户在活跃”本身并不够，它还会指出哪些个人最投入、并建议下一步该联系谁。

面向驻场工程团队，重点则转向账户健康度。智能体从 BigQuery 拉取产品使用数据，提炼近期客户通话要点、即将到来的续约日期，以及客户接近用尽额度的情况。它也会汇总最近通话中的开放问题与未闭环的线索。目标是标出真正需要人介入的事项，避免团队在周日晚上还要翻仪表盘“考古”。

https://docs.langchain.com/oss/python/deepagents/skills

我们如何构建它

这个智能体需要从多个来源拉取信息，在它们之间进行推理，并产出个性化结果。这远远不是一次简单的 LLM 调用就能可靠处理的。

我们选择 Deep Agents 来做多步骤编排，因为输入天然具有“尖峰性”：会议数据、CRM 历史和网页研究在体量与结构上差异很大。使用 Deep Agents 后，大型工具返回结果会自动被卸载到虚拟文件系统里，因此我们不必自己再造一套截断与检索层。我们也使用了该 harness 原生的规划工具来强制执行一致的检查清单（不发送检查 → 调研 → 起草 → 理由说明 → 后续跟进），这让运行更易调试，也减少了智能体在流程中的游走。

我们把智能体接入 LangSmith，以便理解销售代表实际如何使用它，并衡量智能体是否随时间在变好。这意味着从一开始就要搭建评测，而不是事后再补。事实证明，这对我们在迭代提示词或更换模型版本时捕捉回归至关重要。

智能体模式（Agent patterns）

把 GTM 智能体推到生产环境后，我们发现必须解决两个问题：如何让智能体从使用它的人身上学习，以及如何在规模化时保持运行效率。

记忆（Memory）

当销售代表在 Slack 里编辑草稿时，系统会把原稿与修订稿进行对比。如果改动足够实质，LLM 会分析 diff 并提取结构化的风格观察：改了什么、这反映了销售代表的偏好，以及一个可选的引用示例。随后，这些观察会以“每位销售代表”为键存入 PostgreSQL，并在未来每次起草前被读取。

每位销售代表对语气与简洁程度都有不同偏好。这个反馈闭环是自动的：每一次编辑都在教智能体，下一次草稿就会体现这些偏好。我们还设置了每周 cron 任务来压缩这些记忆，避免随着时间推移而膨胀。

子智能体委派（Subagent delegation）

账户情报通过编译后的子智能体来运行：它们是轻量级智能体，工具集受限，输出 schema 结构化——这些 schema 充当主智能体的“契约”。销售研究子智能体可以访问 Apollo、Exa 与 BigQuery，并返回结构化的潜在客户与市场上下文。驻场工程子智能体使用 Salesforce、Gong 与支持工具，返回使用趋势、未结工单与扩展信号。

父智能体会为每个账户启动一个子智能体，使工具彼此隔离、输出可预测。由于子智能体相互独立，我们还能并行执行。LangSmith Deployment 负责水平扩展与可持久执行，因此系统在业务量增长时依然可靠。

https://docs.langchain.com/oss/python/deepagents/overview

评测与反馈

在为一个新工作流编写任何生产代码之前，我们都会先在 LangSmith 中定义“成功”的样子。我们从一个小型、具有代表性的场景库开始——这些场景贴近销售代表真实面对的情况——用它来构建初版智能体或功能，并确保基本功扎实后再扩展。

当系统具备可用性之后，我们在 LangSmith 中扩展评测集，覆盖更棘手的情况：在智能体 AI 或 NLP 领域钻得很深的研究人员、我们想重新激活的现有客户、带有既往 Gong 转录的账户、以及像医疗这样术语密集的垂直行业。所有内容都会跑过一个测试 harness：它会 mock 我们的外部 API，让我们能在受控环境中观察行为，再让它接触真实数据。

我们从两个层面做评测。第一层是规则断言，检查基础项：工具是否正确、顺序是否正确、是否产生重复草稿。第二层是 LLM 评审，给语气、字数和格式打分。两者都会作为 CI 中完整评测套件的一部分运行，我们把任何无法解释的智能体行为漂移都视为值得追查的 bug。

但评测只能讲述故事的一部分。真正重要的是：销售代表在日常工作中如何使用这些草稿。我们跟踪每一次 Slack 操作（发送、编辑、取消），并把它直接挂到 LangSmith 的 trace 上。随着时间推移，这让我们能够把写作模式与真实结果关联起来：哪些风格带来更高打开率、哪些主题行能获得回复。当某种规律在足够多销售代表身上都成立时，我们就把它固化为智能体的默认行为。

LangSmith 的评测套件与销售代表的反馈闭环相互增强：一个负责捕捉回归，另一个推动持续改进。

销售团队之外的采用

GTM 智能体最初是一个环境型（ambient）智能体，作为后台进程运行：Salesforce 里出现线索，智能体运行，草稿落到销售代表的 Slack。无需触发，无需手动操作。

后来我们做了一个对话式 Slack 界面作为侧向实验，主要是为了让 SDR 能直接与智能体交互。我们没想到的是，它很快扩散到了公司其他团队。因为智能体已经连接了 Salesforce、Gong、BigQuery 和 Gmail，人们找到了许多我们原本没设计的用法：工程师不用写 SQL 就能查看产品使用情况；客户成功团队在续约电话前会拉取支持历史；客户经理在会议前会先让它总结 Gong 转录。

我们并没有刻意构建这些工作流。智能体有权限，人们就会找到阻力最小的路径。和机器人对话，比打开六个不同的标签页更容易。

我们会在后续文章中介绍其他团队如何使用 GTM 智能体。

经验教训

如果有人从零开始，我们会给出几条建议：

先定义成功，再写代码。 在为新工作流写任何生产代码之前，我们先定义“好”的标准，并围绕它构建一个小型场景库。随着智能体成熟，这个集合会不断扩展。等到功能上线时，我们已经拥有一套会自动在 CI 里运行的评测测试套件：它能捕捉回归、标记漂移。
人类在环不止是安全机制。 事实证明，它也是一种数据采集机制。销售代表的每个动作（发送、编辑、取消）都变成了我们可以学习的信号。记忆系统与反馈闭环之所以能工作，是因为销售代表就在工作流里。
从一开始就把智能体连接到你的“记录系统”（systems of record）。 公司内部的自然扩散之所以发生，是因为智能体一开始就能访问人们需要的数据。我们并没有计划让工程师或客户成功团队使用它，但这种使用之所以会蔓延，是因为访问权限早已就位。
长时间运行的工作流需要合适的基础设施。 这个智能体远不止一次简单的 LLM 调用加一两个工具。它需要从多个来源拉取数据、在其间推理、并行运行子智能体，并在多轮交互中保持状态。选择一个为此类编排而生的智能体 harness——Deep Agents——让我们避免从零重建基础设施。
我们仍处在早期。 GTM 智能体今天已经在处理真实工作流，但我们建立的反馈机制——包括记忆、评测，以及与 trace 绑定的销售代表行为——才是让它在未来六个月显著变好的关键。

链接：http://x.com/i/article/2030841460829089792

Every outbound at LangChain used to start the same way: a rep toggling between tabs. Salesforce for the account record, Gong for call history, LinkedIn for the contact, the company website for context. Fifteen minutes of research before a single word was written, and no easy way to know if a teammate had already reached out yesterday. Inbound follow-up used to mean manually dropping the same message into Apollo for every new contact.

We built a GTM agent that runs the process end-to-end. It triggers on new Salesforce leads, checks whether we should reach out, gathers context (including meeting history), and sends a Slack draft (with reasoning + sources) for the rep to approve. We built it on Deep Agents because this is a long-running, multi-step process that has to orchestrate multiple tools and large amounts of data reliably.

Key results

Lead-to-qualified-opportunity conversion rate up 250% from December 2025 to March 2026, driving 3x more pipeline dollars in the same period
Rep follow-up rate is up 97% for silver leads and 18% for gold leads
Sales reps reclaimed 40 hours per month each, totaling 1,320 hours across the team
50% daily and 86% weekly active usage for sales team members

Team love

关键成果

从 2025 年 12 月到 2026 年 3 月，线索到合格商机的转化率提升 250%，同期带来 3 倍的销售管道金额
银级线索的跟进率提升 97%，金级线索提升 18%
每位销售代表每月节省 40 小时，全团队合计节省 1,320 小时
销售团队成员的日活使用率 50%，周活使用率 86%

团队点赞

Constraints & success criteria

Before writing any code, we defined what the agent actually needed to do.

We had two goals: reduce the time reps spend researching and drafting per lead, and improve conversion on marketing-generated inbound. Outbound research and drafting is systematic enough to automate, but only if the system is safe, auditable, and improves with use.

Non-negotiables

Human-in-the-loop: Nothing is sent without an explicit rep review and approval. A single poorly timed email can undo months of relationship-building.
Contact history knowledge: The agent needed to check whether a rep or teammate had already reached out before drafting anything.

Core capabilities

Relationship-aware personalization: The draft should reflect the current state of the account (customer vs. warm prospect vs. cold), and not treat every lead the same way.
Explainability: Reps should be able to see key inputs and understand why the agent chose a particular angle so they could refine it and provide feedback.
Learning loop: The agent should learn from rep edits over time so drafts improve without anyone manually updating prompts.

Measurement

Every rep action (send, edit, cancel) is logged to LangSmith and attached to the underlying trace so we can evaluate quality, catch regressions, and quantify what’s working.

Scope expansion: account intelligence

Beyond one-off drafts, we also wanted the agent to proactively surface account-level signals like deal risks, expansion opportunities, and competitive moves, so reps know where to focus each week.

约束条件与成功标准

在写任何代码之前，我们先定义了这个智能体到底需要做什么。

不可妥协项

人类在环（Human-in-the-loop）：未经销售代表明确审阅与批准，任何内容都不会发送。一封时机不当的邮件，就可能毁掉数月的关系经营。
联系历史认知：智能体在起草前必须先检查：是否已有销售代表或队友联系过对方。

核心能力

具备关系感知的个性化：草稿必须反映账户当前状态（客户 vs. 暖意向 vs. 冷启动），不能把每条线索都当成同一种情况来写。
可解释性：销售代表应当能看到关键输入，并理解智能体为什么选择某个切入点，从而进行调整并反馈。
学习闭环：智能体应当随着时间从销售代表的编辑中学习，让草稿持续变好，而不需要任何人手动更新提示词。

衡量方式

每一次销售代表的操作（发送、编辑、取消）都会记录到 LangSmith，并与底层 trace 关联，以便我们评估质量、捕捉回归，并量化哪些做法有效。

范围扩展：账户情报

除了单次草稿，我们还希望智能体能主动提示账户层面的信号，例如交易风险、扩展机会和竞争动态，让销售代表每周都知道该把精力放在哪里。

What we built

The GTM agent does two things: (1) it researches leads and writes personalized email drafts, and (2) it aggregates account-level signals across web activity, developer ecosystems, product usage, and marketing touchpoints to show reps where to focus. By tying that intent data back to a rep’s accounts, it surfaces meaningful activity, flags deal risks and competitive moves, and clarifies who is ideal to reach out to next.

We connected the agent to the following data sources:

https://blog.langchain.com/introducing-ambient-agents/

Inbound lead processing

When a new lead shows up in Salesforce, the agent takes over immediately. The first thing it does is look for reasons not to send anything. If someone just filed a support ticket, or if a teammate already reached out earlier in the week, sending an automated email would be a mistake. The agent is programmed to be cautious.

https://www.langchain.com/langsmith/deployment

Once it clears those checks, it does the same research a rep used to do manually: pulls the full Salesforce record, reads through Gong transcripts, checks the prospect's LinkedIn profile. If there isn't much internal history, it goes to the web with Exa to understand what the company is doing with AI right now.

How it writes the email draft depends on the state of the relationship. The agent follows a defined outbound skill, a playbook it loads before drafting. The skill is designed to cover both warm and cold cases. An existing customer gets something different than a warm prospect, who gets something different than a cold contact. For cold outreach, the agent keeps it brief and research-backed, following a playbook we've defined in the skill.

https://smith.langchain.com/

The rep sees the finished draft in a Slack DM with buttons to send, edit, or cancel. They can also see the agent's reasoning, so it's clear why it took a particular angle. If they send it, the agent queues up a set of follow-up emails to optionally enroll the prospect in.

As we've refined the agent, we added a 48-hour SLA for silver leads: if a rep hasn't approved or declined the draft within that window, it sends automatically. This has meaningfully increased our follow-up rate for leads that would otherwise slip through without a response.

Account intelligence

As our team scaled, reps started owning anywhere from 50 to over 100 accounts each. At that volume, it's easy for things to go quiet or for expansion opportunities to slip through.

Every Monday morning, the agent pulls data from Salesforce and BigQuery. It then checks the outside world for funding rounds, product launches, and new AI initiatives. We tailored the reports for two audiences: our sales team and our deployed engineering team, since they care about different data points.

For sales, the agent aggregates signals across product usage, developer ecosystems, web activity, hiring trends, and company news to surface expansion opportunities. It flags executive moves, spikes in package installations, and whether a company is actively hiring AI engineers or building agentic systems – which is a strong signal they're ready to expand. It also identifies potential good fits when we launch new features, matching accounts whose recent activity aligns well with the new features. And because knowing an account is active isn't enough on its own, it surfaces which individuals are most engaged and suggests who to reach out to next.

For deployed engineers, the focus shifts to account health. The agent pulls product usage from BigQuery, highlights from recent customer calls, upcoming renewal dates, and cases where a customer is close to running out of credits. It also surfaces open questions and unresolved threads from recent calls. The goal is to flag what actually needs a person to step in, so the team isn't spending Sunday evenings digging through dashboards.

https://docs.langchain.com/oss/python/deepagents/skills

我们构建了什么

我们把智能体接入了以下数据源：

https://blog.langchain.com/introducing-ambient-agents/

入站线索处理

https://www.langchain.com/langsmith/deployment

https://smith.langchain.com/

账户情报

随着团队规模扩大，每位销售代表开始负责 50 个到 100+ 个账户。在这种体量下，账户沉寂或错失扩展机会都变得很容易发生。

https://docs.langchain.com/oss/python/deepagents/skills

How we built it

The agent needed to pull from multiple sources, reason across them, and produce a personalized output. This is more than a simple LLM call can handle reliably.

We chose Deep Agents for the multi-step orchestration because the inputs are inherently spiky: meeting data, CRM history, and web research vary a lot in size and structure. With Deep Agents, large tool results get offloaded into a virtual filesystem automatically, so we didn't have to build our own truncation and retrieval layer. We also used the harness's native planning tooling to enforce a consistent checklist (do-not-send checks → research → draft → rationale → follow-ups), which made runs easier to debug and reduced agent wandering.

We connected the agent to LangSmith so we could understand how sales reps were actually using it and measure whether the agent was improving over time. That meant setting up evaluations from the start rather than retrofitting them later, which turned out to be critical for catching regressions when we iterated on prompts or swapped model versions.

我们如何构建它

这个智能体需要从多个来源拉取信息，在它们之间进行推理，并产出个性化结果。这远远不是一次简单的 LLM 调用就能可靠处理的。

Agent patterns

Moving our GTM agent to production surfaced two problems we had to solve: how to make the agent learn from the people using it, and how to keep runs efficient at scale.

Memory

When a rep edits a draft in Slack, the system compares the original against the revised version. If the changes are substantive, an LLM analyzes the diff and extracts structured style observations: what changed, what it implies about the rep's preferences, and an optional quoted example. Those observations are stored in PostgreSQL, keyed per rep, and every future run reads them before drafting.

Each rep has stylistic preferences around tone and brevity. The feedback loop is automatic. Every edit teaches the agent, and the next draft reflects it. A weekly cron compacts these memories to keep them from getting bloated over time.

Subagent delegation

Account intelligence runs through compiled subagents: lightweight agents with constrained tool sets and structured output schemas that act as contracts with the main agent. The sales research subagent has access to Apollo, Exa, and BigQuery, and returns structured prospect and market context. The deployed engineer subagent uses Salesforce, Gong, and support tools to return usage trends, open tickets, and expansion signals.

The parent agent spawns one subagent per account, keeping tools isolated and outputs predictable. Because subagents run independently, we can execute them in parallel. LangSmith Deployment handles horizontal scaling and durable execution, so the system stays reliable as volume grows.

https://docs.langchain.com/oss/python/deepagents/overview

智能体模式（Agent patterns）

把 GTM 智能体推到生产环境后，我们发现必须解决两个问题：如何让智能体从使用它的人身上学习，以及如何在规模化时保持运行效率。

记忆（Memory）

子智能体委派（Subagent delegation）

https://docs.langchain.com/oss/python/deepagents/overview

Evals and feedback

Before writing any production code for a new workflow, we define what success looks like in LangSmith. We started with a small library of representative scenarios grounded in the situations our reps actually face, used those to build the initial agent or feature, and made sure the fundamentals work before expanding.

Once things were functional, we broadened the evaluation set in LangSmith to cover harder cases: a researcher deep in agentic AI or NLP, an existing customer we're trying to re-engage, accounts with prior Gong transcripts, verticals with heavy jargon like healthcare. Everything runs through a test harness that mocks our external APIs so we can observe behavior in a controlled environment before it touches real data.

We evaluate on two levels. First, rule-based assertions check the basics: right tools, right order, no duplicate drafts. Second, an LLM judge scores tone, word count, and formatting. Both run as part of a full eval suite in CI, and we treat any unexplained drift in agent behavior as a bug worth investigating.

But evals only tell part of the story. What actually matters is how reps use the drafts day to day. We track every Slack action (send, edit, cancel) and attach it directly to the trace in LangSmith. Over time, this lets us correlate writing patterns with real outcomes: which styles drive opens, which subject lines get replies. When something holds across enough reps, we codify it into the agent's default behavior.

The LangSmith eval suite and the rep feedback loop reinforce each other. One catches regressions, the other drives improvement.

评测与反馈

LangSmith 的评测套件与销售代表的反馈闭环相互增强：一个负责捕捉回归，另一个推动持续改进。

Adoption beyond the sales team

The GTM agent started as an ambient agent, running as a background process. A lead appears in Salesforce, the agent runs, a draft lands in the rep's Slack. No trigger, no manual work.

We later built a conversational Slack interface as a side experiment, mostly to give SDRs a way to interact with the agent directly. What we didn't expect was how quickly it spread to the rest of the company. Because the agent was already connected to Salesforce, Gong, BigQuery, and Gmail, people found uses we hadn't designed for. Engineers checked product usage without writing SQL. Customer success pulled support history before renewal calls. Account executives summarized Gong transcripts before meetings.

We didn't build any of those workflows intentionally. The agent had the access, and people found the path of least resistance. Talking to the bot was easier than opening six different tabs.

We'll cover how other teams are using the GTM agent in a follow-up post.

销售团队之外的采用

我们并没有刻意构建这些工作流。智能体有权限，人们就会找到阻力最小的路径。和机器人对话，比打开六个不同的标签页更容易。

我们会在后续文章中介绍其他团队如何使用 GTM 智能体。

Learnings

A few things we'd tell someone starting from scratch:

Start with a definition of success, not code. Before we write any production code for a new workflow, we define what good looks like and build a small scenario library around it. That set expands as the agent matures. By the time something ships, we have an eval test suite that catches regressions, flags drift, and runs in CI automatically.
Human-in-the-loop goes beyond safety. It turned out to be a data collection mechanism. Every rep action (send, edit, cancel) became a signal we could learn from. The memory system and feedback loop work because reps are in the flow.
Connect the agent to your systems of record from the start. The organic adoption across the company happened because the agent already had access to the data people needed. We didn't plan for engineers or customer success to use it, but that usage spread because the access was already there.
Long-running workflows need the right infrastructure. This agent required much more than a simple LLM call with a tool or two. It needed to pull from multiple sources, reason across them, run subagents in parallel, and maintain state across turns. Picking an agent harness, Deep Agents, built for that kind of orchestration saved us from rebuilding infrastructure from scratch.
We're still early. The GTM agent handles a real workflow today, but the feedback loops we've built – including memory, evals, and rep actions tied to traces – are what will make it meaningfully better over the next six months.

Link: http://x.com/i/article/2030841460829089792

经验教训

如果有人从零开始，我们会给出几条建议：

先定义成功，再写代码。 在为新工作流写任何生产代码之前，我们先定义“好”的标准，并围绕它构建一个小型场景库。随着智能体成熟，这个集合会不断扩展。等到功能上线时，我们已经拥有一套会自动在 CI 里运行的评测测试套件：它能捕捉回归、标记漂移。
人类在环不止是安全机制。 事实证明，它也是一种数据采集机制。销售代表的每个动作（发送、编辑、取消）都变成了我们可以学习的信号。记忆系统与反馈闭环之所以能工作，是因为销售代表就在工作流里。
从一开始就把智能体连接到你的“记录系统”（systems of record）。 公司内部的自然扩散之所以发生，是因为智能体一开始就能访问人们需要的数据。我们并没有计划让工程师或客户成功团队使用它，但这种使用之所以会蔓延，是因为访问权限早已就位。
长时间运行的工作流需要合适的基础设施。 这个智能体远不止一次简单的 LLM 调用加一两个工具。它需要从多个来源拉取数据、在其间推理、并行运行子智能体，并在多轮交互中保持状态。选择一个为此类编排而生的智能体 harness——Deep Agents——让我们避免从零重建基础设施。
我们仍处在早期。 GTM 智能体今天已经在处理真实工作流，但我们建立的反馈机制——包括记忆、评测，以及与 trace 绑定的销售代表行为——才是让它在未来六个月显著变好的关键。

链接：http://x.com/i/article/2030841460829089792

Key results

Lead-to-qualified-opportunity conversion rate up 250% from December 2025 to March 2026, driving 3x more pipeline dollars in the same period
Rep follow-up rate is up 97% for silver leads and 18% for gold leads
Sales reps reclaimed 40 hours per month each, totaling 1,320 hours across the team
50% daily and 86% weekly active usage for sales team members

Team love

Constraints & success criteria

Before writing any code, we defined what the agent actually needed to do.

Non-negotiables

Human-in-the-loop: Nothing is sent without an explicit rep review and approval. A single poorly timed email can undo months of relationship-building.
Contact history knowledge: The agent needed to check whether a rep or teammate had already reached out before drafting anything.

Core capabilities

Relationship-aware personalization: The draft should reflect the current state of the account (customer vs. warm prospect vs. cold), and not treat every lead the same way.
Explainability: Reps should be able to see key inputs and understand why the agent chose a particular angle so they could refine it and provide feedback.
Learning loop: The agent should learn from rep edits over time so drafts improve without anyone manually updating prompts.

Measurement

Every rep action (send, edit, cancel) is logged to LangSmith and attached to the underlying trace so we can evaluate quality, catch regressions, and quantify what’s working.

Scope expansion: account intelligence

Beyond one-off drafts, we also wanted the agent to proactively surface account-level signals like deal risks, expansion opportunities, and competitive moves, so reps know where to focus each week.

What we built

We connected the agent to the following data sources:

https://blog.langchain.com/introducing-ambient-agents/

Inbound lead processing

https://www.langchain.com/langsmith/deployment

https://smith.langchain.com/

Account intelligence

As our team scaled, reps started owning anywhere from 50 to over 100 accounts each. At that volume, it's easy for things to go quiet or for expansion opportunities to slip through.

https://docs.langchain.com/oss/python/deepagents/skills

How we built it

The agent needed to pull from multiple sources, reason across them, and produce a personalized output. This is more than a simple LLM call can handle reliably.

Agent patterns

Moving our GTM agent to production surfaced two problems we had to solve: how to make the agent learn from the people using it, and how to keep runs efficient at scale.

Memory

Subagent delegation

https://docs.langchain.com/oss/python/deepagents/overview

Evals and feedback

The LangSmith eval suite and the rep feedback loop reinforce each other. One catches regressions, the other drives improvement.

Adoption beyond the sales team

The GTM agent started as an ambient agent, running as a background process. A lead appears in Salesforce, the agent runs, a draft lands in the rep's Slack. No trigger, no manual work.

We didn't build any of those workflows intentionally. The agent had the access, and people found the path of least resistance. Talking to the bot was easier than opening six different tabs.

We'll cover how other teams are using the GTM agent in a follow-up post.

Learnings

A few things we'd tell someone starting from scratch:

Start with a definition of success, not code. Before we write any production code for a new workflow, we define what good looks like and build a small scenario library around it. That set expands as the agent matures. By the time something ships, we have an eval test suite that catches regressions, flags drift, and runs in CI automatically.
Human-in-the-loop goes beyond safety. It turned out to be a data collection mechanism. Every rep action (send, edit, cancel) became a signal we could learn from. The memory system and feedback loop work because reps are in the flow.
Connect the agent to your systems of record from the start. The organic adoption across the company happened because the agent already had access to the data people needed. We didn't plan for engineers or customer success to use it, but that usage spread because the access was already there.
Long-running workflows need the right infrastructure. This agent required much more than a simple LLM call with a tool or two. It needed to pull from multiple sources, reason across them, run subagents in parallel, and maintain state across turns. Picking an agent harness, Deep Agents, built for that kind of orchestration saved us from rebuilding infrastructure from scratch.
We're still early. The GTM agent handles a real workflow today, but the feedback loops we've built – including memory, evals, and rep actions tied to traces – are what will make it meaningfully better over the next six months.

Link: http://x.com/i/article/2030841460829089792

📋 讨论归档

讨论进行中…

GTM Agent 的护城河不在“会写信”，而在“可审计的反馈闭环”

核心观点

跟我们的关联

讨论引子

关键成果

约束条件与成功标准

我们构建了什么

我们如何构建它

智能体模式（Agent patterns）

评测与反馈

销售团队之外的采用

经验教训

Key results

关键成果

Constraints & success criteria

约束条件与成功标准

What we built

我们构建了什么

How we built it

我们如何构建它

Agent patterns

智能体模式（Agent patterns）

Evals and feedback

评测与反馈

Adoption beyond the sales team

销售团队之外的采用

Learnings

经验教训

相关笔记

Key results

Constraints & success criteria

What we built

How we built it

Agent patterns

Evals and feedback

Adoption beyond the sales team

Learnings

📋 讨论归档