🧠 阿头学 · 💰投资

NVIDIA GTC 2026 后的异构 AI 工厂时代

NVIDIA 通过异构芯片整合（GPU+LPU+CPU）和统一软件栈，把 AI 基础设施竞争从单点性能转向系统级 TCO，护城河从芯片本身升级到了生态锁定，但能源约束和企业落地复杂性仍是未决风险。
打开原文 ↗

Patrick Moorhead，Moor Insights & Strategy 首席执行官兼首席分析师 2026-03-18 原文链接 ↗

阅读简报

双语对照

完整翻译

原文

讨论归档

核心观点

异构 Token 工厂成为终局架构 NVIDIA 不再卖"更强的 GPU"，而是卖"工厂级经济模型"——GPU 负责 attention 计算，收购的 Groq LPU 专门加速 decode（占 25%），独立 Vera CPU 处理 agent 工具调用。这种分工让推理每兆瓦吞吐量提升 35 倍，把竞争门槛从芯片设计抬升到整栈协同和软件编排。

软件壁垒已成不可复制的护城河 CUDA 20 年积累的开发者生态、六年前的 Ampere GPU 在云端仍在升价、Dynamo/OpenShell/NemoClaw 的统一软件栈——这些都无法在两年内被竞争对手复制。NVIDIA 实际上在执行 Intel 的旧剧本，但速度快得多：先锚定 GPU，再沿栈向上向下扩张，最终掌控架构话语权。

需求管道翻倍但数据存在概念偷换 Jensen 宣称 2027 年需求达"至少 1 万亿美元"（较去年 5000 亿翻倍），但这个数字混淆了"四大云厂商的整体 AI CapEx"与"NVIDIA 芯片/机架的直接营收"。土地、电力、液冷、非英伟达网络设备都被算进去了，真实的 NVIDIA 可转化需求可能被高估 30-50%。

Groq 整合的执行风险被低估 三星制造 LP30、2026 年下半年出货、25/75 配比部署——这些都是激进承诺。但让两种底层逻辑完全不同的芯片（确定性数据流 vs SIMT）在超大规模集群中协同工作、不出现长尾延迟，其软件调度难度被严重低估。这不是"加个交换机"就能解决的。

定制硅威胁被虚化处理 作者承认 Google TPU、Amazon Trainium 是替代品，但以"竞争对手无法提供 NVIDIA 的广度"为由淡化。缺失的视角是：超大云厂商根本不需要 NVIDIA 的全家桶，因为它们自己掌控云基础设施的全部。一旦云厂商在内部推理工作负载上大规模转向自研低成本芯片，NVIDIA 的高毛利系统将面临巨大价格战压力。

跟我们的关联

对 Neta 意味着什么 AI 基础设施的竞争已从"造更强芯片"升级到"定义工作负载分工 + 软件编排"。这意味着技术判断的维度要从单点性能转向系统级 ROI；对标的不再是 FLOPS，而是 tokens per watt、token 时延、单位资本下的智能产出。下一步：用 Token 工厂经济学矩阵重新评估所有 AI infra 投资决策。

对 ATou 意味着什么 如果你在做 AI agent 产品，真正的瓶颈不在模型，而在 orchestration——agent 要调工具、跑 SQL、编译代码、校验结果，这些都需要极强的单线程 CPU 性能。GPU 空转的风险比你想象的更高。下一步：重新审视产品架构中 CPU 的算力配比，以及任务拆解、工具调用、状态管理的设计。

对 Uota 意味着什么 "每一家 SaaS 都会变成 GaaS（代理即服务）"这个判断方向对，但时间线会比 Jensen 暗示的更长——企业 IT 栈不可能两年重建完成。这意味着 2026-2027 年会出现"算力产能过剩但应用层滞后"的错配，谁能在这个窗口期提供"开箱即用的 agent 参考架构"，谁就能抢占市场。下一步：不要等 NVIDIA 的全栈方案完美，而是提前设计企业侧的简化路径。

对通用意味着什么 这场演讲本质上是在宣告"AI 工厂将成为本十年的定义性基础设施品类"。无论你在哪个行业，都要默认未来三年内会有一波"算力基础设施升级"的投资周期。下一步：评估自己的业务是否会被这波周期拉动（云厂商、芯片公司、系统集成商最直接受益），以及如何在这个周期中找到自己的位置。

讨论引子

1. 如果云厂商（Meta、Amazon、Google）在内部推理工作负载上大规模转向自研 ASIC，NVIDIA 的"全家桶"护城河会被削弱到什么程度？ 作者承认定制硅是威胁，但没有量化这个威胁的时间线和规模。这个问题直接关系到 NVIDIA 的长期毛利率。

2. Groq 与 Vera Rubin 的异构整合在超大规模集群中真的能稳定工作吗？ "每兆瓦吞吐量提升 35 倍"这个数字是在什么规模、什么工作负载下测出来的？第三方验证何时到位？这决定了 Token 工厂经济学是否真的能兑现。

3. 企业侧的 AI 工厂采购会不会被能源约束卡住？ 作者指出 NVIDIA 的 DSX Max-Q 只是"软件优化工具"而非"能源来源"，并断定能源是 2028 年的硬性天花板。但 2026-2027 年的需求管道是否已经被能源瓶颈打折？

作者：Patrick Moorhead，Moor Insights & Strategy 首席执行官兼首席分析师

我在主题演讲前就写好了“剧本”：结果如何展开

上周，我发布了对 GTC 2026 的前瞻，并提出一个明确的论点：NVIDIA 必须证明它能把训练 GPU、prefill 加速器、Groq 解码处理器以及独立 CPU，统一到同一个软件层之下。我写下了我预期 Jensen Huang 会宣布什么、风险在哪里，以及我会建议公司怎么做。随后我飞到圣何塞，在 SAP Center 现场观看了主题演讲。

自 2011 年起，我参加了每一届 GTC。这是我看过 Jensen 发表过的、在架构层面最完整的一次主题演讲：7 款新芯片全面量产；5 套机架级系统；一套覆盖训练、推理、代理式编排与存储的统一软件栈；一个比我预期更广阔的物理 AI 生态；以及一台名叫 Olaf 的迪士尼机器人走上舞台——它完全在 NVIDIA 的 Isaac 仿真环境中训练。Jensen 以庆祝 CUDA 20 周年开场，并以宣告“每一家 SaaS 公司都会变成一家 GaaS 公司”作结——一家 agents-as-a-service（代理即服务）公司。在这之间，他用一种足以让每位基础设施 CEO 都竖起耳朵的方式，讲清了 Token 工厂的经济学。

简而言之：NVIDIA 兑现了“异构平台”这套论点。Groq LPU 的整合落点与我预测完全一致。Vera CPU 从被低估的配角走到了舞台中央。软件壁垒更高了。让我意外的是速度与规模：到 2027 年、规模达 1 万亿美元的需求管道；LPX 机架将在 2026 年下半年出货；三星已经在制造 Groq LP30 芯片；而 Satya Nadella 也确认 Vera Rubin 已在 Microsoft Azure 上运行。尚未被充分回应的部分：我为 2027 指出的企业侧简化问题，以及能源约束。

七款芯片、五套机架、一座 AI 工厂：Vera Rubin 平台

Huang 于 3 月 16 日发布 NVIDIA Vera Rubin 平台：7 款全新芯片，全部已量产，并以 5 套机架级系统形式出货。组件包括 Rubin GPU、Vera CPU、NVLink 6 Switch、ConnectX-9 SuperNIC、BlueField-4 DPU、Spectrum-6 以太网交换机，以及新整合的 Groq 3 LPU。五种机架分别是：用于 GPU 计算的 Vera Rubin NVL72、用于代理式编排的 Vera CPU、用于超低时延解码的 Groq 3 LPX、用于上下文记忆存储的 BlueField-4 STX，以及用于以太网主干网络的 Spectrum-6 SPX。

正如我的同事 Matt Kimball 在 CES 2026 的研究笔记中写到的，NVIDIA 将 Vera Rubin 定位为一个“新平台”，而非“新一代芯片”。GTC 2026 证实了这种表述。NVL72 集成 72 颗 Rubin GPU 与 36 颗 Vera CPU，并由 NVLink 6 互联。NVIDIA 宣称，相比 Blackwell，它的推理每瓦吞吐量可提升 10 倍、每个 token 的成本降至十分之一；并表示 NVL72 处理大型混合专家（MoE）模型时，只需要上一代四分之一的 GPU 数量。如果这些效率主张能在量产规模成立，那么它将改变从上到下每一位买家的 AI 工厂经济账。

在舞台上，Jensen 展示了硬件：100% 液冷、无缆计算托盘，把安装时间从两天缩短到两小时，以及第六代 NVLink 交换系统。他还确认 Satya Nadella 已经披露 Vera Rubin 在 Microsoft Azure 上线运行，并称 NVIDIA 的供应链如今每周可制造“数千套”这类机架，“潜在可达到每月多吉瓦级的 AI 工厂产能”。正如 Anshel Sag 在 GTC 2025 所写，基础款 Rubin 原本计划在 2026 年初推出并配备 HBM4。NVIDIA 兑现了这一里程碑。但真正的故事不在 GPU 本身，而在围绕它搭建起来的架构。没有任何一家半导体公司，能在同一时间交付如此多为同一目标而打造、协同设计的组件。当然，交付组件与证明它们能在超大规模下协同工作，是两回事。

从 5000 亿到 1 万亿：12 个月内需求管道翻倍

Jensen 在台上讲述的需求故事令人咋舌。在去年的 GTC，他看到截至 2026 年、对 Blackwell 与 Rubin 的高置信度需求为 5000 亿美元。今年站在同一舞台上，他说如今看到截至 2027 年“至少 1 万亿美元”。他还补充：“我确信算力需求会远远高于此。”

外部数据也在支撑这一判断。Microsoft、Alphabet、Amazon 与 Meta 今年有望在 AI 投资上花费超过 6500 亿美元，几乎是 2023 年水平的三倍。正如我在 2 月对 Yahoo Finance 所说，AI 基础设施基本已经卖到 2027 年底。NVIDIA 第四季度营收为 681 亿美元，比预期高出 80 多亿美元，其中数据中心营收为 623 亿美元。Vera Rubin 的效率提升，恰好发生在客户必须从每一瓦、每一美元的基础设施支出中榨出更多智能的时刻。

Groq 整合：我的预测落地了，而 Jensen 把经济学讲出来了

在我 GTC 前的分析里，我做了一个明确的架构预测：Groq 整合更可能走近期的“解耦式”路径——LPU 机架与 GPU 机架并排部署，通过 NVLink 连接，并由 NVIDIA 的软件层来管理。NVIDIA 公布的方案正是如此。

但 Jensen 在台上比新闻稿走得更远：他展示了 Token 工厂的经济学。他给出了一个二维框架：Y 轴是吞吐量（tokens per watt），X 轴是 token 速度（时延/智能程度），价格层级从免费到每百万 tokens 150 美元的超高端。仅凭 Vera Rubin，就能把整条前沿整体抬升，使数据中心每吉瓦的创收能力相对 Blackwell 提高 5 倍。问题在于：NVLink 72 在超过大约每秒 400 tokens 后就会力竭——它的带宽不足以支撑超高端层级。这就是 Groq 的用武之地。

Groq 3 LPX 机架装入 256 颗 LPU 处理器，具备 128GB 片上 SRAM，以及 640TB/s 的 scale-up 带宽。GPU 负责 attention 计算；LPU 在每一层、为每一个输出 token 加速 decode 操作，并通过定制的 Spectrum-X 互连与 Vera Rubin 相连。Jensen 对部署配比说得很具体：“我可能会把 Groq 加到我整个数据中心的大约 25%。其余全部都是 100% Vera Rubin。”两者合用，NVIDIA 宣称推理每兆瓦吞吐量提升 35 倍。他感谢三星制造 LP30 芯片，并确认其将在 2026 年下半年出货。

Jensen 也解释了 Groq 对他为什么有吸引力：它是确定性的数据流处理器，静态编译、由编译器调度，拥有为单一工作负载而设计的海量片上 SRAM——推理。这种“只做一件事”的聚焦限制了 Groq 作为独立方案的覆盖面，但与 Vera Rubin 和 Dynamo 配对后，NVIDIA 就能兼得两种架构的优势。我一直坚持“异构论点”：AI 流水线正在分裂成三类明确的工作负载，NVIDIA 必须把缺口补齐。现在它做到了。如果执行到位，这是市场上最强的总体拥有成本（TCO）叙事。

Vera CPU：Jensen 称其将成为一个数十亿美元级业务

在 GTC 前的文章里，我把 CPU 的回潮称作“最容易被忽略的故事线之一”。Jensen 在台上直接终结了这种“忽略”。他表示：“我们从没想过我们会单独卖 CPU。我们正在单独卖出大量 CPU。这已经，毫无疑问，会成为一个数十亿美元级的业务。”

NVIDIA 将 Vera CPU 作为独立的机架级产品发布：256 颗液冷处理器、400TB 内存、300TB/s 内存带宽。该芯片采用 88 个 Arm Olympus 核心，每核心内存带宽是 x86 的 3 倍，能效是其两倍，且相较当下的 x86 服务器 CPU，单线程性能提升 1.5 倍。Jensen 用一句话讲清需求：AI 代理要调用工具、跑 SQL、编译代码、验证结果——这些都在 CPU 上完成。如果 CPU 慢，GPU 就会空转。他称 Vera 是“世界上唯一使用 LPDDR5 的数据中心 CPU”，强调极致单线程性能与每瓦性能。

我在 GTC 前就在 X 上发过观点：NVIDIA 正在执行旧式 Intel 服务器剧本，但速度更快——先锚定 GPU，然后沿着栈向上向下扩张，直到你掌控架构话语权。Vera CPU 机架让这一策略具象化。正如 Matt Kimball 在 CES 2026 的分析中所说，CPU 在 AI 系统里不是变得不重要，而是变得更专业。Alibaba、ByteDance、Meta 与 Oracle Cloud Infrastructure 正在合作部署；制造侧则包括 Dell Technologies、HPE、Lenovo 与 Supermicro。超大规模云厂商之外的企业是否会大规模采用 Vera，将取决于定价，以及代理式工作负载能多快成为常态。

软件高墙继续抬升：Dynamo、OpenShell 与“每一家 SaaS 都会变成 GaaS”

我预测 NemoClaw 会是 GTC 的软件头条。NVIDIA 走得比我预期更远。

Jensen 概括了把我们带到今天的三个拐点：ChatGPT 开启生成式时代，o1 开启推理时代，Claude Code 开启代理时代。他说：“NVIDIA 的百分之百都在使用 Claude Code、Codex 和 Cursor 的组合。如今没有任何一位软件工程师不被一个或多个 AI 代理所辅助。”这正是 NVIDIA 正在构建的软件栈背后的需求驱动。

Dynamo 1.0 现已作为 AI 工厂的开源推理操作系统投入生产，可将 Blackwell 推理提升最高达 7 倍，并已在 AWS、Azure、Google Cloud、Oracle Cloud 以及包括 PayPal、Pinterest、ByteDance 在内的企业客户中采用。配合 OpenShell 的 Agent Toolkit 为自治代理提供企业级安全护栏。NemoClaw 栈可用一条命令安装 Nemotron 模型与 OpenShell。Jensen 把 OpenClaw 比作 Windows 和 Mac，称其为“个人 AI 的操作系统”，并宣称它“和 HTML 一样重要，和 Linux 一样重要”。Adobe、Atlassian、SAP、Salesforce、ServiceNow、CrowdStrike 与 Siemens 正在采用它。

Nemotron Coalition 把 Cursor、LangChain、Mistral AI、Perplexity 等聚到一起，在 NVIDIA DGX Cloud 上构建开放的前沿模型。NVIDIA 也扩展了其开源模型家族，涵盖面向代理式 AI 的 Nemotron 3、Isaac GR00T N1.7、Cosmos 3 与 Alpamayo 1.5。Jensen 的挑衅式判断是：“每一家 SaaS 公司都会变成一家 GaaS 公司”：agents-as-a-service（代理即服务）。我认为方向上没错，但时间线会比 Jensen 暗示的更长。企业 IT 栈不可能两年就重建完成。

我在 GTC 2024 写过，NIM 对企业而言“比 Blackwell 更大”，称其是终极的“拥抱并扩展”打法。Jensen 用 CUDA 的飞轮进一步强化了这一点：20 年、数以亿计的已安装 GPU，以及（六年前出货的）Ampere GPU——在云端的定价反而在上涨，因为 CUDA 兼容硬件的有效生命周期极长。这种锁定被嵌入到架构之中，任何竞争对手都很难在两年时间线上复制。

物理 AI 生态的广度超出了我的预期

在 GTC 前的文章里，我写物理 AI“不会在 2026 形成有意义的营收，但它是在为 2028 到 2030 做铺垫”。我仍然坚持这一营收判断。我低估的是生态采纳的速度。

ABB Robotics、FANUC、KUKA 与 YASKAWA 都在采用 NVIDIA Omniverse 与 Isaac 仿真框架。NVIDIA 表示，这四家公司合计的全球装机基数超过 200 万台工业机器人。Figure、Agility 与 AGIBOT 正在基于 Isaac GR00T 模型与 Jetson Thor 构建人形机器人。在自动驾驶方面，BYD、Geely、Isuzu 与 Nissan 正在为 L4 级车辆采用 NVIDIA DRIVE Hyperion；Uber 计划从 2027 年启动机器人出租车网络，并在 2028 年扩展到 28 座城市。在医疗健康领域，Roche 为药物发现部署了超过 3500 颗 Blackwell GPU。还有迪士尼把一台会走路的 Olaf 机器人带上舞台——它在 Isaac 仿真中训练，使用了与 DeepMind 共同开发的物理求解器。最后这个更像是舞台效果，但其底层技术（NVIDIA Warp、Newton 物理引擎、Cosmos 世界模型）与驱动工业应用的栈是一套东西。

自从 NVIDIA 在 GTC 2020 演示 BMW 工厂应用以来，我一直在跟踪其机器人推进，并与一些机器人公司的 CEO 交流过——他们正把完整的开发栈构建在 NVIDIA 的“三台计算机架构”之上。物理 AI 中正在形成的生态锁定，与 CUDA 在数据中心所创造的路径极其相似。正确的问题是：有没有人能在这个规模上给出可信替代？现在答案是否定的。但对多数合作伙伴而言，物理 AI 营收仍处于商业化前夜；从仿真走到生产部署的机器人，路还很长。

NVIDIA 没有完全回应的问题：复杂性、能源与企业落地

我在 GTC 前分析中提出的三项风险，仍有一部分悬而未决。

复杂性。五种机架、七款芯片、以及多种互连方式，对任何非超大规模云厂商而言都很重。Jensen 展示的 MGX 模块化架构与 Token 工厂经济学框架有所帮助，但企业 CIO 仍然需要一套无需 NVIDIA 工程师团队就能部署的参考架构。DGX Spark 与 DGX Station 搭配 NemoClaw 是一个起点，但“桌面 AI”与“完整 AI 工厂”之间的鸿沟仍然很大。

能源。NVIDIA 发布了 DSX Max-Q 与 DSX Flex，用于动态供电与电网灵活性。这些是软件优化工具，而不是能源来源。正如我在主题演讲前所写，能源是对 2028 年展望中最被低估的约束。我对 2026 和 2027 很有信心。再往后一年，则需要行业尚未完全交付的解决方案。

Groq 整合的执行。三星正在制造 LP30，NVIDIA 表示 2026 年下半年可用。这比我预期更激进，是积极信号。但“每兆瓦吞吐量提升 35 倍”的主张，以及 Token 工厂的收入预测，都需要在客户规模上由第三方验证。如果这些数字成立，Groq 这笔交易会显得极具前瞻性；如果不成立，那就是一笔 200 亿美元的押注，而其兑现所需时间会比市场定价所反映的更长。

我仍然有的问题

在 GTC 前的文章里，我提出了四条建议。简化异构计算叙事：部分回应，B+。Jensen 的 Token 工厂框架有帮助，但企业买家仍需要更简单的上手路径。推出风冷的企业推理方案：在 Vera Rubin 上并未在 GTC 得到完整回应，评分为未完成。给出明确的 Groq 整合时间表：以 2026 年下半年可用、三星制造、以及明确的 25/75 部署比例进行了回应，评分 A-（有待验证）。掌握共封装光学（CPO）叙事：Spectrum-6 SPX 已量产，并确认 Feynman 同时支持铜缆与 CPO 的 scale-up，评分 B。

新增一条建议：让客户公开背书 Vera Rubin 在量产规模下的性能。Jensen 展示了 Satya 对 Azure 部署的确认。接下来，让 Anthropic、Meta 或 OpenAI 在下一次财报电话会或 Computex 上台，确认他们的 Token 工厂里观察到的结果。NVIDIA 自己的基准只是起点，不是终点。那次半导体分析的横扫是不错的一步。现在把它展示到客户规模。

GTC 2026 验证了平台论点。现在看执行。

GTC 2026 证实了我在主题演讲前写下的判断：NVIDIA 现在是一家异构 AI 基础设施平台公司。Vera Rubin 平台是任何一家半导体公司迄今为止做出的、在架构层面最完整的 AI 基础设施发布。软件壁垒更高了。物理 AI 生态比我预期更广。而 Jensen 所说到 2027 年的 1 万亿美元需求管道，是两年前难以想象的数字。

正如我在 GTC 2025 所写，那场展会是在展示 NVIDIA 对自身愿景的信心。GTC 2026 更进一步：它在展示 AI 工厂将成为本十年的定义性基础设施品类。到 2027 年的近端需求强度，处在本周期的高位。真正的考验会在能源约束、市场份额向 70% 压缩、以及日益成熟的定制硅对经济性施压时到来。正如我在 2025 年 5 月对 Marketplace 所说，AMD 与 Intel 在纯训练性能上落后一到两年，而 Google 的 TPU 与 Amazon 的 Trainium 是真实的替代选择。定制硅不会消失。但没有任何竞争对手能提供 NVIDIA 这样的广度：GPU、LPU、CPU、存储、网络，以及把它们绑定在一起的软件栈。

我认为 NVIDIA 的位置是结构性的，而非周期性的。芯片可以被复制。CUDA、NIMs、NeMo、Dynamo、OpenShell、Omniverse 以及开发者生态，不可能在两年内被复制。Jensen 提醒我们：CUDA 已经 20 岁，而 Ampere GPU 在云端的定价仍在上行。这就是那场赌注。GTC 2026 是迄今为止最强的证据，证明这是一场正确的赌注。

参考来源

Patrick Moorhead，《NVIDIA GTC 2026：异构计算、Groq，以及 AI 建设的下一阶段》，Moor Insights & Strategy（GTC 前分析）
Patrick Moorhead，《NVIDIA 的 AI Omniverse 在 GTC 2025 扩张》，Moor Insights & Strategy，2025 年 5 月 6 日
Matt Kimball，《NVIDIA 在 CES 2026：Vera Rubin 与 AI 基础设施形态的变化》，Moor Insights & Strategy，2026 年 1 月 12 日
播报分析：Patrick Moorhead 谈 NVIDIA 财报，Yahoo Finance，2026 年 2 月 25 日
播报分析：Patrick Moorhead 谈 NVIDIA 竞争地位，Marketplace，2025 年 5 月 28 日
Patrick Moorhead，LinkedIn 帖子：关于 GTC 2024 的 NVIDIA NIM，2024 年 3 月 18 日
NVIDIA Vera Rubin 平台新闻稿，2026 年 3 月 16 日
NVIDIA Vera CPU 新闻稿，2026 年 3 月 16 日
NVIDIA Dynamo 1.0 新闻稿，2026 年 3 月 16 日
NVIDIA Agent Toolkit 新闻稿，2026 年 3 月 16 日
NVIDIA Nemotron Coalition 新闻稿，2026 年 3 月 16 日
NVIDIA 开源模型新闻稿，2026 年 3 月 16 日
NVIDIA 机器人生态新闻稿，2026 年 3 月 16 日
NVIDIA DRIVE Hyperion L4 新闻稿，2026 年 3 月 16 日
NVIDIA Vera Rubin DSX 参考设计新闻稿，2026 年 3 月 16 日
Roche 在全球扩展 NVIDIA AI 工厂，NVIDIA 博客，2026 年 3 月 16 日
《大型科技公司 2026 年 AI 投资或达 6500 亿美元，激增趋势显著》，Yahoo Finance，2026 年 2 月 6 日
Jensen Huang 的 NVIDIA GTC 2026 主题演讲，2026 年 3 月 16 日（现场参加与文字记录）

By Patrick Moorhead, CEO and Chief Analyst, Moor Insights & Strategy

作者：Patrick Moorhead，Moor Insights & Strategy 首席执行官兼首席分析师

I Wrote the Playbook Before the Keynote. Here’s How It Played Out.

我在主题演讲前就写好了“剧本”：结果如何展开

Last week, I published my GTC 2026 preview with a specific thesis: NVIDIA must prove it can unify training GPUs, pre-fill accelerators, Groq decode processors, and standalone CPUs under a single software layer. I laid out what I expected Jensen Huang to announce, what the risks were, and what I’d advise the company to do. Then I flew to San Jose and watched the keynote from the SAP Center.

I’ve attended every GTC since 2011. This was the most architecturally complete keynote I’ve seen Jensen deliver. Seven new chips in full production. Five rack-scale systems. A unified software stack spanning training, inference, agentic orchestration, and storage. A physical AI ecosystem broader than anything I expected. And a Disney robot named Olaf walking across the stage, trained entirely in NVIDIA’s Isaac simulation environment. Jensen opened by celebrating CUDA’s 20th anniversary and closed by declaring that “every SaaS company will become a GaaS company,” an agents-as-a-service company. In between, he laid out the economics of token factories in a way that should get every infrastructure CEO’s attention.

The short version: NVIDIA delivered on the heterogeneous platform thesis. The Groq LPU integration landed exactly as I predicted. The Vera CPU moved from sleeper to center stage. The software wall got taller. What surprised me was the speed and the scale: a $1 trillion demand pipeline through 2027, the LPX rack shipping in the second half of 2026, Samsung already manufacturing the Groq LP30 chip, and Satya Nadella confirming Vera Rubin is already running at Microsoft Azure. What wasn’t fully addressed: enterprise simplification and the energy constraint I flagged for 2027.

Seven Chips, Five Racks, One AI Factory: The Vera Rubin Platform

七款芯片、五套机架、一座 AI 工厂：Vera Rubin 平台

Huang unveiled the NVIDIA Vera Rubin platform on March 16: seven new chips, all in full production, shipping as five rack-scale systems. The components include the Rubin GPU, Vera CPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch, and the newly integrated Groq 3 LPU. The racks: Vera Rubin NVL72 for GPU compute, Vera CPU for agentic orchestration, Groq 3 LPX for ultra-low-latency decode, BlueField-4 STX for context memory storage, and Spectrum-6 SPX for Ethernet spine networking.

As my colleague Matt Kimball wrote in his CES 2026 research note, NVIDIA positioned Vera Rubin as a new platform, not a new chip generation. GTC 2026 validated that framing. The NVL72 integrates 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6. NVIDIA claims 10x inference throughput per watt and one-tenth the cost per token versus Blackwell, and says NVL72 handles large mixture-of-experts models with one-quarter the GPU count of the prior generation. If those efficiency claims hold at production scale, they change AI factory economics for every buyer in the stack.

On stage, Jensen showed the hardware: 100 percent liquid cooled, cable-free compute trays that reduce installation from two days to two hours, and the sixth-generation NVLink switching system. He also confirmed that Satya Nadella had already reported Vera Rubin up and running at Microsoft Azure, and that NVIDIA’s supply chain can now manufacture “thousands per week” of these racks, “potentially multi-gigawatts of AI factories per month.” As Anshel Sag wrote at GTC 2025, the base-model Rubin was slated for early 2026 with HBM4 memory. NVIDIA delivered on that milestone. But the real story isn’t the GPU itself. It’s the architecture around it. No other semiconductor company has shipped this many purpose-built, co-designed components simultaneously. That said, shipping components and proving they work together at hyperscale are two different things.

From $500 Billion to $1 Trillion: The Demand Pipeline Doubled in 12 Months

从 5000 亿到 1 万亿：12 个月内需求管道翻倍

The demand story Jensen told on stage is staggering. At last year’s GTC, he saw $500 billion of high-confidence demand for Blackwell and Rubin through 2026. This year, standing on the same stage, he said he now sees “at least $1 trillion” through 2027. He added: “I am certain computing demand will be much higher than that.”

The external data backs it up. Microsoft, Alphabet, Amazon, and Meta are on track to spend upward of $650 billion on AI investments this year, nearly tripling 2023 levels. As I told Yahoo Finance in February, AI infrastructure is essentially sold out through the end of 2027. NVIDIA posted fourth-quarter revenue of $68.1 billion, beating estimates by more than $8 billion, with datacenter revenue of $62.3 billion. Vera Rubin’s efficiency gains arrive precisely when customers need to extract more intelligence from every watt and every dollar of infrastructure spend.

The Groq Integration: My Prediction Landed, and Jensen Showed the Economics

Groq 整合：我的预测落地了，而 Jensen 把经济学讲出来了

In my pre-GTC analysis, I made a specific architectural prediction: the more likely near-term path for Groq integration was a disaggregated configuration, with LPU racks sitting alongside GPU racks, connected by NVLink, managed by NVIDIA’s software layer. That’s exactly what NVIDIA announced.

But Jensen went further than the press release by showing the token factory economics. He walked through a 2D framework: throughput (tokens per watt) on the Y axis, token speed (latency/intelligence) on the X axis, with tiers from free to ultra-premium at $150 per million tokens. Vera Rubin alone shifts the entire frontier up, enabling 5x more revenue generation per gigawatt of data center versus Blackwell. The problem: NVLink 72 runs out of steam beyond about 400 tokens per second. It simply doesn’t have enough bandwidth for the ultra-premium tier. That’s where Groq comes in.

The Groq 3 LPX rack packs 256 LPU processors with 128 gigabytes of on-chip SRAM and 640 terabytes per second of scale-up bandwidth. GPUs handle attention math; LPUs accelerate decode operations at every layer for every output token, connected to Vera Rubin via a custom Spectrum-X interconnect. Jensen was specific about the deployment mix: “I would add Groq to maybe 25 percent of my total data center. The rest is all 100 percent Vera Rubin.” Combined, NVIDIA claims 35x more inference throughput per megawatt. He thanked Samsung for manufacturing the LP30 chip and confirmed it ships in the second half of 2026.

Jensen also explained why Groq was attractive to him: it’s a deterministic data flow processor, statically compiled, compiler-scheduled, with massive on-chip SRAM designed for one workload: inference. That single-workload focus limited Groq’s standalone reach, but paired with Vera Rubin and Dynamo, NVIDIA gets the best of both architectures. I’ve been consistent on the heterogeneous thesis. The AI pipeline is splitting into three distinct workloads, and NVIDIA had to fill the gaps. Now it has. If execution holds, it’s the strongest total cost of ownership story in the market.

The Vera CPU: Jensen Called It a Multi-Billion Dollar Business

Vera CPU：Jensen 称其将成为一个数十亿美元级业务

In the pre-GTC piece, I called the CPU resurgence “one of the sleeper storylines.” Jensen put that to rest. He said on stage: “We never thought we would be selling CPU standalone. We are selling a lot of CPU standalone. This is already, for sure, going to be a multi-billion dollar business.”

NVIDIA launched the Vera CPU as a dedicated rack-scale product: 256 liquid-cooled processors, 400 terabytes of memory, 300 terabytes per second of memory bandwidth. The chip uses 88 Arm Olympus cores with 3x more memory bandwidth per core than x86, twice the energy efficiency, and 1.5x better single-thread performance versus today’s x86 server CPUs. Jensen framed the need simply: AI agents call tools, run SQL, compile code, and validate results on CPUs. If the CPUs are slow, the GPUs sit idle. He called Vera “the only data center CPU in the world that uses LPDDR5,” emphasizing extreme single-thread performance and performance per watt.

I posted on X before GTC that NVIDIA is executing the old Intel server playbook but faster: anchor the GPU, then expand up and down the stack until you own the architecture conversation. The Vera CPU rack is that strategy made concrete. As Matt Kimball put it in his CES 2026 analysis, CPUs aren’t becoming less relevant in AI systems; they’re becoming more specialized. Alibaba, ByteDance, Meta, and Oracle Cloud Infrastructure are collaborating on deployment, alongside Dell Technologies, HPE, Lenovo, and Supermicro on manufacturing. Whether enterprises outside the hyperscaler tier adopt Vera at volume will depend on pricing and how quickly agentic workloads become standard.

The Software Wall Keeps Rising: Dynamo, OpenShell, and “Every SaaS Becomes GaaS”

软件高墙继续抬升：Dynamo、OpenShell 与“每一家 SaaS 都会变成 GaaS”

I predicted NemoClaw would be the software headline at GTC. NVIDIA went further than I expected.

我预测 NemoClaw 会是 GTC 的软件头条。NVIDIA 走得比我预期更远。

Jensen framed the three inflections that got us here: ChatGPT started the generative era, o1 started the reasoning era, and Claude Code started the agentic era. He said “100 percent of NVIDIA is using a combination of Claude Code, Codex, and Cursor. There’s not one software engineer today who is not assisted by one or many AI agents.” That’s the demand driver behind the software stack NVIDIA is building.

Dynamo 1.0 is now in production as the open-source inference operating system for AI factories, boosting Blackwell inference by up to 7x and adopted across AWS, Azure, Google Cloud, Oracle Cloud, and enterprise customers including PayPal, Pinterest, and ByteDance. The Agent Toolkit with OpenShell provides enterprise security guardrails for autonomous agents. The NemoClaw stack installs Nemotron models and OpenShell in a single command. Jensen compared OpenClaw to Windows and Mac, calling it “the operating system for personal AI” and declaring it “as big a deal as HTML, as big as Linux.” Adobe, Atlassian, SAP, Salesforce, ServiceNow, CrowdStrike, and Siemens are adopting it.

The Nemotron Coalition brings Cursor, LangChain, Mistral AI, Perplexity, and others together to build open frontier models on NVIDIA DGX Cloud. NVIDIA also expanded its open model families across Nemotron 3 for agentic AI, Isaac GR00T N1.7, Cosmos 3, and Alpamayo 1.5. Jensen’s provocation: “Every SaaS company will become a GaaS company”: agents-as-a-service. I think that’s directionally right, though the timeline will be longer than Jensen implies. The enterprise IT stack doesn’t get rebuilt in two years.

I wrote at GTC 2024 that NIM was “bigger than Blackwell” for enterprises, calling it the ultimate embrace-and-extend play. Jensen reinforced this with the CUDA flywheel: 20 years, hundreds of millions of installed GPUs, and Ampere GPUs (shipped six years ago) with pricing going UP in the cloud because the useful life of CUDA-compatible hardware is so long. The lock-in is architecturally embedded, and it’s the hardest thing for any competitor to replicate on a two-year timeline.

Physical AI Ecosystem Breadth Exceeded My Expectations

物理 AI 生态的广度超出了我的预期

In the pre-GTC piece, I wrote that physical AI was “not meaningful 2026 revenue, but it is the 2028 to 2030 setup.” I stand by the revenue call. What I underestimated was the pace of ecosystem adoption.

ABB Robotics, FANUC, KUKA, and YASKAWA are all adopting NVIDIA Omniverse and Isaac simulation frameworks. NVIDIA says these four represent a combined global installed base exceeding two million industrial robots. Figure, Agility, and AGIBOT are building humanoid robots on Isaac GR00T models and Jetson Thor. On autonomous vehicles, BYD, Geely, Isuzu, and Nissan are adopting NVIDIA DRIVE Hyperion for level 4 vehicles, with Uber planning a robotaxi network starting in 2027 and scaling to 28 cities by 2028. In healthcare, Roche deployed more than 3,500 Blackwell GPUs for drug discovery. And Disney brought a walking Olaf robot on stage, trained in Isaac simulation using a physics solver co-developed with DeepMind. That last one was pure theater, but the underlying tech (NVIDIA Warp, Newton physics engine, Cosmos world models) is the same stack powering the industrial applications.

I’ve been tracking NVIDIA’s robotics push since the company demonstrated BMW factory applications at GTC 2020, and I’ve spoken with robotics CEOs who are building entire development stacks on NVIDIA’s three-computer architecture. The ecosystem lock-in forming in physical AI mirrors what CUDA created in the datacenter. Whether anyone can offer a credible alternative at this scale is the right question. Right now, the answer is no. But physical AI revenue remains pre-commercial for most of these partners, and the path from simulation to deployed production robots is long.

What NVIDIA Didn’t Fully Address: Complexity, Energy, and Enterprise

NVIDIA 没有完全回应的问题：复杂性、能源与企业落地

Three risks from my pre-GTC analysis remain partially unresolved.

我在 GTC 前分析中提出的三项风险，仍有一部分悬而未决。

Complexity. Five rack types, seven chips, and multiple interconnects is a lot for anyone who isn’t a hyperscaler. The MGX modular architecture and the token factory economics framework Jensen presented help, but enterprise CIOs still need a reference architecture they can deploy without a team of NVIDIA engineers. DGX Spark and DGX Station paired with NemoClaw are a start, but the gap between “desktop AI” and “full AI factory” remains wide.

Energy. NVIDIA announced DSX Max-Q and DSX Flex for dynamic power provisioning and grid flexibility. Those are software optimization tools, not energy sources. As I wrote before the keynote, energy is the most underappreciated constraint on the 2028 outlook. I’m confident about 2026 and 2027. The year after that requires solutions the industry hasn’t fully delivered.

Groq integration execution. Samsung is manufacturing the LP30, and NVIDIA says second-half 2026 availability. That’s more aggressive than I expected, which is positive. But the 35x throughput per megawatt claim and the token factory revenue projections need third-party validation at customer scale. If those numbers hold, the Groq deal will look prescient. If they don’t, it’s a $20 billion bet that takes longer to pay off than the market is pricing in.

Questions I Have

我仍然有的问题

In the pre-GTC piece, I pushed on four advisory points. Simplify the heterogeneous compute message: partially addressed, B+. Jensen’s token factory framework helps, but enterprise buyers need a simpler on-ramp. Ship an air-cooled enterprise inference solution: not completely addressed at GTC on Vera Rubin, grade incomplete. Show concrete Groq integration timelines: addressed with second-half 2026 availability, Samsung manufacturing, and a specific 25/75 deployment ratio, grade A-minus pending validation. Own the co-packaged optics narrative: addressed with Spectrum-6 SPX in production plus both copper and CPO scale-up confirmed for Feynman, grade B.

One new advisory: get customers on the record validating Vera Rubin performance at production scale. Jensen showed that Satya confirmed Azure deployment. Now get Anthropic, Meta, or OpenAI on stage at the next earnings call or Computex to confirm what they’re seeing in their token factories. NVIDIA’s own benchmarks are a starting point, not a finish line. The semi analysis sweep was a good step. Now show it at customer scale.

GTC 2026 Validates the Platform Thesis. Now Execute.

GTC 2026 验证了平台论点。现在看执行。

GTC 2026 confirmed what I wrote before the keynote: NVIDIA is now a heterogeneous AI infrastructure platform company. The Vera Rubin platform is the most architecturally complete AI infrastructure announcement any semiconductor company has made. The software wall got taller. The physical AI ecosystem is broader than I anticipated. And Jensen’s $1 trillion demand pipeline through 2027 is a number that would have been unthinkable two years ago.

As I wrote at GTC 2025, that show was a demonstration of NVIDIA’s confidence in its own vision. GTC 2026 goes further. It’s a demonstration that the AI factory is the defining infrastructure category of this decade. Near-term demand through 2027 is as strong as any point in this cycle. The real test comes when energy constraints, market share compression toward 70 percent, and maturing custom silicon pressure the economics. As I told Marketplace in May 2025, AMD and Intel are one to two years behind in raw training performance, and Google’s TPU and Amazon’s Trainium are real alternatives. Custom silicon isn’t going away. But no competitor offers NVIDIA’s breadth: GPUs, LPUs, CPUs, storage, networking, and the software stack tying it together.

I believe NVIDIA’s position is structural, not cyclical. Chips can be replicated. CUDA, NIMs, NeMo, Dynamo, OpenShell, Omniverse, and the developer ecosystem can’t be replicated in two years. Jensen reminded us that CUDA is 20 years old and Ampere GPUs are still appreciating in cloud pricing. That’s the bet. GTC 2026 is the strongest evidence yet that it’s the right one.

Sources

参考来源

Patrick Moorhead, “NVIDIA GTC 2026: Heterogeneous Compute, Groq, and the Next Phase of the AI Build-Out,” Moor Insights & Strategy (pre-GTC analysis)

Patrick Moorhead，《NVIDIA GTC 2026：异构计算、Groq，以及 AI 建设的下一阶段》，Moor Insights & Strategy（GTC 前分析）

Patrick Moorhead, “NVIDIA’s AI Omniverse Expands at GTC 2025,” Moor Insights & Strategy, May 6, 2025

Patrick Moorhead，《NVIDIA 的 AI Omniverse 在 GTC 2025 扩张》，Moor Insights & Strategy，2025 年 5 月 6 日

Matt Kimball, “NVIDIA at CES 2026: Vera Rubin and the Changing Shape of AI Infrastructure,” Moor Insights & Strategy, January 12, 2026

Matt Kimball，《NVIDIA 在 CES 2026：Vera Rubin 与 AI 基础设施形态的变化》，Moor Insights & Strategy，2026 年 1 月 12 日

Broadcast Analysis: Patrick Moorhead on NVIDIA Earnings, Yahoo Finance, February 25, 2026

播报分析：Patrick Moorhead 谈 NVIDIA 财报，Yahoo Finance，2026 年 2 月 25 日

Broadcast Analysis: Patrick Moorhead on NVIDIA Competitive Position, Marketplace, May 28, 2025

播报分析：Patrick Moorhead 谈 NVIDIA 竞争地位，Marketplace，2025 年 5 月 28 日

Patrick Moorhead, LinkedIn post on NVIDIA NIM at GTC 2024, March 18, 2024

Patrick Moorhead，LinkedIn 帖子：关于 GTC 2024 的 NVIDIA NIM，2024 年 3 月 18 日

NVIDIA Vera Rubin Platform Press Release, March 16, 2026

NVIDIA Vera Rubin 平台新闻稿，2026 年 3 月 16 日

NVIDIA Vera CPU Press Release, March 16, 2026

NVIDIA Vera CPU 新闻稿，2026 年 3 月 16 日

NVIDIA Dynamo 1.0 Press Release, March 16, 2026

NVIDIA Dynamo 1.0 新闻稿，2026 年 3 月 16 日

NVIDIA Agent Toolkit Press Release, March 16, 2026

NVIDIA Agent Toolkit 新闻稿，2026 年 3 月 16 日

NVIDIA Nemotron Coalition Press Release, March 16, 2026

NVIDIA Nemotron Coalition 新闻稿，2026 年 3 月 16 日

NVIDIA Open Models Press Release, March 16, 2026

NVIDIA 开源模型新闻稿，2026 年 3 月 16 日

NVIDIA Robot Ecosystem Press Release, March 16, 2026

NVIDIA 机器人生态新闻稿，2026 年 3 月 16 日

NVIDIA DRIVE Hyperion L4 Press Release, March 16, 2026

NVIDIA DRIVE Hyperion L4 新闻稿，2026 年 3 月 16 日

NVIDIA Vera Rubin DSX Reference Design Press Release, March 16, 2026

NVIDIA Vera Rubin DSX 参考设计新闻稿，2026 年 3 月 16 日

Roche Scales NVIDIA AI Factories Globally, NVIDIA Blog, March 16, 2026

Roche 在全球扩展 NVIDIA AI 工厂，NVIDIA 博客，2026 年 3 月 16 日

“Big Tech set to spend $650 billion in 2026 as AI investments soar,” Yahoo Finance, February 6, 2026

《大型科技公司 2026 年 AI 投资或达 6500 亿美元，激增趋势显著》，Yahoo Finance，2026 年 2 月 6 日

NVIDIA GTC 2026 Keynote by Jensen Huang, March 16, 2026 (live attendance and transcript)

Jensen Huang 的 NVIDIA GTC 2026 主题演讲，2026 年 3 月 16 日（现场参加与文字记录）

By Patrick Moorhead, CEO and Chief Analyst, Moor Insights & Strategy

I Wrote the Playbook Before the Keynote. Here’s How It Played Out.

Seven Chips, Five Racks, One AI Factory: The Vera Rubin Platform

From $500 Billion to $1 Trillion: The Demand Pipeline Doubled in 12 Months

The Groq Integration: My Prediction Landed, and Jensen Showed the Economics

The Vera CPU: Jensen Called It a Multi-Billion Dollar Business

The Software Wall Keeps Rising: Dynamo, OpenShell, and “Every SaaS Becomes GaaS”

I predicted NemoClaw would be the software headline at GTC. NVIDIA went further than I expected.

Physical AI Ecosystem Breadth Exceeded My Expectations

What NVIDIA Didn’t Fully Address: Complexity, Energy, and Enterprise

Three risks from my pre-GTC analysis remain partially unresolved.

Questions I Have

GTC 2026 Validates the Platform Thesis. Now Execute.

Sources

Patrick Moorhead, “NVIDIA GTC 2026: Heterogeneous Compute, Groq, and the Next Phase of the AI Build-Out,” Moor Insights & Strategy (pre-GTC analysis)
Patrick Moorhead, “NVIDIA’s AI Omniverse Expands at GTC 2025,” Moor Insights & Strategy, May 6, 2025
Matt Kimball, “NVIDIA at CES 2026: Vera Rubin and the Changing Shape of AI Infrastructure,” Moor Insights & Strategy, January 12, 2026
Broadcast Analysis: Patrick Moorhead on NVIDIA Earnings, Yahoo Finance, February 25, 2026
Broadcast Analysis: Patrick Moorhead on NVIDIA Competitive Position, Marketplace, May 28, 2025
Patrick Moorhead, LinkedIn post on NVIDIA NIM at GTC 2024, March 18, 2024
NVIDIA Vera Rubin Platform Press Release, March 16, 2026
NVIDIA Vera CPU Press Release, March 16, 2026
NVIDIA Dynamo 1.0 Press Release, March 16, 2026
NVIDIA Agent Toolkit Press Release, March 16, 2026
NVIDIA Nemotron Coalition Press Release, March 16, 2026
NVIDIA Open Models Press Release, March 16, 2026
NVIDIA Robot Ecosystem Press Release, March 16, 2026
NVIDIA DRIVE Hyperion L4 Press Release, March 16, 2026
NVIDIA Vera Rubin DSX Reference Design Press Release, March 16, 2026
Roche Scales NVIDIA AI Factories Globally, NVIDIA Blog, March 16, 2026
“Big Tech set to spend $650 billion in 2026 as AI investments soar,” Yahoo Finance, February 6, 2026
NVIDIA GTC 2026 Keynote by Jensen Huang, March 16, 2026 (live attendance and transcript)

📋 讨论归档

讨论进行中…