🧠 阿头学 · 💬 讨论题

为什么 CPU 在 Agentic AI 时代重新变得关键

这篇文章抓对了一个方向：Agentic AI 确实会显著抬高 CPU 和系统编排的重要性，但它把“CPU 必要”夸大成了“Graviton 关键”，本质上是一篇有价值但营销味很重的 AWS 软文。
打开原文 ↗

2026-04-25 原文链接 ↗

阅读简报

双语对照

完整翻译

原文

讨论归档

核心观点

Agent 不是一次性生成，而是持续执行系统 文章最成立的判断是，Agentic AI 不只是吐 token，而是持续做任务、调用工具、处理文件、访问网络、执行代码，因此它的基础设施需求确实不同于只盯训练吞吐的传统 AI 叙事。
CPU 重要性上升，但不是替代 GPU 文中把 CPU 的回归讲得对了一半：Agent 工作流中的调度、I/O、状态管理、工具调用天然偏 CPU；但真正的模型理解、规划和生成仍高度依赖 GPU 或其他加速器，因此现实答案是异构计算，而不是 CPU 重新当主角。
文章最大价值在于提醒成本结构变了 Agent 产品如果要 7x24 小时运行，决定生死的往往不是峰值算力，而是持续低延迟、能效、稳定性和总拥有成本，这个判断对做产品和看基础设施投资都很关键。
Meta 案例有分量，但证据不完整 “部署数千万个 Graviton 核心”听起来很强，但文章没有交代这些 CPU 到底承担的是推理编排、后端服务还是传统基础设施，这个案例能证明“CPU 很重要”，却不足以证明“Graviton 是 Agentic AI 的核心引擎”。
Graviton 被包装成答案，论证明显跳步 从“Agent 需要大量 CPU 型任务”直接跳到“所以 Graviton 至关重要”，中间缺少和其他 CPU、GPU+CPU 组合、NPU/LPU 等方案的定量对比，这不是严谨论证，而是典型产品定位话术。

跟我们的关联

对 ATou 意味着什么、下一步怎么用 ATou 如果在看 Agent 产品，不该只比较模型回答质量，而要优先拆“执行链路”的成本和延迟；下一步应该把任务分成推理、调度、工具调用、I/O 四层，单独估算瓶颈和成本。
对 Neta 意味着什么、下一步怎么用 Neta 做叙事或研究时，可以把“Agent 是持续执行系统，不是聊天工具”当成一个强框架；下一步可用它来筛选项目，优先看高频、多步骤、跨系统闭环的场景。
对 Uota 意味着什么、下一步怎么用 Uota 如果关注产品体验，应该意识到用户感受到的卡顿，很多时候不是模型不聪明，而是编排、外部调用和系统延迟差；下一步应该把体验优化重点从“换更大模型”转向“缩短执行回路”。
对投资判断意味着什么、下一步怎么用 这篇文真正可投的不是“CPU 替代 GPU”这种粗糙命题，而是 Agent 基建中的异构调度、低延迟编排、能效优化和持续服务成本控制；下一步应用时，要避开单一芯片神话，关注全栈效率。

讨论引子

1. Agent 产品的核心壁垒，究竟更接近“模型智力”还是“执行编排效率”？ 2. 如果 Agent 的主要成本从 token 转向常驻运行，那 SaaS 定价是不是必须重写？ 3. CPU 在 Agent 时代到底会持续升值，还是只是过渡阶段，最终又会被更强的推理加速器吞掉？

Agentic AI 系统会持续推理并实时决策，这是一种根本性的转变，因此它所需要的基础设施也不同于传统的 AI 训练。

关键要点

Agentic AI 是持续运行的，而不是按批次执行，因此需要持续的计算能力，以及处理核心之间的高速通信。
像 AWS Graviton 这样的 CPU，正是为这类始终在线的推理工作负载而设计的。
Meta 等公司正在部署数千万个 Graviton 核心，以在全球规模上为 Agentic AI 提供支撑。

Agentic AI 的不同之处

多年以来，关于 AI 基础设施的讨论，核心一直围绕着那些为训练大型语言模型而设计的芯片。像 AWS Trainium 这样的加速器，以及图形处理器（GPU），都擅长并行处理海量数据，因此非常适合承担训练 AI 模型这种高强度工作。

但 Agentic AI 的运作方式不同。要理解为什么中央处理器（CPU）正在重新获得 AI 讨论中的重要地位，我们必须先看清标准语言模型（SLM）、大型语言模型（LLM）与 AI 智能体之间的根本差异。语言模型有点像计算器：你给它一个提示，它就进行海量并行计算，以预测接下来的一组 token（也就是输出）。这正是 GPU 的强项，它们能够在许多核心上同时运行数千次计算（核心即芯片中的单个处理单元）。

而 AI 智能体更像是一位管理者。智能体是一类能够自主行动、完成多步骤任务的 AI 系统，它始终在线，持续处理信息并采取行动。可以把它想象成一个数字助理，它不仅会回应指令，还真的会替你把事情做完，比如管理日程、跨系统协同、作出决策。如果你让一个智能体去研究这家公司并起草一份简报，它做的就不只是生成文本。它必须把目标拆解为一系列连续步骤，打开浏览器并浏览链接，解析文件和数据，过滤噪音，并运行代码来完成最终草稿。

长期以来，推理一直被视为 GPU 的领域，当然如今 CPU 也开始承担越来越多这类工作。但在这条链路中的其他每一个步骤，从逻辑处理、文件管理到网络调用和代码执行，本质上都是原生属于 CPU 的任务。这就要求系统具备持续的计算能力，以及处理核心之间极快的数据通信。

这正是像 AWS Graviton 这样专门打造的 CPU 变得至关重要的地方。Graviton 处理器就是为定义 Agentic AI 时代的这类持续、低延迟工作负载而设计的。这也是为什么 Meta 这样的公司正在把 Graviton 用于其最新的 Agentic AI 工作负载中，部署数千万个 Graviton 核心，为持续推理、服务数十亿用户的 AI 系统提供动力。

Graviton 如何应对这一挑战

Agentic 系统会进入高速执行循环，不只是推理，还会主动检索数据、调用工具、执行动作，然后再回到下一步判断之中。每一个循环，都需要处理器的不同部分能够快速共享数据。

Graviton 芯片的设计目标之一，就是尽量缩短不同部分彼此通信所消耗的时间。对于那些在推理过程中不断交换信息的 AI 系统来说，这种速度至关重要。

那些在全球规模上运行 Agentic AI 的组织之所以选择 Graviton，是因为它的架构与这类工作负载高度匹配。训练模型通常发生在短时间、高强度的爆发式阶段。而运行 Agentic 系统，则需要持续稳定的性能，以及不断适应变化的能力。不同的工作，需要不同的工具。

为什么这对持续智能至关重要

当 AI 系统持续为数百万用户提供服务时，性能与能效就都变得不可或缺。Graviton 提供更强的性能，也是我们能效最高的处理器。对于那些以全球规模全天候运行的系统来说，这种效率既体现了环境责任，也具备切实的经济价值。正是这种组合，使得面向庞大用户群持续运行复杂的 Agentic 系统成为现实。

为永不停歇思考的系统建设基础设施

Agentic AI 的兴起，代表着 AI 基础设施的一次根本性转变。随着 AI 系统变得更加自主，也更深地融入日常数字体验，对基础设施的要求也在随之演进。

像 Graviton 这样为持续、低延迟计算而设计，而不是为周期性训练高峰而设计的处理器，正在成为这一新时代的基础。问题已经不再是 Agentic AI 会不会重塑我们的工作与生活方式，而是我们是否已经拥有能够在大规模上支撑它的基础设施。

而现在，这样的基础设施正越来越多地建立在一种专为永不停歇思考的系统而设计的技术之上。

进一步了解 Graviton 芯片如何让云计算变得更快、更便宜，也更节能。

Agentic AI systems reason continuously and make real-time decisions—a fundamental shift that requires different infrastructure than traditional AI training.

Agentic AI 系统会持续推理并实时决策，这是一种根本性的转变，因此它所需要的基础设施也不同于传统的 AI 训练。

Key takeaways

关键要点

Agentic AI operates continuously, not in batches, requiring sustained computing power with fast communication between processing cores.

Agentic AI 是持续运行的，而不是按批次执行，因此需要持续的计算能力，以及处理核心之间的高速通信。

CPUs like AWS Graviton are designed for these always-on reasoning workloads.

像 AWS Graviton 这样的 CPU，正是为这类始终在线的推理工作负载而设计的。

Companies like Meta are deploying tens of millions of Graviton cores to power agentic AI at global scale.

Meta 等公司正在部署数千万个 Graviton 核心，以在全球规模上为 Agentic AI 提供支撑。

What makes agentic AI different

Agentic AI 的不同之处

For years, conversations about AI infrastructure centered on chips designed for training large language models. Accelerators like AWS Trainium and graphics processing units (GPUs) excel at processing massive amounts of data in parallel, making them ideal for the intensive work of teaching AI models.

But agentic AI operates differently. To understand why the central processing unit (CPU) is reclaiming relevance in the AI conversation, we have to look at the fundamental difference between a standard language model (SLM), large language model (LLM) and an AI agent. A language model is almost like a calculator: you give it a prompt, and it performs a massive amount of parallel math to predict the next set of tokens (the output). This is where GPUs shine—they run thousands of calculations simultaneously across many cores (individual processing units within a chip).

An AI agent, however, is more like a manager. Agents are AI systems capable of acting autonomously to complete multistep tasks—always on and continuously processing information and taking action. Think of a digital assistant that doesn't just respond to commands but actually completes tasks on your behalf—managing schedules, coordinating across systems, making decisions. If you ask an agent to “research this company and draft a brief,” it doesn’t just generate text. It must break the goal into sequential steps, open a web browser and navigate links, parse files, data, and filter noise, and run code to finalize the draft.

While inference has long been the GPU’s domain, (with CPUs now handling a growing share of that workload too), every other step in that chain, from logic to file management to network calls and code execution, is a CPU-native task. That requires sustained computing power with extremely fast communication between processing cores.

This is where purpose-built CPUs like AWS Graviton become critical. Graviton processors are designed for these continuous, low-latency workloads that define the agentic AI era. That's why companies like Meta are using Graviton for their latest agentic AI workloads—deploying tens of millions of Graviton cores to power AI systems that reason continuously and serve billions of users.

How Graviton addresses the challenge

Graviton 如何应对这一挑战

Agentic systems engage in rapid execution cycles—not just reasoning, but actively retrieving data, calling tools, and taking action before looping back to evaluate next steps. Each cycle requires different parts of the processor to share data quickly.

Graviton chips are designed to minimize the time different parts spend communicating with each other. For AI systems constantly exchanging information during reasoning processes, that speed matters.

Graviton 芯片的设计目标之一，就是尽量缩短不同部分彼此通信所消耗的时间。对于那些在推理过程中不断交换信息的 AI 系统来说，这种速度至关重要。

Organizations running agentic AI at global scale are choosing Graviton because the architecture matches the workload. Training a model happens in intense bursts. Running an agentic system requires sustained performance and constant adaptation—different work needs different tools.

Why this matters for continuous intelligence

为什么这对持续智能至关重要

When AI systems serve millions of users continuously, performance and energy efficiency both become essential. Graviton delivers better performance and is our most energy efficient processor. For systems running around the clock at global scale, that efficiency is both environmentally conscious and economically essential. The combination makes it viable to run sophisticated agentic systems continuously for large user bases.

Building infrastructure for systems that never stop thinking

为永不停歇思考的系统建设基础设施

The rise of agentic AI represents a fundamental shift in AI infrastructure. As AI systems become more autonomous and more integrated into daily digital experiences, the requirements evolve.

Agentic AI 的兴起，代表着 AI 基础设施的一次根本性转变。随着 AI 系统变得更加自主，也更深地融入日常数字体验，对基础设施的要求也在随之演进。

Processors like Graviton—designed for sustained, low-latency computing rather than periodic training bursts—are becoming the foundation for this new era. The question isn't whether agentic AI will reshape how we work and live. It's whether the infrastructure exists to support it at scale.

Increasingly, that infrastructure is being built on technology designed specifically for systems that never stop thinking.

而现在，这样的基础设施正越来越多地建立在一种专为永不停歇思考的系统而设计的技术之上。

Learn more about how Graviton chips make cloud computing faster, cheaper, and more energy efficient.

进一步了解 Graviton 芯片如何让云计算变得更快、更便宜，也更节能。

Agentic AI systems reason continuously and make real-time decisions—a fundamental shift that requires different infrastructure than traditional AI training.

Key takeaways

Agentic AI operates continuously, not in batches, requiring sustained computing power with fast communication between processing cores.
CPUs like AWS Graviton are designed for these always-on reasoning workloads.
Companies like Meta are deploying tens of millions of Graviton cores to power agentic AI at global scale.

What makes agentic AI different

How Graviton addresses the challenge

Graviton chips are designed to minimize the time different parts spend communicating with each other. For AI systems constantly exchanging information during reasoning processes, that speed matters.

Why this matters for continuous intelligence

Building infrastructure for systems that never stop thinking

The rise of agentic AI represents a fundamental shift in AI infrastructure. As AI systems become more autonomous and more integrated into daily digital experiences, the requirements evolve.

Increasingly, that infrastructure is being built on technology designed specifically for systems that never stop thinking.

Learn more about how Graviton chips make cloud computing faster, cheaper, and more energy efficient.

📋 讨论归档

讨论进行中…