返回列表
🧠 阿头学 · 💬 讨论题

数据中心/半导体瓶颈时间线——CPU/GPU瓶颈深度分析(一)

AI算力的底层权力结构正在重组——GPU不再独裁,CPU以"调度员"身份回归,ASIC在特定场景偷家,而真正的投资/战略机会藏在"瓶颈切换"的时间差里。

2026-02-15 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • GPU称霸是历史选择,不是终局 2020-2022年深度学习爆发,并行计算需求把GPU推上王座。但这是特定技术周期的产物,不是永恒格局。当模型架构、推理需求、成本压力变化时,王座会松动。
  • CPU的"教练+调度员"新角色比你想的重要 异构计算时代,CPU不再跑算力苦力活,而是管理多GPU、多ASIC的协同调度。这个角色看似边缘,实则是整个系统的"大脑前额叶"——谁控制调度,谁控制效率上限。
  • ASIC vs GPU = 专用锅 vs 全套厨具 Google TPU、Amazon Trainium这些ASIC在特定场景(大规模推理、特定模型架构)上性价比碾压GPU,但灵活性差。关键洞察:云厂商自研ASIC的动机不是技术理想主义,是摆脱NVIDIA定价权
  • 瓶颈公司的投资逻辑:跟踪周期,不追涨杀跌 作者提出一个反直觉观点:瓶颈公司(如NVIDIA)股价涨高时不该卖,因为瓶颈未解决意味着需求还在堆积。应该跟踪"瓶颈出现→加剧→解决"的完整周期来做决策。
  • 产业链的隐藏玩家值得关注 ASIC设计服务商(Broadcom、Marvell、Alchip、创意电子GUC)和韩国DSP公司是大厂自研芯片的"代笔人"。云厂商越想摆脱NVIDIA,这些公司越受益——这是二阶效应。

跟我们的关联

直接相关:我们的推理成本结构。 Neta DAU 10万+,AI社交产品的核心成本就是推理算力。GPU→ASIC的迁移趋势直接影响我们未来的成本曲线。如果云厂商(AWS Trainium、Google TPU)的ASIC推理价格持续下降,我们的模型部署策略需要跟着调整——不能锁死在NVIDIA GPU上。

2026海外战略的基础设施选择。 出海意味着选云、选区域、选算力。了解AWS/Google/Azure背后的芯片布局,能帮我们在海外部署时做更聪明的vendor选择。比如:Google Cloud在TPU上的价格优势,可能让我们在特定推理场景上成本降30%+。

"瓶颈思维"可以复用到我们自己的业务。 文章的核心框架——找到系统瓶颈、判断瓶颈周期——不只是投资方法论。Neta的增长瓶颈在哪?是算力成本、是内容生成质量、还是海外获客?用同样的"瓶颈时间线"思维来审视自己的业务,可能比读十篇增长黑客文章更有用。

讨论引子

  • 我们现在的推理部署是不是过度绑定了NVIDIA生态? 如果明年Google TPU或AWS Trainium的推理性价比再翻一倍,我们的模型能无痛迁移吗?还是说已经被CUDA生态锁死了?这个问题现在不想,等成本压力来了再想就晚了。
  • "瓶颈时间线"框架套到Neta自身:我们2026年最大的瓶颈到底是什么? 是海外获客成本?是多语言AI质量?是团队的海外运营能力?找到那个"最窄的管道",所有资源砸上去,比均匀撒胡椒面有效十倍。
  • 作为AI产品公司,我们要不要建立自己的"算力情报"能力? 不是说去炒芯片股,而是:半导体产业的瓶颈周期直接决定我们未来2-3年的成本结构和产品可能性边界。20人团队里,谁在跟踪这个?还是我们在"用算力但不懂算力"?

数据中心与半导体瓶颈时间线:CPU/GPU 瓶颈深度解析(1)

以下是内容的英文翻译,保持了原有结构、语气与强调。

数据中心/半导体瓶颈时间线总结

在伟大的 AI 时代,你必须知道这些!!

大家都得好好学习这个!!!!

现在,我们来逐一深入剖析每一个瓶颈。

首先,回顾一下我之前写过的简要说明。

  1. CPU → GPU 转型(算力瓶颈)

时间:2020~2022

为什么会成为瓶颈?深度学习需要同时进行成千上万次简单计算(矩阵乘法)。

CPU 擅长按顺序“一件件做”,但并行处理能力弱。

GPU 拥有成千上万的核心同时工作,使 AI 训练快 10~100 倍。

自 AlexNet(2012)以来,GPU 就成了必需品;随着 ChatGPT 级别模型的出现,这一转变彻底完成。

🔸 通俗解释

CPU:从首尔出发,然后依次去大田、光州、大邱、釜山、江原道,一个个走完。

GPU:从首尔同时派出多辆车,分别驶向所有目的地。

CPU:像一个天才坐在桌前对着计算器噼里啪啦地按。

GPU:像成千上万名朴素的劳工一起用手算。

计算器很擅长复杂数学,但在 AI 训练里,最重要的是把简单的事情同时、快速地做完。

🔸 结果/影响

深度学习研究者过去用 CPU 训练,但切换到 GPU(Hopper 之前的世代)后,原本要一天的任务缩短到几小时。

NVIDIA 依靠其 CUDA 软件生态把这条路线几乎做成了独占,从而拿下市场。

🔸 当前状态(2026)

已解决。现在 GPU 已成标准;大家都在 GPU 集群上训练。

从“单卡 GPU”导向 → 到现在:瓶颈转移到整个系统的编排与调度。

ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ

首先,我们搞清楚 CPU、GPU 和 ASIC 分别是什么,以及它们各自的角色。

AI 工作负载(尤其在数据中心)以异构计算为标准配置。

纯 CPU 或纯 GPU 的时代已经过去;CPU + GPU(或 ASIC)的组合才是趋势。

CPU 的角色:

编排调度与数据管理:把数据喂给 GPU(存储、分片、索引)、进行调度,以及电源管理。CPU 负责 AI 预训练/微调阶段的数据准备。

顺序处理与通用任务:处理非 AI 的服务器工作负载(数据库、网络等)。

内置 AI 加速:AMD EPYC 与 Intel Xeon 6 增加了 AI 功能(例如 AMX、内置加速器)→ 可承担一部分推理任务。

优势:稳定、兼容性强。劣势:并行 AI 计算。

GPU 的角色:

并行计算核心:几乎垄断 AI 训练(海量矩阵乘法)与高性能推理。NVIDIA Blackwell/Rubin、AMD MI300/MI400 是主要玩家。

AI 主引擎:在超大规模云厂商(Hyperscaler,如 Google、Meta 等)的集群中,GPU 承担 90%+ 的计算。

当前的组合方式:

机架级整合:NVIDIA Rubin(2026 年末):Vera CPU(36)+ Rubin GPU(72)+ NVLink 连接成一个“AI 超级计算机”。CPU 负责管理 GPU 之间的数据移动。

AMD Helios/MI400:EPYC Venice CPU + MI400 GPU,使用统一内存实现零拷贝数据传输 → 最大化异构效率。

Intel:Xeon 6 主控 CPU + 外接 GPU(多为 NVIDIA/AMD)。自家的 Gaudi 受限较多。

加上定制 ASIC:Google TPU、Amazon Trainium 在推理中替代一部分 GPU。Broadcom 正引领超大规模云厂商定制 ASIC 的设计。

🤦‍♂️🤦‍♂️🤦‍♂️ 怎么样?说实话你完全不知道我在讲什么,对吧?

😁😁😁 既然我们是投资者,我觉得也没必要把这些细枝末节都抠得那么细。

好,那我就用我自己的方式讲得简单点。

这里最重要的就是 CPU、GPU、ASIC。简单说:

CPU 变得更重要了:它在数据管理与电源调节方面的作用变大,有点“复兴”的味道。

GPU 仍然是王者:几乎垄断模型的“教学”(训练)。

回答问题(Inference):由 GPU + 定制引擎(ASIC)分担(为了省成本)。

再深入一点:

过去 CPU 的角色(球队经理的角色)

那种聪明、顺序地做事的类型:擅长一件一件地算(例如通用程序、Web 服务器、数据管理、电源调节、总体指挥)。

不擅长 AI 的并行计算(一次做成千上万件事)→ GPU 出现后,几乎把 AI 相关的活都丢光了。

在 AI 里的角色:就是给 GPU 喂数据、收拾残局。GPU 是主角;CPU 是配角。

通俗解释:经理只负责指挥;真正开车的是赛车手(GPU)。

如今 CPU 的角色

长出了 AI 肌肉:内置小型 AI 加速器(AMX 等)→ 能更快完成简单 AI 计算。

核心数量暴涨(100+)+ 更聪明的内存/互连。

专精于数据搬运/管理:连接多块 GPU/ASIC 时,堪称完美的“交通管制”。

类比:经理现在变成了“教练 + 体能师”。训练赛车手(GPU/ASIC),高效分配燃料(数据),并制定团队战术。

为什么 CPU 又突然走强?

AI 已经从单纯“用 GPU 疯狂计算”变成了复杂的团队协作。

过去:只有教 AI 模型(训练)最重要 → GPU 单人秀。

现在:真实服务使用(推理)爆发 + 混用多种芯片(GPU + ASIC)→ 数据搬运、电源管理、芯片协同变得极其重要。

CPU 正适合做这些。(快速搬数据、省电、协调各个引擎)。

例子:在 2026 的机架级系统里,如果没有 CPU,GPU/ASIC 根本没法“互相说话”,现场就会乱成一团。

另外,随着推理(回答问题)的规模远大于训练,CPU 的效率(低功耗/稳定性)优势就更突出了。

通俗解释

过去的队伍:一个明星赛车手(GPU)就能单挑夺冠 → 经理(CPU)没那么重要。

现在的队伍:多个赛车手(GPU + ASIC)+ 复杂赛道 → 没有经理(CPU),队伍就散架。经理“突然”成了英雄。

结论:CPU 一直都很关键,但随着 AI 从“一个明星”(GPU)变成“一个大团队”(GPU+ASIC),CPU 的作用扩大并更耀眼。

因此,未来 CPU 也可能再次成为瓶颈。

我的看法

大概率不会出现重大瓶颈。

CPU 厂商(AMD、Intel,甚至 NVIDIA 的 Vera CPU)都在疯狂开发:核心数暴涨、内存连接更智能、内置 AI 功能不断加强。

就算推理增长,CPU 需求也在被预测并提前准备。

总体瓶颈更可能出现在电力、内存(HBM)、以及网络。

但概率并非为 0%。如果 AI 的复杂度超出预期(例如大规模 Agent AI),CPU 可能会短暂成为瓶颈。应对得好的公司(比如 AMD)大概率会成为赢家。

🙋‍♂️ 等一下,你可能会问:那 ASIC 又是什么?GPU 会被取代吗?

使用 ASIC 的最大原因是省成本与省电。

AI 有两个阶段:“教模型(Training)”与“用教过的模型回答问题(Inference)”。

推理阶段每天都在重复同一类任务(例如 ChatGPT 回答用户问题)。这类任务可预测、量大,所以一款只把这件事做到极致的工具(ASIC)会更便宜、也更省电。

大公司(Google、Amazon、Meta 等超大规模云厂商)会做完全贴合自己 AI 工作负载的芯片,而不是买 GPU → 成本降低 30~50%,电费也省下来。

通俗解释

ASIC 就像“专门用来煮拉面的电锅”。它煮拉面又快又便宜(非常适合推理这种重复性工作)。所以公司会说:“我们天天吃拉面(做推理),那干脆买个专用锅吧!”而不是用 GPU。

为什么它不会完全取代 GPU?(为什么 GPU 还能活下去)

ASIC 缺乏灵活性(无法做各种任务)。

训练阶段需要频繁换模型、测试新想法(例如为了做 GPT-5 要试上千次)。这不可预测,所以你需要能做各种菜的厨房套装(GPU)。

ASIC 只擅长“做一道菜”,所以哪怕模型稍微改一点,你都得做一颗新芯片 → 时间与金钱成本巨大。

软件生态的优势:NVIDIA GPU 有 CUDA,这本“菜谱”做得太好了,90% 的开发者都在用。ASIC 每家公司都不一样,你得重写代码 → 麻烦又慢。

开发成本与周期:做一颗 ASIC 要几亿美元 + 2~3 年。GPU 你买来就能马上用。

通俗解释:ASIC 是“拉面锅”,煮拉面最强;但菜单一变(新模型来了),你就得再做一口新锅。GPU 是“燃气灶 + 锅 + 平底锅套装”,你立刻就能做任何菜。所以在高试验性、多样化的 AI 训练中,GPU 仍然是最佳选择。

🙋‍♂️ NVIDIA 也在不断开发新芯片(H100→B100→B200 等),那不也一样要花钱吗?

但它用规模经济 + 软件生态把这事儿扛过去了。

NVIDIA GPU:新芯片设计成本 ≈ 10~20 亿美元,周期 1.5~2 年。但凭借 90%+ 的市占率,它能卖出数以十亿计的芯片 → 把研发成本摊到每一颗芯片上。

Google TPU:设计成本类似,但只在内部使用(云服务),销量有限 → 回本更慢。

结论

过去 AI 模型演进太疯狂:Transformer → BERT → GPT 系列 → 多模态。所以你不得不频繁开发 ASIC 芯片,甚至有时开发到一半还得改方向,因此很难大规模使用。

现在 ~ 未来:随着模型开发速度趋于稳定 → TPU(ASIC)的盈利性会更高。

2026:基础结构(Transformer)几乎成了标准,所以“重大变化”更少。模型主要通过“变大(参数增加)”或“微调”来发展。

推理(真实服务)量爆发:每天有数亿人使用 ChatGPT/Gemini 之类的产品 → 大量重复性工作。

所以大公司(Google、Amazon、Meta)大力投资 TPU/Trainium 这类 ASIC:只要做得好,长期在成本/电力上能省下巨额开支(每年数十亿美元的利润)。

随着模型变化放缓,ASIC(TPU)会更强,而 2026 正是这个拐点——这就是为什么超大规模云厂商正在猛推 ASIC。

GPU 仍然坐拥“实验/学习”的王座,但 ASIC 正在吞食“真正赚钱的服务(推理)”。

最终结论

归根结底,我们作为投资者需要思考的是(如前所述):

CPU 是否能很好地跟上 GPU(因为 CPU 也可能出现瓶颈)。

预测 GPU + ASIC 这块总盘子会变得多大,以及其中 ASIC 的份额会增长到多高。(这能帮助你判断谁会领先:做 GPU 的公司 vs. 推 ASIC 的大科技公司)。

  1. GPU 相关公司

NVIDIA(NVDA):主导 AI 与数据中心导向的 GPU 市场,在 AI 训练 GPU 中占据 80% 以上份额。游戏与专业可视化很强,但在消费端正面临日益加剧的竞争。

AMD(AMD):数据中心与游戏高性能 GPU 的直接竞争对手。Instinct 系列在 AI 任务上持续扩张;最近数据中心 GPU 收入已超过 CPU 收入。在供给受限背景下具备较高成长潜力。

Intel(INTC):从传统集成显卡扩展到独立 GPU,依靠 Arc 与 Max 系列瞄准数据中心与 AI。近期通过招聘与投资强势推进,但市占率仍小于竞争对手。

ARM(ARM):提供 Mali、Immortalis 等 GPU IP 设计,主要面向移动与嵌入式设备。支持 AI 加速,但以授权为主而非制造;在智能手机与 IoT 中广泛采用。

  1. CPU 相关公司

Intel(INTC):x86 架构 CPU 市场的传统领头羊,通过 Core Ultra 系列(客户端)与 Xeon(服务器)瞄准 AI PC 与数据中心。2026 年在服务器 CPU 强劲需求支撑下维持约 60% 市占率,但面临来自 ARM 的竞争压力。

AMD(AMD):x86 CPU 专家,以 Ryzen(消费/客户端)与 EPYC(服务器)为核心产品线。到 2025 年底服务器市占率突破 40% 并快速增长。2026 年推出基于 2nm 的 Venice EPYC。

Apple(AAPL):设计并生产基于 ARM 的定制 Apple Silicon(M 系列)CPU,仅用于其生态(Mac/iPad/iPhone)。M5 系列强化 AI 性能;为 2026 开发 M6。

Qualcomm(QCOM):以基于 ARM 的 Snapdragon X 系列(Oryon 定制核心)瞄准 Windows on ARM PC 市场。2026 年以 X2 Plus 等强调多日续航与 AI 性能。

SoftBank(9984.T):通过持有 ARM Holdings 经营领先的 CPU IP 授权业务。不直接制造,但从 Arm 架构收取版税;受 AI/数据中心增长带动,预计 2026 年收入上升。

  1. 定制芯片(ASIC)相关公司

1)芯片设计大师(大科技公司/超大规模云厂商)

Alphabet(GOOGL):以 TPU(Tensor Processing Unit)领跑。已达第 7 代(Ironwood),单颗芯片配备高达 192GB 的 HBM,在 AI 训练/推理效率上对 NVIDIA 构成威胁。

Amazon(AMZN):开发用于训练的 Trainium 与用于推理的 Inferentia。目标是结合自研 Graviton CPU,为 AWS 客户提供比 NVIDIA 更便宜的 AI 计算环境。

Meta(META):设计 MTIA(Meta Training and Inference Accelerator)。主要做面向广告推荐算法(Facebook/Instagram)与大语言模型(Llama)推理优化的芯片。

Tesla(TSLA):打造 FSD(Full Self-Driving)芯片与用于 AI 训练超算的 Dojo 芯片。近期选择三星电子作为 Dojo D2 芯片的生产合作伙伴,加速芯片自研内化。

2)设计合作伙伴

Broadcom(AVGO):ASIC 市场无可争议的 #1。既是 Google TPU 的关键伙伴,也协助为 Meta 与 OpenAI 设计芯片。在芯片到芯片连接技术(SerDes)方面世界最强。

Marvell(MRVL):Broadcom 的强劲对手,为 Amazon 与 Microsoft 承接定制 AI 芯片设计。专长在网络与数据中心 ASIC。

Alchip(3661.TW):台湾 #1 设计公司、TSMC 的顶级伙伴。主导 Amazon 的 ASIC 设计,并充当北美大科技公司大单在台湾的入口。

GUC(3443.TW):TSMC 为主要股东的设计公司。通过率先使用 TSMC 最先进制程(3nm、2nm)为大客户设计高难度芯片。

3)韩国设计解决方案伙伴(DSP/VCA)

韩国公司作为桥梁,连接那些希望采用三星电子或 TSMC 工艺的无晶圆厂公司。

Gaonchips(399720):三星晶圆代工的关键合作伙伴(DSP)。近期凭借拿下三星 2nm 工艺 AI 芯片设计订单证明技术能力,是三星生态内最专精 AI ASIC 设计的公司。

ASICLAND(445090):韩国唯一的 TSMC VCA(Value Chain Aggregator)。为需要使用 TSMC 工艺的国内外无晶圆厂公司承接设计。近期因 AI Edge 芯片与存储控制器的大规模量产合同,盈利快速增长。

ADTechnology(200710):三星的大型 DSP 之一,利用与 ARM 过往合作经验,专注高性能服务器级 AI 芯片设计。近期基于 2nm 积极开拓海外项目。

CoAsia(045970):三星晶圆代工的全球合作伙伴,尤其擅长车载 ASIC 设计。在欧洲与美国市场扩大车载 AI 芯片订单。

我的看法

即便这些公司股价已经大涨,也不该因为经历瓶颈就把它们抛弃。

这类公司往往呈现周期性瓶颈结构,应纳入长期关注名单,持续跟踪与观察。

从实际股价走势看,瓶颈解决后股价往往不会暴跌,而是横盘整理。

因为在下一个瓶颈出现之前,虽然新的成长动能消失了,但既有需求仍会在一定程度上维持,所以营收不至于断崖式下滑。

从企业角度看:

盲目扩产可能会让利润再次增长,

但需求不确定,产能也可能已经接近极限,因此很难指望出现爆发式增长。

结果:营收不容易下降,但也很难期待高增长。

因此,一个有效的投资策略是:

在瓶颈解决后,因股价回调(下跌或横盘)时低位买入。

等待下一个瓶颈出现时的上涨(周期性交易)。

核心投资策略:把这些公司放进自选股,通过持续观察瓶颈出现/解决的周期来把握入场时点。

把这些拆成简单的话,比想象中更耗时间和精力……

既然开了头就得写完,但分配时间并不容易。ㅠ

总之,我计划按板块做总结,但我原本打算写的单家公司深度分析,可能得挪到订阅服务里了。

链接:http://x.com/i/article/2022729580700930048

相关笔记

Here is the English translation of the content, maintaining the original structure, tone, and emphasis.

以下是内容的英文翻译,保持了原有结构、语气与强调。

Data Center/Semiconductor Bottleneck Timeline Summary

数据中心/半导体瓶颈时间线总结

You must know this in the Great AI Era!!

在伟大的 AI 时代,你必须知道这些!!

You have to study this, everyone!!!!

大家都得好好学习这个!!!!

Now, let’s analyze each bottleneck in depth.

现在,我们来逐一深入剖析每一个瓶颈。

First, let’s look back at the brief explanation I wrote before.

首先,回顾一下我之前写过的简要说明。

  1. CPU → GPU Transition (Computation Bottleneck)
  1. CPU → GPU 转型(算力瓶颈)

Period: 2020~2022

时间:2020~2022

Why did it become a bottleneck? Deep learning requires performing thousands of simple calculations (matrix multiplication) simultaneously.

为什么会成为瓶颈?深度学习需要同时进行成千上万次简单计算(矩阵乘法)。

CPUs are good at doing things "one by one in order" but weak at parallel processing.

CPU 擅长按顺序“一件件做”,但并行处理能力弱。

GPUs have thousands of cores working simultaneously, making AI training 10~100 times faster.

GPU 拥有成千上万的核心同时工作,使 AI 训练快 10~100 倍。

Since AlexNet (2012), GPUs became essential, and with the advent of ChatGPT-class models, the transition became complete.

自 AlexNet(2012)以来,GPU 就成了必需品;随着 ChatGPT 级别模型的出现,这一转变彻底完成。

🔸 Easy Explanation

🔸 通俗解释

CPU: Starts in Seoul, then visits Daejeon, Gwangju, Daegu, Busan, and Gangwon-do one by one.

CPU:从首尔出发,然后依次去大田、光州、大邱、釜山、江原道,一个个走完。

GPU: Departs from Seoul with multiple cars simultaneously heading to all destinations.

GPU:从首尔同时派出多辆车,分别驶向所有目的地。

CPU: Like one genius tapping away at a calculator at a desk.

CPU:像一个天才坐在桌前对着计算器噼里啪啦地按。

GPU: Like thousands of simple laborers doing calculations by hand.

GPU:像成千上万名朴素的劳工一起用手算。

Calculators are great for complex math, but for AI training, doing simple things simultaneously and quickly is best.

计算器很擅长复杂数学,但在 AI 训练里,最重要的是把简单的事情同时、快速地做完。

🔸 Result/Impact

🔸 结果/影响

Deep learning researchers used to train on CPUs, but after switching to GPUs (Pre-Hopper generation), tasks that took a day were reduced to a few hours.

深度学习研究者过去用 CPU 训练,但切换到 GPU(Hopper 之前的世代)后,原本要一天的任务缩短到几小时。

NVIDIA captured the market by pushing this exclusively with their CUDA software.

NVIDIA 依靠其 CUDA 软件生态把这条路线几乎做成了独占,从而拿下市场。

🔸 Current Status (2026)

🔸 当前状态(2026)

Already solved. Now GPUs are the standard; everyone trains on GPU clusters.

已解决。现在 GPU 已成标准;大家都在 GPU 集群上训练。

Single GPU Focus → Current: Bottleneck moved to entire system orchestration.

从“单卡 GPU”导向 → 到现在:瓶颈转移到整个系统的编排与调度。

ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ

ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ

First, let’s find out what CPU, GPU, and ASIC are and what their respective roles are.

首先,我们搞清楚 CPU、GPU 和 ASIC 分别是什么,以及它们各自的角色。

AI workloads (especially in Data Centers) rely on heterogeneous computing as the standard.

AI 工作负载(尤其在数据中心)以异构计算为标准配置。

The era of using purely CPUs or GPUs is over; the CPU + GPU (or ASIC) combination is the trend.

纯 CPU 或纯 GPU 的时代已经过去;CPU + GPU(或 ASIC)的组合才是趋势。

Role of CPU:

CPU 的角色:

Orchestration & Data Management: Feeds data to the GPU (storage, sharding, indexing), scheduling, and power management. CPUs handle data preparation during AI pretraining/fine-tuning.

编排调度与数据管理:把数据喂给 GPU(存储、分片、索引)、进行调度,以及电源管理。CPU 负责 AI 预训练/微调阶段的数据准备。

Sequential Processing & General Tasks: Handles non-AI server workloads (databases, networking).

顺序处理与通用任务:处理非 AI 的服务器工作负载(数据库、网络等)。

Built-in AI Acceleration: AMD EPYC and Intel Xeon 6 have added AI features (e.g., AMX, built-in accelerators) → handling some inference tasks.

内置 AI 加速:AMD EPYC 与 Intel Xeon 6 增加了 AI 功能(例如 AMX、内置加速器)→ 可承担一部分推理任务。

Strength: Stability, compatibility. Weakness: Parallel AI computation.

优势:稳定、兼容性强。劣势:并行 AI 计算。

Role of GPU:

GPU 的角色:

Core of Parallel Computing: Monopolies on AI training (massive matrix multiplication) and high-performance inference. NVIDIA Blackwell/Rubin, AMD MI300/MI400 are the main players.

并行计算核心:几乎垄断 AI 训练(海量矩阵乘法)与高性能推理。NVIDIA Blackwell/Rubin、AMD MI300/MI400 是主要玩家。

Main AI Engine: GPUs handle 90%+ of the compute in Hyperscaler (Google, Meta, etc.) clusters.

AI 主引擎:在超大规模云厂商(Hyperscaler,如 Google、Meta 等)的集群中,GPU 承担 90%+ 的计算。

Current Combination Methods:

当前的组合方式:

Rack-scale Integration: NVIDIA Rubin (Late 2026): Vera CPU (36) + Rubin GPU (72) + NVLink connected like one "AI supercomputer." The CPU manages GPU data movement.

机架级整合:NVIDIA Rubin(2026 年末):Vera CPU(36)+ Rubin GPU(72)+ NVLink 连接成一个“AI 超级计算机”。CPU 负责管理 GPU 之间的数据移动。

AMD Helios/MI400: EPYC Venice CPU + MI400 GPU, using unified memory for zero-copy data transfer → maximizing heterogeneous efficiency.

AMD Helios/MI400:EPYC Venice CPU + MI400 GPU,使用统一内存实现零拷贝数据传输 → 最大化异构效率。

Intel: Xeon 6 host CPU + External GPU (mostly NVIDIA/AMD). Their own Gaudi is limited.

Intel:Xeon 6 主控 CPU + 外接 GPU(多为 NVIDIA/AMD)。自家的 Gaudi 受限较多。

Custom ASIC Addition: Google TPU, Amazon Trainium replacing some GPUs in inference. Broadcom is leading the design of hyperscaler custom ASICs.

加上定制 ASIC:Google TPU、Amazon Trainium 在推理中替代一部分 GPU。Broadcom 正引领超大规模云厂商定制 ASIC 的设计。

🤦‍♂️🤦‍♂️🤦‍♂️ How is it? You honestly have no idea what I'm talking about, right?

🤦‍♂️🤦‍♂️🤦‍♂️ 怎么样?说实话你完全不知道我在讲什么,对吧?

😁😁😁 Since we are investors, I don't think we need to know these gritty details.

😁😁😁 既然我们是投资者,我觉得也没必要把这些细枝末节都抠得那么细。

Okay, let me explain it easily in my own way.

好,那我就用我自己的方式讲得简单点。

The most important parts here are CPU, GPU, and ASIC. Simply put:

这里最重要的就是 CPU、GPU、ASIC。简单说:

CPU has become more important: Its role in data management and power regulation has grown, giving it a "resurgence" vibe.

CPU 变得更重要了:它在数据管理与电源调节方面的作用变大,有点“复兴”的味道。

GPU is still King: Nearly monopolizes model teaching (training).

GPU 仍然是王者:几乎垄断模型的“教学”(训练)。

Answering questions (Inference): Split between GPU + Custom Engines (ASIC) (to save costs).

回答问题(Inference):由 GPU + 定制引擎(ASIC)分担(为了省成本)。

Digging a little deeper:

再深入一点:

The Role of the CPU in the Past (Team Manager Role)

过去 CPU 的角色(球队经理的角色)

The type that works smartly and sequentially: Good at calculating one by one (e.g., general programs, web servers, data management, power regulation, overall command).

那种聪明、顺序地做事的类型:擅长一件一件地算(例如通用程序、Web 服务器、数据管理、电源调节、总体指挥)。

Weak at AI parallel calculation (doing thousands of things at once) → Lost almost all AI parts when GPUs appeared.

不擅长 AI 的并行计算(一次做成千上万件事)→ GPU 出现后,几乎把 AI 相关的活都丢光了。

Role in AI: Just giving data to the GPU and cleaning up. The GPU is the star; the CPU is the supporting actor.

在 AI 里的角色:就是给 GPU 喂数据、收拾残局。GPU 是主角;CPU 是配角。

Easy Explanation: The Manager just conducts; the Racer (GPU) does the actual driving.

通俗解释:经理只负责指挥;真正开车的是赛车手(GPU)。

The Role of the CPU Currently

如今 CPU 的角色

Gained AI Muscles: Built-in small AI accelerators (AMX, etc.) → Can do simple AI calculations quickly.

长出了 AI 肌肉:内置小型 AI 加速器(AMX 等)→ 能更快完成简单 AI 计算。

Massive increase in Core Count (100+) + Smarter Memory/Interconnects.

核心数量暴涨(100+)+ 更聪明的内存/互连。

Specialized in Data Movement/Management: Perfect "Traffic Control" when connecting multiple GPUs/ASICs.

专精于数据搬运/管理:连接多块 GPU/ASIC 时,堪称完美的“交通管制”。

Analogy: The Manager now acts as "Coach + Trainer." Trains the racers (GPU/ASIC), distributes fuel (data) efficiently, and plans the team strategy.

类比:经理现在变成了“教练 + 体能师”。训练赛车手(GPU/ASIC),高效分配燃料(数据),并制定团队战术。

Why is the CPU suddenly rising again?

为什么 CPU 又突然走强?

AI has changed from simply "Calculating like crazy with GPUs" to complex teamwork.

AI 已经从单纯“用 GPU 疯狂计算”变成了复杂的团队协作。

Past: Only teaching the AI model (training) was important → GPU solo run.

过去:只有教 AI 模型(训练)最重要 → GPU 单人秀。

Present: Explosion of actual service usage (inference) + Mixing multiple chips (GPU + ASIC) → Data movement, power management, and chip coordination have become incredibly important.

现在:真实服务使用(推理)爆发 + 混用多种芯片(GPU + ASIC)→ 数据搬运、电源管理、芯片协同变得极其重要。

The CPU is perfect for this. (Moving data fast, saving power, coordinating engines).

CPU 正适合做这些。(快速搬数据、省电、协调各个引擎)。

Example: In a 2026 rack-scale system, if there's no CPU, the GPUs/ASICs can't "talk to each other" and chaos ensues.

例子:在 2026 的机架级系统里,如果没有 CPU,GPU/ASIC 根本没法“互相说话”,现场就会乱成一团。

Also, as inference (answering questions) becomes much larger than training, CPU efficiency (low power/stability) shines.

另外,随着推理(回答问题)的规模远大于训练,CPU 的效率(低功耗/稳定性)优势就更突出了。

Easy Explanation

通俗解释

Past Team: One Star Racer (GPU) could win alone → Manager (CPU) was less important.

过去的队伍:一个明星赛车手(GPU)就能单挑夺冠 → 经理(CPU)没那么重要。

Current Team: Multiple Racers (GPU + ASIC) + Complex Track → Without the Manager (CPU), the team falls apart. The Manager has "suddenly" become a hero.

现在的队伍:多个赛车手(GPU + ASIC)+ 复杂赛道 → 没有经理(CPU),队伍就散架。经理“突然”成了英雄。

Conclusion: The CPU was always essential, but as AI grew from "One Star" (GPU) to a "Big Team" (GPU+ASIC), the CPU's role expanded and shone brighter.

结论:CPU 一直都很关键,但随着 AI 从“一个明星”(GPU)变成“一个大团队”(GPU+ASIC),CPU 的作用扩大并更耀眼。

Therefore, the CPU could become a bottleneck again in the future.

因此,未来 CPU 也可能再次成为瓶颈。

My Thoughts

我的看法

It is highly likely there won't be a major bottleneck.

大概率不会出现重大瓶颈。

CPU companies (AMD, Intel, even NVIDIA with Vera CPU) are developing frantically: Core counts exploding, memory connections getting smarter, built-in AI functions.

CPU 厂商(AMD、Intel,甚至 NVIDIA 的 Vera CPU)都在疯狂开发:核心数暴涨、内存连接更智能、内置 AI 功能不断加强。

Even if inference grows, CPU demand is forecasted and being prepared for.

就算推理增长,CPU 需求也在被预测并提前准备。

The overall bottlenecks are predicted to be Power, Memory (HBM), and Networking.

总体瓶颈更可能出现在电力、内存(HBM)、以及网络。

However, the probability is not 0%. If AI becomes more complex than expected (e.g., massive Agent AI), CPUs could briefly become a bottleneck. Companies that respond well, like AMD, will likely be the winners.

但概率并非为 0%。如果 AI 的复杂度超出预期(例如大规模 Agent AI),CPU 可能会短暂成为瓶颈。应对得好的公司(比如 AMD)大概率会成为赢家。

🙋‍♂️ Wait, you might have a question here: Then what is an ASIC? Is the GPU being replaced?

🙋‍♂️ 等一下,你可能会问:那 ASIC 又是什么?GPU 会被取代吗?

The biggest reason for using ASICs is Cost and Power Savings.

使用 ASIC 的最大原因是省成本与省电。

AI has two stages: "Teaching the model (Training)" and "Answering questions with the taught model (Inference)."

AI 有两个阶段:“教模型(Training)”与“用教过的模型回答问题(Inference)”。

Inference stage repeats the same task every day (e.g., ChatGPT answering user questions). This is predictable and high volume, so a tool that does only that specific task perfectly (ASIC) is cheaper and uses less electricity.

推理阶段每天都在重复同一类任务(例如 ChatGPT 回答用户问题)。这类任务可预测、量大,所以一款只把这件事做到极致的工具(ASIC)会更便宜、也更省电。

Big companies (Hyperscalers like Google, Amazon, Meta) make chips tailored exactly to their AI work instead of buying GPUs → Reducing costs by 30~50% and saving on electricity bills.

大公司(Google、Amazon、Meta 等超大规模云厂商)会做完全贴合自己 AI 工作负载的芯片,而不是买 GPU → 成本降低 30~50%,电费也省下来。

Easy Explanation

通俗解释

ASIC is a "Electric Pot dedicated to boiling Ramen." It boils ramen incredibly fast and cheap (perfect for repetitive tasks like inference). So companies say, "We eat ramen (do inference) a lot, so let's just buy a dedicated pot!" instead of a GPU.

ASIC 就像“专门用来煮拉面的电锅”。它煮拉面又快又便宜(非常适合推理这种重复性工作)。所以公司会说:“我们天天吃拉面(做推理),那干脆买个专用锅吧!”而不是用 GPU。

Why won't it replace GPUs completely? (Why GPUs still survive)

为什么它不会完全取代 GPU?(为什么 GPU 还能活下去)

ASICs lack flexibility (can't do various tasks).

ASIC 缺乏灵活性(无法做各种任务)。

The Training stage involves changing models frequently and testing new ideas (e.g., testing thousands of times to make GPT-5). This is unpredictable, so you need a Kitchen Set (GPU) that can cook various dishes.

训练阶段需要频繁换模型、测试新想法(例如为了做 GPT-5 要试上千次)。这不可预测,所以你需要能做各种菜的厨房套装(GPU)。

ASIC is specialized for "only one dish," so if you change the model slightly, you have to make a new chip → Costs huge amounts of time and money.

ASIC 只擅长“做一道菜”,所以哪怕模型稍微改一点,你都得做一颗新芯片 → 时间与金钱成本巨大。

Software Ecosystem Strength: NVIDIA GPUs have CUDA, a "Recipe Book" that is so well-made that 90% of developers use it. ASICs are different for every company, so you have to rewrite code → Annoying and slow.

软件生态的优势:NVIDIA GPU 有 CUDA,这本“菜谱”做得太好了,90% 的开发者都在用。ASIC 每家公司都不一样,你得重写代码 → 麻烦又慢。

Development Cost & Time: Making one ASIC takes hundreds of millions of dollars + 2~3 years. You can just buy a GPU and use it immediately.

开发成本与周期:做一颗 ASIC 要几亿美元 + 2~3 年。GPU 你买来就能马上用。

Easy Explanation: ASIC is a "Ramen Pot," so it's the best for ramen, but if a new menu (new model) comes out, you have to make a new pot. GPU is a "Gas Stove + Pot + Frying Pan Set," so you can cook anything immediately. That’s why GPU is still the best for high-experimentation, diverse AI training.

通俗解释:ASIC 是“拉面锅”,煮拉面最强;但菜单一变(新模型来了),你就得再做一口新锅。GPU 是“燃气灶 + 锅 + 平底锅套装”,你立刻就能做任何菜。所以在高试验性、多样化的 AI 训练中,GPU 仍然是最佳选择。

🙋‍♂️ NVIDIA is also developing new chips (H100→B100→B200, etc.), so doesn't it cost money just like ASICs?

🙋‍♂️ NVIDIA 也在不断开发新芯片(H100→B100→B200 等),那不也一样要花钱吗?

But they overcome this with Economies of Scale + Software Ecosystem.

但它用规模经济 + 软件生态把这事儿扛过去了。

NVIDIA GPU: New chip design cost ≈ $1B~$2B, 1.5~2 year cycle. But with 90%+ market share, they sell billions of chips → Distributing development costs per chip.

NVIDIA GPU:新芯片设计成本 ≈ 10~20 亿美元,周期 1.5~2 年。但凭借 90%+ 的市占率,它能卖出数以十亿计的芯片 → 把研发成本摊到每一颗芯片上。

Google TPU: Similar design cost, but used only internally (cloud service), so sales volume is limited → Slower cost recovery.

Google TPU:设计成本类似,但只在内部使用(云服务),销量有限 → 回本更慢。

Conclusion

结论

In the past, AI models evolved crazily: Transformer → BERT → GPT series → Multimodal. So, you had to develop ASIC chips frequently, or sometimes change them mid-development. So they couldn't be used much.

过去 AI 模型演进太疯狂:Transformer → BERT → GPT 系列 → 多模态。所以你不得不频繁开发 ASIC 芯片,甚至有时开发到一半还得改方向,因此很难大规模使用。

Present ~ Future: As model development speed stabilizes → TPU (ASIC) becomes much more profitable.

现在 ~ 未来:随着模型开发速度趋于稳定 → TPU(ASIC)的盈利性会更高。

2026: The basic structure (Transformer) has become almost standard, so "major changes" are fewer. Models develop mainly by "increasing size (parameters)" or "fine-tuning."

2026:基础结构(Transformer)几乎成了标准,所以“重大变化”更少。模型主要通过“变大(参数增加)”或“微调”来发展。

Inference (Actual Service) volume explodes: Hundreds of millions of people use things like ChatGPT/Gemini daily → Massive repetitive work.

推理(真实服务)量爆发:每天有数亿人使用 ChatGPT/Gemini 之类的产品 → 大量重复性工作。

So big companies (Google, Amazon, Meta) invest in ASICs like TPU/Trainium: Once made well, they save huge amounts on cost/electricity in the long run (Billions of dollars in profit annually).

所以大公司(Google、Amazon、Meta)大力投资 TPU/Trainium 这类 ASIC:只要做得好,长期在成本/电力上能省下巨额开支(每年数十亿美元的利润)。

As model changes slow down, ASICs (TPUs) get stronger, and 2026 is exactly that turning point – this is why hyperscalers are pushing ASICs hard.

随着模型变化放缓,ASIC(TPU)会更强,而 2026 正是这个拐点——这就是为什么超大规模云厂商正在猛推 ASIC。

GPUs still hold the throne for "Experimentation/Learning," but ASICs are eating up the "Actual Money-Making Services (Inference)."

GPU 仍然坐拥“实验/学习”的王座,但 ASIC 正在吞食“真正赚钱的服务(推理)”。

Final Conclusion

最终结论

Ultimately, what we as investors need to think about is, as seen before:

归根结底,我们作为投资者需要思考的是(如前所述):

Whether CPUs are keeping up well with GPUs (since a bottleneck might occur in CPUs).

CPU 是否能很好地跟上 GPU(因为 CPU 也可能出现瓶颈)。

Predicting how big the total pie of GPUs + ASICs will get, and within that, how much the share of ASICs will grow.(This will help you gauge who will race ahead: companies making GPUs vs. Big Tech pushing ASICs).

预测 GPU + ASIC 这块总盘子会变得多大,以及其中 ASIC 的份额会增长到多高。(这能帮助你判断谁会领先:做 GPU 的公司 vs. 推 ASIC 的大科技公司)。

  1. GPU Related Companies
  1. GPU 相关公司

NVIDIA (NVDA): Dominates the AI and data center-centric GPU market, holding over 80% share in AI training GPUs. Strong in gaming and professional visualization, but facing increasing competition in the consumer sector.

NVIDIA(NVDA):主导 AI 与数据中心导向的 GPU 市场,在 AI 训练 GPU 中占据 80% 以上份额。游戏与专业可视化很强,但在消费端正面临日益加剧的竞争。

AMD (AMD): Direct competitor in high-performance GPUs for data centers and gaming. Instinct series is expanding for AI tasks; recently data center GPU revenue surpassed CPU revenue. High growth potential amidst supply constraints.

AMD(AMD):数据中心与游戏高性能 GPU 的直接竞争对手。Instinct 系列在 AI 任务上持续扩张;最近数据中心 GPU 收入已超过 CPU 收入。在供给受限背景下具备较高成长潜力。

Intel (INTC): Expanding beyond traditional integrated graphics to discrete GPUs, targeting data centers and AI with Arc and Max series. Aggressively pushing with recent hiring and investment, but market share is smaller than competitors.

Intel(INTC):从传统集成显卡扩展到独立 GPU,依靠 Arc 与 Max 系列瞄准数据中心与 AI。近期通过招聘与投资强势推进,但市占率仍小于竞争对手。

ARM (ARM): Provides GPU IP designs like Mali and Immortalis, mainly for mobile and embedded devices. Supports AI acceleration but focuses on licensing rather than manufacturing; widely adopted in smartphones and IoT.

ARM(ARM):提供 Mali、Immortalis 等 GPU IP 设计,主要面向移动与嵌入式设备。支持 AI 加速,但以授权为主而非制造;在智能手机与 IoT 中广泛采用。

  1. CPU Related Companies
  1. CPU 相关公司

Intel (INTC): Traditional leader in the x86 architecture CPU market, targeting AI PCs and data centers with Core Ultra series (client) and Xeon (server). Maintains about 60% market share with strong server CPU demand in 2026, but under pressure from ARM competition.

Intel(INTC):x86 架构 CPU 市场的传统领头羊,通过 Core Ultra 系列(客户端)与 Xeon(服务器)瞄准 AI PC 与数据中心。2026 年在服务器 CPU 强劲需求支撑下维持约 60% 市占率,但面临来自 ARM 的竞争压力。

AMD (AMD): x86 CPU specialist with Ryzen (consumer/client) and EPYC (server) lineups as core. Breaking 40% server market share by end of 2025 and growing rapidly. Launching 2nm-based Venice EPYC in 2026.

AMD(AMD):x86 CPU 专家,以 Ryzen(消费/客户端)与 EPYC(服务器)为核心产品线。到 2025 年底服务器市占率突破 40% 并快速增长。2026 年推出基于 2nm 的 Venice EPYC。

Apple (AAPL): Designs and produces custom ARM-based Apple Silicon (M series) CPUs, exclusive to its ecosystem (Mac/iPad/iPhone). M5 series strengthens AI performance; developing M6 for 2026.

Apple(AAPL):设计并生产基于 ARM 的定制 Apple Silicon(M 系列)CPU,仅用于其生态(Mac/iPad/iPhone)。M5 系列强化 AI 性能;为 2026 开发 M6。

Qualcomm (QCOM): Targeting Windows on ARM PC market with ARM-based Snapdragon X series (Oryon custom cores). Emphasizing multi-day battery and AI performance with X2 Plus, etc., in 2026.

Qualcomm(QCOM):以基于 ARM 的 Snapdragon X 系列(Oryon 定制核心)瞄准 Windows on ARM PC 市场。2026 年以 X2 Plus 等强调多日续航与 AI 性能。

SoftBank (9984.T): Leading CPU IP licensing business through ARM Holdings ownership. Does not manufacture directly but earns royalties on Arm architecture; revenue expected to increase in 2026 due to AI/Data Center growth.

SoftBank(9984.T):通过持有 ARM Holdings 经营领先的 CPU IP 授权业务。不直接制造,但从 Arm 架构收取版税;受 AI/数据中心增长带动,预计 2026 年收入上升。

  1. Custom Chip (ASIC) Related Companies
  1. 定制芯片(ASIC)相关公司

1) Masters of Chip Design (Big Tech/Hyperscalers)

1)芯片设计大师(大科技公司/超大规模云厂商)

Alphabet (GOOGL): Leader with TPU (Tensor Processing Unit). Reached 7th generation (Ironwood), equipped with massive 192GB HBM per chip, threatening NVIDIA in AI training/inference efficiency.

Alphabet(GOOGL):以 TPU(Tensor Processing Unit)领跑。已达第 7 代(Ironwood),单颗芯片配备高达 192GB 的 HBM,在 AI 训练/推理效率上对 NVIDIA 构成威胁。

Amazon (AMZN): Developing Trainium for training and Inferentia for inference. Goal is to provide cheaper AI computing environments than NVIDIA to AWS customers by combining with their own Graviton CPU.

Amazon(AMZN):开发用于训练的 Trainium 与用于推理的 Inferentia。目标是结合自研 Graviton CPU,为 AWS 客户提供比 NVIDIA 更便宜的 AI 计算环境。

Meta (META): Designing MTIA (Meta Training and Inference Accelerator). Making chips optimized primarily for ad recommendation algorithms (Facebook/Instagram) and Large Language Model (Llama) inference.

Meta(META):设计 MTIA(Meta Training and Inference Accelerator)。主要做面向广告推荐算法(Facebook/Instagram)与大语言模型(Llama)推理优化的芯片。

Tesla (TSLA): Making FSD (Full Self-Driving) chips and Dojo chips for AI training supercomputers. Recently selected Samsung Electronics as a production partner for the Dojo D2 chip, accelerating chip internalization.

Tesla(TSLA):打造 FSD(Full Self-Driving)芯片与用于 AI 训练超算的 Dojo 芯片。近期选择三星电子作为 Dojo D2 芯片的生产合作伙伴,加速芯片自研内化。

2) Design Partners

2)设计合作伙伴

Broadcom (AVGO): Unrivaled #1 in the ASIC market. Key partner for Google TPU and also helps design chips for Meta and OpenAI. World's strongest in chip-to-chip connection technology (SerDes).

Broadcom(AVGO):ASIC 市场无可争议的 #1。既是 Google TPU 的关键伙伴,也协助为 Meta 与 OpenAI 设计芯片。在芯片到芯片连接技术(SerDes)方面世界最强。

Marvell (MRVL): Broadcom's powerful rival, handling custom AI chip designs for Amazon and Microsoft. Specialized in networking and data center ASICs.

Marvell(MRVL):Broadcom 的强劲对手,为 Amazon 与 Microsoft 承接定制 AI 芯片设计。专长在网络与数据中心 ASIC。

Alchip (3661.TW): Taiwan's #1 design house and TSMC's top partner. Leads Amazon's ASIC design and acts as the Taiwanese gateway for North American Big Tech volume.

Alchip(3661.TW):台湾 #1 设计公司、TSMC 的顶级伙伴。主导 Amazon 的 ASIC 设计,并充当北美大科技公司大单在台湾的入口。

GUC (3443.TW): Design house where TSMC is a major shareholder. Designs high-difficulty chips for large clients by applying TSMC's cutting-edge processes (3nm, 2nm) first.

GUC(3443.TW):TSMC 为主要股东的设计公司。通过率先使用 TSMC 最先进制程(3nm、2nm)为大客户设计高难度芯片。

3) Korean Design Solution Partners (DSP/VCA)

3)韩国设计解决方案伙伴(DSP/VCA)

Korean companies act as a bridge for fabless companies wanting to use Samsung Electronics or TSMC processes.

韩国公司作为桥梁,连接那些希望采用三星电子或 TSMC 工艺的无晶圆厂公司。

Gaonchips (399720): Key partner (DSP) of Samsung Foundry. Recently proved technical capability by winning orders for Samsung's 2nm process-based AI chip design. The most specialized company for AI ASIC design within the Samsung ecosystem.

Gaonchips(399720):三星晶圆代工的关键合作伙伴(DSP)。近期凭借拿下三星 2nm 工艺 AI 芯片设计订单证明技术能力,是三星生态内最专精 AI ASIC 设计的公司。

ASICLAND (445090): The only TSMC VCA (Value Chain Aggregator) in Korea. Handles designs for domestic and foreign fabless companies that need to use TSMC processes. Recent rapid earnings growth with large-scale mass production contracts in AI Edge chips and storage controllers.

ASICLAND(445090):韩国唯一的 TSMC VCA(Value Chain Aggregator)。为需要使用 TSMC 工艺的国内外无晶圆厂公司承接设计。近期因 AI Edge 芯片与存储控制器的大规模量产合同,盈利快速增长。

ADTechnology (200710): One of Samsung's large DSPs, focusing on high-performance server-grade AI chip design utilizing past cooperation experience with ARM. Actively targeting overseas projects based on 2nm recently.

ADTechnology(200710):三星的大型 DSP 之一,利用与 ARM 过往合作经验,专注高性能服务器级 AI 芯片设计。近期基于 2nm 积极开拓海外项目。

CoAsia (045970): Global partner of Samsung Foundry, with particular strength in Automotive ASIC design. Expanding orders for automotive AI chips in European and US markets.

CoAsia(045970):三星晶圆代工的全球合作伙伴,尤其擅长车载 ASIC 设计。在欧洲与美国市场扩大车载 AI 芯片订单。

My Thoughts

我的看法

Companies experiencing bottlenecks should not be abandoned even if their stock prices have risen significantly.

即便这些公司股价已经大涨,也不该因为经历瓶颈就把它们抛弃。

These companies have a structure where bottlenecks occur cyclically, so they should be put on a long-term watchlist and continuously tracked/observed.

这类公司往往呈现周期性瓶颈结构,应纳入长期关注名单,持续跟踪与观察。

Looking at actual stock price trends, prices often do not crash after the bottleneck is resolved but tend to move sideways.

从实际股价走势看,瓶颈解决后股价往往不会暴跌,而是横盘整理。

This is because, until the next bottleneck appears, although new growth momentum has vanished, existing demand is maintained to some extent, so revenue does not plummet.

因为在下一个瓶颈出现之前,虽然新的成长动能消失了,但既有需求仍会在一定程度上维持,所以营收不至于断崖式下滑。

From a corporate perspective:

从企业角度看:

Increasing supply blindly might increase profits again,

盲目扩产可能会让利润再次增长,

But demand is uncertain, and production capacity might have already reached its limit, so it's hard to expect massive growth.

但需求不确定,产能也可能已经接近极限,因此很难指望出现爆发式增长。

Result: Revenue doesn't decrease easily, but it's also hard to expect high growth.

结果:营收不容易下降,但也很难期待高增长。

Therefore, an effective investment strategy is:

因此,一个有效的投资策略是:

Buy low when stock prices adjust (drop or move sideways) due to bottleneck resolution.

在瓶颈解决后,因股价回调(下跌或横盘)时低位买入。

Aim for the rise when the next bottleneck appears (Cyclical Trading).

等待下一个瓶颈出现时的上涨(周期性交易)。

Key Investment Strategy: Keep these companies on your watchlist and time your entry by continuously observing the bottleneck occurrence/resolution cycle.

核心投资策略:把这些公司放进自选股,通过持续观察瓶颈出现/解决的周期来把握入场时点。

Breaking this down into simple terms takes much more time and effort than expected...

把这些拆成简单的话,比想象中更耗时间和精力……

Since I started it, I need to finish it, but allocating time is not easy. ㅠ

既然开了头就得写完,但分配时间并不容易。ㅠ

Anyway, I plan to summarize by sector, but the detailed analysis of individual companies I originally planned will likely have to move to the subscription service.

总之,我计划按板块做总结,但我原本打算写的单家公司深度分析,可能得挪到订阅服务里了。

Link: http://x.com/i/article/2022729580700930048

链接:http://x.com/i/article/2022729580700930048

相关笔记

Data Center & Semiconductor Bottleneck Timeline: In-depth Analysis of CPU/GPU Bottlenecks (1)

  • Source: https://x.com/tesla_teslaway/status/2022730281464271271?s=46
  • Mirror: https://x.com/tesla_teslaway/status/2022730281464271271?s=46
  • Published: 2026-02-14T17:50:54+00:00
  • Saved: 2026-02-15

Content

Here is the English translation of the content, maintaining the original structure, tone, and emphasis.

Data Center/Semiconductor Bottleneck Timeline Summary

You must know this in the Great AI Era!!

You have to study this, everyone!!!!

Now, let’s analyze each bottleneck in depth.

First, let’s look back at the brief explanation I wrote before.

  1. CPU → GPU Transition (Computation Bottleneck)

Period: 2020~2022

Why did it become a bottleneck? Deep learning requires performing thousands of simple calculations (matrix multiplication) simultaneously.

CPUs are good at doing things "one by one in order" but weak at parallel processing.

GPUs have thousands of cores working simultaneously, making AI training 10~100 times faster.

Since AlexNet (2012), GPUs became essential, and with the advent of ChatGPT-class models, the transition became complete.

🔸 Easy Explanation

CPU: Starts in Seoul, then visits Daejeon, Gwangju, Daegu, Busan, and Gangwon-do one by one.

GPU: Departs from Seoul with multiple cars simultaneously heading to all destinations.

CPU: Like one genius tapping away at a calculator at a desk.

GPU: Like thousands of simple laborers doing calculations by hand.

Calculators are great for complex math, but for AI training, doing simple things simultaneously and quickly is best.

🔸 Result/Impact

Deep learning researchers used to train on CPUs, but after switching to GPUs (Pre-Hopper generation), tasks that took a day were reduced to a few hours.

NVIDIA captured the market by pushing this exclusively with their CUDA software.

🔸 Current Status (2026)

Already solved. Now GPUs are the standard; everyone trains on GPU clusters.

Single GPU Focus → Current: Bottleneck moved to entire system orchestration.

ㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡ

First, let’s find out what CPU, GPU, and ASIC are and what their respective roles are.

AI workloads (especially in Data Centers) rely on heterogeneous computing as the standard.

The era of using purely CPUs or GPUs is over; the CPU + GPU (or ASIC) combination is the trend.

Role of CPU:

Orchestration & Data Management: Feeds data to the GPU (storage, sharding, indexing), scheduling, and power management. CPUs handle data preparation during AI pretraining/fine-tuning.

Sequential Processing & General Tasks: Handles non-AI server workloads (databases, networking).

Built-in AI Acceleration: AMD EPYC and Intel Xeon 6 have added AI features (e.g., AMX, built-in accelerators) → handling some inference tasks.

Strength: Stability, compatibility. Weakness: Parallel AI computation.

Role of GPU:

Core of Parallel Computing: Monopolies on AI training (massive matrix multiplication) and high-performance inference. NVIDIA Blackwell/Rubin, AMD MI300/MI400 are the main players.

Main AI Engine: GPUs handle 90%+ of the compute in Hyperscaler (Google, Meta, etc.) clusters.

Current Combination Methods:

Rack-scale Integration: NVIDIA Rubin (Late 2026): Vera CPU (36) + Rubin GPU (72) + NVLink connected like one "AI supercomputer." The CPU manages GPU data movement.

AMD Helios/MI400: EPYC Venice CPU + MI400 GPU, using unified memory for zero-copy data transfer → maximizing heterogeneous efficiency.

Intel: Xeon 6 host CPU + External GPU (mostly NVIDIA/AMD). Their own Gaudi is limited.

Custom ASIC Addition: Google TPU, Amazon Trainium replacing some GPUs in inference. Broadcom is leading the design of hyperscaler custom ASICs.

🤦‍♂️🤦‍♂️🤦‍♂️ How is it? You honestly have no idea what I'm talking about, right?

😁😁😁 Since we are investors, I don't think we need to know these gritty details.

Okay, let me explain it easily in my own way.

The most important parts here are CPU, GPU, and ASIC. Simply put:

CPU has become more important: Its role in data management and power regulation has grown, giving it a "resurgence" vibe.

GPU is still King: Nearly monopolizes model teaching (training).

Answering questions (Inference): Split between GPU + Custom Engines (ASIC) (to save costs).

Digging a little deeper:

The Role of the CPU in the Past (Team Manager Role)

The type that works smartly and sequentially: Good at calculating one by one (e.g., general programs, web servers, data management, power regulation, overall command).

Weak at AI parallel calculation (doing thousands of things at once) → Lost almost all AI parts when GPUs appeared.

Role in AI: Just giving data to the GPU and cleaning up. The GPU is the star; the CPU is the supporting actor.

Easy Explanation: The Manager just conducts; the Racer (GPU) does the actual driving.

The Role of the CPU Currently

Gained AI Muscles: Built-in small AI accelerators (AMX, etc.) → Can do simple AI calculations quickly.

Massive increase in Core Count (100+) + Smarter Memory/Interconnects.

Specialized in Data Movement/Management: Perfect "Traffic Control" when connecting multiple GPUs/ASICs.

Analogy: The Manager now acts as "Coach + Trainer." Trains the racers (GPU/ASIC), distributes fuel (data) efficiently, and plans the team strategy.

Why is the CPU suddenly rising again?

AI has changed from simply "Calculating like crazy with GPUs" to complex teamwork.

Past: Only teaching the AI model (training) was important → GPU solo run.

Present: Explosion of actual service usage (inference) + Mixing multiple chips (GPU + ASIC) → Data movement, power management, and chip coordination have become incredibly important.

The CPU is perfect for this. (Moving data fast, saving power, coordinating engines).

Example: In a 2026 rack-scale system, if there's no CPU, the GPUs/ASICs can't "talk to each other" and chaos ensues.

Also, as inference (answering questions) becomes much larger than training, CPU efficiency (low power/stability) shines.

Easy Explanation

Past Team: One Star Racer (GPU) could win alone → Manager (CPU) was less important.

Current Team: Multiple Racers (GPU + ASIC) + Complex Track → Without the Manager (CPU), the team falls apart. The Manager has "suddenly" become a hero.

Conclusion: The CPU was always essential, but as AI grew from "One Star" (GPU) to a "Big Team" (GPU+ASIC), the CPU's role expanded and shone brighter.

Therefore, the CPU could become a bottleneck again in the future.

My Thoughts

It is highly likely there won't be a major bottleneck.

CPU companies (AMD, Intel, even NVIDIA with Vera CPU) are developing frantically: Core counts exploding, memory connections getting smarter, built-in AI functions.

Even if inference grows, CPU demand is forecasted and being prepared for.

The overall bottlenecks are predicted to be Power, Memory (HBM), and Networking.

However, the probability is not 0%. If AI becomes more complex than expected (e.g., massive Agent AI), CPUs could briefly become a bottleneck. Companies that respond well, like AMD, will likely be the winners.

🙋‍♂️ Wait, you might have a question here: Then what is an ASIC? Is the GPU being replaced?

The biggest reason for using ASICs is Cost and Power Savings.

AI has two stages: "Teaching the model (Training)" and "Answering questions with the taught model (Inference)."

Inference stage repeats the same task every day (e.g., ChatGPT answering user questions). This is predictable and high volume, so a tool that does only that specific task perfectly (ASIC) is cheaper and uses less electricity.

Big companies (Hyperscalers like Google, Amazon, Meta) make chips tailored exactly to their AI work instead of buying GPUs → Reducing costs by 30~50% and saving on electricity bills.

Easy Explanation

ASIC is a "Electric Pot dedicated to boiling Ramen." It boils ramen incredibly fast and cheap (perfect for repetitive tasks like inference). So companies say, "We eat ramen (do inference) a lot, so let's just buy a dedicated pot!" instead of a GPU.

Why won't it replace GPUs completely? (Why GPUs still survive)

ASICs lack flexibility (can't do various tasks).

The Training stage involves changing models frequently and testing new ideas (e.g., testing thousands of times to make GPT-5). This is unpredictable, so you need a Kitchen Set (GPU) that can cook various dishes.

ASIC is specialized for "only one dish," so if you change the model slightly, you have to make a new chip → Costs huge amounts of time and money.

Software Ecosystem Strength: NVIDIA GPUs have CUDA, a "Recipe Book" that is so well-made that 90% of developers use it. ASICs are different for every company, so you have to rewrite code → Annoying and slow.

Development Cost & Time: Making one ASIC takes hundreds of millions of dollars + 2~3 years. You can just buy a GPU and use it immediately.

Easy Explanation: ASIC is a "Ramen Pot," so it's the best for ramen, but if a new menu (new model) comes out, you have to make a new pot. GPU is a "Gas Stove + Pot + Frying Pan Set," so you can cook anything immediately. That’s why GPU is still the best for high-experimentation, diverse AI training.

🙋‍♂️ NVIDIA is also developing new chips (H100→B100→B200, etc.), so doesn't it cost money just like ASICs?

But they overcome this with Economies of Scale + Software Ecosystem.

NVIDIA GPU: New chip design cost ≈ $1B~$2B, 1.5~2 year cycle. But with 90%+ market share, they sell billions of chips → Distributing development costs per chip.

Google TPU: Similar design cost, but used only internally (cloud service), so sales volume is limited → Slower cost recovery.

Conclusion

In the past, AI models evolved crazily: Transformer → BERT → GPT series → Multimodal. So, you had to develop ASIC chips frequently, or sometimes change them mid-development. So they couldn't be used much.

Present ~ Future: As model development speed stabilizes → TPU (ASIC) becomes much more profitable.

2026: The basic structure (Transformer) has become almost standard, so "major changes" are fewer. Models develop mainly by "increasing size (parameters)" or "fine-tuning."

Inference (Actual Service) volume explodes: Hundreds of millions of people use things like ChatGPT/Gemini daily → Massive repetitive work.

So big companies (Google, Amazon, Meta) invest in ASICs like TPU/Trainium: Once made well, they save huge amounts on cost/electricity in the long run (Billions of dollars in profit annually).

As model changes slow down, ASICs (TPUs) get stronger, and 2026 is exactly that turning point – this is why hyperscalers are pushing ASICs hard.

GPUs still hold the throne for "Experimentation/Learning," but ASICs are eating up the "Actual Money-Making Services (Inference)."

Final Conclusion

Ultimately, what we as investors need to think about is, as seen before:

Whether CPUs are keeping up well with GPUs (since a bottleneck might occur in CPUs).

Predicting how big the total pie of GPUs + ASICs will get, and within that, how much the share of ASICs will grow.(This will help you gauge who will race ahead: companies making GPUs vs. Big Tech pushing ASICs).

  1. GPU Related Companies

NVIDIA (NVDA): Dominates the AI and data center-centric GPU market, holding over 80% share in AI training GPUs. Strong in gaming and professional visualization, but facing increasing competition in the consumer sector.

AMD (AMD): Direct competitor in high-performance GPUs for data centers and gaming. Instinct series is expanding for AI tasks; recently data center GPU revenue surpassed CPU revenue. High growth potential amidst supply constraints.

Intel (INTC): Expanding beyond traditional integrated graphics to discrete GPUs, targeting data centers and AI with Arc and Max series. Aggressively pushing with recent hiring and investment, but market share is smaller than competitors.

ARM (ARM): Provides GPU IP designs like Mali and Immortalis, mainly for mobile and embedded devices. Supports AI acceleration but focuses on licensing rather than manufacturing; widely adopted in smartphones and IoT.

  1. CPU Related Companies

Intel (INTC): Traditional leader in the x86 architecture CPU market, targeting AI PCs and data centers with Core Ultra series (client) and Xeon (server). Maintains about 60% market share with strong server CPU demand in 2026, but under pressure from ARM competition.

AMD (AMD): x86 CPU specialist with Ryzen (consumer/client) and EPYC (server) lineups as core. Breaking 40% server market share by end of 2025 and growing rapidly. Launching 2nm-based Venice EPYC in 2026.

Apple (AAPL): Designs and produces custom ARM-based Apple Silicon (M series) CPUs, exclusive to its ecosystem (Mac/iPad/iPhone). M5 series strengthens AI performance; developing M6 for 2026.

Qualcomm (QCOM): Targeting Windows on ARM PC market with ARM-based Snapdragon X series (Oryon custom cores). Emphasizing multi-day battery and AI performance with X2 Plus, etc., in 2026.

SoftBank (9984.T): Leading CPU IP licensing business through ARM Holdings ownership. Does not manufacture directly but earns royalties on Arm architecture; revenue expected to increase in 2026 due to AI/Data Center growth.

  1. Custom Chip (ASIC) Related Companies

1) Masters of Chip Design (Big Tech/Hyperscalers)

Alphabet (GOOGL): Leader with TPU (Tensor Processing Unit). Reached 7th generation (Ironwood), equipped with massive 192GB HBM per chip, threatening NVIDIA in AI training/inference efficiency.

Amazon (AMZN): Developing Trainium for training and Inferentia for inference. Goal is to provide cheaper AI computing environments than NVIDIA to AWS customers by combining with their own Graviton CPU.

Meta (META): Designing MTIA (Meta Training and Inference Accelerator). Making chips optimized primarily for ad recommendation algorithms (Facebook/Instagram) and Large Language Model (Llama) inference.

Tesla (TSLA): Making FSD (Full Self-Driving) chips and Dojo chips for AI training supercomputers. Recently selected Samsung Electronics as a production partner for the Dojo D2 chip, accelerating chip internalization.

2) Design Partners

Broadcom (AVGO): Unrivaled #1 in the ASIC market. Key partner for Google TPU and also helps design chips for Meta and OpenAI. World's strongest in chip-to-chip connection technology (SerDes).

Marvell (MRVL): Broadcom's powerful rival, handling custom AI chip designs for Amazon and Microsoft. Specialized in networking and data center ASICs.

Alchip (3661.TW): Taiwan's #1 design house and TSMC's top partner. Leads Amazon's ASIC design and acts as the Taiwanese gateway for North American Big Tech volume.

GUC (3443.TW): Design house where TSMC is a major shareholder. Designs high-difficulty chips for large clients by applying TSMC's cutting-edge processes (3nm, 2nm) first.

3) Korean Design Solution Partners (DSP/VCA)

Korean companies act as a bridge for fabless companies wanting to use Samsung Electronics or TSMC processes.

Gaonchips (399720): Key partner (DSP) of Samsung Foundry. Recently proved technical capability by winning orders for Samsung's 2nm process-based AI chip design. The most specialized company for AI ASIC design within the Samsung ecosystem.

ASICLAND (445090): The only TSMC VCA (Value Chain Aggregator) in Korea. Handles designs for domestic and foreign fabless companies that need to use TSMC processes. Recent rapid earnings growth with large-scale mass production contracts in AI Edge chips and storage controllers.

ADTechnology (200710): One of Samsung's large DSPs, focusing on high-performance server-grade AI chip design utilizing past cooperation experience with ARM. Actively targeting overseas projects based on 2nm recently.

CoAsia (045970): Global partner of Samsung Foundry, with particular strength in Automotive ASIC design. Expanding orders for automotive AI chips in European and US markets.

My Thoughts

Companies experiencing bottlenecks should not be abandoned even if their stock prices have risen significantly.

These companies have a structure where bottlenecks occur cyclically, so they should be put on a long-term watchlist and continuously tracked/observed.

Looking at actual stock price trends, prices often do not crash after the bottleneck is resolved but tend to move sideways.

This is because, until the next bottleneck appears, although new growth momentum has vanished, existing demand is maintained to some extent, so revenue does not plummet.

From a corporate perspective:

Increasing supply blindly might increase profits again,

But demand is uncertain, and production capacity might have already reached its limit, so it's hard to expect massive growth.

Result: Revenue doesn't decrease easily, but it's also hard to expect high growth.

Therefore, an effective investment strategy is:

Buy low when stock prices adjust (drop or move sideways) due to bottleneck resolution.

Aim for the rise when the next bottleneck appears (Cyclical Trading).

Key Investment Strategy: Keep these companies on your watchlist and time your entry by continuously observing the bottleneck occurrence/resolution cycle.

Breaking this down into simple terms takes much more time and effort than expected...

Since I started it, I need to finish it, but allocating time is not easy. ㅠ

Anyway, I plan to summarize by sector, but the detailed analysis of individual companies I originally planned will likely have to move to the subscription service.

Link: http://x.com/i/article/2022729580700930048

📋 讨论归档

讨论进行中…