返回列表
🧠 阿头学 · 💰投资 · 💬 讨论题

别再叫“标注”了——人类判断将成为万亿级生产要素

自动化不会让人类“没活干”,它会让人类的**判断(judgment)**变成可被结构化、可被购买的经济输入——于是“human data”不是瓶颈,而是长期复利的万亿市场。

2026-01-13
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • 自动化的终局不是失业潮,而是“判断溢价” 越多重复劳动被拿走,人类越集中在决策、例外处理、权衡这些高杠杆环节;这些环节一旦被捕捉成结构化数据,就变成训练信号与资产。
  • 前沿智能离不开人类数据(human data) 你可以用 self-play/synthetic data 扩量,但“什么是好”仍需人类定义:demonstrations、preference learning、rubrics、evaluations、continual corrections——本质是把专业判断外包成可计价的管线。
  • 真正稀缺的是“可复用的结构化过程” 原始活动不是数据;数据必须被 recorded/structured/evaluated/packaged。谁能把工作现场变成可复用的数据工厂,谁就在卖铲子。
  • 组织会出现新激励:做事=训练=赚钱 当一小时工作既能跑业务、又能训练模型、还能额外变现,人类劳动就从成本项变成资产项——但这也会把隐私、劳动关系、和“被商品化”的不适推到台前。
  • “annotation”这个词会误导行业 这不是机械标注,而是“expert human data creation / structured human judgment”。命名改变定价,也改变谁愿意进场。

跟我们的关联

🧠Neta:你们最值钱的不是模型本身,而是能不能把产品里用户/运营/审核/推荐的“判断点”变成可持续的数据回路(loop):清晰标准→可记录→可复盘→可用于训练/评估。

👤ATou:作为 Context Engineer,可以把“做决策的过程”产品化:把 rubric、例外、tradeoff 显式化,既提升团队协作,又沉淀可复用资产。

讨论引子

  • 你愿意让自己的“专业判断”被结构化并出售吗?底线是什么?
  • human data 市场化后,谁最可能成为“新型劳务平台”?谁会被边缘化?
  • 对 AI 产品来说,你更该优先投:更强模型,还是更强的 evaluation / data pipeline?

人类数据将成为每年 1 万亿美元的市场

这不是一个短期预测。这是一个结构性判断:经济最终会向这个方向收敛。

要相信这一点,你需要接受两个前提:

  • 数字智能与物理智能,最终可以把经济中那些乏味、繁琐的部分自动化

  • 在最前沿,没有人类数据,自学习智能是不可能的

自动化是人类能做的最有用、最解放的事情

如果 AI 系统能够自动化“功能”(functions),那么把一切功能都自动化,就是人类最具杠杆的任务。

自动化会压缩时间。它让以下事情成为可能:

  • 让愿望的实现速度提高几个数量级

  • 让人类把注意力集中在工作中那些更愉悦、需要判断的部分,而让机器人与智能体处理其余部分

当人类获得更多时间,他们就会创造更多。全新的工作一开始往往是创造性强、价值高的。随着时间推移,它会变得可理解、可重复,并准备好被自动化。一旦被自动化,它会持续交付价值,同时释放人类去做新的创造性工作。这个循环是永久的。

自动化不会消灭人类工作。它会把人类推向价值更高、更具创造性的工作。

在社会层面,自动化会重塑世界的经济学。当 AI 系统承担更多生产与协调时,生产商品与服务的成本会崩塌式下降,而供给与可得性会爆炸式上升。

与此同时,分发也会变得越来越最优。数字与物理世界的智能系统会以更少的摩擦、更少的浪费、更少的延迟来协调供需,使得获取变得更快、更便宜、更可靠——而且每一年都更上一层楼。

AI 模型将永远向人类学习

每一个人工智能系统都在某种形式上向人类学习:

  • 示范(demonstrations)

  • 监督式微调(supervised fine-tuning)

  • 偏好学习(preference learning)

  • 复杂的评分细则与评估(complex rubrics and evaluations)

  • 持续纠错(continual corrections)

即便是自我博弈与合成数据,也依赖人类“落地”——人类定义目标、奖励,以及“好”究竟长什么样。

因此:

  • 经济中的每一个功能都包含有用的学习信号

  • 每一个决策、异常、失败与取舍都会生成数据

但仅有原始活动还不够。这些数据必须被:

  • 记录

  • 结构化

  • 评估

  • 打包为可用的数据管线

更重要的是:在被自动化的过程中,这些功能必须继续运行。自动化是迭代的,而不是瞬时完成的。

这会带来一种普遍的义务与机会

为了迭代式自动化功能,每一家在真实运转的公司、政府机构或组织,都必须消费并生产与这些功能相关的结构化数据。在大多数情况下,由于规模效率低、固定成本高,以及在内部生产高质量、可复用的结构化数据在运营上极其困难,让他们自己去创建或结构化这些数据并不是最优解。

我们今天已经能看到这种动态。例如,许多律师在 micro1 这样的平台上,围绕标准化、结构化的法律数据工作时,每小时能产生的杠杆效应,比他们在单个律所内部做非结构化工作时更大。在 micro1,有超过 1,000 名律师从事结构化数据创建工作,平均收入比传统律所岗位高约 20%。律所本身不太可能成为大规模结构化训练数据的生产者,但它们会越来越多地成为这类数据的消费者——要么直接购买,要么通过把这些数据嵌入它们使用的工具之中来消费。

这会形成一套强力的激励结构。

正在自动化功能的实验室会为这种数据付费,因为从长期看,渐进式自动化带来的价值远远超过获取数据的成本。

因此:

  • 各类实体不仅为了自动化自身而被激励去生产高质量的人类数据,也因为这些数据在外部市场上具有价值

  • 每一小时的工作可以同时:

  • 让组织运转

  • 训练 AI 模型

  • 为组织创造额外收入

人类劳动不再只是“用来生产商品与服务的劳动”,它本身也会变成一种能够产生收入的资产。

最终的收敛:人类时间的 5% 以上将花在“人类数据”上

合理的推断是:经济中的大多数功能都会花一部分时间去尝试自动化自己。不是完全自动化,也不是一次到位,而是持续地把工作从“人类参与的循环”中推出去——只要它变得可重复、可规模化。

今天,即便是知识工作者,大多数时间也花在沟通与协调上,而不是我们通常认为的“真正产出”。随着自动化推进,知识工作中乏味的部分会被逐步移除;而自动化也会越来越多地吸收协调、排期、路径分发(routing)与常规沟通。结果是:更大比例的人类时间会用在需要判断的知识工作上。

即使在保守假设下,也可以合理预期:在一个更自动化的经济里,大约 75% 的工作时间仍会花在沟通与协调上,而大约 25% 用于做“实际工作”。

并非所有实际工作都需要结构化。但其中相当一部分需要:那些产生决策、判断、示范、评估与异常处理的工作,如果能以结构化、可复用的形式被捕捉,就会变得更有价值——既能更好完成当下任务,也能为未来自动化提供支撑。如果这部分“实际工作”中只有五分之一是在结构化环境中完成的,那么就意味着:总人类劳动时间的大约 5% 会用来生成结构化的人类数据。

全球 GDP 约为 100 万亿美元,而劳动大约占其中的 50%,那么每年的总劳动支出约为 50 万亿美元。其中的 5% 对应的是:每年大约 2.5 万亿美元的人类时间,会被投入到“推动自动化”之中——用于生成示范、反馈、评估与 AI 系统所需的学习信号。

当然,这里面并不都会变成“人类数据市场”的显性支出。很大一部分仍会是隐性的、碎片化的,或无法定价的。但即便做非常激进的折扣,你依然会得到一个数量级在每年 1 万亿美元左右的结果。

自动化会重塑劳动,而不是缩小劳动

这会导致自动化规模化。随着自动化规模化,原本用于人类劳动的部分支出会被重新分配到:

  • 能源

  • 算力(compute)

  • AI 劳动力(AI labor)

但人类劳动的总支出仍会继续上升。

为什么?

自动化创造时间。时间释放创造力。创造力会在经济中产生全新的功能。

这些功能起初由人类来完成。随着时间推移,它们会进入同样的自动化循环。

人类劳动会变得更昂贵,因为:

  • 在任何一个时点,人类时间都是有限的

  • 创造力与判断力是稀缺的

  • 全新的想法天然具有溢价价值

随着自动化扩展,人类会把更多时间集中在更高杠杆的工作上。虽然总人类工作时长确实会随着时间增长,但这种增长无法为了满足需求而迅速加速。劳动力市场扩张最快、最主导的方式,是提高每小时人类劳动所创造的价值。

随着这一过程继续:

  • 人类劳动总支出上升

  • 更大比例的人类时间会用在生成学习信号、推动自动化上

我们不该再把它叫作“标注”(annotation)了

这项工作在塑造 AI 上的重要性意味着,把它称为“数据标注”(data labeling)或“注释/标注”(annotation)完全不准确。这些词描述的是机械劳动,而真正的价值来自于:以结构化形式表达出来的人类判断、专业知识与决策。

更准确的描述应是:专家级人类数据生产,或结构化的人类判断。

这就是人类专业能力如何在自动化经济中实现复利增长。它解释了:为什么人类数据会随着自动化一起扩张,而不是消失;以及为什么它会随着时间推移成为一等的经济投入要素。

人类的卓越能力将比以往任何时候都更重要

这不需要极端假设。它只需要两件事成立:自动化持续有效;智能持续向人类学习。只要这是真的,那么人类数据就不是一个阶段性现象,也不是一个临时瓶颈。它是经济的结构性投入。

人类判断被捕捉、结构化、并不断精炼。

这些判断成为智能训练的“基底”。

智能反过来,产生更多自动化。

当功能被自动化,人类时间被释放出来。人类就用这些时间去创造新的功能、再把它们推进自动化——这个美妙的循环持续运转。

相关笔记

🧭 主题 MOC

  • [[AI MOC|AI]]:(MOC) 讨论「人类数据」作为 AI 学习信号与自动化燃料,属于「AI」训练与产业逻辑的核心命题。
  • [[数据系统 MOC|数据系统]]:(MOC) 强调数据需被「记录」「结构化」「评估」并打包为数据管线,对应「数据系统」的生产与运营。

🎯 核心:人类数据 = 自动化燃料

  • [[20 Areas/24 数据职业/My data life|Data life]]:(Areas) 把「自动化」写成“不要让人做机器的工作”,并明确“被使用的数据是资产”,对齐本文「人类数据」会随自动化扩张的判断。
  • [[00 Inbox/Flomo_Import/2024-09-27-10-20-48|AI-first 组织]]:(Flomo) 用“数据即生产要素/核心资产”解释组织为何会投入把工作流「结构化」成可训练的数据管线。
  • [[30 Wiki/36 AI_Industry/2023-05-22-14-43-40|AI=水电煤]]:(Wiki) 把 AI 视作基础设施,提示当智能供给爆炸时,对高质量「人类数据」与学习信号的需求也会随之抬升。

⚙️ 落地:记录→结构化→评估→管线

  • [[40 Library/41 读书笔记/精益数字方法论/2023-08-25-12-40-16|让数据流动]]:(精益数字方法论) 用“生产/加工/消费”把「数据资产」运营化,正对应本文“数据必须被记录、结构化、评估并打包成数据管线”。

🔄 训练信号:偏好/反馈/奖励函数

  • [[30 Wiki/36 AI_Industry/2024-09-20-06-58-21|RL 三要素]]:(Wiki) 用 reward model / 探索 / prompt 拆解强化学习,帮助把本文的「偏好学习/持续纠错」理解为“可学习的反馈信号”。
  • [[30 Wiki/32 创作_表达/2024-03-01-20-48-47|反馈=RL]]:(Wiki) 把“反馈是强化学习、复盘是用更好的数据集再训练”说清楚,补足本文「评估/纠错」为何会变成高价值「人类数据」。

⚔️ L2 张力:把一切变数据 vs 数据会被刷分/遮蔽

  • [[40 Library/41 读书笔记/BadData/2023-11-09-16-45-07|古德哈特]]:(BadData) 当「测量」与激励/定价绑定就会诱发刷分与扭曲,提醒“人类数据市场化”可能带来 reward hacking 式的副作用。
  • [[40 Library/41 读书笔记/BadData/2023-11-09-16-11-32|信息狂妄]]:(BadData) 强调“重要之物未必可测”,对冲把所有判断都强行「结构化」成训练信号的直线叙事。
  • [[20 Areas/24 数据职业/Not data driven, but data informed|Data informed]]:(Areas) 主张数据更多是「后验照明」而非指挥棒,提供对“用数据替代洞察/价值判断”的反面校准。

human data will be a $1 trillion/year market

这不是一个短期预测。这是一个结构性判断:经济最终会向这个方向收敛。

This is not a short-term prediction. It is a structural claim about where the economy converges.

要相信这一点,你需要接受两个前提:

To believe this, you need to accept two assumptions:

  • 数字智能与物理智能,最终可以把经济中那些乏味、繁琐的部分自动化
  • Digital and physical intelligence can eventually automate the tedious parts of the economy
  • 在最前沿,没有人类数据,自学习智能是不可能的
  • Self-learning intelligence without human data is impossible at the frontier

自动化是人类能做的最有用、最解放的事情

automation is the most useful & liberating thing humanity can do

如果 AI 系统能够自动化“功能”(functions),那么把一切功能都自动化,就是人类最具杠杆的任务。

If AI systems can automate functions, then automating all functions is the highest-leverage task for humanity.

自动化会压缩时间。它让以下事情成为可能:

Automation compresses time. It allows:

  • 让愿望的实现速度提高几个数量级
  • Aspirations to be fulfilled faster, by orders of magnitude
  • 让人类把注意力集中在工作中那些更愉悦、需要判断的部分,而让机器人与智能体处理其余部分
  • Humans to focus on the enjoyable, judgment-heavy parts of work while robots and agents to handle the rest

当人类获得更多时间,他们就会创造更多。全新的工作一开始往往是创造性强、价值高的。随着时间推移,它会变得可理解、可重复,并准备好被自动化。一旦被自动化,它会持续交付价值,同时释放人类去做新的创造性工作。这个循环是永久的。

As humans gain time, they create more. Net-new work is initially creative and high-value. Over time it becomes legible, repeatable, and ready for automation. Once automated, it continues delivering value while freeing humans to focus on new creative work. This loop is permanent.

自动化不会消灭人类工作。它会把人类推向价值更高、更具创造性的工作。

Automation does not eliminate human work. It pushes humans toward higher-value, more creative work.

在社会层面,自动化会重塑世界的经济学。当 AI 系统承担更多生产与协调时,生产商品与服务的成本会崩塌式下降,而供给与可得性会爆炸式上升。

At a societal level, automation reshapes the economics of the world. As AI systems take on more production and coordination, the cost of producing goods and services collapses while availability explodes.

与此同时,分发也会变得越来越最优。数字与物理世界的智能系统会以更少的摩擦、更少的浪费、更少的延迟来协调供需,使得获取变得更快、更便宜、更可靠——而且每一年都更上一层楼。

At the same time, distribution becomes increasingly optimal. Digitally and physically intelligent systems coordinate supply and demand with less friction, less waste, and less delay, making access faster, cheaper, and more reliable every year

AI 模型将永远向人类学习

AI models learn from humans forever

每一个人工智能系统都在某种形式上向人类学习:

Every artificially intelligent system learns from humans in some form:

  • 示范(demonstrations)
  • Demonstrations
  • 监督式微调(supervised fine-tuning)
  • Supervised fine-tuning
  • 偏好学习(preference learning)
  • Preference learning
  • 复杂的评分细则与评估(complex rubrics and evaluations)
  • Complex rubrics and evaluations
  • 持续纠错(continual corrections)
  • Continual corrections

即便是自我博弈与合成数据,也依赖人类“落地”——人类定义目标、奖励,以及“好”究竟长什么样。

Even self-play and synthetic data depend on human grounding — humans define objectives, rewards, and what “good” looks like.

因此:

As a result:

  • 经济中的每一个功能都包含有用的学习信号
  • Every function in the economy contains useful learning signal
  • 每一个决策、异常、失败与取舍都会生成数据
  • Every decision, exception, failure, and tradeoff creates data

但仅有原始活动还不够。这些数据必须被:

But raw activity is not enough. That data must be:

  • 记录
  • Recorded
  • 结构化
  • Structured
  • 评估
  • Evaluated
  • 打包为可用的数据管线
  • Packaged into usable pipelines

更重要的是:在被自动化的过程中,这些功能必须继续运行。自动化是迭代的,而不是瞬时完成的。

And importantly, functions must continue running while they are being automated. Automation is iterative, not instantaneous.

这会带来一种普遍的义务与机会

this creates a universal obligation and opportunity

为了迭代式自动化功能,每一家在真实运转的公司、政府机构或组织,都必须消费并生产与这些功能相关的结构化数据。在大多数情况下,由于规模效率低、固定成本高,以及在内部生产高质量、可复用的结构化数据在运营上极其困难,让他们自己去创建或结构化这些数据并不是最优解。

To iteratively automate functions, every company, government agency, or institution running real operations must consume and produce structured data related to those functions. In most cases, it will not be optimal for them to create or structure that data themselves, due to scale inefficiencies, high fixed costs, and the operational difficulty of producing high-quality, reusable structured data in-house.

我们今天已经能看到这种动态。例如,许多律师在 micro1 这样的平台上,围绕标准化、结构化的法律数据工作时,每小时能产生的杠杆效应,比他们在单个律所内部做非结构化工作时更大。在 micro1,有超过 1,000 名律师从事结构化数据创建工作,平均收入比传统律所岗位高约 20%。律所本身不太可能成为大规模结构化训练数据的生产者,但它们会越来越多地成为这类数据的消费者——要么直接购买,要么通过把这些数据嵌入它们使用的工具之中来消费。

We already see this dynamic today. For example, many lawyers produce more leverage per hour working on standardized, structured legal data through platforms like micro1 than they do performing unstructured work inside individual law firms. At micro1, over 1,000 lawyers work in structured data creation and earn on average ~20% more than in traditional firm roles. Law firms themselves are unlikely to become large-scale producers of structured training data, but they will increasingly be consumers of that data, either directly or by having it embedded in the tools they use.

这会形成一套强力的激励结构。

This creates a powerful incentive structure.

正在自动化功能的实验室会为这种数据付费,因为从长期看,渐进式自动化带来的价值远远超过获取数据的成本。

Labs that are automating functions will pay for this data, because long term the value gained from incremental automation far exceeds the cost of acquiring the data.

因此:

As a result:

  • 各类实体不仅为了自动化自身而被激励去生产高质量的人类数据,也因为这些数据在外部市场上具有价值
  • Entities are incentivized to produce high-quality human data not just to automate themselves, but because that data has external market value
  • 每一小时的工作可以同时:
  • Every hour of work can simultaneously:
  • 让组织运转
  • Run the organization
  • 训练 AI 模型
  • Train AI models
  • 为组织创造额外收入
  • Generate additional revenue for the organization

人类劳动不再只是“用来生产商品与服务的劳动”,它本身也会变成一种能够产生收入的资产。

Human labor becomes not just labor to produce goods & services, but a revenue-generating asset on its own.

最终的收敛:人类时间的 5% 以上将花在“人类数据”上

the ultimate convergence: 5%+ of human time is spent on human data

合理的推断是:经济中的大多数功能都会花一部分时间去尝试自动化自己。不是完全自动化,也不是一次到位,而是持续地把工作从“人类参与的循环”中推出去——只要它变得可重复、可规模化。

It’s reasonable to think that most functions in the economy will spend some amount of time trying to automate themselves. Not fully, and not all at once, but continuously pushing work out of the human loop as it becomes repeatable and scalable.

今天,即便是知识工作者,大多数时间也花在沟通与协调上,而不是我们通常认为的“真正产出”。随着自动化推进,知识工作中乏味的部分会被逐步移除;而自动化也会越来越多地吸收协调、排期、路径分发(routing)与常规沟通。结果是:更大比例的人类时间会用在需要判断的知识工作上。

Today, even knowledge workers spend the majority of their time on communication and coordination rather than on what we would consider actual productive work. As automation advances, tedious parts of knowledge work are progressively removed, and automation increasingly absorbs coordination, scheduling, routing, and routine communication. The result is a larger share of human time being spent on judgment heavy knowledge work.

即使在保守假设下,也可以合理预期:在一个更自动化的经济里,大约 75% 的工作时间仍会花在沟通与协调上,而大约 25% 用于做“实际工作”。

Even under conservative assumptions, it is reasonable to expect that in a more automated economy roughly 75% of work time is still spent on communication and coordination, while about 25% is spent doing actual work.

并非所有实际工作都需要结构化。但其中相当一部分需要:那些产生决策、判断、示范、评估与异常处理的工作,如果能以结构化、可复用的形式被捕捉,就会变得更有价值——既能更好完成当下任务,也能为未来自动化提供支撑。如果这部分“实际工作”中只有五分之一是在结构化环境中完成的,那么就意味着:总人类劳动时间的大约 5% 会用来生成结构化的人类数据。

Not all of that work needs to be structured. But a meaningful fraction does. Work that produces decisions, judgments, demonstrations, evaluations, and exceptions becomes far more valuable when captured in a structured, reusable form, both to complete the task and to enable future automation. If only one fifth of that actual work is performed in structured environments, that implies roughly 5% of total human labor time is spent generating structured human data.

全球 GDP 约为 100 万亿美元,而劳动大约占其中的 50%,那么每年的总劳动支出约为 50 万亿美元。其中的 5% 对应的是:每年大约 2.5 万亿美元的人类时间,会被投入到“推动自动化”之中——用于生成示范、反馈、评估与 AI 系统所需的学习信号。

With global GDP at roughly $100T, and labor representing about 50% of that, total labor spend is around $50T annually. Five percent of that corresponds to roughly $2.5T per year of human time directed at enabling automation, creating demonstrations, feedback, evaluations, and learning signals for AI systems.

当然,这里面并不都会变成“人类数据市场”的显性支出。很大一部分仍会是隐性的、碎片化的,或无法定价的。但即便做非常激进的折扣,你依然会得到一个数量级在每年 1 万亿美元左右的结果。

Certainly not all of this will become explicit spend in the human data market. Much of it will remain implicit, fragmented, or unpriced. But even with aggressive discounting, you still arrive at something on the order of $1T per year.

自动化会重塑劳动,而不是缩小劳动

automation reshapes labor, it doesn’t shrink it

这会导致自动化规模化。随着自动化规模化,原本用于人类劳动的部分支出会被重新分配到:

This results in automation scaling, As automation scales, some amount of what was spent on human labor is redirected towards:

  • 能源
  • Energy
  • 算力(compute)
  • Compute
  • AI 劳动力(AI labor)
  • AI labor

但人类劳动的总支出仍会继续上升。

However, total human labor spend continues to increase.

为什么?

Why?

自动化创造时间。时间释放创造力。创造力会在经济中产生全新的功能。

Automation creates time. Time enables creativity. Creativity produces net-new functions within the economy.

这些功能起初由人类来完成。随着时间推移,它们会进入同样的自动化循环。

Those functions are initially done by humans. Over time, they follow the same automation cycle.

人类劳动会变得更昂贵,因为:

human labor gets more expensive because:

  • 在任何一个时点,人类时间都是有限的
  • Human time is finite at any moment
  • 创造力与判断力是稀缺的
  • Creativity and judgment are scarce
  • 全新的想法天然具有溢价价值
  • Net-new ideas command premium value

随着自动化扩展,人类会把更多时间集中在更高杠杆的工作上。虽然总人类工作时长确实会随着时间增长,但这种增长无法为了满足需求而迅速加速。劳动力市场扩张最快、最主导的方式,是提高每小时人类劳动所创造的价值。

As automation expands, humans concentrate more of their time on higher-leverage work. While total human hours do grow over time, that growth cannot be rapidly accelerated in response to demand. The fastest and dominant way the labor market expands is by increasing the value created per human hour.

随着这一过程继续:

As this continues:

  • 人类劳动总支出上升
  • Total human labor spend rises
  • 更大比例的人类时间会用在生成学习信号、推动自动化上
  • A larger share of human time is spent generating learning signals and enabling automation

我们不该再把它叫作“标注”(annotation)了

we should never call it annotation again

这项工作在塑造 AI 上的重要性意味着,把它称为“数据标注”(data labeling)或“注释/标注”(annotation)完全不准确。这些词描述的是机械劳动,而真正的价值来自于:以结构化形式表达出来的人类判断、专业知识与决策。

The importance of this work in shaping AI means calling it “data labeling” or “annotation” is completely inaccurate. These phrases describe mechanical tasks, when the real value comes from human judgment, expertise, and decision-making expressed in structured form.

更准确的描述应是:专家级人类数据生产,或结构化的人类判断。

A more accurate description is expert human data creation or structured human judgment.

这就是人类专业能力如何在自动化经济中实现复利增长。它解释了:为什么人类数据会随着自动化一起扩张,而不是消失;以及为什么它会随着时间推移成为一等的经济投入要素。

This is how human expertise compounds in an automated economy. It explains why human data scales with automation rather than disappearing, and why it becomes a first-class economic input over time.

人类的卓越能力将比以往任何时候都更重要

human brilliance is needed more than ever

这不需要极端假设。它只需要两件事成立:自动化持续有效;智能持续向人类学习。只要这是真的,那么人类数据就不是一个阶段性现象,也不是一个临时瓶颈。它是经济的结构性投入。

This does not require extreme assumptions. It only requires that automation continues to work, and that intelligence continues to learn from humans. If that is true, then human data is not a phase or a temporary bottleneck. It is a structural input to the economy.

人类判断被捕捉、结构化、并不断精炼。

Human judgment is captured, structured, and refined.

这些判断成为智能训练的“基底”。

That judgment becomes the training substrate of intelligence.

智能反过来,产生更多自动化。

That intelligence, in turn, produces more automation.

当功能被自动化,人类时间被释放出来。人类就用这些时间去创造新的功能、再把它们推进自动化——这个美妙的循环持续运转。

As functions are automated, human time is freed. That time is spent creating new functions to automate, and the beautiful cycle continues.

相关笔记

相关笔记

🧭 主题 MOC

  • [[AI MOC|AI]]:(MOC) 讨论「人类数据」作为 AI 学习信号与自动化燃料,属于「AI」训练与产业逻辑的核心命题。
  • [[数据系统 MOC|数据系统]]:(MOC) 强调数据需被「记录」「结构化」「评估」并打包为数据管线,对应「数据系统」的生产与运营。

🧭 主题 MOC

  • [[AI MOC|AI]]:(MOC) 讨论 AI 在前沿训练中对「human data」的结构性依赖(演示/偏好学习/评估等)。
  • [[商业系统 MOC|商业系统]]:(MOC) 以「human data market」与激励结构解释为何会收敛到 $1T/year 级别的市场规模。

🎯 核心:人类数据 = 自动化燃料

  • [[20 Areas/24 数据职业/My data life|Data life]]:(Areas) 把「自动化」写成“不要让人做机器的工作”,并明确“被使用的数据是资产”,对齐本文「人类数据」会随自动化扩张的判断。
  • [[00 Inbox/Flomo_Import/2024-09-27-10-20-48|AI-first 组织]]:(Flomo) 用“数据即生产要素/核心资产”解释组织为何会投入把工作流「结构化」成可训练的数据管线。
  • [[30 Wiki/36 AI_Industry/2023-05-22-14-43-40|AI=水电煤]]:(Wiki) 把 AI 视作基础设施,提示当智能供给爆炸时,对高质量「人类数据」与学习信号的需求也会随之抬升。

🎯 核心:人类数据 = 训练信号

  • [[30 Wiki/36 AI_Industry/2024-09-20-06-58-21.md|奖励模型]]:(Wiki) 把「人类反馈/判断」当作 reward model 的「训练信号」,解释为何「human data」会被持续购买。
  • [[30 Wiki/36 AI_Industry/2023-05-22-20-40-24.md|AI 术语表]]:(Wiki) 用「fine-tune / preference learning」等概念补齐训练流程,便于把文中的「演示/评估」定位为可商品化的「人类数据」。

⚙️ 落地:记录→结构化→评估→管线

  • [[40 Library/41 读书笔记/精益数字方法论/2023-08-25-12-40-16|让数据流动]]:(精益数字方法论) 用“生产/加工/消费”把「数据资产」运营化,正对应本文“数据必须被记录、结构化、评估并打包成数据管线”。

🔗 机制:数据资产化与自动化飞轮

  • [[20 Areas/24 数据职业/My data life.md|数据资产]]:(Areas) 以「被使用的数据是资产」解释“记录→结构化→评估→管道化”,对应本文把「人类判断」做成可复用的自动化输入。
  • [[30 Wiki/36 AI_Industry/2024-01-02-08-30-02.md|个人知识库]]:(Wiki) 把“持续沉淀语料”视为长期「数据资产」工程,类比本文「结构化人类判断」如何随自动化「复利」。

🔄 训练信号:偏好/反馈/奖励函数

  • [[30 Wiki/36 AI_Industry/2024-09-20-06-58-21|RL 三要素]]:(Wiki) 用 reward model / 探索 / prompt 拆解强化学习,帮助把本文的「偏好学习/持续纠错」理解为“可学习的反馈信号”。
  • [[30 Wiki/32 创作_表达/2024-03-01-20-48-47|反馈=RL]]:(Wiki) 把“反馈是强化学习、复盘是用更好的数据集再训练”说清楚,补足本文「评估/纠错」为何会变成高价值「人类数据」。

💰 经济:Data Value 与摩擦

  • [[00 Inbox/Flomo_Import/2024-04-04-12-08-03.md|价值公式]]:(Flomo) 把「Data Value」与「Privacy Cost」写进交易不等式,帮助理解「human data」变现如何与「隐私成本」长期对冲。

⚔️ L2 张力:把一切变数据 vs 数据会被刷分/遮蔽

  • [[40 Library/41 读书笔记/BadData/2023-11-09-16-45-07|古德哈特]]:(BadData) 当「测量」与激励/定价绑定就会诱发刷分与扭曲,提醒“人类数据市场化”可能带来 reward hacking 式的副作用。
  • [[40 Library/41 读书笔记/BadData/2023-11-09-16-11-32|信息狂妄]]:(BadData) 强调“重要之物未必可测”,对冲把所有判断都强行「结构化」成训练信号的直线叙事。
  • [[20 Areas/24 数据职业/Not data driven, but data informed|Data informed]]:(Areas) 主张数据更多是「后验照明」而非指挥棒,提供对“用数据替代洞察/价值判断”的反面校准。

⚙️ 方法:把工作过程变成可用数据

  • [[20 Areas/24 数据职业/如何设计数据需求.md|数据需求]]:(Areas) 用“先要什么信息→倒推事件/属性”把工作过程「记录/结构化」,贴合文中 “structured environments” 与「数据管道」。
  • [[40 Library/41 读书笔记/数据化决策/2023-07-27-08-18-26.md|量化影响决策]]:(数据化决策) 强调只有能改变「决策/行为」的量化才有价值,呼应本文“例外/权衡/失败”被「结构化」后才会变成可训练的「人类数据」。

⚔️ L2 对立:数据资产化/变现 vs 边界与代价

  • [[30 Wiki/37 产品_设计/2024-01-02-14-04-02.md|资产vs锁定]]:(Wiki) 区分「体验资产」与「迁移成本」,提醒把「人类行为数据」商品化容易滑向锁定式价值捕获。
  • [[30 Wiki/36 AI_Industry/2024-06-11-09-18-05.md|隐私信任]]:(Wiki) 以 Apple 的「隐私/信任」叙事为例,指出「human data」规模化会遭遇「信任」与变现的长期张力。
  • [[40 Library/41 读书笔记/BadData/2023-11-09-16-11-32.md|信息狂妄]]:(BadData) 用路灯效应警惕「可测量≠重要」,为“把更多人类活动转成可定价数据”提供「边界」校验。

Ali Ansari on X: "human data will be a $1 trillion/year market" / X

  • Source: https://x.com/aliniikk/status/2009347948816335031?s=20
  • Mirror: https://r.jina.ai/https://x.com/aliniikk/status/2009347948816335031?s=20
  • Published: Mon, 12 Jan 2026 17:26:41 GMT
  • Saved: 2026-01-13

Content

human data will be a $1 trillion/year market

This is not a short-term prediction. It is a structural claim about where the economy converges.

To believe this, you need to accept two assumptions:

  • Digital and physical intelligence can eventually automate the tedious parts of the economy

  • Self-learning intelligence without human data is impossible at the frontier

automation is the most useful & liberating thing humanity can do

If AI systems can automate functions, then automating all functions is the highest-leverage task for humanity.

Automation compresses time. It allows:

  • Aspirations to be fulfilled faster, by orders of magnitude

  • Humans to focus on the enjoyable, judgment-heavy parts of work while robots and agents to handle the rest

As humans gain time, they create more. Net-new work is initially creative and high-value. Over time it becomes legible, repeatable, and ready for automation. Once automated, it continues delivering value while freeing humans to focus on new creative work. This loop is permanent.

Automation does not eliminate human work. It pushes humans toward higher-value, more creative work.

At a societal level, automation reshapes the economics of the world. As AI systems take on more production and coordination, the cost of producing goods and services collapses while availability explodes.

At the same time, distribution becomes increasingly optimal. Digitally and physically intelligent systems coordinate supply and demand with less friction, less waste, and less delay, making access faster, cheaper, and more reliable every year

AI models learn from humans forever

Every artificially intelligent system learns from humans in some form:

  • Demonstrations

  • Supervised fine-tuning

  • Preference learning

  • Complex rubrics and evaluations

  • Continual corrections

Even self-play and synthetic data depend on human grounding — humans define objectives, rewards, and what “good” looks like.

As a result:

  • Every function in the economy contains useful learning signal

  • Every decision, exception, failure, and tradeoff creates data

But raw activity is not enough. That data must be:

  • Recorded

  • Structured

  • Evaluated

  • Packaged into usable pipelines

And importantly, functions must continue running while they are being automated. Automation is iterative, not instantaneous.

this creates a universal obligation and opportunity

To iteratively automate functions, every company, government agency, or institution running real operations must consume and produce structured data related to those functions. In most cases, it will not be optimal for them to create or structure that data themselves, due to scale inefficiencies, high fixed costs, and the operational difficulty of producing high-quality, reusable structured data in-house.

We already see this dynamic today. For example, many lawyers produce more leverage per hour working on standardized, structured legal data through platforms like micro1 than they do performing unstructured work inside individual law firms. At micro1, over 1,000 lawyers work in structured data creation and earn on average ~20% more than in traditional firm roles. Law firms themselves are unlikely to become large-scale producers of structured training data, but they will increasingly be consumers of that data, either directly or by having it embedded in the tools they use.

This creates a powerful incentive structure.

Labs that are automating functions will pay for this data, because long term the value gained from incremental automation far exceeds the cost of acquiring the data.

As a result:

  • Entities are incentivized to produce high-quality human data not just to automate themselves, but because that data has external market value

  • Every hour of work can simultaneously:

  • Run the organization

  • Train AI models

  • Generate additional revenue for the organization

Human labor becomes not just labor to produce goods & services, but a revenue-generating asset on its own.

the ultimate convergence: 5%+ of human time is spent on human data

It’s reasonable to think that most functions in the economy will spend some amount of time trying to automate themselves. Not fully, and not all at once, but continuously pushing work out of the human loop as it becomes repeatable and scalable.

Today, even knowledge workers spend the majority of their time on communication and coordination rather than on what we would consider actual productive work. As automation advances, tedious parts of knowledge work are progressively removed, and automation increasingly absorbs coordination, scheduling, routing, and routine communication. The result is a larger share of human time being spent on judgment heavy knowledge work.

Even under conservative assumptions, it is reasonable to expect that in a more automated economy roughly 75% of work time is still spent on communication and coordination, while about 25% is spent doing actual work.

Not all of that work needs to be structured. But a meaningful fraction does. Work that produces decisions, judgments, demonstrations, evaluations, and exceptions becomes far more valuable when captured in a structured, reusable form, both to complete the task and to enable future automation. If only one fifth of that actual work is performed in structured environments, that implies roughly 5% of total human labor time is spent generating structured human data.

With global GDP at roughly $100T, and labor representing about 50% of that, total labor spend is around $50T annually. Five percent of that corresponds to roughly $2.5T per year of human time directed at enabling automation, creating demonstrations, feedback, evaluations, and learning signals for AI systems.

Certainly not all of this will become explicit spend in the human data market. Much of it will remain implicit, fragmented, or unpriced. But even with aggressive discounting, you still arrive at something on the order of $1T per year.

automation reshapes labor, it doesn’t shrink it

This results in automation scaling, As automation scales, some amount of what was spent on human labor is redirected towards:

  • Energy

  • Compute

  • AI labor

However, total human labor spend continues to increase.

Why?

Automation creates time. Time enables creativity. Creativity produces net-new functions within the economy.

Those functions are initially done by humans. Over time, they follow the same automation cycle.

human labor gets more expensive because:

  • Human time is finite at any moment

  • Creativity and judgment are scarce

  • Net-new ideas command premium value

As automation expands, humans concentrate more of their time on higher-leverage work. While total human hours do grow over time, that growth cannot be rapidly accelerated in response to demand. The fastest and dominant way the labor market expands is by increasing the value created per human hour.

As this continues:

  • Total human labor spend rises

  • A larger share of human time is spent generating learning signals and enabling automation

we should never call it annotation again

The importance of this work in shaping AI means calling it “data labeling” or “annotation” is completely inaccurate. These phrases describe mechanical tasks, when the real value comes from human judgment, expertise, and decision-making expressed in structured form.

A more accurate description is expert human data creation or structured human judgment.

This is how human expertise compounds in an automated economy. It explains why human data scales with automation rather than disappearing, and why it becomes a first-class economic input over time.

human brilliance is needed more than ever

This does not require extreme assumptions. It only requires that automation continues to work, and that intelligence continues to learn from humans. If that is true, then human data is not a phase or a temporary bottleneck. It is a structural input to the economy.

Human judgment is captured, structured, and refined.

That judgment becomes the training substrate of intelligence.

That intelligence, in turn, produces more automation.

As functions are automated, human time is freed. That time is spent creating new functions to automate, and the beautiful cycle continues.

相关笔记

🧭 主题 MOC

  • [[AI MOC|AI]]:(MOC) 讨论 AI 在前沿训练中对「human data」的结构性依赖(演示/偏好学习/评估等)。
  • [[商业系统 MOC|商业系统]]:(MOC) 以「human data market」与激励结构解释为何会收敛到 $1T/year 级别的市场规模。

🎯 核心:人类数据 = 训练信号

  • [[30 Wiki/36 AI_Industry/2024-09-20-06-58-21.md|奖励模型]]:(Wiki) 把「人类反馈/判断」当作 reward model 的「训练信号」,解释为何「human data」会被持续购买。
  • [[30 Wiki/36 AI_Industry/2023-05-22-20-40-24.md|AI 术语表]]:(Wiki) 用「fine-tune / preference learning」等概念补齐训练流程,便于把文中的「演示/评估」定位为可商品化的「人类数据」。

🔗 机制:数据资产化与自动化飞轮

  • [[20 Areas/24 数据职业/My data life.md|数据资产]]:(Areas) 以「被使用的数据是资产」解释“记录→结构化→评估→管道化”,对应本文把「人类判断」做成可复用的自动化输入。
  • [[30 Wiki/36 AI_Industry/2024-01-02-08-30-02.md|个人知识库]]:(Wiki) 把“持续沉淀语料”视为长期「数据资产」工程,类比本文「结构化人类判断」如何随自动化「复利」。

💰 经济:Data Value 与摩擦

  • [[00 Inbox/Flomo_Import/2024-04-04-12-08-03.md|价值公式]]:(Flomo) 把「Data Value」与「Privacy Cost」写进交易不等式,帮助理解「human data」变现如何与「隐私成本」长期对冲。

⚙️ 方法:把工作过程变成可用数据

  • [[20 Areas/24 数据职业/如何设计数据需求.md|数据需求]]:(Areas) 用“先要什么信息→倒推事件/属性”把工作过程「记录/结构化」,贴合文中 “structured environments” 与「数据管道」。
  • [[40 Library/41 读书笔记/数据化决策/2023-07-27-08-18-26.md|量化影响决策]]:(数据化决策) 强调只有能改变「决策/行为」的量化才有价值,呼应本文“例外/权衡/失败”被「结构化」后才会变成可训练的「人类数据」。

⚔️ L2 对立:数据资产化/变现 vs 边界与代价

  • [[30 Wiki/37 产品_设计/2024-01-02-14-04-02.md|资产vs锁定]]:(Wiki) 区分「体验资产」与「迁移成本」,提醒把「人类行为数据」商品化容易滑向锁定式价值捕获。
  • [[30 Wiki/36 AI_Industry/2024-06-11-09-18-05.md|隐私信任]]:(Wiki) 以 Apple 的「隐私/信任」叙事为例,指出「human data」规模化会遭遇「信任」与变现的长期张力。
  • [[40 Library/41 读书笔记/BadData/2023-11-09-16-11-32.md|信息狂妄]]:(BadData) 用路灯效应警惕「可测量≠重要」,为“把更多人类活动转成可定价数据”提供「边界」校验。

📋 讨论归档

讨论进行中…