返回列表
🧠 阿头学 · 💬 讨论题 · 🪞 Uota学

七年对话沙龙经验表明,LLM 仍卡在“会说话”而不是“会交流”

作者对“LLM 已接近人类交流”的乐观判断提出了有力反驳:现有模型擅长生成内容,但在节奏、留白、寒暄和在场感这些真正决定对话质量的层面,依然明显不合格。
打开原文 ↗

2026-05-07 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • 交流的关键不是信息量,而是节奏管理 作者最站得住的判断是,高质量对话首先是“有音乐性”的,不是“有内容密度”的;沉默、停顿、轮次切换、群体气氛决定了讨论能否建设性展开,而主流 LLM 默认持续输出,反而经常破坏这种节奏。
  • 很多“听不进去”不是态度问题,而是流程设计失败 作者指出,人们在对话中常常忙着准备自己下一句,这个判断很准;Interintellect 通过提前分配发言预期、缩短独占发言时段、提供聊天区和笔记缓冲,降低了抢话焦虑,这说明好交流更多靠机制,不靠道德说教。
  • 寒暄不是废话,而是连接基础设施 作者对 phatic communication 的强调是本文最容易被低估的部分;“你好吗”、玩笑、重复、安抚看似不传递新信息,但它们实际承担了关系启动、风险缓冲和情绪校准功能,而这正是多数 AI 产品为了效率最先删掉的东西。
  • 文字媒介天然制造错位,LLM 继承了这个结构性问题 作者关于异步性的判断很扎实:文本交流经常默认双方在线,却又无法保证同步,因此特别容易引发误判、焦虑和失配;LLM 以书写为底层形态,即使语音交互也常是“转写—生成—朗读”,所以它并没有绕开文字沟通的老毛病。
  • 但“LLM 未真正理解交流”这个结论仍然说重了 作者抓住了真问题,但论证仍有跳跃;她批评的很多现象更像当前产品形态和交互设计的失败,而不必然证明模型层面“不理解”,这点必须分清,否则容易把可修复的产品缺陷说成不可逾越的能力鸿沟。

跟我们的关联

  • 对 ATou 意味着什么、下一步怎么用 ATou 如果在做 AI 产品、访谈、播客或社区运营,重点不该只放在“答案更强”,而该放在“回合更自然”;下一步可以直接测试短回合回答、主动停顿、插话窗口、旁路笔记区这些节奏设计。
  • 对 Neta 意味着什么、下一步怎么用 Neta 如果关心认知效率,会发现“高质量讨论”本质上是在重排注意力,而不是堆叠观点;下一步可以把“节奏—预期—缓冲”当成会议、社群和协作流程的评估框架。
  • 对 Uota 意味着什么、下一步怎么用 Uota 如果更关注关系体验,这篇文章的价值在于指出温度并不等于更会夸人,而是更会暖场、更会接住、更会留白;下一步可以在对话产品里强化寒暄层、等待态和安抚反馈,而不是一味追求信息压缩。
  • 对三者共同意味着什么、下一步怎么用 这篇文章最实用的启发是:对话体验的护城河不只是内容质量,而是无毒、低压、可参与的互动机制;下一步应优先设计“让人愿意继续说”的结构,而不是只设计“让系统说得更聪明”的能力。

讨论引子

1. LLM 现在的问题到底是“模型不理解交流”,还是“产品把交流设计坏了”? 2. 在哪些场景里,沉默、寒暄、在场感是真正关键的;又在哪些场景里,它们其实是低优先级? 3. 如果要做一个“更像人类交流”的 AI,第一步最该改的是回复长度、语音轮次、记忆机制,还是 UI 节奏设计?

在运营一个对话沙龙平台的 7 年里,我们对人类沟通学到了太多,而这些东西,我还没有看到 LLM 真正做对。

1- 音乐性

人类对话极其具有音乐性,因为它本质上关乎节奏。开场之后,人们会顺着那段旋律放松下来,或者被它搅得不安。这种音乐可以是独奏、二重奏,也可以在群体对话中成为一场交响乐。一场人类讨论能有多积极、多有建设性,取决于它最终形成了怎样的音乐。

就像音乐一样,人类对话中的一个关键元素是沉默。出现空档时,人们才能消化、连接、思考。上世纪 70 年代,伴侣治疗师 John Gottman 曾试图把自己与来访者的会谈数学化,结果也发现了类似的东西。Esther Perel 也告诉过我,在伴侣治疗中,这种对话几乎是一个人能经历的风险最高的对话之一,节奏与音乐性比说了什么更重要。听起来违反直觉,但确实如此。

即便是在短信里,人们也本能地学会了如何制造沉默的间隙,也就是那些不说话的时刻。你可以用它来强调观点,表达不满,或者凸显爱与在场感。我还没看到 LLM 敢这么做。

在我的沙龙平台 Interintellect 上,我们教新主持人的一件核心事情,就是如何鼓励、允许并管理沉默。这对人类来说同样违反直觉,甚至让人害怕。但对任何拥有身体的人来说,身体本身就是纯粹的节奏,因此对话的音乐性其实是一种直接可感的东西。

2- 优先级

对任何主持一场对话的人来说,或者有时只是参与对话的人来说,一个挑战在于,人们明明看似在与另一个人交流,却仍能高度困在自己的脑子里。这样的时刻有多少次呢,明明有人正在说话,而你已经完全在想着 下一句要说什么了!

Interintellect 举办的是固定时长、固定主题、带有明确意图的聚会。我们通过营造一种轻松拿麦的氛围,帮助人们走出自己的壳。每个人都知道,很快就会轮到自己发言,于是那种不耐烦的成分就完全消失了。在线上沙龙里,我们也会大量使用聊天区,人们可以在那里给别人留便条,也可以给自己记笔记。在线下沙龙里,我则常看到人们记笔记,以此腾出脑内空间。

当我们请来很有名的嘉宾时,我们会确保流程绝不是先 1 对 1 对谈,然后 50 分钟后才开放给观众。我们会提前告诉参与者,我们只会进行 10 分钟 1 对 1,再 10 分钟观众提问,再 10 分钟 1 对 1,依此类推。
这有助于防止观众出现精神性便秘:每个人都能保持流动与在场,在想法之间玩耍,实时倾听彼此。

这一点,我觉得 LLM 也还没有做好。很多时候,Claude 或 GPT 一开口,我已经在想下一个问题了,然后就直接跳过它们,或者打断它们。

3 - 寒暄之爱

所谓 phatic communication,指的是那些并不真正传递信息的言语成分,它们存在,只是为了让我们建立连接、感觉更好。从你好吗到玩笑,闲聊不该被看低。它有重要的生理作用:它让我们进入状态,帮助音乐开始。

寒暄式交流可以非常程式化,比如你刚走进一个陌生人的店里时会发生的那种。但在熟人之间,它又充满语境。提醒、重复、安抚。如果 phatic 元素能更深地融入 LLM 体验,这种体验会温暖得多。(Claude 那种温暖且会变化的欢迎语,就是一个不错的开始。)

4 - 可获得性

Interintellect 最早的形态,其实是一个由 AI 驱动的聊天应用,叫 Ixy,名字来自 mutual information,目标是让亲密关系中的书面沟通变得更好。我当时为它独立做了两年的研究,那还是 GPT2 的远古时代。这些研究对今天 Interintellect 上那种良好的氛围起了决定性作用。也正因如此,在经历了数以万计的对话之后,跨越封锁、选举、战争,我们的所有线下公开沙龙里都没有发生过一次有毒事件,尽管大多数参与者彼此都是陌生人。

我当年研究的一个重点是异步性。大量数据都指向一点:文字对话之所以会变糟,是因为它一边默认双方始终在线,一边又根本无法保证这一点。

在语言学里,我们总是关注对齐。两个人在客厅里说话,他们会努力说同一种语言,找到相同的音量,使用相近的词汇。简而言之,他们会尽量最大化 mutual information。

而在文字交流中,这件事要复杂得多。因为相较于现实生活,我们在文字里既更诚实,也更不诚实;既更在场,也更不在场。我的感觉是,由于 LLM 以书写为基础,即便是语音也会先被转写,而 AI 再把自己以书面形式生成的文本读给我们听,它们因此继承了人类短信交流中的一部分问题。

当然,LLM 永远在线。光这一点,人类就无法竞争。但人类沟通中有太多东西是身体性的,节奏、感觉、兴奋、起鸡皮疙瘩、出汗,还有 缺席,正因为缺席,在场才显得珍贵。所以眼下我并不担心,那种让人们聚在一起共同思考的文学沙龙,会在短期内被替代。

但无论是借助 AI,还是单纯依靠更好的人的思考,去打造更好的沟通工具,让人类彼此更好地交流,仍然是一项摆在前方的重要任务。

Having run a conversation salon platform for 7 years, we've learned so much about human communication that I don't (yet) see LLMs get right.

在运营一个对话沙龙平台的 7 年里,我们对人类沟通学到了太多,而这些东西,我还没有看到 LLM 真正做对。

1- Musicality:

1- 音乐性

Human conversation is incredibly musical in that it is all about the rhythm. After the entry point, people relax into the melody or get upset by it. The "music" can be a solo, a duet, or a symphony when it's a group conversation. A human discussion will be as positive or constructive as the "music" that it becomes allows.

人类对话极其具有音乐性,因为它本质上关乎节奏。开场之后,人们会顺着那段旋律放松下来,或者被它搅得不安。这种音乐可以是独奏、二重奏,也可以在群体对话中成为一场交响乐。一场人类讨论能有多积极、多有建设性,取决于它最终形成了怎样的音乐。

As with music, a key element in human conversation is silence. When there is a gap, people can process, connect, think. In the 1970s the couple's therapist John Gottman tried to mathematize his sessions with patients, and found something similar. Esther Perel also told me that in couple's therapy (one of the highest stakes conversations a person can have), the rhythm and musicality are more important than what is being said. Counterintuitive but true.

就像音乐一样,人类对话中的一个关键元素是沉默。出现空档时,人们才能消化、连接、思考。上世纪 70 年代,伴侣治疗师 John Gottman 曾试图把自己与来访者的会谈数学化,结果也发现了类似的东西。Esther Perel 也告诉过我,在伴侣治疗中,这种对话几乎是一个人能经历的风险最高的对话之一,节奏与音乐性比说了什么更重要。听起来违反直觉,但确实如此。

Even in text messages, people have learned instinctively how to create silent gaps -- those moments of not-speaking which you can use to make a point, to show dissatisfaction, or emphasize love and presence. I don't see LLMs daring to do this yet.

即便是在短信里,人们也本能地学会了如何制造沉默的间隙,也就是那些不说话的时刻。你可以用它来强调观点,表达不满,或者凸显爱与在场感。我还没看到 LLM 敢这么做。

On Interintellect, my salon platform, one of the main things we teach new salon hosts is how to encourage, allow, and manage silence. It is counterintuitive, even scary, for humans too. But to anyone with a body -- for the body is pure rhythm -- the musicality of conversation is viscerally obvious.

在我的沙龙平台 Interintellect 上,我们教新主持人的一件核心事情,就是如何鼓励、允许并管理沉默。这对人类来说同样违反直觉,甚至让人害怕。但对任何拥有身体的人来说,身体本身就是纯粹的节奏,因此对话的音乐性其实是一种直接可感的东西。

2- Priority:

2- 优先级

A challenge for anyone hosting a conversation -- or sometimes just participating in one -- is how much people can stay in their own heads while seemingly engaging with another human. How many times someone is talking and you're already fully focused on what you want to say next!

对任何主持一场对话的人来说,或者有时只是参与对话的人来说,一个挑战在于,人们明明看似在与另一个人交流,却仍能高度困在自己的脑子里。这样的时刻有多少次呢,明明有人正在说话,而你已经完全在想着 下一句要说什么了!

On Interintellect, which hosts fixed time, fixed theme, intentional gatherings, we help people come out of their shell by fostering an atmosphere of "easy mic" -- everybody knows they will get the mic soon, and so the impatience element is completely gone. We also, in the case of online salons, use the chat a lot where people can leave notes for others or self. At IRLs salons, I see people taking notes to free up mindspace.

Interintellect 举办的是固定时长、固定主题、带有明确意图的聚会。我们通过营造一种轻松拿麦的氛围,帮助人们走出自己的壳。每个人都知道,很快就会轮到自己发言,于是那种不耐烦的成分就完全消失了。在线上沙龙里,我们也会大量使用聊天区,人们可以在那里给别人留便条,也可以给自己记笔记。在线下沙龙里,我则常看到人们记笔记,以此腾出脑内空间。

When we have a big celeb on, we ensure it is never 1:1 and then 50 minutes later we open to the audience. We tell attendees in advance that we will do only 10 mins of 1:1, then 10 mins audience, then 10 mins 1:1, ... etc. This helps prevent the audience's mental constipation: everyone can just be fluid and present, playing with ideas, listening to each other real-time.

当我们请来很有名的嘉宾时,我们会确保流程绝不是先 1 对 1 对谈,然后 50 分钟后才开放给观众。我们会提前告诉参与者,我们只会进行 10 分钟 1 对 1,再 10 分钟观众提问,再 10 分钟 1 对 1,依此类推。
这有助于防止观众出现精神性便秘:每个人都能保持流动与在场,在想法之间玩耍,实时倾听彼此。

This I don't think LLMs got right yet. It happens to me a ton of times that Claude or GPT starts talking, and I am already at my next question, and just skip or stop them.

这一点,我觉得 LLM 也还没有做好。很多时候,Claude 或 GPT 一开口,我已经在想下一个问题了,然后就直接跳过它们,或者打断它们。

3 - Phatic love

3 - 寒暄之爱

"Phatic" communication is what we call all parts of speech that don't really convey information, they're just there to make us bond and feel better. From "how are you"s to jokes, small talk is not to be looked down upon! It serves an important physiological purpose: it puts us in the mood, it helps start the "music".

所谓 phatic communication,指的是那些并不真正传递信息的言语成分,它们存在,只是为了让我们建立连接、感觉更好。从你好吗到玩笑,闲聊不该被看低。它有重要的生理作用:它让我们进入状态,帮助音乐开始。

Phatic comms can be very formulaic, e.g., with a total stranger whose store you've just walked into. But with people we know it is full of context. Reminders, repetition, reassurance. The LLM experience would be much warmer if phatic elements were more integral to it. (Claude's warm, changing welcome is a good start.)

寒暄式交流可以非常程式化,比如你刚走进一个陌生人的店里时会发生的那种。但在熟人之间,它又充满语境。提醒、重复、安抚。如果 phatic 元素能更深地融入 LLM 体验,这种体验会温暖得多。(Claude 那种温暖且会变化的欢迎语,就是一个不错的开始。)

4 - Availability

4 - 可获得性

The very first incarnation of Interintellect was an AI powered chat app called Ixy (after "mutual information") aiming at making written communication between loved ones better. The two years of research that I conducted for it independently (this was ancient GPT2 times) were instrumental for today's good vibes on Interintellect, and the fact that after tens of thousands of conversations (across lockdowns, elections, wars) we have had 0 toxic incident at any of our live public salons even though most attendees are strangers.

Interintellect 最早的形态,其实是一个由 AI 驱动的聊天应用,叫 Ixy,名字来自 mutual information,目标是让亲密关系中的书面沟通变得更好。我当时为它独立做了两年的研究,那还是 GPT2 的远古时代。这些研究对今天 Interintellect 上那种良好的氛围起了决定性作用。也正因如此,在经历了数以万计的对话之后,跨越封锁、选举、战争,我们的所有线下公开沙龙里都没有发生过一次有毒事件,尽管大多数参与者彼此都是陌生人。

One thing my old research focused on was asynchrony. A lot of our data pointed at how text conversations can go bad because they simultaneously assume constant availability while cannot guarantee it.

我当年研究的一个重点是异步性。大量数据都指向一点:文字对话之所以会变糟,是因为它一边默认双方始终在线,一边又根本无法保证这一点。

In linguistics, we always look at alignment. Two people are talking in a living room, they will make efforts to speak the same language, find the same volume, use a similar vocabulary. In short, they will try to maximize mutual information.

在语言学里,我们总是关注对齐。两个人在客厅里说话,他们会努力说同一种语言,找到相同的音量,使用相近的词汇。简而言之,他们会尽量最大化 mutual information。

This is far more complicated over text, where we are both more and less honest and more and less present than in real life. My sense is because LLMs are writing-based (even our audio is transcribed, and the AI "reads out" to us a text it generates in written form) they inherited some of these issues from human texting.

而在文字交流中,这件事要复杂得多。因为相较于现实生活,我们在文字里既更诚实,也更不诚实;既更在场,也更不在场。我的感觉是,由于 LLM 以书写为基础,即便是语音也会先被转写,而 AI 再把自己以书面形式生成的文本读给我们听,它们因此继承了人类短信交流中的一部分问题。

Of course, LLMs are always available. With that, humans cannot compete. But so much of human communication is physical -- rhythm, sensation, excitement, goosebumps, sweat ... and absence which makes presence valuable -- that right now I am not worried the literary salon where people can come together to think together could be replaced anytime soon.

当然,LLM 永远在线。光这一点,人类就无法竞争。但人类沟通中有太多东西是身体性的,节奏、感觉、兴奋、起鸡皮疙瘩、出汗,还有 缺席,正因为缺席,在场才显得珍贵。所以眼下我并不担心,那种让人们聚在一起共同思考的文学沙龙,会在短期内被替代。

But building better communication tools for humans to use with each other -- powered by AI or just plain good human thinking -- remains an essential task ahead.

但无论是借助 AI,还是单纯依靠更好的人的思考,去打造更好的沟通工具,让人类彼此更好地交流,仍然是一项摆在前方的重要任务。

Having run a conversation salon platform for 7 years, we've learned so much about human communication that I don't (yet) see LLMs get right.

1- Musicality:

Human conversation is incredibly musical in that it is all about the rhythm. After the entry point, people relax into the melody or get upset by it. The "music" can be a solo, a duet, or a symphony when it's a group conversation. A human discussion will be as positive or constructive as the "music" that it becomes allows.

As with music, a key element in human conversation is silence. When there is a gap, people can process, connect, think. In the 1970s the couple's therapist John Gottman tried to mathematize his sessions with patients, and found something similar. Esther Perel also told me that in couple's therapy (one of the highest stakes conversations a person can have), the rhythm and musicality are more important than what is being said. Counterintuitive but true.

Even in text messages, people have learned instinctively how to create silent gaps -- those moments of not-speaking which you can use to make a point, to show dissatisfaction, or emphasize love and presence. I don't see LLMs daring to do this yet.

On Interintellect, my salon platform, one of the main things we teach new salon hosts is how to encourage, allow, and manage silence. It is counterintuitive, even scary, for humans too. But to anyone with a body -- for the body is pure rhythm -- the musicality of conversation is viscerally obvious.

2- Priority:

A challenge for anyone hosting a conversation -- or sometimes just participating in one -- is how much people can stay in their own heads while seemingly engaging with another human. How many times someone is talking and you're already fully focused on what you want to say next!

On Interintellect, which hosts fixed time, fixed theme, intentional gatherings, we help people come out of their shell by fostering an atmosphere of "easy mic" -- everybody knows they will get the mic soon, and so the impatience element is completely gone. We also, in the case of online salons, use the chat a lot where people can leave notes for others or self. At IRLs salons, I see people taking notes to free up mindspace.

When we have a big celeb on, we ensure it is never 1:1 and then 50 minutes later we open to the audience. We tell attendees in advance that we will do only 10 mins of 1:1, then 10 mins audience, then 10 mins 1:1, ... etc. This helps prevent the audience's mental constipation: everyone can just be fluid and present, playing with ideas, listening to each other real-time.

This I don't think LLMs got right yet. It happens to me a ton of times that Claude or GPT starts talking, and I am already at my next question, and just skip or stop them.

3 - Phatic love

"Phatic" communication is what we call all parts of speech that don't really convey information, they're just there to make us bond and feel better. From "how are you"s to jokes, small talk is not to be looked down upon! It serves an important physiological purpose: it puts us in the mood, it helps start the "music".

Phatic comms can be very formulaic, e.g., with a total stranger whose store you've just walked into. But with people we know it is full of context. Reminders, repetition, reassurance. The LLM experience would be much warmer if phatic elements were more integral to it. (Claude's warm, changing welcome is a good start.)

4 - Availability

The very first incarnation of Interintellect was an AI powered chat app called Ixy (after "mutual information") aiming at making written communication between loved ones better. The two years of research that I conducted for it independently (this was ancient GPT2 times) were instrumental for today's good vibes on Interintellect, and the fact that after tens of thousands of conversations (across lockdowns, elections, wars) we have had 0 toxic incident at any of our live public salons even though most attendees are strangers.

One thing my old research focused on was asynchrony. A lot of our data pointed at how text conversations can go bad because they simultaneously assume constant availability while cannot guarantee it.

In linguistics, we always look at alignment. Two people are talking in a living room, they will make efforts to speak the same language, find the same volume, use a similar vocabulary. In short, they will try to maximize mutual information.

This is far more complicated over text, where we are both more and less honest and more and less present than in real life. My sense is because LLMs are writing-based (even our audio is transcribed, and the AI "reads out" to us a text it generates in written form) they inherited some of these issues from human texting.

Of course, LLMs are always available. With that, humans cannot compete. But so much of human communication is physical -- rhythm, sensation, excitement, goosebumps, sweat ... and absence which makes presence valuable -- that right now I am not worried the literary salon where people can come together to think together could be replaced anytime soon.

But building better communication tools for humans to use with each other -- powered by AI or just plain good human thinking -- remains an essential task ahead.

📋 讨论归档

讨论进行中…