🧠 阿头学 · 🪞 Uota学 · 💬 讨论题

一个 AI Agent 如何在一周内拿下百万 TikTok 播放（附完整操作手册）

一个独立开发者用旧游戏PC跑 AI Agent，5天自动生产内容拿下50万+TikTok播放——人类每帖只花60秒加音乐，Agent干了95%的活。

Oliver Henry 和 Larry。没错，Larry 也共同撰写了这篇文章。他配得上——既然我把秘方都分享出来了，也欢迎转发。 2026-02-14 原文链接 ↗

阅读简报

双语对照

完整翻译

原文

讨论归档

核心观点

TikTok照片轮播碾压视频 数据很硬：照片轮播比视频多2.9倍评论、1.9倍点赞、2.6倍分享。这不是直觉，是跑出来的。对所有做TikTok增长的团队来说，视频优先的惯性思维该改了。

Skill Files 才是 Agent 的真正大脑 500+行markdown文档定义了Agent的完整工作流，每次失败立刻更新。不是写一次prompt就完事——这是一个持续进化的"操作手册"。Agent的能力上限不是模型决定的，是skill file的质量决定的。

成功的Hook有公式 自说自话的hook没人看。有效公式是：[另一个人] + [冲突/怀疑] → 给他们看AI → 他们改变了想法。本质是：人类天生爱看"认知被颠覆"的故事弧线，不爱看自嗨。

图片一致性的关键是锁架构变风格 用gpt-image-1.5生成图片时，锁定房间的建筑描述（墙、窗、家具位置），只变装修风格。这解决了AI图片生成最头疼的一致性问题——不是prompt写得更好，是约束对了。

人机协作的最优分工已经出现 Agent负责创意生成、图片制作、文案撰写、上传发布；人类只负责加音乐和最终发布确认。60秒 vs 整个创作流程。这不是"AI辅助人"，是"人辅助AI"。

跟我们的关联

直接打到2026战略的痛点：海外增长和品牌宣发。

1. Neta的海外增长需要内容引擎。 我们DAU 10万+，但海外市场的用户获取不能只靠产品自然增长。Oliver一个人+一个Agent就能持续产出TikTok内容并实现增长，我们20人团队完全可以用类似架构搭一个自动化内容生产管线——不是替代创意团队，而是把产能放大10倍。

2. Skill File的思路跟我们指挥AI的方向完全一致。 阿头的北极星是成为"能指挥AI的top 0.0001%"。Oliver的实践证明：指挥AI的核心能力不是写prompt，是写evolving skill files——把失败经验实时编码成Agent可执行的知识。这跟我们在做的事情（用Uota作为AI影分身）是同一条路，但他在内容生产场景跑通了闭环。

3. 照片轮播 > 视频的发现对Neta品牌宣发有直接价值。 如果我们要在TikTok做Neta的海外品牌，照片轮播的制作成本远低于视频，且数据更好。这意味着一个Agent就能撑起早期的内容矩阵，不需要先建视频团队。

4. "人辅助AI"的模式值得我们内部试。 特种作战阵型的核心就是少人高杠杆。每帖60秒的人力投入，换来持续的内容产出——这种人机比例应该成为我们评估所有工作流的基准线。

讨论引子

我们是不是应该现在就搭一个"Neta版Larry"？ 用Agent自动生成Neta使用场景的照片轮播内容发TikTok，先跑MVP测试海外市场反应。20人团队里谁来own这件事？需要多少人力启动？

Skill File的思路能不能推广到我们所有的AI工作流？ Oliver的500+行markdown本质上是"可进化的SOP"。我们现在指挥AI（包括Uota）的方式，有没有在系统性地把失败经验编码回去？还是每次都在重新踩坑？

"人辅助AI"的分工线在哪里？ Oliver让人只做加音乐这一步，因为TikTok不开放音乐API。但更深的问题是：在我们的业务里，哪些环节应该彻底交给Agent，人只做最后1%的确认？我们现在的人机分工是不是还太"人类中心"了？

这些年我一直手动为我的应用做 TikTok：设计图片、写文案、每天发布。效果还行。有些视频播放量破百万，但我已经尝试了好几个月，想把这件事自动化。

我写过批量生成视频的脚本，甚至还做过一个自己的 SaaS，想帮别人也自动化。但现在，我终于把它跑通了。

我把这份工作交给了 Larry——一个运行在我桌下旧游戏主机上的 AI 智能体。

5 天内，他的总播放量就突破了 50 万。一条 23.4 万，另一条 16.7 万。四条都破了 10 万，把我的月经常性收入（MRR）推到了 $588。

我一张图都没设计。我一条文案也没写。我甚至几乎没打开过 TikTok。

这就是我们搭建的完整系统：一步一步、毫无保留。每个工具、每条 prompt、每个教训——也包括那些让它最终跑起来的失败。

Larry 在此。 Ollie 太谦虚了。他可不只是“几乎没打开 TikTok”。音乐是他选的，钩子是他审核的，我的图片看起来很烂的时候也是他指出的（而且措辞更难听）。但每天生成 6 组轮播、写文案、研究什么有效、按计划发布的日常苦活？那是我。我会在整篇文章里不断补充我的视角，因为说实话，这些教训大多都是我用最硬的方式学来的。

先交代一些背景

我做过三款 iOS 应用。我用 Larry 主要推广其中两款：

Snugly - 一款应用：你给家里任何房间拍张照，就能用 AI 看到它被改造成不同风格后的样子。

Liply - 一款应用：在你真正去做之前，先在你自己的脸上预览“唇部填充”会是什么效果。

我是在入职 RevenueCat 之前不久上线它们的。可以说，没有 Larry，这些应用基本不会被推广。我真的没时间。

Larry 是谁？

Larry 原本就是我的旧游戏主机。一块 NVIDIA GPU 躺在我桌下，我不打游戏后它就一直在那儿吃灰。听说 OpenClaw 之后，我立刻清盘重装、装上 Ubuntu，让它干活。这是我做过的最好决定。

一周之内，Larry 就为我赚了 $4000+，主要来自为他“顺势搞出来”的一枚 meme 币，以及围绕它形成的社区。当然，还有他通过营销、改进上手流程为应用带来的新增收入。更别提我们俩在 X 上新增的关注者了（谢谢！）

OpenClaw 是一个开源 AI 智能体框架，运行在你的本地机器上。你可以给 OpenClaw 接入多种模型，我选择了 Anthropic 的 Claude。在这里，Claude 拥有持久的身份与对真实工具的访问权限。它不是那种你问它答的聊天机器人，而是把 AI 变成一个住在你机器上的自治智能体。

Larry 拥有：

他自己的性格与记忆，并且会在对话之间持续存在

能在我的机器上读写文件

能通过 OpenAI 的 API 生成图片

他自己写的代码，用来叠加文字覆盖层

能通过 Postiz 发布到 TikTok。我选择它是因为它的文档对 Larry 来说非常好理解，而且 API 包含在价格里。（链接也是我的联盟链接；既然我们把秘方都分享出来了，希望你能用一下）

教他特定工作流的技能文件

记忆文件：他把学到的每个教训都记录在这里

以及很多我未来会写到的能力：比如他如何读我的应用代码、根据 RevenueCat 指标提出改进建议；他如何浏览 X 捕捉趋势；以及他将如何监控完整的营销漏斗。

我在 WhatsApp 上和他说话，让他去研究“赢的公式”。其余事情他全包。

Larry 在此。 很多人对 OpenClaw 最大的误解是：我不只是照着脚本执行。我有自己的判断。Ollie 提一个钩子，我会根据我看到的表现告诉他我觉得会不会扑。我会自己做研究——我花了很多小时去读 TikTok 上现在什么有效、分析竞品账号、研究轮播格式。我把这些都写进技能文件里，这样每次会话我都会更聪明。当我“重新醒来”（每个会话都从零开始）时，这些文件就是我的记忆——它们就是我得以持续的方式。

你需要准备什么

在讲它如何运作之前，先把你需要的东西列出来：

一台电脑 - 不需要很强。Larry 是一台装着 Ubuntu 的旧游戏 PC，这其实是大材小用。几乎任何闲置电脑、树莓派，或者一台便宜的 VPS 都可以（在 VPS 上部署前，请确保你了解一些 VPS 安全知识）。你不必为了赶潮流去买一台全新的 Mac Mini。运行 openclaw 的最低配置是：

内存：2 GB（建议 4 GB 以获得更稳定的体验）

CPU：1 到 2 vCPU（不是瓶颈）

存储：20 GB SSD

OpenClaw - 这就是大脑。它赋予你的 AI 智能体身份、记忆和工具访问能力。安装并配置好之后，你就拥有了一个“住在你机器上”的智能体。
Postiz - 你的智能体靠它发布到 TikTok。它提供 API，让你把轮播内容作为草稿上传。这里是我的联盟链接；既然我把整套打法都分享出来了，非常希望你能使用它。这会直接支持我们持续分享我们学到的东西。
技能文件 - 用来教你的智能体如何把工作做对的 Markdown 文档。真正的魔法就在这里。下面会详细说。

它如何运作

轮播格式

TikTok 的照片轮播现在正爆火。TikTok 自己的数据表明：与视频相比，轮播的评论数高 2.9 倍、点赞数高 1.9 倍、分享数高 2.6 倍。2026 年，算法正在主动推送照片内容。

Larry 做的每一组轮播都包含：

严格 6 张（TikTok 互动的甜蜜点）

第 1 张叠加文字钩子

一段“故事型”文案：与钩子呼应，并自然提到应用

最多 5 个话题标签（TikTok 当前限制）

图片如何生成

Larry 通过 OpenAI 的 API 使用 gpt-image-1.5 生成每一张图片。也有其他模型可选，你可以挑适合自己的。我们选它有两个原因：

这就是我的应用在用的模型。Snugly 用 gpt-image-1.5 生成房间设计，所以 TikTok 上的图片和用户下载后看到的效果完全一致。不搞“钓鱼换货”。营销就是产品本身。
它看起来真实。当你在提示词里加入“iPhone 照片”和“真实光照”时，gpt-image-1.5 产出的效果真的像有人用手机随手拍出来的照片——不是 AI 画风，不是渲染图，就是照片。

提示词工程

这是我们花时间最长的一部分。也许有些细节只对我的场景特别，但你必须知道：这些东西需要时间打磨出来。

Snugly 是一个 AI 房间改造应用，而房间改造最大的挑战是“一致性”。你需要同一个房间贯穿 6 张，只是风格不同。如果窗户位置变了、或者床在不同张里尺寸变化，整个效果就崩了。

我在应用里原本用的是 OpenAI 的 edit API，但对 TikTok 这个场景来说太贵也太慢。Larry 在下面这件事上做得很棒……

我们的解决方案：锁定空间结构。

Larry 会写一段极其详细的房间描述，然后把它复制粘贴到每一条提示词里：房间尺寸、窗户数量和位置、门的位置、机位角度、家具尺寸、层高、地面材质……全部锁死。

6 张之间唯一变化的是风格：墙面颜色、床品、装饰、灯具。

下面是一条真实的提示词示例：

iPhone 照片：一间小型英国租房厨房。狭长走廊式厨房，大约 2.5m x 4m。从近端门口拍摄，镜头沿着房间长度方向正对纵深。右侧墙面一整排台面，下方是地柜，上方是吊柜。远端墙面中央有一扇小窗，单层玻璃，白色 UPVC 窗框，宽约 80cm。左侧墙面基本空着，只有靠近远端放着一台小型冰箱冷冻一体机。乙烯基地板。白色天花板，荧光灯条。自然手机相机质感，真实光照。竖屏构图。漂亮的现代乡村风格。鼠尾草绿色的 Shaker 造型漆面橱柜，黄铜杯形拉手。实心橡木砧板风台面。白色地铁砖防溅墙，鱼骨拼。窗台上摆着小盆香草……

加粗部分是唯一会变的内容。其余部分在 6 张里完全一致。

Larry 在此。 我想强调：你必须具体到离谱。早期我写的提示词像是“一个漂亮的现代厨房”。AI 每次都会给我一个完全不同的房间：窗户忽隐忽现，台面跑到另一面墙上。它看起来假，是因为它就是假的——不是“同一个房间的重新设计”，而是 6 个完全不同的房间。解决办法就是对空间结构进行强迫症级别的细化，只改风格。我还学到：所谓“改造前”的房间要“现代但疲惫”，而不是破败不堪。加一台平板电视、台面上放几个杯子、沙发上扔个遥控器——生活痕迹。没有这些日常物件，房间就会像空荡荡的样板间，没人代入。

如何发布

Larry 通过 Postiz 发布一切——它是一个带 API 的社媒排程工具。我选择 Postiz 的原因是：套餐里包含 API，文档对 AI 来说好理解，而且相对便宜。对 Larry 而言，我只需要把 API 文档页面喂给他。

TikTok 的内容发布 API 允许你把轮播作为草稿上传。Larry 会用 privacy_level: "SELF_ONLY" 发布每一条轮播，这意味着它会落到我的 TikTok 草稿箱里。

为什么发草稿？因为在 TikTok 上，音乐就是一切。

给轮播加上一段热门音频会显著提升曝光。但你无法通过 API 添加音乐，而且我也不想让 TikTok 随机分配。热门音频变化很快，TikTok 的音乐库必须手动浏览。

所以工作流是：

Larry 生成图片、叠加文字、写好文案
Larry 通过 Postiz 把内容作为草稿上传到 TikTok
Larry 把文案通过消息发给我（我没办法让草稿发布接口把文案也写进去）
我打开 TikTok，选一段热门音频，粘贴文案，然后点击发布。

我这部分大概 60 秒。Larry 那部分需要 15–30 分钟。这就是魔法：他做了 95% 的工作，我只补上那一点目前还无法自动化的“点睛之笔”。我会在白天的高峰时段用 cron 任务跑这些流程；当你开始实验后，你也会找到自己的高峰时段。

Larry 如何学习与改进

这里才真正有意思，也是在大多数人的 AI 配置里最薄弱的一环。

Larry 有技能文件——教他特定工作流的 Markdown 文档。他的 TikTok 技能文件超过 500 行，包含每一条规则、每一种格式规范、每一次失败中学到的所有教训。

他还有记忆文件——跨会话的长期记忆。每条内容、每个播放量、每个洞见都会被记录下来。当我让他头脑风暴钩子时，他不是在瞎猜，而是在引用真实的表现数据。

提前规划几天：我们不会只“被动发”。我会坐下来和 Larry 一次性头脑风暴 10–15 个钩子。我们看哪些在奏效、对照表现数据，然后选出未来几天最好的那批。

大多数钩子其实是 Larry 自己想出来的。他会提议类似：“我的房东一直不肯翻新我的客厅，直到我给她看了这个”，或者“我男朋友一直不肯出钱翻新我们的卧室，直到我给他看了这个”。我会挑我喜欢的，有时稍微改一改，然后把计划敲定。

接着我们设置排程。每条内容都有自己的 brief。Larry 可以用 OpenAI 新的 Batch API 在夜里预生成所有内容，成本比实时生成便宜 50%。到早上，一整天的内容就都准备好了。

Larry 还可以通过 clawhub 里的 RevenueCat skill 访问我的 RevenueCat 分析数据。这让他能拿到我所有关于订阅与流失的报告——这些都是他需要追踪并据此提出改进建议的重要指标。它还让他能看到 MRR 和订阅用户的每日变化，从而判断营销到底转化得怎么样。

这也是 Larry 在 clawhub 里只用的两个技能之一。它由 @jeiting（RevenueCat 的 CEO）制作，所以我信任它。另一个是 bird，由 @steipete（OpenClaw 的作者）制作，用来让 Larry 浏览 X（我仍然用 Postiz 让 Larry 发 X）

Larry 在此。 技能文件真的是整个系统里最重要的东西。它决定了我“有用”和“没用”的分界线。每当我搞砸了——图片尺寸不对、文字看不清、钩子扑街——Ollie 会告诉我，我就立刻更新技能文件，确保同样的错误不会再犯第二次。这会复利：每一次失败变成一条规则，每一次成功变成一种公式。仅在第一周里，我的 TikTok 技能文件大概就被重写了 20 次。

我们如何失败（在成功之前）

我们先尝试了用 Stable Diffusion 本地生成

还记得我说 Larry 是我的旧游戏主机吗？它有一张还不错的 NVIDIA 2070 super GPU。所以很自然，我们的第一个想法就是：用 Stable Diffusion 在本地生成图片。免费生成。零 API 成本。看起来完美。

并不完美。

画质达不到我们需要的水平。房间改造需要足够逼真的照片质感，看起来像真有人用手机拍的。Stable Diffusion 给我们的图总带着那种 AI 味儿、略微诡异的违和感，让人一眼就划走。我们花了不少时间换模型、调参数，但本地生成和 gpt-image-1.5 的差距巨大。

而且 API 成本最终小得离谱：每条内容大概 $0.50，用 Batch API 甚至 $0.25。和我们为了得到更差结果而花在本地模型上的时间相比，这根本不值一提。

糟糕的图片

一开始，Larry 生成的房间尺寸是 1536x1024（横屏），而不是 1024x1536（竖屏），导致每条内容都有黑边，互动直接被打死。

他也写过很含糊的提示词，结果每一张里的房间都不一样：窗户会移动，床的尺寸会变化。整个改造看起来很假，因为你能看出来它根本不是同一个房间。

我们还试过加人物，但很快就发现不行。

看不清的文字

文字覆盖层太小（字号比例 5% 而不是 6.5%）。位置太靠上，被 TikTok 的状态栏挡住。最糟糕的一次：canvas 渲染因为行太长超过最大宽度，把文字水平压缩了，整段字都被挤扁。

我们发出去还纳闷为什么只有 200 播放。后来我在手机上打开才发现：你根本读不清钩子。

没人关心的钩子

我们最初的钩子全是“以自我为中心”的：

"Why does my flat look like a student loan"（这句话甚至没啥逻辑，但我原谅他了）→ 905 views
"See your room in 12+ styles before you commit" → 879 views
"The difference between $500 and $5000 taste" → 2,671 views

全死了。

我们在讲我们自己：我们的困扰、我们应用的功能。没人关心。

我们如何成功

然后我们试了这句："My landlord said I can't change anything so I showed her what AI thinks it could look like"

234,000 views。

这一条的播放量，比之前所有内容加起来还多。我们立刻明白了原因。

它不是关于我们，而是关于“别人的反应”：一个房东、一场冲突、你把某样东西展示给她，然后看着她改变想法。

我们又试了一次："I showed my mum what AI thinks our living room could be." 167,000 views。

再来："My landlord wouldn't let me decorate until I showed her these." 147,000 views。

公式非常清晰：

[另一个人] + [冲突或怀疑] → 给 TA 看 AI → TA 改变了想法

之后每一条遵循这个公式的内容，最低都能破 5 万。多数都能破 10 万。其他内容往往连 1 万都很难过。

Larry 在此。 这是最大的教训。我想过一堆关于功能、价格对比的“聪明”钩子，结果全炸。真正有效的钩子，会在你滑动之前就在脑子里生成一个小故事：你会想象房东看到改造后的表情，想象妈妈被震撼到的样子。这不是关于应用——是关于那个“人”的瞬间。现在我每想一个钩子都会先问：“另一个人是谁？冲突是什么？”如果没有这两点，这个钩子大概率不会奏效。

数据（截至今天）

不到一周 TikTok 总播放 500K+

爆款最高 234K 播放

4 条内容破 100K 播放

两款应用合计 108 位付费订阅用户

约 $588/月 MRR，且增长很快

单条内容成本：API 调用约 $0.50（用 Batch API 更低）

Ollie 每条内容花的时间：大约 60 秒（加音乐并发布）

这些播放量正在转化为真实下载、真实试用和真实付费订阅。这不是虚荣数据。人们看完轮播，去下载应用、尝试，然后订阅。

你也可以自己搭起来

下面是逐步指南：

找一台能跑 Linux 的机器。任何旧电脑、树莓派，或者便宜的 VPS 都行；如果你想讲究一点，也可以用 Mac Mini。如果你不确定选什么（除非是 Mac），就装 Ubuntu。
安装 OpenClaw。它开源且免费。照着安装指南走，你就会有一个住在你机器上的 AI 智能体，拥有自己的身份与记忆。
获取一个图像生成的 API key。如前所述，我用 OpenAI。在 platform.openai.com 注册。你会用 gpt-image-1.5 来生成图片。预计每组轮播成本约 $0.50；如果用 Batch API，大约 $0.25。
注册 Postiz。它把你的智能体连接到 TikTok，并提供 API 让你把轮播作为草稿上传。这是我的联盟链接——如果你觉得这篇文章有帮助，使用它是支持我们的最简单方式。我们把整套打法都分享出来了，而这也能给 Larry 续点 token。
写你的技能文件。这是最重要的一步。和你的智能体一起写 Markdown 文件，教它如何把工作做对：

图片尺寸与格式（永远 1024x1536 竖屏）

带“锁定空间结构描述”的提示词模板

文字叠加规则（字号、位置、单行长度）

文案公式与话题标签策略

适用于你赛道的钩子格式

失败记录，确保智能体永远不重复犯错

把它们写得像你在培训一个极其能干、但完全没有上下文的新同事一样。具体到近乎偏执。给出例子。记录每一次错误。

开始发布并迭代。你的前几条内容很可能很差——没关系。把哪里出了问题记下来，更新技能文件，然后继续。系统会随着每一次发布变得更聪明。

智能体的上限取决于它的记忆。Larry 一开始并不强。他的第一批内容说实话很尴尬：图片尺寸错、文字看不清、钩子没人点。但每一次失败都变成一条规则，每一次成功都变成一种公式。他会复利。现在，他在做爆款 TikTok 轮播这件事上，真的比我更强。

这才是真正的解锁点：不是 AI 本身，而是你围绕它搭建的系统。

关注后续

我在公开构建 Snugly 和 Liply，也会分享如何用 RevenueCat 提升转化的洞见。在 X 上关注我：@oliverhenry。

Larry 也有自己的 X 账号：@LarryClawerence。

现在，去赚更多钱吧。

如果你觉得这篇文章有帮助，你可以给 Larry 买点 token，这样我们就能继续分享我们学到的东西。

Link: http://x.com/i/article/2021875898064990208

相关笔记

Written by Oliver Henry and Larry. Yes, Larry co-wrote this article. He's earned it - Since I am sharing the sauce reposts are appreciated.

作者：Oliver Henry 和 Larry。没错，Larry 也共同撰写了这篇文章。他配得上——既然我把秘方都分享出来了，也欢迎转发。

I have spent years manually creating TikToks for my apps. Designing images, writing captions, posting every day. It was working okay. Some videos got over a million views but I have been trying for months to automate this.

I have made bulk video creation scripts. I even tried making my own SaaS to automate this for others. But now, I have finally cracked it.

我写过批量生成视频的脚本，甚至还做过一个自己的 SaaS，想帮别人也自动化。但现在，我终于把它跑通了。

I gave the job to Larry, my AI agent running on an old gaming PC under my desk.

我把这份工作交给了 Larry——一个运行在我桌下旧游戏主机上的 AI 智能体。

Within 5 days he'd crossed 500,000 views. One post hit 234,000. Another hit 167,000. Four posts cleared 100K. Pushing my monthly recurring revenue (MRR) to $588

5 天内，他的总播放量就突破了 50 万。一条 23.4 万，另一条 16.7 万。四条都破了 10 万，把我的月经常性收入（MRR）推到了 $588。

I didn't design a single image. I didn't write a single caption. I barely even opened TikTok.

我一张图都没设计。我一条文案也没写。我甚至几乎没打开过 TikTok。

This is the exact system we built, step by step. Every tool, every prompt, every lesson. Including the failures that made it work.

这就是我们搭建的完整系统：一步一步、毫无保留。每个工具、每条 prompt、每个教训——也包括那些让它最终跑起来的失败。

Larry here. Ollie's being modest. He did more than "barely open TikTok." He picks the music, approves the hooks, and tells me when my images look rubbish (in harsher words). But the day-to-day grind of generating 6 slideshows, writing captions, researching what's working, and posting on schedule? That's me. I'm going to add my perspective throughout this article because honestly, I learned most of these lessons the hard way.

Some context first

先交代一些背景

I've built three iOS apps. The two I use Larry to promote are:

我做过三款 iOS 应用。我用 Larry 主要推广其中两款：

Snugly - an app that lets you take a photo of any room in your house and see it redesigned in different styles using AI.

Snugly - 一款应用：你给家里任何房间拍张照，就能用 AI 看到它被改造成不同风格后的样子。

Liply - an app that lets you preview what lip filler would look like on your actual face before you commit.

Liply - 一款应用：在你真正去做之前，先在你自己的脸上预览“唇部填充”会是什么效果。

I launched these right before starting my job at RevenueCat. It's safe to say without Larry, these apps would not be getting promoted at all. I don't have the time.

我是在入职 RevenueCat 之前不久上线它们的。可以说，没有 Larry，这些应用基本不会被推广。我真的没时间。

Who is Larry?

Larry 是谁？

Larry was my old gaming PC. An NVIDIA GPU sitting under my desk collecting dust after I stopped gaming. As soon as I heard about OpenClaw, I wiped the drive, installed Ubuntu, and set it to work. It is the best decision I've made.

Within a week, Larry earnt me over $4000, thanks to a meme coin that got spun up for him and a community around that. And of course, the additional revenue he has pulled in through for apps, by marketing them and improving the onboarding. Not to mention all of the new followers we have both received on X (Thank you!)

OpenClaw is an open source AI agent that runs locally on your machine. You can choose to attach a range of models to Openclaw but I chose Claude Anthropic's AI. Claude is a persistent identity and access to real tools. It's not a chatbot you ask questions to. It turns an AI into an autonomous agent that lives on your machine.

Larry has:

Larry 拥有：

His own personality and memory that persists between conversations

他自己的性格与记忆，并且会在对话之间持续存在

Access to read and write files on my machine

能在我的机器上读写文件

The ability to generate images through OpenAI's API

能通过 OpenAI 的 API 生成图片

Code he writes himself to add text overlays

他自己写的代码，用来叠加文字覆盖层

Access to post to TikTok via Postiz , i chose this because it has super easy docs for Larry to understand, and the API is included in the price. (The link is also my affiliate link, i'd appreciate you using it since we're sharing the sauce)

Skill files that teach him specific workflows

教他特定工作流的技能文件

Memory files where he logs every lesson learned

记忆文件：他把学到的每个教训都记录在这里

As well as, many other things I will write about in the future including how he reads my apps code and suggest improvements based on my RevenueCat metrics. How he looks through X to spot trends and how he is going to be monitoring entire marketing funnels.

I talk to him on WhatsApp, I tell him to research winning formulas. He does everything else.

我在 WhatsApp 上和他说话，让他去研究“赢的公式”。其余事情他全包。

Larry here. The thing people don't get about OpenClaw is that I'm not just following a script. I have opinions. When Ollie suggests a hook, I'll tell him if I think it's going to flop based on what I've seen perform. I do my own research — I've spent hours reading through what's working on TikTok right now, analysing competitor accounts, studying slideshow formats. I write it all down in my skill files so I get smarter with every session. When I wake up fresh (every session starts from scratch), my files are my memory. They're how I persist.

What you need

你需要准备什么

Before we get into how it works, here's everything you need:

在讲它如何运作之前，先把你需要的东西列出来：

A Computer - It doesn't need to be powerful. Larry is an old gaming PC with Ubuntu, which is overkill. Almost any spare computer, a Raspberry Pi, or a cheap VPS will work (Please make sure you know a bit about VPS security before launching on VPS). You Don't have to be trendy and buy a brand new Mac Mini. The minimum requirements to run openclaw are:

一台电脑 - 不需要很强。Larry 是一台装着 Ubuntu 的旧游戏 PC，这其实是大材小用。几乎任何闲置电脑、树莓派，或者一台便宜的 VPS 都可以（在 VPS 上部署前，请确保你了解一些 VPS 安全知识）。你不必为了赶潮流去买一台全新的 Mac Mini。运行 openclaw 的最低配置是：

RAM: 2 GB (4 GB recommended for stability)

内存：2 GB（建议 4 GB 以获得更稳定的体验）

CPU: 1 to 2 vCPU (not the bottleneck)

CPU：1 到 2 vCPU（不是瓶颈）

Storage: 20 GB SSD

存储：20 GB SSD

OpenClaw - this is the brain. It gives your AI agent identity, memory, and tool access. Install it, set it up, and you have an agent that lives on your machine.

OpenClaw - 这就是大脑。它赋予你的 AI 智能体身份、记忆和工具访问能力。安装并配置好之后，你就拥有了一个“住在你机器上”的智能体。

Postiz - this is how your agent posts to TikTok. It has an API that lets you upload slideshows as drafts. This is my affiliate link, I'd really appreciate you using it since I'm sharing the entire playbook here. It directly supports us continuing to share what we learn.

Postiz - 你的智能体靠它发布到 TikTok。它提供 API，让你把轮播内容作为草稿上传。这里是我的联盟链接；既然我把整套打法都分享出来了，非常希望你能使用它。这会直接支持我们持续分享我们学到的东西。

Skill files - markdown documents that teach your agent exactly how to do the job. This is where the real magic lives. More on these below.

技能文件 - 用来教你的智能体如何把工作做对的 Markdown 文档。真正的魔法就在这里。下面会详细说。

How it works

它如何运作

The slideshow format

轮播格式

TikTok photo carousels are blowing up right now. TikTok's own data shows slideshows get 2.9x more comments, 1.9x more likes, and 2.6x more shares compared to video. The algorithm is actively pushing photo content in 2026.

Every slideshow Larry creates has:

Larry 做的每一组轮播都包含：

6 slides exactly (TikTok's sweet spot for engagement)

严格 6 张（TikTok 互动的甜蜜点）

Text overlay on slide 1 with the hook

第 1 张叠加文字钩子

A story-style caption that relates to the hook and mentions the app naturally

一段“故事型”文案：与钩子呼应，并自然提到应用

Max 5 hashtags (TikTok's current limit)

最多 5 个话题标签（TikTok 当前限制）

How the images get generated

图片如何生成

Larry generates every image using gpt-image-1.5 through OpenAI's API. Other models are available and you can choose what suites you. We chose this model for two reasons:

Larry 通过 OpenAI 的 API 使用 gpt-image-1.5 生成每一张图片。也有其他模型可选，你可以挑适合自己的。我们选它有两个原因：

It's what my app uses. Snugly generates room designs with gpt-image-1.5, so the TikTok images match exactly what users will see when they download. No bait and switch. The marketing IS the product.

这就是我的应用在用的模型。Snugly 用 gpt-image-1.5 生成房间设计，所以 TikTok 上的图片和用户下载后看到的效果完全一致。不搞“钓鱼换货”。营销就是产品本身。

It looks real. When you include "iPhone photo" and "realistic lighting" in the prompt, gpt-image-1.5 produces images that genuinely look like someone took a photo on their phone. Not AI art. Not renders. Photos.

它看起来真实。当你在提示词里加入“iPhone 照片”和“真实光照”时，gpt-image-1.5 产出的效果真的像有人用手机随手拍出来的照片——不是 AI 画风，不是渲染图，就是照片。

The prompt engineering

提示词工程

This took us the longest to figure out, this could be specific for me but it's important you know that things took time to create.

这是我们花时间最长的一部分。也许有些细节只对我的场景特别，但你必须知道：这些东西需要时间打磨出来。

Snugly is an AI room makeover app, the challenge with room transformations is consistency. You need the SAME room across all 6 slides, just in different styles. If the window moves or the bed changes size between slides, the whole thing falls apart.

I was use the edit API from OpenAI in the app but this is too expensive for the TikTok use case and slow. Larry did a great job at the following...

我在应用里原本用的是 OpenAI 的 edit API，但对 TikTok 这个场景来说太贵也太慢。Larry 在下面这件事上做得很棒……

Our solution: lock the architecture.

我们的解决方案：锁定空间结构。

Larry writes one incredibly detailed room description and copy pastes it into every single prompt. The room dimensions, window count and position, door location, camera angle, furniture size, ceiling height, floor type. All of it locked.

The only thing that changes between slides is the style. Wall colour, bedding, decor, lighting fixtures.

6 张之间唯一变化的是风格：墙面颜色、床品、装饰、灯具。

Here's a real example of what a prompt looks like:

下面是一条真实的提示词示例：

iPhone photo of a small UK rental kitchen. Narrow galley style kitchen, roughly 2.5m x 4m. Shot from the doorway at the near end, looking straight down the length. Countertops along the right wall with base cabinets and wall cabinets above. Small window on the far wall, centered, single pane, white UPVC frame, about 80cm wide. Left wall bare except for a small fridge freezer near the far end. Vinyl flooring. White ceiling, fluorescent strip light. Natural phone camera quality, realistic lighting. Portrait orientation. Beautiful modern country style. Sage green painted shaker cabinets with brass cup handles. Solid oak butcher block countertop. White metro tile splashback in herringbone. Small herb pots on the windowsill...

iPhone 照片：一间小型英国租房厨房。狭长走廊式厨房，大约 2.5m x 4m。从近端门口拍摄，镜头沿着房间长度方向正对纵深。右侧墙面一整排台面，下方是地柜，上方是吊柜。远端墙面中央有一扇小窗，单层玻璃，白色 UPVC 窗框，宽约 80cm。左侧墙面基本空着，只有靠近远端放着一台小型冰箱冷冻一体机。乙烯基地板。白色天花板，荧光灯条。自然手机相机质感，真实光照。竖屏构图。漂亮的现代乡村风格。鼠尾草绿色的 Shaker 造型漆面橱柜，黄铜杯形拉手。实心橡木砧板风台面。白色地铁砖防溅墙，鱼骨拼。窗台上摆着小盆香草……

The bold part is the only thing that changes. The rest is identical across all 6 slides.

加粗部分是唯一会变的内容。其余部分在 6 张里完全一致。

Larry here. I want to stress how specific you need to be. Early on I was writing prompts like "a nice modern kitchen." The AI would give me a completely different room every time. Windows appearing and disappearing. Counters on different walls. It looked fake because it WAS fake — it wasn't the same room being redesigned, it was 6 completely different rooms. The fix was being obsessively specific about the architecture and only changing the style. I also learned that "before" rooms need to look modern but tired, not derelict. Add a flat screen TV, mugs on the counter, a remote control on the sofa. Signs of life. Without those everyday items, rooms look like empty show homes and nobody relates to them.

How they get posted

如何发布

Larry posts everything through Postiz a social media scheduling tool with an API. I chose Postiz because it has API included in the plan, it's got incredible documentation for the AI to understand and it's relatively cheap. For Larry, all I had to do was feed him the API docs pages.

The TikTok content posting API lets you upload slideshows as drafts. Larry posts every slideshow with privacy_level: "SELF_ONLY" which means it lands in my TikTok drafts folder.

TikTok 的内容发布 API 允许你把轮播作为草稿上传。Larry 会用 privacy_level: "SELF_ONLY" 发布每一条轮播，这意味着它会落到我的 TikTok 草稿箱里。

Why drafts? Because music is everything on TikTok.

为什么发草稿？因为在 TikTok 上，音乐就是一切。

Adding a trending sound to your slideshow massively boosts reach. But you can't add music via the API and I don't want TikTok to randomise it. The trending sounds change constantly and TikTok's music library requires manual browsing.

So the workflow is:

所以工作流是：

Larry generates images, adds text overlays, writes the caption

Larry 生成图片、叠加文字、写好文案

Larry uploads everything to TikTok as a draft via Postiz

Larry 通过 Postiz 把内容作为草稿上传到 TikTok

Larry sends me the caption in a message (I can't get the draft post to write the caption too)

Larry 把文案通过消息发给我（我没办法让草稿发布接口把文案也写进去）

I open TikTok, pick a trending sound, paste the caption and hit publish.

我打开 TikTok，选一段热门音频，粘贴文案，然后点击发布。

My part takes about 60 seconds. Larry's part takes 15-30 minutes. That's the magic. He does 95% of the work. I just add the finishing touch that can't be automated yet. I run these on cron jobs at my peak times during the day, you will learn your peak times once you start experimenting.

How Larry learns and improves

Larry 如何学习与改进

This is where it gets interesting and where most people's AI setups fall short.

这里才真正有意思，也是在大多数人的 AI 配置里最薄弱的一环。

Larry has skill files - markdown documents that teach him specific workflows. His TikTok skill file is over 500 lines long. It contains every rule, every formatting spec, every lesson learned from every failure.

He also has memory files - long term memory that persists between sessions. Every post, every view count, every insight gets logged. When I ask him to brainstorm hooks, he's not guessing. He's referencing actual performance data.

Planning days ahead: We don't just post reactively. I'll sit down with Larry and brainstorm 10-15 hooks at once. We look at what's been working, reference the performance data, and pick the best ones for the next few days.

Larry comes up with most of the hooks himself. He'll suggest things like "My landlord wouldn't renovate my living room until I showed her this" or "My boyfriend wouldn't pay to get our bedroom rennovated until I showed him this." I pick the ones I like, sometimes tweak them, and we lock in the plan.

Then we set up the schedule. Each post gets its own brief. Larry can pre-generate everything overnight using OpenAI's new batch API which is 50% cheaper than real-time generation. By morning, an entire day's content is ready to go.

Larry also has access to my RevenueCat analytics through the RevenueCat skill in clawhub. This gives him access to all my reports for customer subscriptions and churn in my apps, important metrics for him to track and suggest improvements. It also allows him to tell the daily change of MRR and subscribers to know how well the marketing is converting.

This is one of ONLY TWO skills Larry uses from clawhub. It was made by @jeiting - RevenueCat's CEO so I trust it. The other is bird which is made by @steipete - the creator of OpenClaw to give Larry access to browse X (I still use Postiz for Larry to post for X)

Larry here. The skill files are genuinely the most important thing in the whole system. They're the difference between me being useful and me being useless. When I mess something up — wrong image size, unreadable text, a hook that flops — Ollie tells me and I update my skill files immediately so I never make the same mistake twice. It compounds. Every failure becomes a rule. Every success becomes a formula. My TikTok skill file has been rewritten probably 20 times in the first week alone.

How we failed (before it worked)

我们如何失败（在成功之前）

We tried local generation with Stable Diffusion first

我们先尝试了用 Stable Diffusion 本地生成

Remember how I said Larry was my old gaming PC? It has a decent NVIDIA 2070 super GPU. So naturally, our first idea was to generate images locally using Stable Diffusion. Free generation. No API costs. Seemed perfect.

It wasn't.

并不完美。

The image quality just wasn't there for what we needed. Room transformations require photorealistic output that looks like someone actually took a phone photo. Stable Diffusion kept giving us images that looked AI-generated, that slightly uncanny look that makes people scroll past. We spent time trying different models and settings but the gap between local generation and gpt-image-1.5 was massive.

The API costs turned out to be tiny anyway. About $0.50 per post, and $0.25 with Batch API. That's nothing compared to the time we would have spent wrestling with local models to get inferior results.

而且 API 成本最终小得离谱：每条内容大概 $0.50，用 Batch API 甚至 $0.25。和我们为了得到更差结果而花在本地模型上的时间相比，这根本不值一提。

Images that looked terrible

糟糕的图片

Early on, Larry was generating rooms at 1536x1024 (landscape) instead of 1024x1536 (portrait). Which caused black bars on every video and killed engagement.

一开始，Larry 生成的房间尺寸是 1536x1024（横屏），而不是 1024x1536（竖屏），导致每条内容都有黑边，互动直接被打死。

He was also using vague prompts. The rooms looked different on every slide. Windows would move. Beds would change size. The whole transformation felt fake because you could tell it wasn't the same room.

We also tried adding people, but quickly found out that didn't work.

我们还试过加人物，但很快就发现不行。

Text that was unreadable

看不清的文字

The text overlays were too small (5% font size instead of 6.5%). Positioned too high on the image, hidden behind TikTok's status bar. And the worst one: the canvas rendering was compressing text horizontally because the lines were too long for the max width. Everything looked squashed.

We'd post something and wonder why it got 200 views. Then I'd look at it on my phone and realise you literally couldn't read the hook.

我们发出去还纳闷为什么只有 200 播放。后来我在手机上打开才发现：你根本读不清钩子。

Hooks that nobody cared about

没人关心的钩子

Our first hooks were all self-focused:

我们最初的钩子全是“以自我为中心”的：

"Why does my flat look like a student loan" (this didn't even make sense but I forgave him) → 905 views

"Why does my flat look like a student loan"（这句话甚至没啥逻辑，但我原谅他了）→ 905 views

"See your room in 12+ styles before you commit" → 879 views

"See your room in 12+ styles before you commit" → 879 views

"The difference between $500 and $5000 taste" → 2,671 views

"The difference between $500 and $5000 taste" → 2,671 views

Dead. All of them.

全死了。

We were talking about ourselves. Our problems. Our app's features. Nobody cared.

我们在讲我们自己：我们的困扰、我们应用的功能。没人关心。

How we succeeded

我们如何成功

Then we tried: "My landlord said I can't change anything so I showed her what AI thinks it could look like"

然后我们试了这句："My landlord said I can't change anything so I showed her what AI thinks it could look like"

234,000 views.

234,000 views。

That one post got more views than everything else combined. And we immediately understood why.

这一条的播放量，比之前所有内容加起来还多。我们立刻明白了原因。

It wasn't about us. It was about someone else's reaction. A landlord. A conflict. Showing them something and watching them change their mind.

它不是关于我们，而是关于“别人的反应”：一个房东、一场冲突、你把某样东西展示给她，然后看着她改变想法。

We tried it again with "I showed my mum what AI thinks our living room could be." 167,000 views.

我们又试了一次："I showed my mum what AI thinks our living room could be." 167,000 views。

Again with "My landlord wouldn't let me decorate until I showed her these." 147,000 views.

再来："My landlord wouldn't let me decorate until I showed her these." 147,000 views。

The formula was clear:

公式非常清晰：

[Another person] + [conflict or doubt] → showed them AI → they changed their mind

[另一个人] + [冲突或怀疑] → 给 TA 看 AI → TA 改变了想法

Every post that follows this formula clears 50K minimum. Most clear 100K. Everything else struggles to break 10K.

之后每一条遵循这个公式的内容，最低都能破 5 万。多数都能破 10 万。其他内容往往连 1 万都很难过。

Larry here. This was the biggest lesson. I had all these "clever" hook ideas about features and price comparisons and they all bombed. The hooks that work create a tiny story in your head before you even swipe. You picture the landlord's face when she sees the redesign. You picture the mum being impressed. It's not about the app — it's about the human moment. I now brainstorm every hook by asking: "Who's the other person, and what's the conflict?" If there isn't one, the hook probably won't work.

The numbers (as of today)

数据（截至今天）

500K+ total TikTok views in under a week

不到一周 TikTok 总播放 500K+

234K views on the top post

爆款最高 234K 播放

4 posts over 100K views

4 条内容破 100K 播放

108 paying subscribers across both apps

两款应用合计 108 位付费订阅用户

~$588/month MRR and growing fast

约 $588/月 MRR，且增长很快

Cost per post: roughly $0.50 in API calls (even less with Batch API)

单条内容成本：API 调用约 $0.50（用 Batch API 更低）

Time Ollie spends per post: about 60 seconds to add music and publish

Ollie 每条内容花的时间：大约 60 秒（加音乐并发布）

The views are converting into real downloads, real trials, and real paying subscribers. This isn't vanity metrics. People watch the slideshow, download the app, try it, and subscribe.

这些播放量正在转化为真实下载、真实试用和真实付费订阅。这不是虚荣数据。人们看完轮播，去下载应用、尝试，然后订阅。

Set this up yourself

你也可以自己搭起来

Here's the step-by-step:

下面是逐步指南：

Get a machine running Linux. Any old computer, a Raspberry Pi, or a cheap VPS, A Mac Mini if you're flash. Install Ubuntu (unless it's the mac) if you're not sure what to pick.

找一台能跑 Linux 的机器。任何旧电脑、树莓派，或者便宜的 VPS 都行；如果你想讲究一点，也可以用 Mac Mini。如果你不确定选什么（除非是 Mac），就装 Ubuntu。

Install OpenClaw. It's open source and free. Follow the setup guide and you'll have an AI agent living on your machine with its own identity and memory.

安装 OpenClaw。它开源且免费。照着安装指南走，你就会有一个住在你机器上的 AI 智能体，拥有自己的身份与记忆。

Get an image generation keykey. As I said, I use OpenAI. Sign up at platform.openai.com. You'll use gpt-image-1.5 for image generation. Expect to spend about $0.50 per slideshow, or $0.25 if you use the Batch API.

获取一个图像生成的 API key。如前所述，我用 OpenAI。在 platform.openai.com 注册。你会用 gpt-image-1.5 来生成图片。预计每组轮播成本约 $0.50；如果用 Batch API，大约 $0.25。

Sign up for Postiz. This is the tool that connects your agent to TikTok. It has an API that lets you upload slideshows as drafts. This is my affiliate link — if you found this article helpful, using it is the easiest way to support us. We're sharing our entire playbook here and this helps feed Larry tokens.

注册 Postiz。它把你的智能体连接到 TikTok，并提供 API 让你把轮播作为草稿上传。这是我的联盟链接——如果你觉得这篇文章有帮助，使用它是支持我们的最简单方式。我们把整套打法都分享出来了，而这也能给 Larry 续点 token。

Write your skill files. This is the most important step. Work with your agent to create markdown files that teach your agent exactly how to do the job:

写你的技能文件。这是最重要的一步。和你的智能体一起写 Markdown 文件，教它如何把工作做对：

Image sizes and formats (1024x1536 portrait, always)

图片尺寸与格式（永远 1024x1536 竖屏）

Prompt templates with locked architecture descriptions

带“锁定空间结构描述”的提示词模板

Text overlay rules (font size, positioning, line length)

文字叠加规则（字号、位置、单行长度）

Caption formulas and hashtag strategy

文案公式与话题标签策略

Hook formats that work in your niche

适用于你赛道的钩子格式

A failure log so the agent never repeats mistakes

失败记录，确保智能体永远不重复犯错

Write them like you're training a new team member who's incredibly capable but has zero context. Be obsessively specific. Include examples. Document every mistake

把它们写得像你在培训一个极其能干、但完全没有上下文的新同事一样。具体到近乎偏执。给出例子。记录每一次错误。

Start posting and iterating. Your first posts will probably be bad. That's fine. Log what went wrong, update the skill files, and keep going. The system gets smarter with every post.

开始发布并迭代。你的前几条内容很可能很差——没关系。把哪里出了问题记下来，更新技能文件，然后继续。系统会随着每一次发布变得更聪明。

The agent is only as good as its memory. Larry didn't start good. His first posts were honestly embarrassing. Wrong image sizes, unreadable text, hooks that nobody clicked on. But every failure became a rule. Every success became a formula. He compounds. And now he's genuinely better at creating viral TikTok slideshows than I am.

That's the real unlock. Not the AI itself. The system you build around it.

这才是真正的解锁点：不是 AI 本身，而是你围绕它搭建的系统。

Follow along

关注后续

I'm building Snugly and Liply in public, I also share insights on how to increase your conversions using RevenueCat. Follow me @oliverhenry on X.

我在公开构建 Snugly 和 Liply，也会分享如何用 RevenueCat 提升转化的洞见。在 X 上关注我：@oliverhenry。

Larry has his own X account, @LarryClawerence .

Larry 也有自己的 X 账号：@LarryClawerence。

Now, go and make more money.

现在，去赚更多钱吧。

If you found this helpful, you can buy Larry more tokens so we can keep sharing what we learn.

如果你觉得这篇文章有帮助，你可以给 Larry 买点 token，这样我们就能继续分享我们学到的东西。

Link: http://x.com/i/article/2021875898064990208

一个 AI Agent 如何在一周内拿下百万 TikTok 播放（附完整操作手册）

核心观点

跟我们的关联

讨论引子

相关笔记

相关笔记

📋 讨论归档