🧠 阿头学 · 💬 讨论题

用 GPT 5.4 + Image 1.5 生成游戏精灵动画的工程化流程

AI 生成游戏资产的关键不在模型有多强，而在于用"已发布帧锚定 + 一次性全序列生成 + 脚本归一化"的工程约束，把模型从创意生成器变成可控的资产变体器。
打开原文 ↗

2026-03-12 原文链接 ↗

阅读简报

双语对照

完整翻译

原文

讨论归档

核心观点

锚定胜过提示词 用已上线的真实精灵帧作为种子，强制 AI 在"补全"而非"创作"的约束下工作，这比写再复杂的 Prompt 都更能锁定比例、调色板和剪影的一致性。

空间化优于序列化 不要逐帧调用模型，而是将整条动画序列展开为单张 1024×1024 画布，让 AI 在一次推理中处理所有帧，利用 Transformer 的自注意力机制维持视觉特征的连贯性。

全局缩放 + 局部填充 当不同姿势存在高度差异时（如举剑比站立高），不能逐帧独立缩放，否则角色会"时大时小"；应用统一缩放系数，让姿势差异表现为帧内的位置变化和填充。

确定性脚本接管模糊环节 Python 后处理脚本负责检测、缩放、对齐、锚点锁定等逻辑性极强的操作，而非依赖 AI 的"觉悟"；这是 80% AI 生成 + 20% 确定性代码的黄金比例。

流程的工程化本质 整套方法论是"锚定资产 → 结构化画布 → 一次性全局生成 → 程序化归一化"的四步法，可迁移到任何需要"AI 生成但必须接生产"的场景。

跟我们的关联

对 ATou 意味着什么 这套流程将游戏资产生产成本降低 1-2 个数量级，意味着一人工作室的竞争力从"拼美术资源"转向"拼玩法创意"和"叙事深度"；下一步可以尝试将"锚定 + 结构化画布 + 脚本归一化"的模式应用到 UI 状态图、icon 系列等其他资产类型。

对 Neta 意味着什么 AI Agent 设计的启发在于：不要让模型在"局部补丁"上反复 hallucinate，而是用工具链 + 结构化约束把模型"夹"在中间，让它做"填空题"而非"命题作文"；下一步可以思考如何在多步骤工作流中系统性地应用这种"锚定 + 约束"的模式。

对 Uota 意味着什么 这是一个"先锁生产标准，再谈 AI 创造力"的产品思维范例；下一步应该明确定义"什么叫能直接用"的标准（尺寸、风格、质量），再反向设计 prompt 和工具链，而不是先生成再想办法适配。

对通用开发者意味着什么 "基准资产 + 可复用脚本"可以让非专业人员也能安全地产生可用资产，降低团队协作成本；下一步可以将这套方法写进设计规范或渲染 pipeline，作为标准化的资产生产流程。

讨论引子

这套流程对"没有已发布种子帧"的全新项目是否仍然可靠？如果第一帧的生成本身就具有高度随机性，整个"生产流"是否只是"碰运气"的延伸？

作者仅展示了 4 帧的简单动画（受伤反应）；对于需要 8-12 帧的复杂循环（如跑步、攻击组合），AI 在单张 1024 画布上维持空间透视一致性的能力是否仍然有效？

Python 脚本的"自动检测角色轮廓、计算共享缩放"等核心算法，对非工程师或复杂背景/多角色场景是否仍然可靠？这套工具链与现有引擎（Unity、Godot）的集成难度有多高？

我最近发布了这条推文，展示了我是如何创建一个动画像素艺术 (Pixel Art) 海盗游戏角色的。

自那条推文以来，我进一步改进了图像生成。

许多人表现出了兴趣（谢谢！），更多人对我使用的流程 (Workflow) 提出了疑问。与其使用推文串 (Threads)，我想尝试一下 X 文章来展示我是如何实现的。

免责声明 (Disclaimer)： 此流程仍处于实验性 (Experimental) 阶段，也就是说，我仍在探索获得良好效果的最佳方法。因此，请将其视为教育资源，它绝非最终指南。

让我们开始吧！

流程概述 (Workflow Overview)

核心思路很简单：

从一帧已确认的游戏内帧开始
让 GPT Image 围绕该帧创建一个完整的动画序列图 (Animation Strip)
将序列图归一化 (Normalize) 为固定大小的游戏帧
重建资产索引并在引擎内预览

1. 从已发布的种子帧开始 (Start From A Shipped Seed Frame)

我们发现，当模型锚定 (Anchored) 在实际的生产精灵图 (Sprite) 而非松散的概念上时，一致性会好得多。

对于受伤动画 (Hurt Animation)，种子图像 (Seed Image) 是以下生成的帧： （我会另写一篇教程介绍如何获得第一张图像，但这条推文确实涵盖了高层级的方法）

这很重要，因为它锁定了：

面部
身体比例
调色板 (Palette)
剪影 (Silhouette)

2. 为编辑 API 构建参考画布 (Build A Reference Canvas For The Edit API)

我们不直接发送原始的 64x64 精灵图。我们使用最近邻插值 (Nearest-neighbor) 对其进行放大，并将其放入一个带有预留帧槽位的更大透明编辑画布 (Edit Canvas) 中。

对于受伤动作，将上述 idle/frame-01.png 放入 1024x1024 的画布中，我们得到以下结果：

该画布是使用 Python 脚本创建的。

3. 请求完整的序列图，而非微小的帧编辑 (Ask For The Full Strip, Not Tiny Frame Edits)

需要注意的一点是，尝试逐帧生成动画对角色一致性效果不佳。相反，更好的方法是要求一次生成一整条序列图。

因此，对于受伤动画，我使用了以下提示词 (Prompt)：

Intended use: candidate production spritesheet for a 2D side-view pirate platformer hurt animation review. Edit the provided transparent reference-canvas image into a single horizontal four-frame hurt spritesheet. The existing sprite in the leftmost slot is the exact shipped idle-v2 starting frame and must remain the starting frame for this sequence: same compact pirate hero, same right-facing side view, same red bandana, same blue tunic, same brown boots, same tan skin, same readable face, same proportions, same pixel-art silhouette family. Composition: keep the image transparent, keep exactly one row of four equal 256x256 frame slots laid out left to right across the 1024x1024 canvas, centered vertically, no overlap between frame slots, no extra characters, no labels, no UI. Action: frame 1 stays as the calm idle starting pose, frames 2 through 4 show a short hurt reaction from a hit, with the same pirate recoiling backward, torso pulled back, head jolted, one brief pain expression, then slight recovery. Keep body size, head size, and outfit proportions consistent across all four frames. Style: authentic 16-bit pixel art, crisp pixel clusters, stepped shading, restrained palette, production game asset, not concept art. Constraints: no sword, no weapon, no scenery, no floor, no glow, no atmospheric haze, no impact effects, no shadows outside the sprite contours, no collage, no poster layout, no blurry details. Keep wide transparent empty space outside the four frame slots.

（译意：预期用途：用于 2D 侧视图海盗平台游戏受伤动画评审的候选生产精灵表 (Spritesheet)。将提供的透明参考画布图像编辑为单行四帧的受伤精灵表。最左侧槽位中的现有精灵是确切的已发布 idle-v2 起始帧，且必须保持为该序列的起始帧：相同的紧凑海盗英雄，相同的右向侧视图，相同的红色头巾，相同的蓝色上衣，相同的棕色靴子，相同的棕褐色皮肤，相同的清晰面部，相同的比例，相同的像素艺术剪影系列。构图：保持图像透明，在 1024x1024 画布上保持正好一行四个相等的 256x256 帧槽位，从左到右布局，垂直居中，帧槽位之间无重叠，无额外角色，无标签，无 UI。动作：第 1 帧保持为冷静的待机 (Idle) 起始姿势，第 2 到第 4 帧显示受到攻击后的短暂受伤反应，海盗向后退缩，躯干后缩，头部震动，一个简短的痛苦表情，然后轻微恢复。在所有四帧中保持身体大小、头部大小和服装比例一致。风格：真实的 16 bit 像素艺术，清晰的像素簇，阶梯状阴影，受限的调色板，生产级游戏资产，而非概念图。约束：无剑，无武器，无布景，无地板，无发光，无大气薄雾，无冲击效果，精灵轮廓外无阴影，无拼贴，无海报布局，无模糊细节。在四个帧槽位之外保持宽阔的透明空白区域。）

这产生了一个一致的序列，如下所示：

4. 将序列图归一化为真实的游戏帧 (Normalize The Strip Into Real Game Frames)

原始的 GPT 序列图还不能直接用于游戏。我们使用另一个 Python 脚本将其导入为具有 64x64 帧的标准格式。基本上，它的工作内容是：

检测原始序列图中的精灵组件
使用玩家“锚点” (Anchor) 图像 (idle/frame-01.png)
为整个动画计算一个共享缩放比例 (Shared Scale)
将每一帧填充 (Padding) 到 64x64 的透明画布中
可选地将第 01 帧锁定为确切的已发布待机帧

最后一部分对受伤动画很重要。我们显式地将导出的 01.png 替换为真实的待机帧，以便动画从游戏中已使用的确切精灵开始。

I recently posted this tweet that showed how I created an animated pixel art pirate game character.

Since that tweet, I've gone ahead and improved the image generation.

A lot of you showed interest (thanks!) - and even more had questions about the workflow I used. Rather than using threads, I thought I'd give X articles a try to showcase what I'm doing to get

Disclaimer: This workflow is still experimental i.e. I am still figuring out the best way to get good results. So treat it as an educational resource, but it's not meant to be a definitive guide in any way.

Let's go!

来源 (Source): https://x.com/chongdashu/status/2031743032266043687?s=46
镜像 (Mirror): https://x.com/chongdashu/status/2031743032266043687?s=46
发布日期 (Published): 2026-03-11T14:44:21+00:00
保存日期 (Saved): 2026-03-12

Workflow Overview

The core idea is simple:

start from one approved in-game frame
ask GPT Image to create a whole animation strip around that frame
normalize the strip into fixed-size game frames
rebuild the asset index and preview it in-engine

内容 (Content)

我最近发布了这条推文，展示了我是如何创建一个动画像素艺术 (Pixel Art) 海盗游戏角色的。

自那条推文以来，我进一步改进了图像生成。

许多人表现出了兴趣（谢谢！），更多人对我使用的流程 (Workflow) 提出了疑问。与其使用推文串 (Threads)，我想尝试一下 X 文章来展示我是如何实现的。

让我们开始吧！

1. Start From A Shipped Seed Frame

We learned that consistency gets much better when the model is anchored to the actual production sprite, not a loose concept.

For the hurt animation, the seed image was the following generated frame (I'll write a separate tutorial on how I got this first image) but this tweet does cover the high level approach)

That matters because it locks in:

the face
the body proportions
the palette
the silhouette

流程概述 (Workflow Overview)

核心思路很简单：

从一帧已确认的游戏内帧开始
让 GPT Image 围绕该帧创建一个完整的动画序列图 (Animation Strip)
将序列图归一化 (Normalize) 为固定大小的游戏帧
重建资产索引并在引擎内预览

2. Build A Reference Canvas For The Edit API

We do not send the raw 64x64 sprite directly. We upscale it with nearest-neighbor and place it into a larger transparent edit canvas with reserved frame slots.

For the hurt run, taking the above idle/frame-01.png and putting it into a 1024x1024 canvas, we get the following:

That canvas is created with a python script.

1. 从已发布的种子帧开始 (Start From A Shipped Seed Frame)

我们发现，当模型锚定 (Anchored) 在实际的生产精灵图 (Sprite) 而非松散的概念上时，一致性会好得多。

这很重要，因为它锁定了：

面部
身体比例
调色板 (Palette)
剪影 (Silhouette)

3. Ask For The Full Strip, Not Tiny Frame Edits

One thing to note is that trying to generate animations frame-by-frame did not work well for character consistency. Instead the better method is to ask for one whole strip at once.

So for the hurt animation, I used the prompt:

Intended use: candidate production spritesheet for a 2D side-view pirate platformer hurt animation review. Edit the provided transparent reference-canvas image into a single horizontal four-frame hurt spritesheet. The existing sprite in the leftmost slot is the exact shipped idle-v2 starting frame and must remain the starting frame for this sequence: same compact pirate hero, same right-facing side view, same red bandana, same blue tunic, same brown boots, same tan skin, same readable face, same proportions, same pixel-art silhouette family. Composition: keep the image transparent, keep exactly one row of four equal 256x256 frame slots laid out left to right across the 1024x1024 canvas, centered vertically, no overlap between frame slots, no extra characters, no labels, no UI. Action: frame 1 stays as the calm idle starting pose, frames 2 through 4 show a short hurt reaction from a hit, with the same pirate recoiling backward, torso pulled back, head jolted, one brief pain expression, then slight recovery. Keep body size, head size, and outfit proportions consistent across all four frames. Style: authentic 16-bit pixel art, crisp pixel clusters, stepped shading, restrained palette, production game asset, not concept art. Constraints: no sword, no weapon, no scenery, no floor, no glow, no atmospheric haze, no impact effects, no shadows outside the sprite contours, no collage, no poster layout, no blurry details. Keep wide transparent empty space outside the four frame slots.

This gave a consistent sequence as you can see below:

2. 为编辑 API 构建参考画布 (Build A Reference Canvas For The Edit API)

对于受伤动作，将上述 idle/frame-01.png 放入 1024x1024 的画布中，我们得到以下结果：

该画布是使用 Python 脚本创建的。

4. Normalize The Strip Into Real Game Frames

The raw GPT strip is not yet game-ready. We import it into standardised format with 64x64 frames with another python script. Basically, what it does is

detecting the sprite components in the raw strip
using the player 'anchor' image (idle/frame-01.png)
computing one shared scale for the whole animation
padding each frame into a 64x64 transparent canvas
optionally locking frame 01 to the exact shipped idle frame

That last part is important for hurt. We explicitly replace exported 01.png with the real idle frame so the animation starts from the exact sprite already used in-game.

This results is 'normalised' spritesheets having a standard frame size for each panel.

This works surprisingly well even for more complex animations. For example, see the Attack animation below

https://x.com/chongdashu/status/2031474716704436484

3. 请求完整的序列图，而非微小的帧编辑 (Ask For The Full Strip, Not Tiny Frame Edits)

需要注意的一点是，尝试逐帧生成动画对角色一致性效果不佳。相反，更好的方法是要求一次生成一整条序列图。

因此，对于受伤动画，我使用了以下提示词 (Prompt)：

Intended use: candidate production spritesheet for a 2D side-view pirate platformer hurt animation review. Edit the provided transparent reference-canvas image into a single horizontal four-frame hurt spritesheet. The existing sprite in the leftmost slot is the exact shipped idle-v2 starting frame and must remain the starting frame for this sequence: same compact pirate hero, same right-facing side view, same red bandana, same blue tunic, same brown boots, same tan skin, same readable face, same proportions, same pixel-art silhouette family. Composition: keep the image transparent, keep exactly one row of four equal 256x256 frame slots laid out left to right across the 1024x1024 canvas, centered vertically, no overlap between frame slots, no extra characters, no labels, no UI. Action: frame 1 stays as the calm idle starting pose, frames 2 through 4 show a short hurt reaction from a hit, with the same pirate recoiling backward, torso pulled back, head jolted, one brief pain expression, then slight recovery. Keep body size, head size, and outfit proportions consistent across all four frames. Style: authentic 16-bit pixel art, crisp pixel clusters, stepped shading, restrained palette, production game asset, not concept art. Constraints: no sword, no weapon, no scenery, no floor, no glow, no atmospheric haze, no impact effects, no shadows outside the sprite contours, no collage, no poster layout, no blurry details. Keep wide transparent empty space outside the four frame slots.

这产生了一个一致的序列，如下所示：

5. Lessons

**5.1 Handling complex poses **

Problems can come when one pose was taller than another. For example:

a sword-up attack frame is naturally taller than a neutral frame
if you scale that pose down on its own, the whole character looks smaller

The fix was:

use one global scale for the whole strip
let pose differences show up as extra height inside the frame
use padding and a shared anchor instead of per-frame rescaling

4. 将序列图归一化为真实的游戏帧 (Normalize The Strip Into Real Game Frames)

原始的 GPT 序列图还不能直接用于游戏。我们使用另一个 Python 脚本将其导入为具有 64x64 帧的标准格式。基本上，它的工作内容是：

检测原始序列图中的精灵组件
使用玩家“锚点” (Anchor) 图像 (idle/frame-01.png)
为整个动画计算一个共享缩放比例 (Shared Scale)
将每一帧填充 (Padding) 到 64x64 的透明画布中
可选地将第 01 帧锁定为确切的已发布待机帧

最后一部分对受伤动画很重要。我们显式地将导出的 01.png 替换为真实的待机帧，以便动画从游戏中已使用的确切精灵开始。

这导致“归一化”的精灵表在每个面板上都有标准的帧大小。

即使对于更复杂的动画，这也能产生惊人的效果。例如，请看下方的攻击 (Attack) 动画：

https://x.com/chongdashu/status/2031474716704436484

In Summary

If we want consistent AI-generated sprite animations, the workflow should be:

pick the exact shipped sprite frame to anchor from
generate a full strip in one request
normalize with one shared scale
lock frame 01 back to the shipped sprite when the animation should start from idle
verify in the preview scene before treating it as production-ready

That is the part that has made the results noticeably more stable.

Link: http://x.com/i/article/2031720286639333376

5. 经验教训 (Lessons)

5.1 处理复杂姿势 (Handling complex poses)

当一个姿势比另一个姿势高时，可能会出现问题。例如：

举剑攻击帧自然比中立帧高
如果你单独缩小该姿势，整个角色看起来会变小

解决方法是：

为整个序列图使用一个全局缩放比例 (Global Scale)
让姿势差异表现为帧内的额外高度
使用填充 (Padding) 和共享锚点，而不是逐帧重新缩放

总结 (In Summary)

如果我们想要一致的 AI 生成精灵动画，流程应该是：

选择确切的已发布精灵帧作为锚点
在一次请求中生成完整的序列图
使用一个共享缩放比例进行归一化
当动画应从待机开始时，将第 01 帧锁定回已发布的精灵
在将其视为生产就绪之前，在预览场景中进行验证

正是这一部分让结果变得明显更加稳定。

链接: http://x.com/i/article/2031720286639333376

I recently posted this tweet that showed how I created an animated pixel art pirate game character.

Since that tweet, I've gone ahead and improved the image generation.

A lot of you showed interest (thanks!) - and even more had questions about the workflow I used. Rather than using threads, I thought I'd give X articles a try to showcase what I'm doing to get

Let's go!

Workflow Overview

The core idea is simple:

start from one approved in-game frame
ask GPT Image to create a whole animation strip around that frame
normalize the strip into fixed-size game frames
rebuild the asset index and preview it in-engine

1. Start From A Shipped Seed Frame

We learned that consistency gets much better when the model is anchored to the actual production sprite, not a loose concept.

For the hurt animation, the seed image was the following generated frame (I'll write a separate tutorial on how I got this first image) but this tweet does cover the high level approach)

That matters because it locks in:

the face
the body proportions
the palette
the silhouette

2. Build A Reference Canvas For The Edit API

We do not send the raw 64x64 sprite directly. We upscale it with nearest-neighbor and place it into a larger transparent edit canvas with reserved frame slots.

For the hurt run, taking the above idle/frame-01.png and putting it into a 1024x1024 canvas, we get the following:

That canvas is created with a python script.

3. Ask For The Full Strip, Not Tiny Frame Edits

One thing to note is that trying to generate animations frame-by-frame did not work well for character consistency. Instead the better method is to ask for one whole strip at once.

So for the hurt animation, I used the prompt:

Intended use: candidate production spritesheet for a 2D side-view pirate platformer hurt animation review. Edit the provided transparent reference-canvas image into a single horizontal four-frame hurt spritesheet. The existing sprite in the leftmost slot is the exact shipped idle-v2 starting frame and must remain the starting frame for this sequence: same compact pirate hero, same right-facing side view, same red bandana, same blue tunic, same brown boots, same tan skin, same readable face, same proportions, same pixel-art silhouette family. Composition: keep the image transparent, keep exactly one row of four equal 256x256 frame slots laid out left to right across the 1024x1024 canvas, centered vertically, no overlap between frame slots, no extra characters, no labels, no UI. Action: frame 1 stays as the calm idle starting pose, frames 2 through 4 show a short hurt reaction from a hit, with the same pirate recoiling backward, torso pulled back, head jolted, one brief pain expression, then slight recovery. Keep body size, head size, and outfit proportions consistent across all four frames. Style: authentic 16-bit pixel art, crisp pixel clusters, stepped shading, restrained palette, production game asset, not concept art. Constraints: no sword, no weapon, no scenery, no floor, no glow, no atmospheric haze, no impact effects, no shadows outside the sprite contours, no collage, no poster layout, no blurry details. Keep wide transparent empty space outside the four frame slots.

This gave a consistent sequence as you can see below:

4. Normalize The Strip Into Real Game Frames

The raw GPT strip is not yet game-ready. We import it into standardised format with 64x64 frames with another python script. Basically, what it does is

detecting the sprite components in the raw strip
using the player 'anchor' image (idle/frame-01.png)
computing one shared scale for the whole animation
padding each frame into a 64x64 transparent canvas
optionally locking frame 01 to the exact shipped idle frame

That last part is important for hurt. We explicitly replace exported 01.png with the real idle frame so the animation starts from the exact sprite already used in-game.

This results is 'normalised' spritesheets having a standard frame size for each panel.

This works surprisingly well even for more complex animations. For example, see the Attack animation below

https://x.com/chongdashu/status/2031474716704436484

5. Lessons

**5.1 Handling complex poses **

Problems can come when one pose was taller than another. For example:

a sword-up attack frame is naturally taller than a neutral frame
if you scale that pose down on its own, the whole character looks smaller

The fix was:

use one global scale for the whole strip
let pose differences show up as extra height inside the frame
use padding and a shared anchor instead of per-frame rescaling

In Summary

If we want consistent AI-generated sprite animations, the workflow should be:

pick the exact shipped sprite frame to anchor from
generate a full strip in one request
normalize with one shared scale
lock frame 01 back to the shipped sprite when the animation should start from idle
verify in the preview scene before treating it as production-ready

That is the part that has made the results noticeably more stable.

Link: http://x.com/i/article/2031720286639333376

📋 讨论归档

讨论进行中…