返回列表
🧠 阿头学 · 💬 讨论题

AI 像素艺术工作流实验——Gemini 在游戏美术中的可用性边界

Gemini 在生成简单像素动作上有实用价值,但复杂连续动作仍需人工拆解逐帧试错,整体效率提升被严重夸大,缺乏与替代方案的对比验证。
打开原文 ↗

2026-03-15 原文链接 ↗
阅读简报
双语对照
完整翻译
原文
讨论归档

核心观点

  • 上下文锁定是核心价值,不是单次出图质量 在同一对话内保持角色设定、支持局部修改指令,这让 AI 从"盲盒抽卡"变成"可微调的生产模块"。对单角色长期迭代确实有帮助,但这个优势被夸大成了"显著加速原型制作"。
  • 复杂动作的"拆帧逐步生成"本质是人工补偿 作者把连击动作拆成"抬剑→下劈"逐帧指令,再无限抽卡直到满意。这不是 AI 的能力突破,而是用人工流程设计来弥补模型在连贯性上的根本缺陷,成本被严重低估。
  • 透明背景问题的"土办法"暴露了工程成本 纯黑底+Photoshop 手动抠图在少量帧时可行,但对几百上千帧的生产规模,这一步会成为流水线瓶颈。文章用"只是多一点工序"淡化了这个隐形成本。
  • 样本极窄,外推过度 仅测试单一角色、单一视角、单一题材,却推导出"实用工作流"的通用结论。多角色协作、不同分辨率、敌人种类等实际项目需求完全未验证。
  • 缺乏与替代方案的对比 没有与 Stable Diffusion、Leonardo.ai、现成素材包或传统像素画师的时间/成本对比,导致"比预期实用"这个结论无法验证其真实价值。

跟我们的关联

  • 对独立开发者(ATou)意味着什么 如果你在做单角色概念验证或快速 demo,Gemini 的无限抽卡+局部修改确实能降低门槛。但不要期待它能替代整套美术流程——复杂动作仍需大量人工试错,透明化处理也很繁琐。下一步:先用作者的工作流做一个完整的 8 帧攻击动画,实际计时看耗时多少,再与外包或素材包成本对比。
  • 对团队美术管理意味着什么 如果想用 AI 作为"初级美术助手",需要明确它的粒度限制:适合生成基础待机/行走,不适合复杂连续动作。把流程设计到"AI 生成单帧+人工编排"这个粒度,而不是期待端到端自动化。下一步:建立一套"原子动作库"的生成 SOP,明确哪些动作类型用 AI,哪些用传统手段。
  • 对 AI 产品判断意味着什么 评估 AI 能力时,不要问"能不能做整套功能",要问"在哪个粒度(帧、块、字段)表现稳定"。Gemini 在"单帧生成+局部修改"这个粒度很稳定,但在"多帧连贯性"上崩溃。产品设计应该围绕稳定粒度来构建流程,而不是试图突破模型的能力边界。下一步:对你的 AI 工具做"粒度拆解",明确每个粒度的成功率和成本。

讨论引子

  • 如果把"逐帧抽卡+人工编排"的成本量化,这套工作流相比直接购买低价素材包或外包,真的能节省时间吗?还是只是"感觉快"?
  • Gemini 的 100 次/天抽卡限制在实际生产中会成为瓶颈吗?多人团队共享一个账号时如何分配这个配额?
  • 文章完全没提及 AI 生成像素图的常见问题(锯齿、色溢出、像素网格不统一),这些在 Unity 中落地时会有多严重?

使用 AI 制作像素画的实用工作流:Gemini 2D 动画实验

#gamedev #unity3d #tooling #ai 在这次实验中,我整理了一个工作流:使用 Gemini 的图像生成模型 “Nano Banana 2” 为 2D 动作游戏生成角色序列帧(sprite sheet),并将其导入 Unity,同时总结了一些稳定输出的技巧。先说结论:结果比预想中实用得多。

【本次使用的 Prompt】

//{Prompt}
//2D side-scrolling pixel art style (dot-picture).
//Only requires the side view; no other directions.
//Create a female sword-wielding protagonist with long red hair.
//The style should be RPG-like, a fusion of medieval and modern fashion, making it cute.
//The character naturally has red hair, which changes to platinum blonde when holding a sword. 
//Include a visual sequence for the transition from red to platinum blonde while drawing the sword.
//Must include the following actions:
//1. Idle Action
//2. Attack Actions (Combo 1,2,3)
//3. Walking/Runing Action
//4. Death Action
//5. Action of eating a steamed bun
//Pure black background for all of the above actions.
//Each action must have its corresponding animation frames.

【基础动作生成与“无限抽卡”的强项】

先说待机、行走这类基础动作。在这方面,Gemini 可以说完成得相当出色。

使用 Gemini 最大的优势在于,你可以每天最多“抽卡”(重新生成)100 次,一直刷到自己满意为止。角色设计的一致性也非常高,只要保持在同一段对话中,就可以基于同一角色连续生成不同的动作模块。

此外,还可以在已经生成的图像基础上发出类似“保持这里不变,只修正这一小块”的指令。这样就可以在保留角色整体设计与氛围的前提下,反复进行细微调整。要在长周期内为同一角色创建一整套动作时,这个功能几乎是不可或缺的。

【复杂连击攻击的难点与局限】

虽然简单动作做得很完美,但一旦指定“复杂且连续的动作”,例如攻击连击(combo),要保持动作的一致性就会突然变得困难。

应对方法与妥协方案:

不要试图一次性生成整个复杂动作,而是将动作流程拆解,逐帧下指令。
将 prompt 分解成“举剑”“向下挥砍”等步骤,然后像抽卡一样不断重复再生成(loop),直到抽出最接近理想的输出。

在整体动画的流畅度方面,确实还有明显的改进空间,但如果逐帧单独来看,每一帧的像素画质量都已经相当足够。有时甚至还能顺带生成可以直接拿来当 VFX 的特效。

背景透明化工作流(黑色背景 + Photoshop)

此时一个很明确的问题是:单凭 AI 目前还无法直接输出“带透明背景的 PNG 图像”。

对策:
在 prompt 中强制要求“背景必须是纯黑(或绿色等纯色)”。之后将生成的图像导入 Photoshop 等工具,把黑色背景抠掉并设置为透明。虽然会多一点手工操作,但以目前来说,这是最稳妥的做法。

总结

优点:

简单动作(idle、walk 等)可以在保持一致性的前提下被很好地生成。

通过持续在同一对话中操作,可以批量生成同一角色的衍生动作。
能够在保留底图的前提下进行局部修改指令。

缺点(挑战):

复杂动作(例如 combo)的连续性难以保持,需要逐帧指令,并不断试错(抽卡)。
由于无法直接输出透明背景,必须指定纯色背景,再用外部工具手动抠图。

虽然目前仍处于不断试验与调整的阶段,但它已经足够具备作为“极大加速原型制作”的工具潜力。接下来我也打算持续进行验证与测试!

Practical Workflow for AI Pixel Art: Gemini 2D Animation Experiment

使用 AI 制作像素画的实用工作流:Gemini 2D 动画实验

#gamedev #unity3d #tooling #ai For this experiment, I summarized the workflow of using Gemini's image generation model, "Nano Banana 2," to generate sprite sheets for a 2D action game and implement them into Unity, along with tips to stabilize the output. To jump straight to the conclusion: the results were far more practical than expected.

#gamedev #unity3d #tooling #ai 在这次实验中,我整理了一个工作流:使用 Gemini 的图像生成模型 “Nano Banana 2” 为 2D 动作游戏生成角色序列帧(sprite sheet),并将其导入 Unity,同时总结了一些稳定输出的技巧。先说结论:结果比预想中实用得多。

【The Prompt Used】

【本次使用的 Prompt】

//{Prompt}
//2D side-scrolling pixel art style (dot-picture).
//Only requires the side view; no other directions.
//Create a female sword-wielding protagonist with long red hair.
//The style should be RPG-like, a fusion of medieval and modern fashion, making it cute.
//The character naturally has red hair, which changes to platinum blonde when holding a sword. 
//Include a visual sequence for the transition from red to platinum blonde while drawing the sword.
//Must include the following actions:
//1. Idle Action
//2. Attack Actions (Combo 1,2,3)
//3. Walking/Runing Action
//4. Death Action
//5. Action of eating a steamed bun
//Pure black background for all of the above actions.
//Each action must have its corresponding animation frames.

//{Prompt}
//2D side-scrolling pixel art style (dot-picture).
//Only requires the side view; no other directions.
//Create a female sword-wielding protagonist with long red hair.
//The style should be RPG-like, a fusion of medieval and modern fashion, making it cute.
//The character naturally has red hair, which changes to platinum blonde when holding a sword. 
//Include a visual sequence for the transition from red to platinum blonde while drawing the sword.
//Must include the following actions:
//1. Idle Action
//2. Attack Actions (Combo 1,2,3)
//3. Walking/Runing Action
//4. Death Action
//5. Action of eating a steamed bun
//Pure black background for all of the above actions.
//Each action must have its corresponding animation frames.

【Basic Action Generation and the Strength of the "Infinite Gacha"】

【基础动作生成与“无限抽卡”的强项】

First, regarding basic motions like idling and walking. In this regard, Gemini handles the task perfectly.

先说待机、行走这类基础动作。在这方面,Gemini 可以说完成得相当出色。

The biggest advantage of using Gemini is that you can endlessly "pull the gacha" (regenerate) up to 100 times a day until you are satisfied. The consistency of the character design is also exceptionally high; by continuing the chat, you can successively derive different action modules based on the same character.

使用 Gemini 最大的优势在于,你可以每天最多“抽卡”(重新生成)100 次,一直刷到自己满意为止。角色设计的一致性也非常高,只要保持在同一段对话中,就可以基于同一角色连续生成不同的动作模块。

Furthermore, based on a generated image, you can give instructions like, "Keep this part unchanged, but fix only this specific area." This makes it possible to repeatedly make minor adjustments while maintaining the overall design and vibe of the character. When creating a series of motions for the same character over the long term, this feature is indispensable.

此外,还可以在已经生成的图像基础上发出类似“保持这里不变,只修正这一小块”的指令。这样就可以在保留角色整体设计与氛围的前提下,反复进行细微调整。要在长周期内为同一角色创建一整套动作时,这个功能几乎是不可或缺的。

【Challenges and Limitations of Complex Attack Combos】

【复杂连击攻击的难点与局限】

While simple movements are perfect, once you specify "complex and continuous actions" like attack combos, it suddenly becomes difficult to maintain movement consistency.

虽然简单动作做得很完美,但一旦指定“复杂且连续的动作”,例如攻击连击(combo),要保持动作的一致性就会突然变得困难。

Countermeasures and Compromises:

应对方法与妥协方案:

Instead of trying to generate complex actions in a single shot, break down the flow and give instructions frame by frame. Subdivide the prompts into steps like "raise the sword" and "swing down," and relentlessly regenerate (loop) like a gacha game until you pull an output that is closest to your ideal.

不要试图一次性生成整个复杂动作,而是将动作流程拆解,逐帧下指令。
将 prompt 分解成“举剑”“向下挥砍”等步骤,然后像抽卡一样不断重复再生成(loop),直到抽出最接近理想的输出。

While there is still significant room for improvement in the overall fluidity of the animation, looking at it frame by frame reveals pixel art of more than sufficient quality. Sometimes, it even renders effects that can be used directly as VFX.

在整体动画的流畅度方面,确实还有明显的改进空间,但如果逐帧单独来看,每一帧的像素画质量都已经相当足够。有时甚至还能顺带生成可以直接拿来当 VFX 的特效。

Background Transparency Workflow (Black Background + Photoshop)

背景透明化工作流(黑色背景 + Photoshop)

A clear issue at this point is that the AI alone cannot directly output a "PNG image with a transparent background."

此时一个很明确的问题是:单凭 AI 目前还无法直接输出“带透明背景的 PNG 图像”。

Countermeasure: Forcefully instruct the prompt to "make the background completely black (or a solid color like green)." After that, bring the generated image into a tool like Photoshop and manually cut out the black background to apply transparency. It takes a bit of extra effort, but this is the most reliable method at present.

对策:
在 prompt 中强制要求“背景必须是纯黑(或绿色等纯色)”。之后将生成的图像导入 Photoshop 等工具,把黑色背景抠掉并设置为透明。虽然会多一点手工操作,但以目前来说,这是最稳妥的做法。

Summary

总结

Pros:

优点:

Simple movements (idle, walk, etc.) can be generated perfectly while maintaining consistency.

简单动作(idle、walk 等)可以在保持一致性的前提下被很好地生成。

By continuing the chat, you can mass-produce derivative motions of the same character. Partial modification instructions are possible while preserving the base image.

通过持续在同一对话中操作,可以批量生成同一角色的衍生动作。
能够在保留底图的前提下进行局部修改指令。

Cons (Challenges):

缺点(挑战):

It is difficult to maintain continuity for complex actions (combos, etc.), requiring frame-by-frame instructions and trial-and-error (gacha). Since it cannot output with a transparent background, specifying a solid color background and manually cutting it out with an external tool is mandatory.

复杂动作(例如 combo)的连续性难以保持,需要逐帧指令,并不断试错(抽卡)。
由于无法直接输出透明背景,必须指定纯色背景,再用外部工具手动抠图。

While still in the trial-and-error stage, it already has sufficient potential as a tool to dramatically speed up prototype production. I plan to continue verifying and testing this going forward!

虽然目前仍处于不断试验与调整的阶段,但它已经足够具备作为“极大加速原型制作”的工具潜力。接下来我也打算持续进行验证与测试!

  • X (Twitter): @kenjiDev9662(我每天都会发开发日志!)

Practical Workflow for AI Pixel Art: Gemini 2D Animation Experiment

#gamedev #unity3d #tooling #ai For this experiment, I summarized the workflow of using Gemini's image generation model, "Nano Banana 2," to generate sprite sheets for a 2D action game and implement them into Unity, along with tips to stabilize the output. To jump straight to the conclusion: the results were far more practical than expected.

【The Prompt Used】

//{Prompt}
//2D side-scrolling pixel art style (dot-picture).
//Only requires the side view; no other directions.
//Create a female sword-wielding protagonist with long red hair.
//The style should be RPG-like, a fusion of medieval and modern fashion, making it cute.
//The character naturally has red hair, which changes to platinum blonde when holding a sword. 
//Include a visual sequence for the transition from red to platinum blonde while drawing the sword.
//Must include the following actions:
//1. Idle Action
//2. Attack Actions (Combo 1,2,3)
//3. Walking/Runing Action
//4. Death Action
//5. Action of eating a steamed bun
//Pure black background for all of the above actions.
//Each action must have its corresponding animation frames.

【Basic Action Generation and the Strength of the "Infinite Gacha"】

First, regarding basic motions like idling and walking. In this regard, Gemini handles the task perfectly.

The biggest advantage of using Gemini is that you can endlessly "pull the gacha" (regenerate) up to 100 times a day until you are satisfied. The consistency of the character design is also exceptionally high; by continuing the chat, you can successively derive different action modules based on the same character.

Furthermore, based on a generated image, you can give instructions like, "Keep this part unchanged, but fix only this specific area." This makes it possible to repeatedly make minor adjustments while maintaining the overall design and vibe of the character. When creating a series of motions for the same character over the long term, this feature is indispensable.

【Challenges and Limitations of Complex Attack Combos】

While simple movements are perfect, once you specify "complex and continuous actions" like attack combos, it suddenly becomes difficult to maintain movement consistency.

Countermeasures and Compromises:

Instead of trying to generate complex actions in a single shot, break down the flow and give instructions frame by frame. Subdivide the prompts into steps like "raise the sword" and "swing down," and relentlessly regenerate (loop) like a gacha game until you pull an output that is closest to your ideal.

While there is still significant room for improvement in the overall fluidity of the animation, looking at it frame by frame reveals pixel art of more than sufficient quality. Sometimes, it even renders effects that can be used directly as VFX.

Background Transparency Workflow (Black Background + Photoshop)

A clear issue at this point is that the AI alone cannot directly output a "PNG image with a transparent background."

Countermeasure: Forcefully instruct the prompt to "make the background completely black (or a solid color like green)." After that, bring the generated image into a tool like Photoshop and manually cut out the black background to apply transparency. It takes a bit of extra effort, but this is the most reliable method at present.

Summary

Pros:

Simple movements (idle, walk, etc.) can be generated perfectly while maintaining consistency.

By continuing the chat, you can mass-produce derivative motions of the same character. Partial modification instructions are possible while preserving the base image.

Cons (Challenges):

It is difficult to maintain continuity for complex actions (combos, etc.), requiring frame-by-frame instructions and trial-and-error (gacha). Since it cannot output with a transparent background, specifying a solid color background and manually cutting it out with an external tool is mandatory.

While still in the trial-and-error stage, it already has sufficient potential as a tool to dramatically speed up prototype production. I plan to continue verifying and testing this going forward!

📋 讨论归档

讨论进行中…