《我的女儿是侦探》AI视频生成完整教程（三种方案）

# 《我的女儿是侦探》AI 视频生成完整教程

# 从 0 到 1 生成悬疑短剧视频

# 目录

方案总览
方案 A：LoRA 训练 + ComfyUI 标准流程
方案 B：固定种子 + ControlNet 流程
方案 C：设定图复用流程
音频生成教程
最终合成教程
完整工作流 JSON

# 1. 方案总览

# 三种方案对比

方案	一致性	难度	时间投入	推荐场景
方案 A：LoRA 训练	⭐⭐⭐⭐⭐	⭐⭐⭐	前期高，后期低	长期项目、多集制作
方案 B：固定种子 + ControlNet	⭐⭐⭐⭐	⭐⭐	中等	中短期项目
方案 C：设定图复用	⭐⭐⭐	⭐	低	快速测试、单集制作

# 推荐选择

如果你要做完整的 10 季内容：选择方案 A（LoRA 训练）
如果你只做第 1 集测试：选择方案 C（设定图复用）
如果你有 ComfyUI 经验：选择方案 B（固定种子 + ControlNet）

# 2. 方案 A：LoRA 训练 + ComfyUI 标准流程

# 阶段一：训练角色 LoRA

# 步骤 1.1：准备训练素材

李柯文角色素材要求：

要求	说明
数量	15-20 张
内容	不同角度、表情、服装
质量	高清、背景干净
格式	PNG 或 JPG

提示词生成素材：

使用 Stable Diffusion 或 Midjourney 生成训练素材：

# 正面
35 year old Asian man, programmer, wearing glasses, black rectangular frames, short black hair, neutral expression, studio lighting, white background

# 侧面
35 year old Asian man, side profile, wearing glasses, programmer

# 微笑
35 year old Asian man, gentle smile, wearing glasses

# 思考
35 year old Asian man, deep in thought, wearing glasses, serious expression

# 疲惫
35 year old Asian man, tired expression, stubble, wearing glasses

保存命名规则：

likewen_01_front.png
likewen_02_side.png
likewen_03_smile.png
likewen_04_thinking.png
likewen_05_tired.png
...

# 步骤 1.2：训练 LoRA

方法一：使用 Kohya_ss（推荐）

Kohya_ss 是最流行的 LoRA 训练工具。

安装步骤：

# 克隆仓库
git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts

# 安装依赖
pip install -r requirements.txt

# 启动GUI
python kohya_gui.py

训练参数设置：

参数	推荐值	说明
Base Model	flux1-dev.safetensors	Flux 基础模型
Resolution	512,768	训练分辨率
Batch Size	1	批次大小
Epochs	10-20	训练轮数
Learning Rate	0.0001	学习率
Network Rank	16	LoRA 秩
Network Alpha	16	LoRA Alpha

训练流程：

准备数据集

训练文件夹结构：
train_data/
└── likewen/
    ├── 10_likewen/          # 触发词_概念名
    │   ├── image1.png
    │   ├── image1.txt       # 描述文件
    │   ├── image2.png
    │   └── image2.txt

创建描述文件（每个图片对应一个.txt）

1 2	`# image1.txt内容： 10_likewen, 35 year old Asian man, programmer, wearing glasses, short black hair, neutral expression, studio lighting, white background`

运行训练
- 打开 Kohya_ss GUI
- 选择 "LoRA" 标签
- 设置参数
- 点击 "Start Training"
等待完成
- 训练时间：约 30-60 分钟（取决于显卡）
- 输出： likewen_lora.safetensors

方法二：使用在线训练平台

平台	网址	优势
Civitai	civitai.com	免费训练
SeaArt	seaart.ai	简单易用
Tensor.Art	tensor.art	速度快

# 步骤 1.3：测试 LoRA 效果

在 ComfyUI 中测试：

工作流：
[Flux模型] → [LORA Loader] → [CLIP Text Encode] → [Flux生成] → [Save Image]

提示词：
10_likewen, 35 year old Asian man, programmer, wearing glasses, sitting at desk

加载LoRA：
LoRA文件：likewen_lora.safetensors
强度：0.8-1.0

检查要点：

人物面部是否一致
是否保留了角色特征
是否可以生成不同姿势

# 阶段二：生成场景空镜

# 步骤 2.1：准备场景提示词

8 个主要场景：

# 场景1：深夜办公室
empty corporate office at night, single desk lamp illuminating workspace, computer screen glowing blue, scattered takeout boxes and coffee cups, empty chair, city lights through window, cold blue tone, cinematic lighting, 8k

# 场景2：家客厅
modern Chinese family living room, warm lighting, comfortable sofa, family photos on wall, evening atmosphere, cozy and welcoming, 8k

# 场景3：萌萌房间
little girl's bedroom, pastel colors, small bed with pink blanket, stuffed toys, night light, warm soft lighting, cute and cozy, 8k

# 场景4：公司会议室
corporate conference room, cold fluorescent lighting, long table, documents on table, minimal decoration, sterile atmosphere, 8k

# 场景5：医院走廊
hospital corridor, white walls, windows with afternoon sunlight, clean and bright, medical atmosphere, 8k

# 场景6：家庭餐厅
home dining room, dinner table set for three, Chinese dishes, warm overhead lighting, evening, 8k

# 场景7：书房
home study room, desk with computer, bookshelves, desk lamp on, late night atmosphere, quiet, 8k

# 场景8：主卧室
master bedroom at night, moonlight through curtains, bed, nightstand, peaceful atmosphere, 8k

# 步骤 2.2：生成场景图

ComfyUI 工作流：

[Flux模型] → [CLIP Text Encode] → [Flux Sampler] → [VAE Decode] → [Save Image]

参数设置：
- 步数：30
- CFG：7
- 种子：固定（如123456789）用于保持风格一致
- 尺寸：832x480（视频比例）

保存命名：

scene_01_office_night.png
scene_02_living_room.png
scene_03_mengmeng_room.png
scene_04_conference_room.png
scene_05_hospital_corridor.png
scene_06_dining_room.png
scene_07_study_room.png
scene_08_bedroom.png

# 阶段三：生成角色图（透明背景）

# 步骤 3.1：使用 LayerDiffusion

安装 LayerDiffusion 插件：

1 2	`cd ComfyUI/custom_nodes/ git clone https://github.com/huchenlei/ComfyUI-LayerDiffusion.git`

重启 ComfyUI。

# 步骤 3.2：生成透明背景角色

ComfyUI 工作流：

[Flux模型] → [LORA Loader: likewen_lora] → [LayerDiffusion: Apply]
      ↓
[CLIP Text Encode: 正向提示词] ─────────────────→ [Flux Sampler]
      ↓
[CLIP Text Encode: 负向提示词] ────────────────→ [LayerDiffusion: Decode]
      ↓
[Save Image: PNG with alpha channel]

提示词：

# 正向
10_likewen, 35 year old Asian man, programmer, wearing glasses, gray hoodie, sitting pose, transparent background, PNG format

# 负向
background, scenery, multiple people, blurry

# 步骤 3.3：生成不同姿势

为每个场景生成对应的角色姿势：

场景	角色姿势	提示词添加
办公室	坐着打字	sitting at desk, typing on keyboard
家	站着换鞋	standing, taking off shoes
会议室	坐着看文件	sitting at table, looking at documents
医院	站着谈话	standing, talking, side view
餐厅	坐着吃饭	sitting at dining table
书房	坐着思考	sitting at desk, thoughtful

# 阶段四：合成首帧

# 步骤 4.1：使用 Image Composite Masked

ComfyUI 工作流：

[Load Image: 场景背景] ────────┐
                                 ├──→ [Image Composite Masked] ──→ [Save Image]
[Load Image: 角色PNG] ─────────┤
                                 │
[自动提取Mask] ────────────────┘

参数设置：

参数	说明	示例值
x	水平位置	400
y	垂直位置	200
resize_source	是否缩放	false

定位技巧：

先用低分辨率测试位置
记录正确的 x, y 值
应用到高清图

# 步骤 4.2：批量合成脚本

# composite_scenes.py
import os
from PIL import Image

# 配置
SCENES = {
    "office": {"bg": "scene_01_office_night.png", "chars": [("likewen_sitting.png", 400, 200)]},
    "living_room": {"bg": "scene_02_living_room.png", "chars": [("likewen_standing.png", 300, 250)]},
    "dining_room": {"bg": "scene_06_dining_room.png", "chars": [
        ("likewen_sitting.png", 300, 280),
        ("xiaoyun_sitting.png", 500, 280),
        ("mengmeng_sitting.png", 400, 350)
    ]},
}

# 批量合成
for scene_name, config in SCENES.items():
    bg = Image.open(f"scenes/{config['bg']}").convert("RGBA")
  
    for char_file, x, y in config["chars"]:
        char = Image.open(f"characters/{char_file}").convert("RGBA")
        bg.paste(char, (x, y), char)
  
    bg.save(f"composites/{scene_name}_composite.png")
    print(f"Saved: {scene_name}_composite.png")

运行：

1	`python composite_scenes.py`

# 阶段五：生成视频

# 步骤 5.1：Wan I2V 生成

ComfyUI 工作流：

1
2
3

[Load Image: 合成首帧] ──→ [Wan I2V Sampler] ──→ [VHS Video Combine] ──→ [Save Video]
                               ↑
[提示词] ───────────────────────┘

参数设置：

参数	推荐值	说明
frames	81	81 帧≈5 秒
steps	25	生成步数
guidance_scale	8	引导强度
width	832	视频宽度
height	480	视频高度

动作提示词示例：

# 场景1：办公室
The man continues typing on the keyboard, occasionally looking at the screen with a tired expression, slight head movement

# 场景6：家庭晚餐
The family sits at the dining table, the man looks thoughtful, the woman serves food, the baby plays with her spoon

# 场景7：书房
The man sits at the desk deep in thought, looking at the computer screen, occasionally rubbing his eyes

# 3. 方案 B：固定种子 + ControlNet 流程

# 核心思路

使用固定种子生成角色，用 ControlNet 控制姿势，确保一致性。

# 步骤详解

# 步骤 3.1：固定种子生成角色

找到好的种子：

多次生成同一角色
记录满意结果的种子值
固定使用该种子

ComfyUI 设置：

1
2
3

[Flux模型] → [CLIP Text Encode] → [Flux Sampler]
                                    ↑
                            [KSampler: seed固定]

示例：

1 2	`种子：123456789（固定）每次生成同一角色都使用相同种子`

# 步骤 3.2：使用 ControlNet 控制姿势

安装 ControlNet：

1 2	`cd ComfyUI/custom_nodes/ git clone https://github.com/Fannovel16/comfyui_controlnet_aux.git`

工作流：

1
2
3

[参考姿势图] → [ControlNet Apply] → [Flux Sampler] → [生成图]
                    ↑
[角色提示词] ───────┘

ControlNet 参数：

参数	推荐值
ControlNet 类型	OpenPose 或 Depth
强度	0.7-1.0
预处理器	default

# 4. 方案 C：设定图复用流程

# 核心思路

只生成一次角色设定图，后续通过 Image-to-Video 动画化。

# 步骤详解

# 步骤 4.1：生成角色设定图

生成高质量设定图：

提示词：
35 year old Asian man, programmer, wearing glasses, gray hoodie, full body, studio lighting, white background, high quality, 8k

参数：
- 步数：50
- CFG：8
- 尺寸：512x768
- 种子：任意（但记录下来）

# 步骤 4.2：去背景

方法一：Rembg

1 2	`pip install rembg rembg i character.png character_transparent.png`

方法二：在线工具

https://www.remove.bg/
上传图片，下载透明 PNG

# 步骤 4.3：合成到场景

使用方法同方案 A 的阶段四。

# 步骤 4.4：动画化

直接用合成图做 Image-to-Video。

# 5. 音频生成教程

# 5.1 配音生成（CosyVoice）

# 安装 CosyVoice

# 克隆仓库
git clone https://github.com/FunAudioLLM/CosyVoice.git
cd CosyVoice

# 安装依赖
pip install -r requirements.txt

# 下载模型
# 从ModelScope下载CosyVoice模型

# 生成配音

from cosyvoice import CosyVoice

# 初始化
cosyvoice = CosyVoice('pretrained_models/CosyVoice')

# 生成语音
text = "我...今天被裁员了。"
audio = cosyvoice.inference_sft(text, '中文男声')

# 保存
import soundfile as sf
sf.write('output.wav', audio, 16000)

# 角色声音设置

角色	声音类型	CosyVoice 参数
李柯文	男，35 岁，疲惫	中文男声，低沉
张晓云	女，33 岁，温柔	中文女声，柔和
萌萌	女，1 岁半，童声	中文童声
陈志远	男，45 岁，和善	中文男声，中年

# 5.2 配乐生成（Suno）

# 使用 Suno 生成悬疑配乐

访问 https://suno.com
选择 "Create"
输入描述：

Style: Suspense thriller background music, dark ambient, tension building
Mood: Mysterious, tense, emotional
Instruments: Piano, strings, subtle electronic
Duration: 3-5 minutes

生成并下载

# 配乐风格表

场景类型	配乐风格
紧张调查	紧张悬疑，低频合成器
家庭温馨	柔和钢琴，温暖弦乐
真相揭露	震撼音效，节奏加快
深夜思考	孤独氛围，单音钢琴

# 5.3 音效生成

# 使用 Freesound 库

访问 https://freesound.org 搜索：

音效类型	搜索关键词
键盘声	keyboard typing
脚步声	footsteps
电话铃声	phone ringing
门开关	door open close
汽车	car driving

# 6. 最终合成教程

# 6.1 使用 FFmpeg 合成

# 安装 FFmpeg

# Windows
winget install ffmpeg

# Mac
brew install ffmpeg

# Linux
sudo apt install ffmpeg

# 合成命令

# 合并视频片段
ffmpeg -f concat -i filelist.txt -c copy output.mp4

# filelist.txt内容：
# file 'scene01.mp4'
# file 'scene02.mp4'
# file 'scene03.mp4'

# 添加音频
ffmpeg -i video.mp4 -i audio.mp3 -c:v copy -c:a aac -map 0:v:0 -map 1:a:0 output_with_audio.mp4

# 添加字幕
ffmpeg -i video.mp4 -i subtitle.srt -c copy -c:s mov_text output_with_subtitle.mp4

# 6.2 使用剪映合成

# 步骤

导入视频片段
- 新建项目
- 导入所有视频片段
拼接视频
- 按顺序拖入轨道
- 调整时长和节奏
添加音频
- 导入配音
- 导入配乐
- 调整音量平衡
添加字幕
- 自动识别或手动添加
- 调整样式和位置
导出
- 选择分辨率（推荐 1080p）
- 选择格式（MP4）
- 导出

# 7. 操作检查清单

# 阶段一：准备阶段

Flux 模型已下载
Wan 模型已下载
ComfyUI 已启动
必要插件已安装

# 阶段二：角色生成

LoRA 已训练（方案 A）
或：角色设定图已生成（方案 C）
透明背景已处理
角色一致性已检查

# 阶段三：场景生成

8 个场景空镜已生成
场景风格一致
分辨率正确

# 阶段四：合成首帧

角色位置已确定
合成图已检查
光影匹配自然

# 阶段五：视频生成

动作提示词已准备
参数已设置
视频质量已检查

# 阶段六：音频生成

配音已生成
配乐已选择
音效已准备

# 阶段七：最终合成

视频已拼接
音频已添加
字幕已添加
最终检查完成

# 8. 常见问题 FAQ

# Q1: LoRA 训练后效果不好怎么办？

解决方案：

增加训练素材数量（20-30 张）
确保素材质量高、背景干净
调整训练参数（降低学习率）
增加训练轮数

# Q2: 生成的角色每次都不一样？

解决方案：

使用固定种子
训练 LoRA
使用 ControlNet Pose 控制姿势

# Q3: 合成后边缘有白边？

解决方案：

使用 LayerDiffusion 替代手动合成
添加 Mask Feather 节点羽化边缘
使用 Flux Inpainting 细化边缘

# Q4: 视频生成速度慢？

解决方案：

降低 frames 参数（41 帧≈2.5 秒）
降低 steps 参数
使用更快的模型（Wan2.1-T2V-1.3B）

# Q5: 配音不自然？

解决方案：

使用 CosyVoice 2.0（更自然）
调整语速和情感参数
后期用 Audacity 微调

# 9. 推荐资源

# 模型下载

模型	下载地址
Flux.1 Dev	huggingface.co/black-forest-labs/FLUX.1-dev
Wan2.1-I2V	huggingface.co/Wan-Video/Wan2.1-I2V-14B
Wan2.1-T2V	huggingface.co/Wan-Video/Wan2.1-T2V-14B

# 插件下载

插件	GitHub
LayerDiffusion	github.com/huchenlei/ComfyUI-LayerDiffusion
ControlNet	github.com/Fannovel16/comfyui_controlnet_aux
VideoHelperSuite	github.com/Kosinkadink/ComfyUI-VideoHelperSuite