Spaces:
Runtime error
Runtime error
kundaja-green
commited on
Commit
·
ebb79f2
0
Parent(s):
Completely fresh repository upload
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- .gitattributes +8 -0
- .gitignore +11 -0
- .python-version +1 -0
- Dockerfile +49 -0
- README.ja.md +426 -0
- README.md +64 -0
- Start_Wan_GUI.bat +54 -0
- cache_latents.py +281 -0
- cache_text_encoder_outputs.py +214 -0
- convert_lora.py +131 -0
- dataset/__init__.py +0 -0
- dataset/config_utils.py +372 -0
- dataset/dataset_config.md +378 -0
- dataset/dataset_example.toml +44 -0
- dataset/ebPhotos-001/20190915_193922.jpg +3 -0
- dataset/ebPhotos-001/20190915_193922.txt +1 -0
- dataset/ebPhotos-001/20190921_182515.jpg +3 -0
- dataset/ebPhotos-001/20190921_182515.txt +1 -0
- dataset/ebPhotos-001/20190921_182517.jpg +3 -0
- dataset/ebPhotos-001/20190921_182517.txt +1 -0
- dataset/ebPhotos-001/20220521_222809.jpg +3 -0
- dataset/ebPhotos-001/20220521_222809.txt +1 -0
- dataset/ebPhotos-001/20230427_082757.jpg +3 -0
- dataset/ebPhotos-001/20230427_082757.txt +1 -0
- dataset/ebPhotos-001/20230427_082800.jpg +3 -0
- dataset/ebPhotos-001/20230427_082800.txt +1 -0
- dataset/ebPhotos-001/20230427_082805.jpg +3 -0
- dataset/ebPhotos-001/20230427_082805.txt +1 -0
- dataset/ebPhotos-001/20230502_185323.jpg +3 -0
- dataset/ebPhotos-001/20230502_185323.txt +1 -0
- dataset/ebPhotos-001/20230504_193610.jpg +3 -0
- dataset/ebPhotos-001/20230504_193610.txt +1 -0
- dataset/ebPhotos-001/20230504_193624.jpg +3 -0
- dataset/ebPhotos-001/20230504_193624.txt +1 -0
- dataset/ebPhotos-001/20230504_193657.jpg +3 -0
- dataset/ebPhotos-001/20230504_193657.txt +1 -0
- dataset/ebPhotos-001/20230504_193734.jpg +3 -0
- dataset/ebPhotos-001/20230504_193734.txt +1 -0
- dataset/ebPhotos-001/20230504_193750.jpg +3 -0
- dataset/ebPhotos-001/20230504_193750.txt +1 -0
- dataset/ebPhotos-001/20230504_193805.jpg +3 -0
- dataset/ebPhotos-001/20230504_193805.txt +1 -0
- dataset/ebPhotos-001/20230505_194441.jpg +3 -0
- dataset/ebPhotos-001/20230505_194441.txt +1 -0
- dataset/ebPhotos-001/20230505_194607.jpg +3 -0
- dataset/ebPhotos-001/20230505_194607.txt +1 -0
- dataset/ebPhotos-001/20230505_194707.jpg +3 -0
- dataset/ebPhotos-001/20230505_194707.txt +1 -0
- dataset/ebPhotos-001/20230505_194729.jpg +3 -0
- dataset/ebPhotos-001/20230505_194729.txt +1 -0
.gitattributes
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
2 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
3 |
+
*.jpg filter=lfs diff=lfs merge=lfs -text
|
4 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
5 |
+
*.webp filter=lfs diff=lfs merge=lfs -text
|
6 |
+
PXL_20240227_181242253jpg filter=lfs diff=lfs merge=lfs -text
|
7 |
+
PXL_20240227_181242253jpg* filter=lfs diff=lfs merge=lfs -text
|
8 |
+
*PXL_20240227_181242253jpg filter=lfs diff=lfs merge=lfs -text
|
.gitignore
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__pycache__/
|
2 |
+
.venv
|
3 |
+
venv/
|
4 |
+
logs/
|
5 |
+
uv.lock
|
6 |
+
main.exp
|
7 |
+
main.lib
|
8 |
+
main.obj
|
9 |
+
dataset/Wan
|
10 |
+
Models/
|
11 |
+
Output_LoRAs/
|
.python-version
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
3.10
|
Dockerfile
ADDED
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Use a standard Python 3.12 base image
|
2 |
+
FROM python:3.12-slim
|
3 |
+
|
4 |
+
# Set the working directory inside the container
|
5 |
+
WORKDIR /code
|
6 |
+
|
7 |
+
# Install git and aria2 for faster downloads
|
8 |
+
RUN apt-get update && apt-get install -y git aria2
|
9 |
+
|
10 |
+
# Copy the requirements file first to leverage Docker cache
|
11 |
+
COPY requirements.txt .
|
12 |
+
|
13 |
+
# Install the correct CUDA-enabled PyTorch version and other requirements
|
14 |
+
RUN pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
|
15 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
16 |
+
|
17 |
+
# --- NEW SECTION: DOWNLOAD MODELS ---
|
18 |
+
# Download the official Wan2.1 models from their Hugging Face repository
|
19 |
+
# This downloads them into a "Models/Wan" folder inside the container
|
20 |
+
RUN huggingface-cli download wan-video/wan2.1 \
|
21 |
+
--repo-type model \
|
22 |
+
--include "*.pth" "*.json" "*.safetensors" \
|
23 |
+
--local-dir Models/Wan --local-dir-use-symlinks False
|
24 |
+
|
25 |
+
# Copy all your project files (code, dataset configs, etc.) into the container
|
26 |
+
COPY . .
|
27 |
+
|
28 |
+
# This is the command that will run when the Space starts.
|
29 |
+
# It uses the models we just downloaded.
|
30 |
+
CMD ["accelerate", "launch", "wan_train_network.py", \
|
31 |
+
"--task", "i2v-14B", \
|
32 |
+
"--dit", "Models/Wan/wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors", \
|
33 |
+
"--vae", "Models/Wan/Wan2.1_VAE.pth", \
|
34 |
+
"--clip", "Models/Wan/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth", \
|
35 |
+
"--t5", "Models/Wan/models_t5_umt5-xxl-enc-bf16.pth", \
|
36 |
+
"--dataset_config", "dataset/testtoml.toml", \
|
37 |
+
"--output_dir", "/data/output", \
|
38 |
+
"--output_name", "My_HF_Lora_v1", \
|
39 |
+
"--save_every_n_epochs", "10", \
|
40 |
+
"--max_train_epochs", "70", \
|
41 |
+
"--network_module", "networks.lora_wan", \
|
42 |
+
"--network_dim", "32", \
|
43 |
+
"--network_alpha", "4", \
|
44 |
+
"--learning_rate", "2e-5", \
|
45 |
+
"--optimizer_type", "adamw", \
|
46 |
+
"--mixed_precision", "bf16", \
|
47 |
+
"--gradient_checkpointing", \
|
48 |
+
"--sdpa" \
|
49 |
+
]
|
README.ja.md
ADDED
@@ -0,0 +1,426 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# GUIの使い方
|
2 |
+
|
3 |
+
- GUIを開くには、次のコマンドを実行します - `Start_Wan_GUI.bat`
|
4 |
+
- すべての設定は、 "**Load Settings**" および "**Save Setting**" ボタンを使用して保存および読み込むことができます。
|
5 |
+
- 設定の詳細については以下を参照してください。 [Wan2.1 documentation](./docs/wan.md), [Advanced Configuration](./docs/advanced_config.md#fp8-quantization), [Dataset configuration guide](./dataset/dataset_config.md).
|
6 |
+
|
7 |
+
|
8 |
+

|
9 |
+
|
10 |
+
|
11 |
+
|
12 |
+
# Musubi Tuner
|
13 |
+
|
14 |
+
[English](./README.md) | [日本語](./README.ja.md)
|
15 |
+
|
16 |
+
## 目次
|
17 |
+
|
18 |
+
- [はじめに](#はじめに)
|
19 |
+
- [最近の更新](#最近の更新)
|
20 |
+
- [リリースについて](#リリースについて)
|
21 |
+
- [概要](#概要)
|
22 |
+
- [ハードウェア要件](#ハードウェア要件)
|
23 |
+
- [特徴](#特徴)
|
24 |
+
- [インストール](#インストール)
|
25 |
+
- [モデルのダウンロード](#モデルのダウンロード)
|
26 |
+
- [HunyuanVideoの公式モデルを使う](#HunyuanVideoの公式モデルを使う)
|
27 |
+
- [Text EncoderにComfyUI提供のモデルを使う](#Text-EncoderにComfyUI提供のモデルを使う)
|
28 |
+
- [使い方](#使い方)
|
29 |
+
- [データセット設定](#データセット設定)
|
30 |
+
- [latentの事前キャッシュ](#latentの事前キャッシュ)
|
31 |
+
- [Text Encoder出力の事前キャッシュ](#Text-Encoder出力の事前キャッシュ)
|
32 |
+
- [学習](#学習)
|
33 |
+
- [LoRAの重みのマージ](#LoRAの重みのマージ)
|
34 |
+
- [推論](#推論)
|
35 |
+
- [SkyReels V1での推論](#SkyReels-V1での推論)
|
36 |
+
- [LoRAの形式の変換](#LoRAの形式の変換)
|
37 |
+
- [その他](#その他)
|
38 |
+
- [SageAttentionのインストール方法](#SageAttentionのインストール方法)
|
39 |
+
- [免責事項](#免責事項)
|
40 |
+
- [コントリビューションについて](#コントリビューションについて)
|
41 |
+
- [ライセンス](#ライセンス)
|
42 |
+
|
43 |
+
## はじめに
|
44 |
+
|
45 |
+
このリポジトリは、HunyuanVideoおよびWan2.1のLoRA学習用のコマンドラインツールです。このリポジトリは非公式であり、公式のHunyuanVideoやWan2.1のリポジトリとは関係ありません。
|
46 |
+
|
47 |
+
Wan2.1については、[Wan2.1のドキュメント](./docs/wan.md)も参照してください。
|
48 |
+
|
49 |
+
*リポジトリは開発中です。*
|
50 |
+
|
51 |
+
### 最近の更新
|
52 |
+
|
53 |
+
- 2025/03/16
|
54 |
+
- Wan2.1の学習で、fp16の重みを使用した場合でも重みがbf16にcastされていた不具合を修正しました。[PR #160]https://github.com/kohya-ss/musubi-tuner/pull/160)
|
55 |
+
- あわせてfp16の重みを使用するとサンプル画像生成で黒画像が生成される不具合を修正しました。
|
56 |
+
- fp16の学習で不具合が起きる場合にはbf16をお使いください。
|
57 |
+
- Wan2.1の推論スクリプトをリファクタリングしました。`--fp8_fast`と`--compile`オプションが追加されました。詳しくは[こちら](./docs/wan.md#inference--推論)を参照してください。PR [#153](https://github.com/kohya-ss/musubi-tuner/pull/153)
|
58 |
+
- 大幅に変更を行ったため、不具合があればお知らせください。
|
59 |
+
- 先日追加された`--fp8_scaled`オプションは、fp8での学習および推論の精度向上に効果があるようです。`--fp8_base`で学習している場合や、`--fp8`で推論している場合は、`--fp8_scaled`の追加をご検討ください。問題があればご連絡ください。
|
60 |
+
|
61 |
+
- 2025/03/13
|
62 |
+
- HunyuanVideoの推論スクリプトで、RTX 40x0向けの高速化オプション`--fp8_fast`と、`torch.compile`を使用するオプション`--compile`が追加されました。[PR #137](https://github.com/kohya-ss/musubi-tuner/pull/137) Sarania 氏に感謝いたします。
|
63 |
+
- 詳細は[推論](#推論)を参照してください。
|
64 |
+
- Wan2.1の学習、推論で、fp8量子化を行うオプションを`--fp8_scaled`を追加しました。[PR #141](https://github.com/kohya-ss/musubi-tuner/pull/141)
|
65 |
+
- 単純なFP8へのキャストではなく、スケーリングを行うことで、VRAM使用量の削減と精度の維持を両立します。
|
66 |
+
- 詳細は[高度な設定](./docs/advanced_config.md#fp8-quantization)を参照してください。
|
67 |
+
- また`fp16`のモデルをWan2.1の学習と推論でサポートしました。
|
68 |
+
|
69 |
+
- 2025/03/07
|
70 |
+
- Wan 2.1の学習で、サンプル画像生成を行わない場合でも`--t5`オプションが必須になっていたのを修正しました。
|
71 |
+
|
72 |
+
- 2025/03/07
|
73 |
+
- Wan 2.1のLoRA学習をサポートしました。`wan_train_network.py`を使用してください。詳細は[こちら](./docs/wan.md)を参照してください。
|
74 |
+
|
75 |
+
- 2025/03/04
|
76 |
+
- Wan 2.1の推論をサポートしました。`wan_generate_video.py`を使用してください。詳細は[こちら](./docs/wan.md)を参照してください。
|
77 |
+
- `requirements.txt`が更新されました。`pip install -r requirements.txt`を実行���てください。
|
78 |
+
|
79 |
+
### リリースについて
|
80 |
+
|
81 |
+
Musubi Tunerの解説記事執筆や、関連ツールの開発に取り組んでくださる方々に感謝いたします。このプロジェクトは開発中のため、互換性のない変更や機能追加が起きる可能性があります。想定外の互換性問題を避けるため、参照用として[リリース](https://github.com/kohya-ss/musubi-tuner/releases)をお使いください。
|
82 |
+
|
83 |
+
最新のリリースとバージョン履歴は[リリースページ](https://github.com/kohya-ss/musubi-tuner/releases)で確認できます。
|
84 |
+
|
85 |
+
## 概要
|
86 |
+
|
87 |
+
### ハードウェア要件
|
88 |
+
|
89 |
+
- VRAM: 静止画での学習は12GB以上推奨、動画での学習は24GB以上推奨。
|
90 |
+
- *解像度等の学習設定により異なります。*12GBでは解像度 960x544 以下とし、`--blocks_to_swap`、`--fp8_llm`等の省メモリオプションを使用してください。
|
91 |
+
- メインメモリ: 64GB以上を推奨、32GB+スワップで動作するかもしれませんが、未検証です。
|
92 |
+
|
93 |
+
### 特徴
|
94 |
+
|
95 |
+
- 省メモリに特化
|
96 |
+
- Windows対応(Linuxでの動作報告もあります)
|
97 |
+
- マルチGPUには対応していません
|
98 |
+
|
99 |
+
## インストール
|
100 |
+
|
101 |
+
### pipによるインストール
|
102 |
+
|
103 |
+
Python 3.10以上を使用してください(3.10で動作確認済み)。
|
104 |
+
|
105 |
+
適当な仮想環境を作成し、ご利用のCUDAバージョンに合わせたPyTorchとtorchvisionをインストールしてください。
|
106 |
+
|
107 |
+
PyTorchはバージョン2.5.1以上を使用してください([補足](#PyTorchのバージョンについて))。
|
108 |
+
|
109 |
+
```bash
|
110 |
+
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
|
111 |
+
```
|
112 |
+
|
113 |
+
以下のコマンドを使用して、必要な依存関係をインストールします。
|
114 |
+
|
115 |
+
```bash
|
116 |
+
pip install -r requirements.txt
|
117 |
+
```
|
118 |
+
|
119 |
+
オプションとして、FlashAttention、SageAttention(推論にのみ使用、インストール方法は[こちら](#SageAttentionのインストール方法)を参照)を使用できます。
|
120 |
+
|
121 |
+
また、`ascii-magic`(データセットの確認に使用)、`matplotlib`(timestepsの可視化に使用)、`tensorboard`(学習ログの記録に使用)を必要に応じてインストールしてください。
|
122 |
+
|
123 |
+
```bash
|
124 |
+
pip install ascii-magic matplotlib tensorboard
|
125 |
+
```
|
126 |
+
### uvによるインストール
|
127 |
+
|
128 |
+
uvを使用してインストールすることもできますが、uvによるインストールは試験的なものです。フィードバックを歓迎します。
|
129 |
+
|
130 |
+
#### Linux/MacOS
|
131 |
+
|
132 |
+
```sh
|
133 |
+
curl -LsSf https://astral.sh/uv/install.sh | sh
|
134 |
+
```
|
135 |
+
|
136 |
+
表示される指示に従い、pathを設定してください。
|
137 |
+
|
138 |
+
#### Windows
|
139 |
+
|
140 |
+
```powershell
|
141 |
+
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
|
142 |
+
```
|
143 |
+
|
144 |
+
表示される指示に従い、PATHを設定するか、この時点でシステムを再起動してください。
|
145 |
+
|
146 |
+
## モデルのダウンロード
|
147 |
+
|
148 |
+
以下のいずれかの方法で、モデルをダウンロードしてください。
|
149 |
+
|
150 |
+
### HunyuanVideoの公式モデルを使う
|
151 |
+
|
152 |
+
[公式のREADME](https://github.com/Tencent/HunyuanVideo/blob/main/ckpts/README.md)を参考にダウンロードし、任意のディレクトリに以下のように配置します。
|
153 |
+
|
154 |
+
```
|
155 |
+
ckpts
|
156 |
+
├──hunyuan-video-t2v-720p
|
157 |
+
│ ├──transformers
|
158 |
+
│ ├──vae
|
159 |
+
├──text_encoder
|
160 |
+
├──text_encoder_2
|
161 |
+
├──...
|
162 |
+
```
|
163 |
+
|
164 |
+
### Text EncoderにComfyUI提供のモデルを使う
|
165 |
+
|
166 |
+
こちらの方法の方がより簡単です。DiTとVAEのモデルはHumyuanVideoのものを使用します。
|
167 |
+
|
168 |
+
https://huggingface.co/tencent/HunyuanVideo/tree/main/hunyuan-video-t2v-720p/transformers から、[mp_rank_00_model_states.pt](https://huggingface.co/tencent/HunyuanVideo/resolve/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt) をダウンロードし、任意のディレクトリに配置します。
|
169 |
+
|
170 |
+
(同じページにfp8のモデルもありますが、未検証です。)
|
171 |
+
|
172 |
+
`--fp8_base`を指定して学習する場合は、`mp_rank_00_model_states.pt`の代わりに、[こちら](https://huggingface.co/kohya-ss/HunyuanVideo-fp8_e4m3fn-unofficial)の`mp_rank_00_model_states_fp8.safetensors`を使用可能です。(このファイルは非公式のもので、重みを単純にfloat8_e4m3fnに変換したものです。)
|
173 |
+
|
174 |
+
また、https://huggingface.co/tencent/HunyuanVideo/tree/main/hunyuan-video-t2v-720p/vae から、[pytorch_model.pt](https://huggingface.co/tencent/HunyuanVideo/resolve/main/hunyuan-video-t2v-720p/vae/pytorch_model.pt) をダウンロードし、任意のディレクトリに配置します。
|
175 |
+
|
176 |
+
Text EncoderにはComfyUI提供のモデルを使用させていただきます。[ComyUIのページ](https://comfyanonymous.github.io/ComfyUI_examples/hunyuan_video/)を参考に、https://huggingface.co/Comfy-Org/HunyuanVideo_repackaged/tree/main/split_files/text_encoders から、llava_llama3_fp16.safetensors (Text Encoder 1、LLM)と��clip_l.safetensors (Text Encoder 2、CLIP)をダウンロードし、任意のディレクトリに配置します。
|
177 |
+
|
178 |
+
(同じページにfp8のLLMモデルもありますが、動作未検証です。)
|
179 |
+
|
180 |
+
## 使い方
|
181 |
+
|
182 |
+
### データセット設定
|
183 |
+
|
184 |
+
[こちら](./dataset/dataset_config.md)を参照してください。
|
185 |
+
|
186 |
+
### latentの事前キャッシュ
|
187 |
+
|
188 |
+
latentの事前キャッシュは必須です。以下のコマンドを使用して、事前キャッシュを作成してください。(pipによるインストールの場合)
|
189 |
+
|
190 |
+
```bash
|
191 |
+
python cache_latents.py --dataset_config path/to/toml --vae path/to/ckpts/hunyuan-video-t2v-720p/vae/pytorch_model.pt --vae_chunk_size 32 --vae_tiling
|
192 |
+
```
|
193 |
+
|
194 |
+
uvでインストールした場合は、`uv run python cache_latents.py ...`のように、`uv run`を先頭につけてください。以下のコマンドも同様です。
|
195 |
+
|
196 |
+
その他のオプションは`python cache_latents.py --help`で確認できます。
|
197 |
+
|
198 |
+
VRAMが足りない場合は、`--vae_spatial_tile_sample_min_size`を128程度に減らし、`--batch_size`を小さくしてください。
|
199 |
+
|
200 |
+
`--debug_mode image` を指定するとデータセットの画像とキャプションが新規ウィンドウに表示されます。`--debug_mode console`でコンソールに表示されます(`ascii-magic`が必要)。
|
201 |
+
|
202 |
+
デフォルトではデータセットに含まれないキャッシュファイルは自動的に削除されます。`--keep_cache`を指定すると、キャッシュファイルを残すことができます。
|
203 |
+
|
204 |
+
### Text Encoder出力の事前キャッシュ
|
205 |
+
|
206 |
+
Text Encoder出力の事前キャッシュは必須です。以下のコマンドを使用して、事前キャッシュを作成してください。
|
207 |
+
|
208 |
+
```bash
|
209 |
+
python cache_text_encoder_outputs.py --dataset_config path/to/toml --text_encoder1 path/to/ckpts/text_encoder --text_encoder2 path/to/ckpts/text_encoder_2 --batch_size 16
|
210 |
+
```
|
211 |
+
|
212 |
+
その他のオプションは`python cache_text_encoder_outputs.py --help`で確認できます。
|
213 |
+
|
214 |
+
`--batch_size`はVRAMに合わせて調整してください。
|
215 |
+
|
216 |
+
VRAMが足りない場合(16GB程度未満の場合)は、`--fp8_llm`を指定して、fp8でLLMを実行してください。
|
217 |
+
|
218 |
+
デフォルトではデータセットに含まれないキャッシュファイルは自動的に削除されます。`--keep_cache`を指定すると、キャッシュファイルを残すことができます。
|
219 |
+
|
220 |
+
### 学習
|
221 |
+
|
222 |
+
以下のコマンドを使用して、学習を開始します(実際には一行で入力してください)。
|
223 |
+
|
224 |
+
```bash
|
225 |
+
accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 hv_train_network.py
|
226 |
+
--dit path/to/ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt
|
227 |
+
--dataset_config path/to/toml --sdpa --mixed_precision bf16 --fp8_base
|
228 |
+
--optimizer_type adamw8bit --learning_rate 2e-4 --gradient_checkpointing
|
229 |
+
--max_data_loader_n_workers 2 --persistent_data_loader_workers
|
230 |
+
--network_module networks.lora --network_dim 32
|
231 |
+
--timestep_sampling shift --discrete_flow_shift 7.0
|
232 |
+
--max_train_epochs 16 --save_every_n_epochs 1 --seed 42
|
233 |
+
--output_dir path/to/output_dir --output_name name-of-lora
|
234 |
+
```
|
235 |
+
|
236 |
+
__更新__:サンプルの学習率を1e-3から2e-4に、`--timestep_sampling`を`sigmoid`から`shift`に、`--discrete_flow_shift`を1.0から7.0に変更しました。より高速な学習が期待されます。ディテールが甘くなる場合は、discrete flow shiftを3.0程度に下げてみてください。
|
237 |
+
|
238 |
+
ただ、適切な学習率、学習ステップ数、timestepsの分布、loss weightingなどのパラメータは、以前として不明な点が数多くあります。情報提供をお待ちしています。
|
239 |
+
|
240 |
+
その他のオプションは`python hv_train_network.py --help`で確認できます(ただし多くのオプションは動作未確認です)。
|
241 |
+
|
242 |
+
`--fp8_base`を指定すると、DiTがfp8で学習されます。未指定時はmixed precisionのデータ型が使用されます。fp8は大きく消費メモリを削減できますが、品質は低下する可能性があります。`--fp8_base`を指定しない場合はVRAM 24GB以上を推奨します。また必要に応じて`--blocks_to_swap`を使用してください。
|
243 |
+
|
244 |
+
VRAMが足りない場合は、`--blocks_to_swap`を指定して、一部のブロックをCPUにオフロードしてください。最大36が指定できます。
|
245 |
+
|
246 |
+
(block swapのアイデアは2kpr氏の実装に基づくものです。2kpr氏にあらためて感謝します。)
|
247 |
+
|
248 |
+
`--sdpa`でPyTorchのscaled dot product attentionを使用します。`--flash_attn`で[FlashAttention]:(https://github.com/Dao-AILab/flash-attention)を使用します。`--xformers`でxformersの利用も可能ですが、xformersを使う場合は`--split_attn`を指定してください。`--sage_attn`でSageAttentionを使用しますが、SageAttentionは現時点では学習に未対応のため、正しく動作しません。
|
249 |
+
|
250 |
+
`--split_attn`を指定すると、attentionを分割して処理します。速度が多少低下しますが、VRAM使用量はわずかに減ります。
|
251 |
+
|
252 |
+
学習されるLoRAの形式は、`sd-scripts`と同じです。
|
253 |
+
|
254 |
+
`--show_timesteps`に`image`(`matplotlib`が必要)または`console`を指定すると、学習時のtimestepsの分布とtimestepsごとのloss weightingが確認できます。
|
255 |
+
|
256 |
+
学習時のログの記録が可能です。[TensorBoard形式のログの保存と参照](./docs/advanced_config.md#save-and-view-logs-in-tensorboard-format--tensorboard形式のログの保存と参照)を参照してください。
|
257 |
+
|
258 |
+
学習中のサンプル画像生成については、[こちらのドキュメント](./docs/sampling_during_training.md)を参照してください。その他の高度な設定については[こちらのドキュメント](./docs/advanced_config.md)を参照してください。
|
259 |
+
|
260 |
+
### LoRAの重みのマージ
|
261 |
+
|
262 |
+
注:Wan 2.1には対応していません。
|
263 |
+
|
264 |
+
```bash
|
265 |
+
python merge_lora.py \
|
266 |
+
--dit path/to/ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt \
|
267 |
+
--lora_weight path/to/lora.safetensors \
|
268 |
+
--save_merged_model path/to/merged_model.safetensors \
|
269 |
+
--device cpu \
|
270 |
+
--lora_multiplier 1.0
|
271 |
+
```
|
272 |
+
|
273 |
+
`--device`には計算を行うデバイス(`cpu`または`cuda`等)を指定してください。`cuda`を指定すると計算が高速化されます。
|
274 |
+
|
275 |
+
`--lora_weight`にはマージするLoRAの重みを、`--lora_multiplier`にはLoRAの重みの係数を、それぞれ指定してください。複数個が指定可能で、両者の数は一致させてください。
|
276 |
+
|
277 |
+
### 推論
|
278 |
+
|
279 |
+
以下のコマンドを使用して動画を生成します。
|
280 |
+
|
281 |
+
```bash
|
282 |
+
python hv_generate_video.py --fp8 --video_size 544 960 --video_length 5 --infer_steps 30
|
283 |
+
--prompt "A cat walks on the grass, realistic style." --save_path path/to/save/dir --output_type both
|
284 |
+
--dit path/to/ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt --attn_mode sdpa --split_attn
|
285 |
+
--vae path/to/ckpts/hunyuan-video-t2v-720p/vae/pytorch_model.pt
|
286 |
+
--vae_chunk_size 32 --vae_spatial_tile_sample_min_size 128
|
287 |
+
--text_encoder1 path/to/ckpts/text_encoder
|
288 |
+
--text_encoder2 path/to/ckpts/text_encoder_2
|
289 |
+
--seed 1234 --lora_multiplier 1.0 --lora_weight path/to/lora.safetensors
|
290 |
+
```
|
291 |
+
|
292 |
+
その他のオプションは`python hv_generate_video.py --help`で確認できます。
|
293 |
+
|
294 |
+
`--fp8`を指定すると、DiTがfp8で推論されます。fp8は大きく消費メモリを削減できますが、品質は低下する可能性があります。
|
295 |
+
|
296 |
+
RTX 40x0シリーズのGPUを使用している場合は、`--fp8_fast`オプションを指定することで、高速推論が可能です。このオプションを指定する場合は、`--fp8`も指定してください。
|
297 |
+
|
298 |
+
VRAMが足りない場合は、`--blocks_to_swap`を指定して、一部のブロックをCPUにオフロードしてください。最大38が指定できます。
|
299 |
+
|
300 |
+
`--attn_mode`には`flash`、`torch`、`sageattn`、`xformers`または`sdpa`(`torch`指定時と同じ)のいずれかを指定してください。それぞれFlashAttention、scaled dot product attention、SageAttention、xformersに対応します。デフォルトは`torch`です。SageAttentionはVRAMの削減に有効です。
|
301 |
+
|
302 |
+
`--split_attn`を指定すると、attentionを分割して処理します。SageAttention利用時で10%程度の高速化が見込まれます。
|
303 |
+
|
304 |
+
`--output_type`には`both`、`latent`、`video`、`images`のいずれかを指定してください。`both`はlatentと動画の両方を出力します。VAEでOut of Memoryエラーが発生する場合に備えて、`both`を指定することをお勧めします。`--latent_path`に保存されたlatentを指定し、`--output_type video` (または`images`)としてスクリプトを実行すると、VAEのdecodeのみを行えます。
|
305 |
+
|
306 |
+
`--seed`は省略可能です。指定しない場合はランダムなシードが使用されます。
|
307 |
+
|
308 |
+
`--video_length`は「4の倍数+1」を指定してください。
|
309 |
+
|
310 |
+
`--flow_shift`にタイムステップのシフト値(discrete flow shift)を指定可能です。省略時のデフォルト値は7.0で、これは推論ステップ数が50の時の推奨値です。HunyuanVideoの論文では、ステップ数50の場合は7.0、ステップ数20未満(10など)で17.0が推奨されています。
|
311 |
+
|
312 |
+
`--video_path`に読み込む動画を指定すると、video2videoの推論が可能です。動画ファイルを指定するか、複数の画像ファイルが入ったディレクトリを指定してください(画像ファイルはファイル名でソートされ、各フレームとして用いられます)。`--video_length`よりも短い動画を指定するとエラーになります。`--strength`で強度を指定できます。0~1.0で指定でき、大きいほど元の動画からの変化が大きくな���ます。
|
313 |
+
|
314 |
+
なおvideo2video推論の処理は実験的なものです。
|
315 |
+
|
316 |
+
`--compile`オプションでPyTorchのコンパイル機能を有効にします(実験的機能)。tritonのインストールが必要です。また、WindowsではVisual C++ build toolsが必要で、かつPyTorch>=2.6.0でのみ動作します。`--compile_args`でコンパイル時の引数を渡すことができます。
|
317 |
+
|
318 |
+
`--compile`は初回実行時にかなりの時間がかかりますが、2回目以降は高速化されます。
|
319 |
+
|
320 |
+
`--save_merged_model`オプションで、LoRAマージ後のDiTモデルを保存できます。`--save_merged_model path/to/merged_model.safetensors`のように指定してください。なおこのオプションを指定すると推論は行われません。
|
321 |
+
|
322 |
+
### SkyReels V1での推論
|
323 |
+
|
324 |
+
SkyReels V1のT2VとI2Vモデルがサポートされています(推論のみ)。
|
325 |
+
|
326 |
+
モデルは[こちら](https://huggingface.co/Kijai/SkyReels-V1-Hunyuan_comfy)からダウンロードできます。モデルを提供してくださったKijai氏に感謝します。`skyreels_hunyuan_i2v_bf16.safetensors`がI2Vモデル、`skyreels_hunyuan_t2v_bf16.safetensors`がT2Vモデルです。`bf16`以外の形式は未検証です(`fp8_e4m3fn`は動作するかもしれません)。
|
327 |
+
|
328 |
+
T2V推論を行う場合、以下のオプションを推論コマンドに追加してください:
|
329 |
+
|
330 |
+
```bash
|
331 |
+
--guidance_scale 6.0 --embedded_cfg_scale 1.0 --negative_prompt "Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion" --split_uncond
|
332 |
+
```
|
333 |
+
|
334 |
+
SkyReels V1はclassifier free guidance(ネガティブプロンプト)を必要とするようです。`--guidance_scale`はネガティブプロンプトのガイダンススケールです。公式リポジトリの推奨値は6.0です。デフォルトは1.0で、この場合はclassifier free guidanceは使用されません(ネガティブプロンプトは無視されます)。
|
335 |
+
|
336 |
+
`--embedded_cfg_scale`は埋め込みガイダンスのスケールです。公式リポジトリの推奨値は1.0です(埋め込みガイダンスなしを意味すると思われます)。
|
337 |
+
|
338 |
+
`--negative_prompt`はいわゆるネガティブプロンプトです。上記のサンプルは公式リポジトリのものです。`--guidance_scale`を指定し、`--negative_prompt`を指定しなかった場合は、空文字列が使用されます。
|
339 |
+
|
340 |
+
`--split_uncond`を指定すると、モデル呼び出しをuncondとcond(ネガティブプロンプトとプロンプト)に分割します。VRAM使用量が減りますが、推論速度は低下する可能性があります。`--split_attn`が指定されている場合、`--split_uncond`は自動的に有効になります。
|
341 |
+
|
342 |
+
### LoRAの形式の変換
|
343 |
+
|
344 |
+
ComfyUIで使用可能な形式(Diffusion-pipeと思われる)への変換は以下のコマンドで行えます。
|
345 |
+
|
346 |
+
```bash
|
347 |
+
python convert_lora.py --input path/to/musubi_lora.safetensors --output path/to/another_format.safetensors --target other
|
348 |
+
```
|
349 |
+
|
350 |
+
`--input`と`--output`はそれぞれ入力と出力のファイルパスを指定してください。
|
351 |
+
|
352 |
+
`--target`には`other`を指定してください。`default`を指定すると、他の形式から当リポジトリの形式に変換できます。
|
353 |
+
|
354 |
+
Wan2.1も対応済みです。
|
355 |
+
|
356 |
+
## その他
|
357 |
+
|
358 |
+
### SageAttentionのインストール方法
|
359 |
+
|
360 |
+
sdbds氏によるWindows対応のSageAttentionのwheelが https://github.com/sdbds/SageAttention-for-windows で公開されています。triton をインストールし、Python、PyTorch、CUDAのバージョンが一致する場合は、[Releases](https://github.com/sdbds/SageAttention-for-windows/releases)からビルド済みwheelをダウンロードしてインストールすることが可能です。sdbds氏に感謝します。
|
361 |
+
|
362 |
+
参考までに、以下は、SageAttentionをビルドしインストールするための簡単な手順です。Microsoft Visual C++ 再頒布可能パッケージを最新にする必要があるかもしれません。
|
363 |
+
|
364 |
+
1. Pythonのバージョンに応じたtriton 3.1.0のwhellを[こちら](https://github.com/woct0rdho/triton-windows/releases/tag/v3.1.0-windows.post5)からダウンロードしてインストールします。
|
365 |
+
|
366 |
+
2. Microsoft Visual Studio 2022かBuild Tools for Visual Studio 2022を、C++のビルドができるよう設定し、インストールします。(上のRedditの投稿を参照してください)。
|
367 |
+
|
368 |
+
3. 任意のフォルダにSageAttentionのリポジトリをクローンします。
|
369 |
+
```shell
|
370 |
+
git clone https://github.com/thu-ml/SageAttention.git
|
371 |
+
```
|
372 |
+
|
373 |
+
なお `git clone https://github.com/sdbds/SageAttention-for-windows.git` で、前述のsdbds氏のリポジトリを使用することで、手順4.を省略できます。
|
374 |
+
|
375 |
+
4. `SageAttention/csrc`フォルダ内の`math.cuh`を開き、71行目と146行目の `ushort` を `unsigned short` に変更し��保存します。
|
376 |
+
|
377 |
+
5. スタートメニューから Visual Studio 2022 内の `x64 Native Tools Command Prompt for VS 2022` を選択してコマンドプロンプトを開きます。
|
378 |
+
|
379 |
+
6. venvを有効にし、SageAttentionのフォルダに移動して以下のコマンドを実行します。DISTUTILSが設定されていない、のようなエラーが出た場合は `set DISTUTILS_USE_SDK=1`としてから再度実行してください。
|
380 |
+
```shell
|
381 |
+
python setup.py install
|
382 |
+
```
|
383 |
+
|
384 |
+
以上でSageAttentionのインストールが完了です。
|
385 |
+
|
386 |
+
### PyTorchのバージョンについて
|
387 |
+
|
388 |
+
`--attn_mode`に`torch`を指定する場合、2.5.1以降のPyTorchを使用してください(それより前のバージョンでは生成される動画が真っ黒になるようです)。
|
389 |
+
|
390 |
+
古いバージョンを使う場合、xformersやSageAttentionを使用してください。
|
391 |
+
|
392 |
+
## 免責事項
|
393 |
+
|
394 |
+
このリポジトリは非公式であり、公式のHunyuanVideoリポジトリとは関係ありません。また、このリポジトリは開発中で、実験的なものです。テストおよびフィードバックを歓迎しますが、以下の点にご注意ください:
|
395 |
+
|
396 |
+
- 実際の稼働環境での動作を意図したものではありません
|
397 |
+
- 機能やAPIは予告なく変更されることがあります
|
398 |
+
- いくつもの機能が未検証です
|
399 |
+
- 動画学習機能はまだ開発中です
|
400 |
+
|
401 |
+
問題やバグについては、以下の情報とともにIssueを作成してください:
|
402 |
+
|
403 |
+
- 問題の詳細な説明
|
404 |
+
- 再現手順
|
405 |
+
- 環境の詳細(OS、GPU、VRAM、Pythonバージョンなど)
|
406 |
+
- 関連するエラーメッセージやログ
|
407 |
+
|
408 |
+
## コントリビューションについて
|
409 |
+
|
410 |
+
コントリビューションを歓迎します。ただし、以下にご注意ください:
|
411 |
+
|
412 |
+
- メンテナーのリソースが限られているため、PRのレビューやマージには時間がかかる場合があります
|
413 |
+
- 大きな変更に取り組む前には、議論のためのIssueを作成してください
|
414 |
+
- PRに関して:
|
415 |
+
- 変更は焦点を絞り、適度なサイズにしてください
|
416 |
+
- 明確な説明をお願いします
|
417 |
+
- 既存のコードスタイルに従ってください
|
418 |
+
- ドキュメントが更新されていることを確認してください
|
419 |
+
|
420 |
+
## ライセンス
|
421 |
+
|
422 |
+
`hunyuan_model`ディレクトリ以下のコードは、[HunyuanVideo](https://github.com/Tencent/HunyuanVideo)のコードを一部改変して使用しているため、そちらのライセンスに従います。
|
423 |
+
|
424 |
+
`wan`ディレクトリ以下のコードは、[Wan2.1](https://github.com/Wan-Video/Wan2.1)のコードを一部改変して使用しています。ライセンスはApache License 2.0です。
|
425 |
+
|
426 |
+
他のコードはApache License 2.0に従います。一部Diffusersのコードをコピー、改変して使用しています。
|
README.md
ADDED
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Simple GUI for [Musubi Tuner](https://github.com/kohya-ss/musubi-tuner) (Wan 2.1 models only)
|
2 |
+
|
3 |
+
|
4 |
+
# How to use GUI
|
5 |
+
|
6 |
+
- Download the repository by running in the command line:
|
7 |
+
`git clone https://github.com/Kvento/musubi-tuner-wan-gui`
|
8 |
+
|
9 |
+
- To open the GUI just run `Start_Wan_GUI.bat`.
|
10 |
+
- All settings can be saved and loaded using the "**Load Settings**" and "**Save Setting**" buttons.
|
11 |
+
- More info about settings see in [Wan2.1 documentation](./docs/wan.md), [Advanced Configuration](./docs/advanced_config.md#fp8-quantization), [Dataset configuration guide](./dataset/dataset_config.md).
|
12 |
+
|
13 |
+
|
14 |
+

|
15 |
+
|
16 |
+
|
17 |
+
|
18 |
+
|
19 |
+
|
20 |
+
# Miscellaneous
|
21 |
+
|
22 |
+
|
23 |
+
## SageAttention Installation
|
24 |
+
|
25 |
+
sdbsd has provided a Windows-compatible SageAttention implementation and pre-built wheels here: https://github.com/sdbds/SageAttention-for-windows. After installing triton, if your Python, PyTorch, and CUDA versions match, you can download and install the pre-built wheel from the [Releases](https://github.com/sdbds/SageAttention-for-windows/releases) page. Thanks to sdbsd for this contribution.
|
26 |
+
|
27 |
+
For reference, the build and installation instructions are as follows. You may need to update Microsoft Visual C++ Redistributable to the latest version.
|
28 |
+
|
29 |
+
1. Download and install triton 3.1.0 wheel matching your Python version from [here](https://github.com/woct0rdho/triton-windows/releases/tag/v3.1.0-windows.post5).
|
30 |
+
|
31 |
+
2. Install Microsoft Visual Studio 2022 or Build Tools for Visual Studio 2022, configured for C++ builds.
|
32 |
+
|
33 |
+
3. Clone the SageAttention repository in your preferred directory:
|
34 |
+
```shell
|
35 |
+
git clone https://github.com/thu-ml/SageAttention.git
|
36 |
+
```
|
37 |
+
|
38 |
+
You can skip step 4 by using the sdbsd repository mentioned above by `git clone https://github.com/sdbds/SageAttention-for-windows.git`.
|
39 |
+
|
40 |
+
4. Open `math.cuh` in the `SageAttention/csrc` folder and change `ushort` to `unsigned short` on lines 71 and 146, then save.
|
41 |
+
|
42 |
+
5. Open `x64 Native Tools Command Prompt for VS 2022` from the Start menu under Visual Studio 2022.
|
43 |
+
|
44 |
+
6. Activate your venv, navigate to the SageAttention folder, and run the following command. If you get a DISTUTILS not configured error, set `set DISTUTILS_USE_SDK=1` and try again:
|
45 |
+
```shell
|
46 |
+
python setup.py install
|
47 |
+
```
|
48 |
+
|
49 |
+
This completes the SageAttention installation.
|
50 |
+
|
51 |
+
### PyTorch version
|
52 |
+
|
53 |
+
If you specify `torch` for `--attn_mode`, use PyTorch 2.5.1 or later (earlier versions may result in black videos).
|
54 |
+
|
55 |
+
If you use an earlier version, use xformers or SageAttention.
|
56 |
+
|
57 |
+
|
58 |
+
# License
|
59 |
+
|
60 |
+
Code under the `hunyuan_model` directory is modified from [HunyuanVideo](https://github.com/Tencent/HunyuanVideo) and follows their license.
|
61 |
+
|
62 |
+
Code under the `wan` directory is modified from [Wan2.1](https://github.com/Wan-Video/Wan2.1). The license is under the Apache License 2.0.
|
63 |
+
|
64 |
+
Other code is under the Apache License 2.0. Some code is copied and modified from Diffusers.
|
Start_Wan_GUI.bat
ADDED
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
@echo off
|
2 |
+
setlocal
|
3 |
+
|
4 |
+
:: Specify the path to your Python script
|
5 |
+
set SCRIPT_PATH=wan_lora_trainer_gui.py
|
6 |
+
|
7 |
+
:: Check if Python is installed
|
8 |
+
echo Checking for Python...
|
9 |
+
python --version >nul 2>&1
|
10 |
+
if %errorlevel% neq 0 (
|
11 |
+
echo Python not found. Automatic installation is not possible via bat file.
|
12 |
+
echo Please install Python manually from the official website: https://www.python.org/
|
13 |
+
pause
|
14 |
+
exit /b 1
|
15 |
+
)
|
16 |
+
|
17 |
+
:: Check for pip (tool for installing Python packages)
|
18 |
+
echo Checking for pip...
|
19 |
+
python -m ensurepip >nul 2>&1
|
20 |
+
python -m pip --version >nul 2>&1
|
21 |
+
if %errorlevel% neq 0 (
|
22 |
+
echo pip not found. Installing pip...
|
23 |
+
python -m ensurepip --upgrade
|
24 |
+
python -m pip install --upgrade pip
|
25 |
+
if %errorlevel% neq 0 (
|
26 |
+
echo Failed to install pip. Please check your Python installation.
|
27 |
+
pause
|
28 |
+
exit /b 1
|
29 |
+
)
|
30 |
+
)
|
31 |
+
|
32 |
+
:: Check for tkinter
|
33 |
+
echo Checking for tkinter...
|
34 |
+
python -c "import tkinter" >nul 2>&1
|
35 |
+
if %errorlevel% neq 0 (
|
36 |
+
echo tkinter module not found. Attempting to install...
|
37 |
+
python -m pip install tk
|
38 |
+
if %errorlevel% neq 0 (
|
39 |
+
echo Failed to install tkinter. There might be an issue with permissions.
|
40 |
+
pause
|
41 |
+
exit /b 1
|
42 |
+
)
|
43 |
+
)
|
44 |
+
|
45 |
+
:: Run the script
|
46 |
+
echo All dependencies are installed. Running the script...
|
47 |
+
start /min python %SCRIPT_PATH%
|
48 |
+
if %errorlevel% neq 0 (
|
49 |
+
echo An error occurred while running the script.
|
50 |
+
pause
|
51 |
+
exit /b 1
|
52 |
+
)
|
53 |
+
|
54 |
+
echo Script executed successfully.
|
cache_latents.py
ADDED
@@ -0,0 +1,281 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import argparse
|
2 |
+
import os
|
3 |
+
import glob
|
4 |
+
from typing import Optional, Union
|
5 |
+
|
6 |
+
import numpy as np
|
7 |
+
import torch
|
8 |
+
from tqdm import tqdm
|
9 |
+
|
10 |
+
from dataset import config_utils
|
11 |
+
from dataset.config_utils import BlueprintGenerator, ConfigSanitizer
|
12 |
+
from PIL import Image
|
13 |
+
|
14 |
+
import logging
|
15 |
+
|
16 |
+
from dataset.image_video_dataset import BaseDataset, ItemInfo, save_latent_cache, ARCHITECTURE_HUNYUAN_VIDEO
|
17 |
+
from hunyuan_model.vae import load_vae
|
18 |
+
from hunyuan_model.autoencoder_kl_causal_3d import AutoencoderKLCausal3D
|
19 |
+
from utils.model_utils import str_to_dtype
|
20 |
+
|
21 |
+
logger = logging.getLogger(__name__)
|
22 |
+
logging.basicConfig(level=logging.INFO)
|
23 |
+
|
24 |
+
|
25 |
+
def show_image(image: Union[list[Union[Image.Image, np.ndarray], Union[Image.Image, np.ndarray]]]) -> int:
|
26 |
+
import cv2
|
27 |
+
|
28 |
+
imgs = (
|
29 |
+
[image]
|
30 |
+
if (isinstance(image, np.ndarray) and len(image.shape) == 3) or isinstance(image, Image.Image)
|
31 |
+
else [image[0], image[-1]]
|
32 |
+
)
|
33 |
+
if len(imgs) > 1:
|
34 |
+
print(f"Number of images: {len(image)}")
|
35 |
+
for i, img in enumerate(imgs):
|
36 |
+
if len(imgs) > 1:
|
37 |
+
print(f"{'First' if i == 0 else 'Last'} image: {img.shape}")
|
38 |
+
else:
|
39 |
+
print(f"Image: {img.shape}")
|
40 |
+
cv2_img = np.array(img) if isinstance(img, Image.Image) else img
|
41 |
+
cv2_img = cv2.cvtColor(cv2_img, cv2.COLOR_RGB2BGR)
|
42 |
+
cv2.imshow("image", cv2_img)
|
43 |
+
k = cv2.waitKey(0)
|
44 |
+
cv2.destroyAllWindows()
|
45 |
+
if k == ord("q") or k == ord("d"):
|
46 |
+
return k
|
47 |
+
return k
|
48 |
+
|
49 |
+
|
50 |
+
def show_console(
|
51 |
+
image: Union[list[Union[Image.Image, np.ndarray], Union[Image.Image, np.ndarray]]],
|
52 |
+
width: int,
|
53 |
+
back: str,
|
54 |
+
interactive: bool = False,
|
55 |
+
) -> int:
|
56 |
+
from ascii_magic import from_pillow_image, Back
|
57 |
+
|
58 |
+
back = None
|
59 |
+
if back is not None:
|
60 |
+
back = getattr(Back, back.upper())
|
61 |
+
|
62 |
+
k = None
|
63 |
+
imgs = (
|
64 |
+
[image]
|
65 |
+
if (isinstance(image, np.ndarray) and len(image.shape) == 3) or isinstance(image, Image.Image)
|
66 |
+
else [image[0], image[-1]]
|
67 |
+
)
|
68 |
+
if len(imgs) > 1:
|
69 |
+
print(f"Number of images: {len(image)}")
|
70 |
+
for i, img in enumerate(imgs):
|
71 |
+
if len(imgs) > 1:
|
72 |
+
print(f"{'First' if i == 0 else 'Last'} image: {img.shape}")
|
73 |
+
else:
|
74 |
+
print(f"Image: {img.shape}")
|
75 |
+
pil_img = img if isinstance(img, Image.Image) else Image.fromarray(img)
|
76 |
+
ascii_img = from_pillow_image(pil_img)
|
77 |
+
ascii_img.to_terminal(columns=width, back=back)
|
78 |
+
|
79 |
+
if interactive:
|
80 |
+
k = input("Press q to quit, d to next dataset, other key to next: ")
|
81 |
+
if k == "q" or k == "d":
|
82 |
+
return ord(k)
|
83 |
+
|
84 |
+
if not interactive:
|
85 |
+
return ord(" ")
|
86 |
+
return ord(k) if k else ord(" ")
|
87 |
+
|
88 |
+
|
89 |
+
def show_datasets(
|
90 |
+
datasets: list[BaseDataset], debug_mode: str, console_width: int, console_back: str, console_num_images: Optional[int]
|
91 |
+
):
|
92 |
+
print(f"d: next dataset, q: quit")
|
93 |
+
|
94 |
+
num_workers = max(1, os.cpu_count() - 1)
|
95 |
+
for i, dataset in enumerate(datasets):
|
96 |
+
print(f"Dataset [{i}]")
|
97 |
+
batch_index = 0
|
98 |
+
num_images_to_show = console_num_images
|
99 |
+
k = None
|
100 |
+
for key, batch in dataset.retrieve_latent_cache_batches(num_workers):
|
101 |
+
print(f"bucket resolution: {key}, count: {len(batch)}")
|
102 |
+
for j, item_info in enumerate(batch):
|
103 |
+
item_info: ItemInfo
|
104 |
+
print(f"{batch_index}-{j}: {item_info}")
|
105 |
+
if debug_mode == "image":
|
106 |
+
k = show_image(item_info.content)
|
107 |
+
elif debug_mode == "console":
|
108 |
+
k = show_console(item_info.content, console_width, console_back, console_num_images is None)
|
109 |
+
if num_images_to_show is not None:
|
110 |
+
num_images_to_show -= 1
|
111 |
+
if num_images_to_show == 0:
|
112 |
+
k = ord("d") # next dataset
|
113 |
+
|
114 |
+
if k == ord("q"):
|
115 |
+
return
|
116 |
+
elif k == ord("d"):
|
117 |
+
break
|
118 |
+
if k == ord("d"):
|
119 |
+
break
|
120 |
+
batch_index += 1
|
121 |
+
|
122 |
+
|
123 |
+
def encode_and_save_batch(vae: AutoencoderKLCausal3D, batch: list[ItemInfo]):
|
124 |
+
contents = torch.stack([torch.from_numpy(item.content) for item in batch])
|
125 |
+
if len(contents.shape) == 4:
|
126 |
+
contents = contents.unsqueeze(1) # B, H, W, C -> B, F, H, W, C
|
127 |
+
|
128 |
+
contents = contents.permute(0, 4, 1, 2, 3).contiguous() # B, C, F, H, W
|
129 |
+
contents = contents.to(vae.device, dtype=vae.dtype)
|
130 |
+
contents = contents / 127.5 - 1.0 # normalize to [-1, 1]
|
131 |
+
|
132 |
+
h, w = contents.shape[3], contents.shape[4]
|
133 |
+
if h < 8 or w < 8:
|
134 |
+
item = batch[0] # other items should have the same size
|
135 |
+
raise ValueError(f"Image or video size too small: {item.item_key} and {len(batch) - 1} more, size: {item.original_size}")
|
136 |
+
|
137 |
+
# print(f"encode batch: {contents.shape}")
|
138 |
+
with torch.no_grad():
|
139 |
+
latent = vae.encode(contents).latent_dist.sample()
|
140 |
+
# latent = latent * vae.config.scaling_factor
|
141 |
+
|
142 |
+
# # debug: decode and save
|
143 |
+
# with torch.no_grad():
|
144 |
+
# latent_to_decode = latent / vae.config.scaling_factor
|
145 |
+
# images = vae.decode(latent_to_decode, return_dict=False)[0]
|
146 |
+
# images = (images / 2 + 0.5).clamp(0, 1)
|
147 |
+
# images = images.cpu().float().numpy()
|
148 |
+
# images = (images * 255).astype(np.uint8)
|
149 |
+
# images = images.transpose(0, 2, 3, 4, 1) # B, C, F, H, W -> B, F, H, W, C
|
150 |
+
# for b in range(images.shape[0]):
|
151 |
+
# for f in range(images.shape[1]):
|
152 |
+
# fln = os.path.splitext(os.path.basename(batch[b].item_key))[0]
|
153 |
+
# img = Image.fromarray(images[b, f])
|
154 |
+
# img.save(f"./logs/decode_{fln}_{b}_{f:03d}.jpg")
|
155 |
+
|
156 |
+
for item, l in zip(batch, latent):
|
157 |
+
# print(f"save latent cache: {item.latent_cache_path}, latent shape: {l.shape}")
|
158 |
+
save_latent_cache(item, l)
|
159 |
+
|
160 |
+
|
161 |
+
def encode_datasets(datasets: list[BaseDataset], encode: callable, args: argparse.Namespace):
|
162 |
+
num_workers = args.num_workers if args.num_workers is not None else max(1, os.cpu_count() - 1)
|
163 |
+
for i, dataset in enumerate(datasets):
|
164 |
+
logger.info(f"Encoding dataset [{i}]")
|
165 |
+
all_latent_cache_paths = []
|
166 |
+
for _, batch in tqdm(dataset.retrieve_latent_cache_batches(num_workers)):
|
167 |
+
all_latent_cache_paths.extend([item.latent_cache_path for item in batch])
|
168 |
+
|
169 |
+
if args.skip_existing:
|
170 |
+
filtered_batch = [item for item in batch if not os.path.exists(item.latent_cache_path)]
|
171 |
+
if len(filtered_batch) == 0:
|
172 |
+
continue
|
173 |
+
batch = filtered_batch
|
174 |
+
|
175 |
+
bs = args.batch_size if args.batch_size is not None else len(batch)
|
176 |
+
for i in range(0, len(batch), bs):
|
177 |
+
encode(batch[i : i + bs])
|
178 |
+
|
179 |
+
# normalize paths
|
180 |
+
all_latent_cache_paths = [os.path.normpath(p) for p in all_latent_cache_paths]
|
181 |
+
all_latent_cache_paths = set(all_latent_cache_paths)
|
182 |
+
|
183 |
+
# remove old cache files not in the dataset
|
184 |
+
all_cache_files = dataset.get_all_latent_cache_files()
|
185 |
+
for cache_file in all_cache_files:
|
186 |
+
if os.path.normpath(cache_file) not in all_latent_cache_paths:
|
187 |
+
if args.keep_cache:
|
188 |
+
logger.info(f"Keep cache file not in the dataset: {cache_file}")
|
189 |
+
else:
|
190 |
+
os.remove(cache_file)
|
191 |
+
logger.info(f"Removed old cache file: {cache_file}")
|
192 |
+
|
193 |
+
|
194 |
+
def main(args):
|
195 |
+
device = args.device if args.device is not None else "cuda" if torch.cuda.is_available() else "cpu"
|
196 |
+
device = torch.device(device)
|
197 |
+
|
198 |
+
# Load dataset config
|
199 |
+
blueprint_generator = BlueprintGenerator(ConfigSanitizer())
|
200 |
+
logger.info(f"Load dataset config from {args.dataset_config}")
|
201 |
+
user_config = config_utils.load_user_config(args.dataset_config)
|
202 |
+
blueprint = blueprint_generator.generate(user_config, args, architecture=ARCHITECTURE_HUNYUAN_VIDEO)
|
203 |
+
train_dataset_group = config_utils.generate_dataset_group_by_blueprint(blueprint.dataset_group)
|
204 |
+
|
205 |
+
datasets = train_dataset_group.datasets
|
206 |
+
|
207 |
+
if args.debug_mode is not None:
|
208 |
+
show_datasets(datasets, args.debug_mode, args.console_width, args.console_back, args.console_num_images)
|
209 |
+
return
|
210 |
+
|
211 |
+
assert args.vae is not None, "vae checkpoint is required"
|
212 |
+
|
213 |
+
# Load VAE model: HunyuanVideo VAE model is float16
|
214 |
+
vae_dtype = torch.float16 if args.vae_dtype is None else str_to_dtype(args.vae_dtype)
|
215 |
+
vae, _, s_ratio, t_ratio = load_vae(vae_dtype=vae_dtype, device=device, vae_path=args.vae)
|
216 |
+
vae.eval()
|
217 |
+
logger.info(f"Loaded VAE: {vae.config}, dtype: {vae.dtype}")
|
218 |
+
|
219 |
+
if args.vae_chunk_size is not None:
|
220 |
+
vae.set_chunk_size_for_causal_conv_3d(args.vae_chunk_size)
|
221 |
+
logger.info(f"Set chunk_size to {args.vae_chunk_size} for CausalConv3d in VAE")
|
222 |
+
if args.vae_spatial_tile_sample_min_size is not None:
|
223 |
+
vae.enable_spatial_tiling(True)
|
224 |
+
vae.tile_sample_min_size = args.vae_spatial_tile_sample_min_size
|
225 |
+
vae.tile_latent_min_size = args.vae_spatial_tile_sample_min_size // 8
|
226 |
+
elif args.vae_tiling:
|
227 |
+
vae.enable_spatial_tiling(True)
|
228 |
+
|
229 |
+
# Encode images
|
230 |
+
def encode(one_batch: list[ItemInfo]):
|
231 |
+
encode_and_save_batch(vae, one_batch)
|
232 |
+
|
233 |
+
encode_datasets(datasets, encode, args)
|
234 |
+
|
235 |
+
|
236 |
+
def setup_parser_common() -> argparse.ArgumentParser:
|
237 |
+
parser = argparse.ArgumentParser()
|
238 |
+
|
239 |
+
parser.add_argument("--dataset_config", type=str, required=True, help="path to dataset config .toml file")
|
240 |
+
parser.add_argument("--vae", type=str, required=False, default=None, help="path to vae checkpoint")
|
241 |
+
parser.add_argument("--vae_dtype", type=str, default=None, help="data type for VAE, default is float16")
|
242 |
+
parser.add_argument("--device", type=str, default=None, help="device to use, default is cuda if available")
|
243 |
+
parser.add_argument(
|
244 |
+
"--batch_size", type=int, default=None, help="batch size, override dataset config if dataset batch size > this"
|
245 |
+
)
|
246 |
+
parser.add_argument("--num_workers", type=int, default=None, help="number of workers for dataset. default is cpu count-1")
|
247 |
+
parser.add_argument("--skip_existing", action="store_true", help="skip existing cache files")
|
248 |
+
parser.add_argument("--keep_cache", action="store_true", help="keep cache files not in dataset")
|
249 |
+
parser.add_argument("--debug_mode", type=str, default=None, choices=["image", "console"], help="debug mode")
|
250 |
+
parser.add_argument("--console_width", type=int, default=80, help="debug mode: console width")
|
251 |
+
parser.add_argument(
|
252 |
+
"--console_back", type=str, default=None, help="debug mode: console background color, one of ascii_magic.Back"
|
253 |
+
)
|
254 |
+
parser.add_argument(
|
255 |
+
"--console_num_images",
|
256 |
+
type=int,
|
257 |
+
default=None,
|
258 |
+
help="debug mode: not interactive, number of images to show for each dataset",
|
259 |
+
)
|
260 |
+
return parser
|
261 |
+
|
262 |
+
|
263 |
+
def hv_setup_parser(parser: argparse.ArgumentParser) -> argparse.ArgumentParser:
|
264 |
+
parser.add_argument(
|
265 |
+
"--vae_tiling",
|
266 |
+
action="store_true",
|
267 |
+
help="enable spatial tiling for VAE, default is False. If vae_spatial_tile_sample_min_size is set, this is automatically enabled",
|
268 |
+
)
|
269 |
+
parser.add_argument("--vae_chunk_size", type=int, default=None, help="chunk size for CausalConv3d in VAE")
|
270 |
+
parser.add_argument(
|
271 |
+
"--vae_spatial_tile_sample_min_size", type=int, default=None, help="spatial tile sample min size for VAE, default 256"
|
272 |
+
)
|
273 |
+
return parser
|
274 |
+
|
275 |
+
|
276 |
+
if __name__ == "__main__":
|
277 |
+
parser = setup_parser_common()
|
278 |
+
parser = hv_setup_parser(parser)
|
279 |
+
|
280 |
+
args = parser.parse_args()
|
281 |
+
main(args)
|
cache_text_encoder_outputs.py
ADDED
@@ -0,0 +1,214 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import argparse
|
2 |
+
import os
|
3 |
+
from typing import Optional, Union
|
4 |
+
|
5 |
+
import numpy as np
|
6 |
+
import torch
|
7 |
+
from tqdm import tqdm
|
8 |
+
|
9 |
+
from dataset import config_utils
|
10 |
+
from dataset.config_utils import BlueprintGenerator, ConfigSanitizer
|
11 |
+
import accelerate
|
12 |
+
|
13 |
+
from dataset.image_video_dataset import ARCHITECTURE_HUNYUAN_VIDEO, BaseDataset, ItemInfo, save_text_encoder_output_cache
|
14 |
+
from hunyuan_model import text_encoder as text_encoder_module
|
15 |
+
from hunyuan_model.text_encoder import TextEncoder
|
16 |
+
|
17 |
+
import logging
|
18 |
+
|
19 |
+
from utils.model_utils import str_to_dtype
|
20 |
+
|
21 |
+
logger = logging.getLogger(__name__)
|
22 |
+
logging.basicConfig(level=logging.INFO)
|
23 |
+
|
24 |
+
|
25 |
+
def encode_prompt(text_encoder: TextEncoder, prompt: Union[str, list[str]]):
|
26 |
+
data_type = "video" # video only, image is not supported
|
27 |
+
text_inputs = text_encoder.text2tokens(prompt, data_type=data_type)
|
28 |
+
|
29 |
+
with torch.no_grad():
|
30 |
+
prompt_outputs = text_encoder.encode(text_inputs, data_type=data_type)
|
31 |
+
|
32 |
+
return prompt_outputs.hidden_state, prompt_outputs.attention_mask
|
33 |
+
|
34 |
+
|
35 |
+
def encode_and_save_batch(
|
36 |
+
text_encoder: TextEncoder, batch: list[ItemInfo], is_llm: bool, accelerator: Optional[accelerate.Accelerator]
|
37 |
+
):
|
38 |
+
prompts = [item.caption for item in batch]
|
39 |
+
# print(prompts)
|
40 |
+
|
41 |
+
# encode prompt
|
42 |
+
if accelerator is not None:
|
43 |
+
with accelerator.autocast():
|
44 |
+
prompt_embeds, prompt_mask = encode_prompt(text_encoder, prompts)
|
45 |
+
else:
|
46 |
+
prompt_embeds, prompt_mask = encode_prompt(text_encoder, prompts)
|
47 |
+
|
48 |
+
# # convert to fp16 if needed
|
49 |
+
# if prompt_embeds.dtype == torch.float32 and text_encoder.dtype != torch.float32:
|
50 |
+
# prompt_embeds = prompt_embeds.to(text_encoder.dtype)
|
51 |
+
|
52 |
+
# save prompt cache
|
53 |
+
for item, embed, mask in zip(batch, prompt_embeds, prompt_mask):
|
54 |
+
save_text_encoder_output_cache(item, embed, mask, is_llm)
|
55 |
+
|
56 |
+
|
57 |
+
def prepare_cache_files_and_paths(datasets: list[BaseDataset]):
|
58 |
+
all_cache_files_for_dataset = [] # exisiting cache files
|
59 |
+
all_cache_paths_for_dataset = [] # all cache paths in the dataset
|
60 |
+
for dataset in datasets:
|
61 |
+
all_cache_files = [os.path.normpath(file) for file in dataset.get_all_text_encoder_output_cache_files()]
|
62 |
+
all_cache_files = set(all_cache_files)
|
63 |
+
all_cache_files_for_dataset.append(all_cache_files)
|
64 |
+
|
65 |
+
all_cache_paths_for_dataset.append(set())
|
66 |
+
return all_cache_files_for_dataset, all_cache_paths_for_dataset
|
67 |
+
|
68 |
+
|
69 |
+
def process_text_encoder_batches(
|
70 |
+
num_workers: Optional[int],
|
71 |
+
skip_existing: bool,
|
72 |
+
batch_size: int,
|
73 |
+
datasets: list[BaseDataset],
|
74 |
+
all_cache_files_for_dataset: list[set],
|
75 |
+
all_cache_paths_for_dataset: list[set],
|
76 |
+
encode: callable,
|
77 |
+
):
|
78 |
+
num_workers = num_workers if num_workers is not None else max(1, os.cpu_count() - 1)
|
79 |
+
for i, dataset in enumerate(datasets):
|
80 |
+
logger.info(f"Encoding dataset [{i}]")
|
81 |
+
all_cache_files = all_cache_files_for_dataset[i]
|
82 |
+
all_cache_paths = all_cache_paths_for_dataset[i]
|
83 |
+
for batch in tqdm(dataset.retrieve_text_encoder_output_cache_batches(num_workers)):
|
84 |
+
# update cache files (it's ok if we update it multiple times)
|
85 |
+
all_cache_paths.update([os.path.normpath(item.text_encoder_output_cache_path) for item in batch])
|
86 |
+
|
87 |
+
# skip existing cache files
|
88 |
+
if skip_existing:
|
89 |
+
filtered_batch = [
|
90 |
+
item for item in batch if not os.path.normpath(item.text_encoder_output_cache_path) in all_cache_files
|
91 |
+
]
|
92 |
+
# print(f"Filtered {len(batch) - len(filtered_batch)} existing cache files")
|
93 |
+
if len(filtered_batch) == 0:
|
94 |
+
continue
|
95 |
+
batch = filtered_batch
|
96 |
+
|
97 |
+
bs = batch_size if batch_size is not None else len(batch)
|
98 |
+
for i in range(0, len(batch), bs):
|
99 |
+
encode(batch[i : i + bs])
|
100 |
+
|
101 |
+
|
102 |
+
def post_process_cache_files(
|
103 |
+
datasets: list[BaseDataset], all_cache_files_for_dataset: list[set], all_cache_paths_for_dataset: list[set]
|
104 |
+
):
|
105 |
+
for i, dataset in enumerate(datasets):
|
106 |
+
all_cache_files = all_cache_files_for_dataset[i]
|
107 |
+
all_cache_paths = all_cache_paths_for_dataset[i]
|
108 |
+
for cache_file in all_cache_files:
|
109 |
+
if cache_file not in all_cache_paths:
|
110 |
+
if args.keep_cache:
|
111 |
+
logger.info(f"Keep cache file not in the dataset: {cache_file}")
|
112 |
+
else:
|
113 |
+
os.remove(cache_file)
|
114 |
+
logger.info(f"Removed old cache file: {cache_file}")
|
115 |
+
|
116 |
+
|
117 |
+
def main(args):
|
118 |
+
device = args.device if args.device is not None else "cuda" if torch.cuda.is_available() else "cpu"
|
119 |
+
device = torch.device(device)
|
120 |
+
|
121 |
+
# Load dataset config
|
122 |
+
blueprint_generator = BlueprintGenerator(ConfigSanitizer())
|
123 |
+
logger.info(f"Load dataset config from {args.dataset_config}")
|
124 |
+
user_config = config_utils.load_user_config(args.dataset_config)
|
125 |
+
blueprint = blueprint_generator.generate(user_config, args, architecture=ARCHITECTURE_HUNYUAN_VIDEO)
|
126 |
+
train_dataset_group = config_utils.generate_dataset_group_by_blueprint(blueprint.dataset_group)
|
127 |
+
|
128 |
+
datasets = train_dataset_group.datasets
|
129 |
+
|
130 |
+
# define accelerator for fp8 inference
|
131 |
+
accelerator = None
|
132 |
+
if args.fp8_llm:
|
133 |
+
accelerator = accelerate.Accelerator(mixed_precision="fp16")
|
134 |
+
|
135 |
+
# prepare cache files and paths: all_cache_files_for_dataset = exisiting cache files, all_cache_paths_for_dataset = all cache paths in the dataset
|
136 |
+
all_cache_files_for_dataset, all_cache_paths_for_dataset = prepare_cache_files_and_paths(datasets)
|
137 |
+
|
138 |
+
# Load Text Encoder 1
|
139 |
+
text_encoder_dtype = torch.float16 if args.text_encoder_dtype is None else str_to_dtype(args.text_encoder_dtype)
|
140 |
+
logger.info(f"loading text encoder 1: {args.text_encoder1}")
|
141 |
+
text_encoder_1 = text_encoder_module.load_text_encoder_1(args.text_encoder1, device, args.fp8_llm, text_encoder_dtype)
|
142 |
+
text_encoder_1.to(device=device)
|
143 |
+
|
144 |
+
# Encode with Text Encoder 1 (LLM)
|
145 |
+
logger.info("Encoding with Text Encoder 1")
|
146 |
+
|
147 |
+
def encode_for_text_encoder_1(batch: list[ItemInfo]):
|
148 |
+
encode_and_save_batch(text_encoder_1, batch, is_llm=True, accelerator=accelerator)
|
149 |
+
|
150 |
+
process_text_encoder_batches(
|
151 |
+
args.num_workers,
|
152 |
+
args.skip_existing,
|
153 |
+
args.batch_size,
|
154 |
+
datasets,
|
155 |
+
all_cache_files_for_dataset,
|
156 |
+
all_cache_paths_for_dataset,
|
157 |
+
encode_for_text_encoder_1,
|
158 |
+
)
|
159 |
+
del text_encoder_1
|
160 |
+
|
161 |
+
# Load Text Encoder 2
|
162 |
+
logger.info(f"loading text encoder 2: {args.text_encoder2}")
|
163 |
+
text_encoder_2 = text_encoder_module.load_text_encoder_2(args.text_encoder2, device, text_encoder_dtype)
|
164 |
+
text_encoder_2.to(device=device)
|
165 |
+
|
166 |
+
# Encode with Text Encoder 2
|
167 |
+
logger.info("Encoding with Text Encoder 2")
|
168 |
+
|
169 |
+
def encode_for_text_encoder_2(batch: list[ItemInfo]):
|
170 |
+
encode_and_save_batch(text_encoder_2, batch, is_llm=False, accelerator=None)
|
171 |
+
|
172 |
+
process_text_encoder_batches(
|
173 |
+
args.num_workers,
|
174 |
+
args.skip_existing,
|
175 |
+
args.batch_size,
|
176 |
+
datasets,
|
177 |
+
all_cache_files_for_dataset,
|
178 |
+
all_cache_paths_for_dataset,
|
179 |
+
encode_for_text_encoder_2,
|
180 |
+
)
|
181 |
+
del text_encoder_2
|
182 |
+
|
183 |
+
# remove cache files not in dataset
|
184 |
+
post_process_cache_files(datasets, all_cache_files_for_dataset, all_cache_paths_for_dataset)
|
185 |
+
|
186 |
+
|
187 |
+
def setup_parser_common():
|
188 |
+
parser = argparse.ArgumentParser()
|
189 |
+
|
190 |
+
parser.add_argument("--dataset_config", type=str, required=True, help="path to dataset config .toml file")
|
191 |
+
parser.add_argument("--device", type=str, default=None, help="device to use, default is cuda if available")
|
192 |
+
parser.add_argument(
|
193 |
+
"--batch_size", type=int, default=None, help="batch size, override dataset config if dataset batch size > this"
|
194 |
+
)
|
195 |
+
parser.add_argument("--num_workers", type=int, default=None, help="number of workers for dataset. default is cpu count-1")
|
196 |
+
parser.add_argument("--skip_existing", action="store_true", help="skip existing cache files")
|
197 |
+
parser.add_argument("--keep_cache", action="store_true", help="keep cache files not in dataset")
|
198 |
+
return parser
|
199 |
+
|
200 |
+
|
201 |
+
def hv_setup_parser(parser: argparse.ArgumentParser) -> argparse.ArgumentParser:
|
202 |
+
parser.add_argument("--text_encoder1", type=str, required=True, help="Text Encoder 1 directory")
|
203 |
+
parser.add_argument("--text_encoder2", type=str, required=True, help="Text Encoder 2 directory")
|
204 |
+
parser.add_argument("--text_encoder_dtype", type=str, default=None, help="data type for Text Encoder, default is float16")
|
205 |
+
parser.add_argument("--fp8_llm", action="store_true", help="use fp8 for Text Encoder 1 (LLM)")
|
206 |
+
return parser
|
207 |
+
|
208 |
+
|
209 |
+
if __name__ == "__main__":
|
210 |
+
parser = setup_parser_common()
|
211 |
+
parser = hv_setup_parser(parser)
|
212 |
+
|
213 |
+
args = parser.parse_args()
|
214 |
+
main(args)
|
convert_lora.py
ADDED
@@ -0,0 +1,131 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import argparse
|
2 |
+
import torch
|
3 |
+
from safetensors.torch import load_file, save_file
|
4 |
+
from safetensors import safe_open
|
5 |
+
from utils import model_utils
|
6 |
+
import logging
|
7 |
+
|
8 |
+
logger = logging.getLogger(__name__)
|
9 |
+
logging.basicConfig(level=logging.INFO)
|
10 |
+
|
11 |
+
|
12 |
+
def convert_from_diffusers(prefix, weights_sd):
|
13 |
+
# convert from diffusers(?) to default LoRA
|
14 |
+
# Diffusers format: {"diffusion_model.module.name.lora_A.weight": weight, "diffusion_model.module.name.lora_B.weight": weight, ...}
|
15 |
+
# default LoRA format: {"prefix_module_name.lora_down.weight": weight, "prefix_module_name.lora_up.weight": weight, ...}
|
16 |
+
|
17 |
+
# note: Diffusers has no alpha, so alpha is set to rank
|
18 |
+
new_weights_sd = {}
|
19 |
+
lora_dims = {}
|
20 |
+
for key, weight in weights_sd.items():
|
21 |
+
diffusers_prefix, key_body = key.split(".", 1)
|
22 |
+
if diffusers_prefix != "diffusion_model" and diffusers_prefix != "transformer":
|
23 |
+
logger.warning(f"unexpected key: {key} in diffusers format")
|
24 |
+
continue
|
25 |
+
|
26 |
+
new_key = f"{prefix}{key_body}".replace(".", "_").replace("_lora_A_", ".lora_down.").replace("_lora_B_", ".lora_up.")
|
27 |
+
new_weights_sd[new_key] = weight
|
28 |
+
|
29 |
+
lora_name = new_key.split(".")[0] # before first dot
|
30 |
+
if lora_name not in lora_dims and "lora_down" in new_key:
|
31 |
+
lora_dims[lora_name] = weight.shape[0]
|
32 |
+
|
33 |
+
# add alpha with rank
|
34 |
+
for lora_name, dim in lora_dims.items():
|
35 |
+
new_weights_sd[f"{lora_name}.alpha"] = torch.tensor(dim)
|
36 |
+
return new_weights_sd
|
37 |
+
|
38 |
+
|
39 |
+
def convert_to_diffusers(prefix, weights_sd):
|
40 |
+
# convert from default LoRA to diffusers
|
41 |
+
|
42 |
+
# get alphas
|
43 |
+
lora_alphas = {}
|
44 |
+
for key, weight in weights_sd.items():
|
45 |
+
if key.startswith(prefix):
|
46 |
+
lora_name = key.split(".", 1)[0] # before first dot
|
47 |
+
if lora_name not in lora_alphas and "alpha" in key:
|
48 |
+
lora_alphas[lora_name] = weight
|
49 |
+
|
50 |
+
new_weights_sd = {}
|
51 |
+
for key, weight in weights_sd.items():
|
52 |
+
if key.startswith(prefix):
|
53 |
+
if "alpha" in key:
|
54 |
+
continue
|
55 |
+
|
56 |
+
lora_name = key.split(".", 1)[0] # before first dot
|
57 |
+
|
58 |
+
module_name = lora_name[len(prefix) :] # remove prefix
|
59 |
+
module_name = module_name.replace("_", ".") # replace "_" with "."
|
60 |
+
if ".cross.attn." in module_name or ".self.attn." in module_name:
|
61 |
+
# Wan2.1 lora name to module name: ugly but works
|
62 |
+
module_name = module_name.replace("cross.attn", "cross_attn")
|
63 |
+
module_name = module_name.replace("self.attn", "self_attn")
|
64 |
+
module_name = module_name.replace("k.img", "k_img")
|
65 |
+
module_name = module_name.replace("v.img", "v_img")
|
66 |
+
else:
|
67 |
+
# HunyuanVideo lora name to module name: ugly but works
|
68 |
+
module_name = module_name.replace("double.blocks.", "double_blocks.")
|
69 |
+
module_name = module_name.replace("single.blocks.", "single_blocks.")
|
70 |
+
module_name = module_name.replace("img.", "img_")
|
71 |
+
module_name = module_name.replace("txt.", "txt_")
|
72 |
+
module_name = module_name.replace("attn.", "attn_")
|
73 |
+
diffusers_prefix = "diffusion_model"
|
74 |
+
if "lora_down" in key:
|
75 |
+
new_key = f"{diffusers_prefix}.{module_name}.lora_A.weight"
|
76 |
+
dim = weight.shape[0]
|
77 |
+
elif "lora_up" in key:
|
78 |
+
new_key = f"{diffusers_prefix}.{module_name}.lora_B.weight"
|
79 |
+
dim = weight.shape[1]
|
80 |
+
else:
|
81 |
+
logger.warning(f"unexpected key: {key} in default LoRA format")
|
82 |
+
continue
|
83 |
+
|
84 |
+
# scale weight by alpha using float16
|
85 |
+
if lora_name in lora_alphas:
|
86 |
+
scale = lora_alphas[lora_name].half() / dim
|
87 |
+
scale = scale.sqrt()
|
88 |
+
weight = weight.half() * scale
|
89 |
+
else:
|
90 |
+
logger.warning(f"missing alpha for {lora_name}")
|
91 |
+
|
92 |
+
new_weights_sd[new_key] = weight
|
93 |
+
|
94 |
+
return new_weights_sd
|
95 |
+
|
96 |
+
|
97 |
+
def convert(input_file, output_file, target_format):
|
98 |
+
logger.info(f"loading {input_file}")
|
99 |
+
weights_sd = load_file(input_file)
|
100 |
+
with safe_open(input_file, framework="pt") as f:
|
101 |
+
metadata = f.metadata()
|
102 |
+
|
103 |
+
logger.info(f"converting to {target_format}")
|
104 |
+
prefix = "lora_unet_"
|
105 |
+
if target_format == "default":
|
106 |
+
new_weights_sd = convert_from_diffusers(prefix, weights_sd)
|
107 |
+
metadata = metadata or {}
|
108 |
+
model_utils.precalculate_safetensors_hashes(new_weights_sd, metadata)
|
109 |
+
elif target_format == "other":
|
110 |
+
new_weights_sd = convert_to_diffusers(prefix, weights_sd)
|
111 |
+
else:
|
112 |
+
raise ValueError(f"unknown target format: {target_format}")
|
113 |
+
|
114 |
+
logger.info(f"saving to {output_file}")
|
115 |
+
save_file(new_weights_sd, output_file, metadata=metadata)
|
116 |
+
|
117 |
+
logger.info("done")
|
118 |
+
|
119 |
+
|
120 |
+
def parse_args():
|
121 |
+
parser = argparse.ArgumentParser(description="Convert LoRA weights between default and other formats")
|
122 |
+
parser.add_argument("--input", type=str, required=True, help="input model file")
|
123 |
+
parser.add_argument("--output", type=str, required=True, help="output model file")
|
124 |
+
parser.add_argument("--target", type=str, required=True, choices=["other", "default"], help="target format")
|
125 |
+
args = parser.parse_args()
|
126 |
+
return args
|
127 |
+
|
128 |
+
|
129 |
+
if __name__ == "__main__":
|
130 |
+
args = parse_args()
|
131 |
+
convert(args.input, args.output, args.target)
|
dataset/__init__.py
ADDED
File without changes
|
dataset/config_utils.py
ADDED
@@ -0,0 +1,372 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import argparse
|
2 |
+
from dataclasses import (
|
3 |
+
asdict,
|
4 |
+
dataclass,
|
5 |
+
)
|
6 |
+
import functools
|
7 |
+
import random
|
8 |
+
from textwrap import dedent, indent
|
9 |
+
import json
|
10 |
+
from pathlib import Path
|
11 |
+
|
12 |
+
# from toolz import curry
|
13 |
+
from typing import Dict, List, Optional, Sequence, Tuple, Union
|
14 |
+
|
15 |
+
import toml
|
16 |
+
import voluptuous
|
17 |
+
from voluptuous import Any, ExactSequence, MultipleInvalid, Object, Schema
|
18 |
+
|
19 |
+
from .image_video_dataset import DatasetGroup, ImageDataset, VideoDataset
|
20 |
+
|
21 |
+
import logging
|
22 |
+
|
23 |
+
logger = logging.getLogger(__name__)
|
24 |
+
logging.basicConfig(level=logging.INFO)
|
25 |
+
|
26 |
+
|
27 |
+
@dataclass
|
28 |
+
class BaseDatasetParams:
|
29 |
+
resolution: Tuple[int, int] = (960, 544)
|
30 |
+
enable_bucket: bool = False
|
31 |
+
bucket_no_upscale: bool = False
|
32 |
+
caption_extension: Optional[str] = None
|
33 |
+
batch_size: int = 1
|
34 |
+
num_repeats: int = 1
|
35 |
+
cache_directory: Optional[str] = None
|
36 |
+
debug_dataset: bool = False
|
37 |
+
architecture: str = "no_default" # short style like "hv" or "wan"
|
38 |
+
|
39 |
+
|
40 |
+
@dataclass
|
41 |
+
class ImageDatasetParams(BaseDatasetParams):
|
42 |
+
image_directory: Optional[str] = None
|
43 |
+
image_jsonl_file: Optional[str] = None
|
44 |
+
|
45 |
+
|
46 |
+
@dataclass
|
47 |
+
class VideoDatasetParams(BaseDatasetParams):
|
48 |
+
video_directory: Optional[str] = None
|
49 |
+
video_jsonl_file: Optional[str] = None
|
50 |
+
target_frames: Sequence[int] = (1,)
|
51 |
+
frame_extraction: Optional[str] = "head"
|
52 |
+
frame_stride: Optional[int] = 1
|
53 |
+
frame_sample: Optional[int] = 1
|
54 |
+
|
55 |
+
|
56 |
+
@dataclass
|
57 |
+
class DatasetBlueprint:
|
58 |
+
is_image_dataset: bool
|
59 |
+
params: Union[ImageDatasetParams, VideoDatasetParams]
|
60 |
+
|
61 |
+
|
62 |
+
@dataclass
|
63 |
+
class DatasetGroupBlueprint:
|
64 |
+
datasets: Sequence[DatasetBlueprint]
|
65 |
+
|
66 |
+
|
67 |
+
@dataclass
|
68 |
+
class Blueprint:
|
69 |
+
dataset_group: DatasetGroupBlueprint
|
70 |
+
|
71 |
+
|
72 |
+
class ConfigSanitizer:
|
73 |
+
# @curry
|
74 |
+
@staticmethod
|
75 |
+
def __validate_and_convert_twodim(klass, value: Sequence) -> Tuple:
|
76 |
+
Schema(ExactSequence([klass, klass]))(value)
|
77 |
+
return tuple(value)
|
78 |
+
|
79 |
+
# @curry
|
80 |
+
@staticmethod
|
81 |
+
def __validate_and_convert_scalar_or_twodim(klass, value: Union[float, Sequence]) -> Tuple:
|
82 |
+
Schema(Any(klass, ExactSequence([klass, klass])))(value)
|
83 |
+
try:
|
84 |
+
Schema(klass)(value)
|
85 |
+
return (value, value)
|
86 |
+
except:
|
87 |
+
return ConfigSanitizer.__validate_and_convert_twodim(klass, value)
|
88 |
+
|
89 |
+
# datasets schema
|
90 |
+
DATASET_ASCENDABLE_SCHEMA = {
|
91 |
+
"caption_extension": str,
|
92 |
+
"batch_size": int,
|
93 |
+
"num_repeats": int,
|
94 |
+
"resolution": functools.partial(__validate_and_convert_scalar_or_twodim.__func__, int),
|
95 |
+
"enable_bucket": bool,
|
96 |
+
"bucket_no_upscale": bool,
|
97 |
+
}
|
98 |
+
IMAGE_DATASET_DISTINCT_SCHEMA = {
|
99 |
+
"image_directory": str,
|
100 |
+
"image_jsonl_file": str,
|
101 |
+
"cache_directory": str,
|
102 |
+
}
|
103 |
+
VIDEO_DATASET_DISTINCT_SCHEMA = {
|
104 |
+
"video_directory": str,
|
105 |
+
"video_jsonl_file": str,
|
106 |
+
"target_frames": [int],
|
107 |
+
"frame_extraction": str,
|
108 |
+
"frame_stride": int,
|
109 |
+
"frame_sample": int,
|
110 |
+
"cache_directory": str,
|
111 |
+
}
|
112 |
+
|
113 |
+
# options handled by argparse but not handled by user config
|
114 |
+
ARGPARSE_SPECIFIC_SCHEMA = {
|
115 |
+
"debug_dataset": bool,
|
116 |
+
}
|
117 |
+
|
118 |
+
def __init__(self) -> None:
|
119 |
+
self.image_dataset_schema = self.__merge_dict(
|
120 |
+
self.DATASET_ASCENDABLE_SCHEMA,
|
121 |
+
self.IMAGE_DATASET_DISTINCT_SCHEMA,
|
122 |
+
)
|
123 |
+
self.video_dataset_schema = self.__merge_dict(
|
124 |
+
self.DATASET_ASCENDABLE_SCHEMA,
|
125 |
+
self.VIDEO_DATASET_DISTINCT_SCHEMA,
|
126 |
+
)
|
127 |
+
|
128 |
+
def validate_flex_dataset(dataset_config: dict):
|
129 |
+
if "target_frames" in dataset_config:
|
130 |
+
return Schema(self.video_dataset_schema)(dataset_config)
|
131 |
+
else:
|
132 |
+
return Schema(self.image_dataset_schema)(dataset_config)
|
133 |
+
|
134 |
+
self.dataset_schema = validate_flex_dataset
|
135 |
+
|
136 |
+
self.general_schema = self.__merge_dict(
|
137 |
+
self.DATASET_ASCENDABLE_SCHEMA,
|
138 |
+
)
|
139 |
+
self.user_config_validator = Schema(
|
140 |
+
{
|
141 |
+
"general": self.general_schema,
|
142 |
+
"datasets": [self.dataset_schema],
|
143 |
+
}
|
144 |
+
)
|
145 |
+
self.argparse_schema = self.__merge_dict(
|
146 |
+
self.ARGPARSE_SPECIFIC_SCHEMA,
|
147 |
+
)
|
148 |
+
self.argparse_config_validator = Schema(Object(self.argparse_schema), extra=voluptuous.ALLOW_EXTRA)
|
149 |
+
|
150 |
+
def sanitize_user_config(self, user_config: dict) -> dict:
|
151 |
+
try:
|
152 |
+
return self.user_config_validator(user_config)
|
153 |
+
except MultipleInvalid:
|
154 |
+
# TODO: clarify the error message
|
155 |
+
logger.error("Invalid user config / ユーザ設定の形式が正しくないようです")
|
156 |
+
raise
|
157 |
+
|
158 |
+
# NOTE: In nature, argument parser result is not needed to be sanitize
|
159 |
+
# However this will help us to detect program bug
|
160 |
+
def sanitize_argparse_namespace(self, argparse_namespace: argparse.Namespace) -> argparse.Namespace:
|
161 |
+
try:
|
162 |
+
return self.argparse_config_validator(argparse_namespace)
|
163 |
+
except MultipleInvalid:
|
164 |
+
# XXX: this should be a bug
|
165 |
+
logger.error(
|
166 |
+
"Invalid cmdline parsed arguments. This should be a bug. / コマンドラインのパース結果が正しくないようです。プログラムのバグの可能性が高いです。"
|
167 |
+
)
|
168 |
+
raise
|
169 |
+
|
170 |
+
# NOTE: value would be overwritten by latter dict if there is already the same key
|
171 |
+
@staticmethod
|
172 |
+
def __merge_dict(*dict_list: dict) -> dict:
|
173 |
+
merged = {}
|
174 |
+
for schema in dict_list:
|
175 |
+
# merged |= schema
|
176 |
+
for k, v in schema.items():
|
177 |
+
merged[k] = v
|
178 |
+
return merged
|
179 |
+
|
180 |
+
|
181 |
+
class BlueprintGenerator:
|
182 |
+
BLUEPRINT_PARAM_NAME_TO_CONFIG_OPTNAME = {}
|
183 |
+
|
184 |
+
def __init__(self, sanitizer: ConfigSanitizer):
|
185 |
+
self.sanitizer = sanitizer
|
186 |
+
|
187 |
+
# runtime_params is for parameters which is only configurable on runtime, such as tokenizer
|
188 |
+
def generate(self, user_config: dict, argparse_namespace: argparse.Namespace, **runtime_params) -> Blueprint:
|
189 |
+
sanitized_user_config = self.sanitizer.sanitize_user_config(user_config)
|
190 |
+
sanitized_argparse_namespace = self.sanitizer.sanitize_argparse_namespace(argparse_namespace)
|
191 |
+
|
192 |
+
argparse_config = {k: v for k, v in vars(sanitized_argparse_namespace).items() if v is not None}
|
193 |
+
general_config = sanitized_user_config.get("general", {})
|
194 |
+
|
195 |
+
dataset_blueprints = []
|
196 |
+
for dataset_config in sanitized_user_config.get("datasets", []):
|
197 |
+
is_image_dataset = "target_frames" not in dataset_config
|
198 |
+
if is_image_dataset:
|
199 |
+
dataset_params_klass = ImageDatasetParams
|
200 |
+
else:
|
201 |
+
dataset_params_klass = VideoDatasetParams
|
202 |
+
|
203 |
+
params = self.generate_params_by_fallbacks(
|
204 |
+
dataset_params_klass, [dataset_config, general_config, argparse_config, runtime_params]
|
205 |
+
)
|
206 |
+
dataset_blueprints.append(DatasetBlueprint(is_image_dataset, params))
|
207 |
+
|
208 |
+
dataset_group_blueprint = DatasetGroupBlueprint(dataset_blueprints)
|
209 |
+
|
210 |
+
return Blueprint(dataset_group_blueprint)
|
211 |
+
|
212 |
+
@staticmethod
|
213 |
+
def generate_params_by_fallbacks(param_klass, fallbacks: Sequence[dict]):
|
214 |
+
name_map = BlueprintGenerator.BLUEPRINT_PARAM_NAME_TO_CONFIG_OPTNAME
|
215 |
+
search_value = BlueprintGenerator.search_value
|
216 |
+
default_params = asdict(param_klass())
|
217 |
+
param_names = default_params.keys()
|
218 |
+
|
219 |
+
params = {name: search_value(name_map.get(name, name), fallbacks, default_params.get(name)) for name in param_names}
|
220 |
+
|
221 |
+
return param_klass(**params)
|
222 |
+
|
223 |
+
@staticmethod
|
224 |
+
def search_value(key: str, fallbacks: Sequence[dict], default_value=None):
|
225 |
+
for cand in fallbacks:
|
226 |
+
value = cand.get(key)
|
227 |
+
if value is not None:
|
228 |
+
return value
|
229 |
+
|
230 |
+
return default_value
|
231 |
+
|
232 |
+
|
233 |
+
# if training is True, it will return a dataset group for training, otherwise for caching
|
234 |
+
def generate_dataset_group_by_blueprint(dataset_group_blueprint: DatasetGroupBlueprint, training: bool = False) -> DatasetGroup:
|
235 |
+
datasets: List[Union[ImageDataset, VideoDataset]] = []
|
236 |
+
|
237 |
+
for dataset_blueprint in dataset_group_blueprint.datasets:
|
238 |
+
if dataset_blueprint.is_image_dataset:
|
239 |
+
dataset_klass = ImageDataset
|
240 |
+
else:
|
241 |
+
dataset_klass = VideoDataset
|
242 |
+
|
243 |
+
dataset = dataset_klass(**asdict(dataset_blueprint.params))
|
244 |
+
datasets.append(dataset)
|
245 |
+
|
246 |
+
# assertion
|
247 |
+
cache_directories = [dataset.cache_directory for dataset in datasets]
|
248 |
+
num_of_unique_cache_directories = len(set(cache_directories))
|
249 |
+
if num_of_unique_cache_directories != len(cache_directories):
|
250 |
+
raise ValueError(
|
251 |
+
"cache directory should be unique for each dataset (note that cache directory is image/video directory if not specified)"
|
252 |
+
+ " / cache directory は各データセットごとに異なる必要があります(指定されていない場合はimage/video directoryが使われるので注意)"
|
253 |
+
)
|
254 |
+
|
255 |
+
# print info
|
256 |
+
info = ""
|
257 |
+
for i, dataset in enumerate(datasets):
|
258 |
+
is_image_dataset = isinstance(dataset, ImageDataset)
|
259 |
+
info += dedent(
|
260 |
+
f"""\
|
261 |
+
[Dataset {i}]
|
262 |
+
is_image_dataset: {is_image_dataset}
|
263 |
+
resolution: {dataset.resolution}
|
264 |
+
batch_size: {dataset.batch_size}
|
265 |
+
num_repeats: {dataset.num_repeats}
|
266 |
+
caption_extension: "{dataset.caption_extension}"
|
267 |
+
enable_bucket: {dataset.enable_bucket}
|
268 |
+
bucket_no_upscale: {dataset.bucket_no_upscale}
|
269 |
+
cache_directory: "{dataset.cache_directory}"
|
270 |
+
debug_dataset: {dataset.debug_dataset}
|
271 |
+
"""
|
272 |
+
)
|
273 |
+
|
274 |
+
if is_image_dataset:
|
275 |
+
info += indent(
|
276 |
+
dedent(
|
277 |
+
f"""\
|
278 |
+
image_directory: "{dataset.image_directory}"
|
279 |
+
image_jsonl_file: "{dataset.image_jsonl_file}"
|
280 |
+
\n"""
|
281 |
+
),
|
282 |
+
" ",
|
283 |
+
)
|
284 |
+
else:
|
285 |
+
info += indent(
|
286 |
+
dedent(
|
287 |
+
f"""\
|
288 |
+
video_directory: "{dataset.video_directory}"
|
289 |
+
video_jsonl_file: "{dataset.video_jsonl_file}"
|
290 |
+
target_frames: {dataset.target_frames}
|
291 |
+
frame_extraction: {dataset.frame_extraction}
|
292 |
+
frame_stride: {dataset.frame_stride}
|
293 |
+
frame_sample: {dataset.frame_sample}
|
294 |
+
\n"""
|
295 |
+
),
|
296 |
+
" ",
|
297 |
+
)
|
298 |
+
logger.info(f"{info}")
|
299 |
+
|
300 |
+
# make buckets first because it determines the length of dataset
|
301 |
+
# and set the same seed for all datasets
|
302 |
+
seed = random.randint(0, 2**31) # actual seed is seed + epoch_no
|
303 |
+
for i, dataset in enumerate(datasets):
|
304 |
+
# logger.info(f"[Dataset {i}]")
|
305 |
+
dataset.set_seed(seed)
|
306 |
+
if training:
|
307 |
+
dataset.prepare_for_training()
|
308 |
+
|
309 |
+
return DatasetGroup(datasets)
|
310 |
+
|
311 |
+
|
312 |
+
def load_user_config(file: str) -> dict:
|
313 |
+
file: Path = Path(file)
|
314 |
+
if not file.is_file():
|
315 |
+
raise ValueError(f"file not found / ファイルが見つかりません: {file}")
|
316 |
+
|
317 |
+
if file.name.lower().endswith(".json"):
|
318 |
+
try:
|
319 |
+
with open(file, "r", encoding="utf-8") as f:
|
320 |
+
config = json.load(f)
|
321 |
+
except Exception:
|
322 |
+
logger.error(
|
323 |
+
f"Error on parsing JSON config file. Please check the format. / JSON 形式の設定ファイルの読み込みに失敗しました。文法が正しいか確認してください。: {file}"
|
324 |
+
)
|
325 |
+
raise
|
326 |
+
elif file.name.lower().endswith(".toml"):
|
327 |
+
try:
|
328 |
+
config = toml.load(file)
|
329 |
+
except Exception:
|
330 |
+
logger.error(
|
331 |
+
f"Error on parsing TOML config file. Please check the format. / TOML 形式の設定ファイルの読み込みに失敗しました。文法が正しいか確認してください。: {file}"
|
332 |
+
)
|
333 |
+
raise
|
334 |
+
else:
|
335 |
+
raise ValueError(f"not supported config file format / 対応していない設定ファイルの形式です: {file}")
|
336 |
+
|
337 |
+
return config
|
338 |
+
|
339 |
+
|
340 |
+
# for config test
|
341 |
+
if __name__ == "__main__":
|
342 |
+
parser = argparse.ArgumentParser()
|
343 |
+
parser.add_argument("dataset_config")
|
344 |
+
config_args, remain = parser.parse_known_args()
|
345 |
+
|
346 |
+
parser = argparse.ArgumentParser()
|
347 |
+
parser.add_argument("--debug_dataset", action="store_true")
|
348 |
+
argparse_namespace = parser.parse_args(remain)
|
349 |
+
|
350 |
+
logger.info("[argparse_namespace]")
|
351 |
+
logger.info(f"{vars(argparse_namespace)}")
|
352 |
+
|
353 |
+
user_config = load_user_config(config_args.dataset_config)
|
354 |
+
|
355 |
+
logger.info("")
|
356 |
+
logger.info("[user_config]")
|
357 |
+
logger.info(f"{user_config}")
|
358 |
+
|
359 |
+
sanitizer = ConfigSanitizer()
|
360 |
+
sanitized_user_config = sanitizer.sanitize_user_config(user_config)
|
361 |
+
|
362 |
+
logger.info("")
|
363 |
+
logger.info("[sanitized_user_config]")
|
364 |
+
logger.info(f"{sanitized_user_config}")
|
365 |
+
|
366 |
+
blueprint = BlueprintGenerator(sanitizer).generate(user_config, argparse_namespace)
|
367 |
+
|
368 |
+
logger.info("")
|
369 |
+
logger.info("[blueprint]")
|
370 |
+
logger.info(f"{blueprint}")
|
371 |
+
|
372 |
+
dataset_group = generate_dataset_group_by_blueprint(blueprint.dataset_group)
|
dataset/dataset_config.md
ADDED
@@ -0,0 +1,378 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
> 📝 Click on the language section to expand / 言語をクリックして展開
|
2 |
+
|
3 |
+
## Dataset Configuration
|
4 |
+
|
5 |
+
Please create a TOML file for dataset configuration.
|
6 |
+
|
7 |
+
Image and video datasets are supported. The configuration file can include multiple datasets, either image or video datasets, with caption text files or metadata JSONL files.
|
8 |
+
|
9 |
+
The cache directory must be different for each dataset.
|
10 |
+
|
11 |
+
<details>
|
12 |
+
<summary>日本語</summary>
|
13 |
+
|
14 |
+
データセットの設定を行うためのTOMLファイルを作成してください。
|
15 |
+
|
16 |
+
画像データセットと動画データセットがサポートされています。設定ファイルには、画像または動画データセットを複数含めることができます。キャプションテキストファイルまたはメタデータJSONLファイルを使用できます。
|
17 |
+
|
18 |
+
キャッシュディレクトリは、各データセットごとに異なるディレクトリである必要があります。
|
19 |
+
</details>
|
20 |
+
|
21 |
+
### Sample for Image Dataset with Caption Text Files
|
22 |
+
|
23 |
+
```toml
|
24 |
+
# resolution, caption_extension, batch_size, num_repeats, enable_bucket, bucket_no_upscale should be set in either general or datasets
|
25 |
+
# otherwise, the default values will be used for each item
|
26 |
+
|
27 |
+
# general configurations
|
28 |
+
[general]
|
29 |
+
resolution = [960, 544]
|
30 |
+
caption_extension = ".txt"
|
31 |
+
batch_size = 1
|
32 |
+
enable_bucket = true
|
33 |
+
bucket_no_upscale = false
|
34 |
+
|
35 |
+
[[datasets]]
|
36 |
+
image_directory = "/path/to/image_dir"
|
37 |
+
cache_directory = "/path/to/cache_directory"
|
38 |
+
num_repeats = 1 # optional, default is 1. Number of times to repeat the dataset. Useful to balance the multiple datasets with different sizes.
|
39 |
+
|
40 |
+
# other datasets can be added here. each dataset can have different configurations
|
41 |
+
```
|
42 |
+
|
43 |
+
`cache_directory` is optional, default is None to use the same directory as the image directory. However, we recommend to set the cache directory to avoid accidental sharing of the cache files between different datasets.
|
44 |
+
|
45 |
+
`num_repeats` is also available. It is optional, default is 1 (no repeat). It repeats the images (or videos) that many times to expand the dataset. For example, if `num_repeats = 2` and there are 20 images in the dataset, each image will be duplicated twice (with the same caption) to have a total of 40 images. It is useful to balance the multiple datasets with different sizes.
|
46 |
+
|
47 |
+
<details>
|
48 |
+
<summary>日本語</summary>
|
49 |
+
|
50 |
+
`cache_directory` はオプションです。デフォルトは画像ディレクトリと同じディレクトリに設定されます。ただし、異なるデータセット間でキャッシュファイルが共有されるのを防ぐために、明示的に別のキャッシュディレクトリを設定することをお勧めします。
|
51 |
+
|
52 |
+
`num_repeats` はオプションで、デフォルトは 1 です(繰り返しなし)。画像(や動画)を、その回数だけ単純に繰り返してデータセットを拡張します。たとえば`num_repeats = 2`としたとき、画像20枚のデータセットなら、各画像が2枚ずつ(同一のキャプションで)計40枚存在した場合と同じになります。異なるデータ数のデータセット間でバランスを取るために使用可能です。
|
53 |
+
|
54 |
+
resolution, caption_extension, batch_size, num_repeats, enable_bucket, bucket_no_upscale は general または datasets のどちらかに設定してください。省略時は各項目のデフォルト値が使用されます。
|
55 |
+
|
56 |
+
`[[datasets]]`以下を追加することで、他のデータセットを追加できます。各データセットには異なる設定を持てます。
|
57 |
+
</details>
|
58 |
+
|
59 |
+
### Sample for Image Dataset with Metadata JSONL File
|
60 |
+
|
61 |
+
```toml
|
62 |
+
# resolution, batch_size, num_repeats, enable_bucket, bucket_no_upscale should be set in either general or datasets
|
63 |
+
# caption_extension is not required for metadata jsonl file
|
64 |
+
# cache_directory is required for each dataset with metadata jsonl file
|
65 |
+
|
66 |
+
# general configurations
|
67 |
+
[general]
|
68 |
+
resolution = [960, 544]
|
69 |
+
batch_size = 1
|
70 |
+
enable_bucket = true
|
71 |
+
bucket_no_upscale = false
|
72 |
+
|
73 |
+
[[datasets]]
|
74 |
+
image_jsonl_file = "/path/to/metadata.jsonl"
|
75 |
+
cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
|
76 |
+
num_repeats = 1 # optional, default is 1. Same as above.
|
77 |
+
|
78 |
+
# other datasets can be added here. each dataset can have different configurations
|
79 |
+
```
|
80 |
+
|
81 |
+
JSONL file format for metadata:
|
82 |
+
|
83 |
+
```json
|
84 |
+
{"image_path": "/path/to/image1.jpg", "caption": "A caption for image1"}
|
85 |
+
{"image_path": "/path/to/image2.jpg", "caption": "A caption for image2"}
|
86 |
+
```
|
87 |
+
|
88 |
+
<details>
|
89 |
+
<summary>日本語</summary>
|
90 |
+
|
91 |
+
resolution, batch_size, num_repeats, enable_bucket, bucket_no_upscale は general または datasets のどちらかに設定してください。省略時は各項目のデフォルト値が使用されます。
|
92 |
+
|
93 |
+
metadata jsonl ファイルを使用する場合、caption_extension は必要ありません。また、cache_directory は必須です。
|
94 |
+
|
95 |
+
キャプションによるデータセットと同様に、複数のデータセットを追加���きます。各データセットには異なる設定を持てます。
|
96 |
+
</details>
|
97 |
+
|
98 |
+
|
99 |
+
### Sample for Video Dataset with Caption Text Files
|
100 |
+
|
101 |
+
```toml
|
102 |
+
# resolution, caption_extension, target_frames, frame_extraction, frame_stride, frame_sample,
|
103 |
+
# batch_size, num_repeats, enable_bucket, bucket_no_upscale should be set in either general or datasets
|
104 |
+
# num_repeats is also available for video dataset, example is not shown here
|
105 |
+
|
106 |
+
# general configurations
|
107 |
+
[general]
|
108 |
+
resolution = [960, 544]
|
109 |
+
caption_extension = ".txt"
|
110 |
+
batch_size = 1
|
111 |
+
enable_bucket = true
|
112 |
+
bucket_no_upscale = false
|
113 |
+
|
114 |
+
[[datasets]]
|
115 |
+
video_directory = "/path/to/video_dir"
|
116 |
+
cache_directory = "/path/to/cache_directory" # recommended to set cache directory
|
117 |
+
target_frames = [1, 25, 45]
|
118 |
+
frame_extraction = "head"
|
119 |
+
|
120 |
+
# other datasets can be added here. each dataset can have different configurations
|
121 |
+
```
|
122 |
+
|
123 |
+
__In HunyuanVideo and Wan2.1, the number of `target_frames` must be "N*4+1" (N=0,1,2,...).__
|
124 |
+
|
125 |
+
<details>
|
126 |
+
<summary>日本語</summary>
|
127 |
+
|
128 |
+
resolution, caption_extension, target_frames, frame_extraction, frame_stride, frame_sample, batch_size, num_repeats, enable_bucket, bucket_no_upscale は general または datasets のどちらかに設定してください。
|
129 |
+
|
130 |
+
__HunyuanVideoおよびWan2.1では、target_framesの数値は「N*4+1」である必要があります。__
|
131 |
+
|
132 |
+
他の注意事項は画像データセットと同様です。
|
133 |
+
</details>
|
134 |
+
|
135 |
+
### Sample for Video Dataset with Metadata JSONL File
|
136 |
+
|
137 |
+
```toml
|
138 |
+
# resolution, target_frames, frame_extraction, frame_stride, frame_sample,
|
139 |
+
# batch_size, num_repeats, enable_bucket, bucket_no_upscale should be set in either general or datasets
|
140 |
+
# caption_extension is not required for metadata jsonl file
|
141 |
+
# cache_directory is required for each dataset with metadata jsonl file
|
142 |
+
|
143 |
+
# general configurations
|
144 |
+
[general]
|
145 |
+
resolution = [960, 544]
|
146 |
+
batch_size = 1
|
147 |
+
enable_bucket = true
|
148 |
+
bucket_no_upscale = false
|
149 |
+
|
150 |
+
[[datasets]]
|
151 |
+
video_jsonl_file = "/path/to/metadata.jsonl"
|
152 |
+
target_frames = [1, 25, 45]
|
153 |
+
frame_extraction = "head"
|
154 |
+
cache_directory = "/path/to/cache_directory_head"
|
155 |
+
|
156 |
+
# same metadata jsonl file can be used for multiple datasets
|
157 |
+
[[datasets]]
|
158 |
+
video_jsonl_file = "/path/to/metadata.jsonl"
|
159 |
+
target_frames = [1]
|
160 |
+
frame_stride = 10
|
161 |
+
cache_directory = "/path/to/cache_directory_stride"
|
162 |
+
|
163 |
+
# other datasets can be added here. each dataset can have different configurations
|
164 |
+
```
|
165 |
+
|
166 |
+
JSONL file format for metadata:
|
167 |
+
|
168 |
+
```json
|
169 |
+
{"video_path": "/path/to/video1.mp4", "caption": "A caption for video1"}
|
170 |
+
{"video_path": "/path/to/video2.mp4", "caption": "A caption for video2"}
|
171 |
+
```
|
172 |
+
|
173 |
+
<details>
|
174 |
+
<summary>日本語</summary>
|
175 |
+
|
176 |
+
resolution, target_frames, frame_extraction, frame_stride, frame_sample, batch_size, num_repeats, enable_bucket, bucket_no_upscale は general または datasets のどちらかに設定してください。
|
177 |
+
|
178 |
+
metadata jsonl ファイルを使用する場合、caption_extension は必要ありません。また、cache_directory は必須です。
|
179 |
+
|
180 |
+
他の注意事項は今までのデータセットと同様です。
|
181 |
+
</details>
|
182 |
+
|
183 |
+
### frame_extraction Options
|
184 |
+
|
185 |
+
- `head`: Extract the first N frames from the video.
|
186 |
+
- `chunk`: Extract frames by splitting the video into chunks of N frames.
|
187 |
+
- `slide`: Extract frames from the video with a stride of `frame_stride`.
|
188 |
+
- `uniform`: Extract `frame_sample` samples uniformly from the video.
|
189 |
+
|
190 |
+
For example, consider a video with 40 frames. The following diagrams illustrate each extraction:
|
191 |
+
|
192 |
+
<details>
|
193 |
+
<summary>日本語</summary>
|
194 |
+
|
195 |
+
- `head`: 動画から最初のNフレームを抽出します。
|
196 |
+
- `chunk`: 動画をNフレームずつに分割してフレームを抽出します。
|
197 |
+
- `slide`: `frame_stride`に指定したフレームごとに動画からNフレームを抽出します。
|
198 |
+
- `uniform`: 動画から一定間隔で、`frame_sample`個のNフレームを抽出します。
|
199 |
+
|
200 |
+
例えば、40フレームの動画を例とした抽出について、以下の図で説明します。
|
201 |
+
</details>
|
202 |
+
|
203 |
+
```
|
204 |
+
Original Video, 40 frames: x = frame, o = no frame
|
205 |
+
oooooooooooooooooooooooooooooooooooooooo
|
206 |
+
|
207 |
+
head, target_frames = [1, 13, 25] -> extract head frames:
|
208 |
+
xooooooooooooooooooooooooooooooooooooooo
|
209 |
+
xxxxxxxxxxxxxooooooooooooooooooooooooooo
|
210 |
+
xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo
|
211 |
+
|
212 |
+
chunk, target_frames = [13, 25] -> extract frames by splitting into chunks, into 13 and 25 frames:
|
213 |
+
xxxxxxxxxxxxxooooooooooooooooooooooooooo
|
214 |
+
oooooooooooooxxxxxxxxxxxxxoooooooooooooo
|
215 |
+
ooooooooooooooooooooooooooxxxxxxxxxxxxxo
|
216 |
+
xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo
|
217 |
+
|
218 |
+
NOTE: Please do not include 1 in target_frames if you are using the frame_extraction "chunk". It will make the all frames to be extracted.
|
219 |
+
注: frame_extraction "chunk" を使用する場合、target_frames に 1 を含めないでください。全てのフレームが抽出されてしまいます。
|
220 |
+
|
221 |
+
slide, target_frames = [1, 13, 25], frame_stride = 10 -> extract N frames with a stride of 10:
|
222 |
+
xooooooooooooooooooooooooooooooooooooooo
|
223 |
+
ooooooooooxooooooooooooooooooooooooooooo
|
224 |
+
ooooooooooooooooooooxooooooooooooooooooo
|
225 |
+
ooooooooooooooooooooooooooooooxooooooooo
|
226 |
+
xxxxxxxxxxxxxooooooooooooooooooooooooooo
|
227 |
+
ooooooooooxxxxxxxxxxxxxooooooooooooooooo
|
228 |
+
ooooooooooooooooooooxxxxxxxxxxxxxooooooo
|
229 |
+
xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo
|
230 |
+
ooooooooooxxxxxxxxxxxxxxxxxxxxxxxxxooooo
|
231 |
+
|
232 |
+
uniform, target_frames =[1, 13, 25], frame_sample = 4 -> extract `frame_sample` samples uniformly, N frames each:
|
233 |
+
xooooooooooooooooooooooooooooooooooooooo
|
234 |
+
oooooooooooooxoooooooooooooooooooooooooo
|
235 |
+
oooooooooooooooooooooooooxoooooooooooooo
|
236 |
+
ooooooooooooooooooooooooooooooooooooooox
|
237 |
+
xxxxxxxxxxxxxooooooooooooooooooooooooooo
|
238 |
+
oooooooooxxxxxxxxxxxxxoooooooooooooooooo
|
239 |
+
ooooooooooooooooooxxxxxxxxxxxxxooooooooo
|
240 |
+
oooooooooooooooooooooooooooxxxxxxxxxxxxx
|
241 |
+
xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo
|
242 |
+
oooooxxxxxxxxxxxxxxxxxxxxxxxxxoooooooooo
|
243 |
+
ooooooooooxxxxxxxxxxxxxxxxxxxxxxxxxooooo
|
244 |
+
oooooooooooooooxxxxxxxxxxxxxxxxxxxxxxxxx
|
245 |
+
```
|
246 |
+
|
247 |
+
## Specifications
|
248 |
+
|
249 |
+
```toml
|
250 |
+
# general configurations
|
251 |
+
[general]
|
252 |
+
resolution = [960, 544] # optional, [W, H], default is None. This is the default resolution for all datasets
|
253 |
+
caption_extension = ".txt" # optional, default is None. This is the default caption extension for all datasets
|
254 |
+
batch_size = 1 # optional, default is 1. This is the default batch size for all datasets
|
255 |
+
num_repeats = 1 # optional, default is 1. Number of times to repeat the dataset. Useful to balance the multiple datasets with different sizes.
|
256 |
+
enable_bucket = true # optional, default is false. Enable bucketing for datasets
|
257 |
+
bucket_no_upscale = false # optional, default is false. Disable upscaling for bucketing. Ignored if enable_bucket is false
|
258 |
+
|
259 |
+
### Image Dataset
|
260 |
+
|
261 |
+
# sample image dataset with caption text files
|
262 |
+
[[datasets]]
|
263 |
+
image_directory = "/path/to/image_dir"
|
264 |
+
caption_extension = ".txt" # required for caption text files, if general caption extension is not set
|
265 |
+
resolution = [960, 544] # required if general resolution is not set
|
266 |
+
batch_size = 4 # optional, overwrite the default batch size
|
267 |
+
num_repeats = 1 # optional, overwrite the default num_repeats
|
268 |
+
enable_bucket = false # optional, overwrite the default bucketing setting
|
269 |
+
bucket_no_upscale = true # optional, overwrite the default bucketing setting
|
270 |
+
cache_directory = "/path/to/cache_directory" # optional, default is None to use the same directory as the image directory. NOTE: caching is always enabled
|
271 |
+
|
272 |
+
# sample image dataset with metadata **jsonl** file
|
273 |
+
[[datasets]]
|
274 |
+
image_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of image files and captions
|
275 |
+
resolution = [960, 544] # required if general resolution is not set
|
276 |
+
cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
|
277 |
+
# caption_extension is not required for metadata jsonl file
|
278 |
+
# batch_size, num_repeats, enable_bucket, bucket_no_upscale are also available for metadata jsonl file
|
279 |
+
|
280 |
+
### Video Dataset
|
281 |
+
|
282 |
+
# sample video dataset with caption text files
|
283 |
+
[[datasets]]
|
284 |
+
video_directory = "/path/to/video_dir"
|
285 |
+
caption_extension = ".txt" # required for caption text files, if general caption extension is not set
|
286 |
+
resolution = [960, 544] # required if general resolution is not set
|
287 |
+
|
288 |
+
target_frames = [1, 25, 79] # required for video dataset. list of video lengths to extract frames. each element must be N*4+1 (N=0,1,2,...)
|
289 |
+
|
290 |
+
# NOTE: Please do not include 1 in target_frames if you are using the frame_extraction "chunk". It will make the all frames to be extracted.
|
291 |
+
|
292 |
+
frame_extraction = "head" # optional, "head" or "chunk", "slide", "uniform". Default is "head"
|
293 |
+
frame_stride = 1 # optional, default is 1, available for "slide" frame extraction
|
294 |
+
frame_sample = 4 # optional, default is 1 (same as "head"), available for "uniform" frame extraction
|
295 |
+
# batch_size, num_repeats, enable_bucket, bucket_no_upscale, cache_directory are also available for video dataset
|
296 |
+
|
297 |
+
# sample video dataset with metadata jsonl file
|
298 |
+
[[datasets]]
|
299 |
+
video_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of video files and captions
|
300 |
+
|
301 |
+
target_frames = [1, 79]
|
302 |
+
|
303 |
+
cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
|
304 |
+
# frame_extraction, frame_stride, frame_sample are also available for metadata jsonl file
|
305 |
+
```
|
306 |
+
|
307 |
+
<!--
|
308 |
+
# sample image dataset with lance
|
309 |
+
[[datasets]]
|
310 |
+
image_lance_dataset = "/path/to/lance_dataset"
|
311 |
+
resolution = [960, 544] # required if general resolution is not set
|
312 |
+
# batch_size, enable_bucket, bucket_no_upscale, cache_directory are also available for lance dataset
|
313 |
+
-->
|
314 |
+
|
315 |
+
The metadata with .json file will be supported in the near future.
|
316 |
+
|
317 |
+
|
318 |
+
|
319 |
+
<!--
|
320 |
+
|
321 |
+
```toml
|
322 |
+
# general configurations
|
323 |
+
[general]
|
324 |
+
resolution = [960, 544] # optional, [W, H], default is None. This is the default resolution for all datasets
|
325 |
+
caption_extension = ".txt" # optional, default is None. This is the default caption extension for all datasets
|
326 |
+
batch_size = 1 # optional, default is 1. This is the default batch size for all datasets
|
327 |
+
enable_bucket = true # optional, default is false. Enable bucketing for datasets
|
328 |
+
bucket_no_upscale = false # optional, default is false. Disable upscaling for bucketing. Ignored if enable_bucket is false
|
329 |
+
|
330 |
+
# sample image dataset with caption text files
|
331 |
+
[[datasets]]
|
332 |
+
image_directory = "/path/to/image_dir"
|
333 |
+
caption_extension = ".txt" # required for caption text files, if general caption extension is not set
|
334 |
+
resolution = [960, 544] # required if general resolution is not set
|
335 |
+
batch_size = 4 # optional, overwrite the default batch size
|
336 |
+
enable_bucket = false # optional, overwrite the default bucketing setting
|
337 |
+
bucket_no_upscale = true # optional, overwrite the default bucketing setting
|
338 |
+
cache_directory = "/path/to/cache_directory" # optional, default is None to use the same directory as the image directory. NOTE: caching is always enabled
|
339 |
+
|
340 |
+
# sample image dataset with metadata **jsonl** file
|
341 |
+
[[datasets]]
|
342 |
+
image_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of image files and captions
|
343 |
+
resolution = [960, 544] # required if general resolution is not set
|
344 |
+
cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
|
345 |
+
# caption_extension is not required for metadata jsonl file
|
346 |
+
# batch_size, enable_bucket, bucket_no_upscale are also available for metadata jsonl file
|
347 |
+
|
348 |
+
# sample video dataset with caption text files
|
349 |
+
[[datasets]]
|
350 |
+
video_directory = "/path/to/video_dir"
|
351 |
+
caption_extension = ".txt" # required for caption text files, if general caption extension is not set
|
352 |
+
resolution = [960, 544] # required if general resolution is not set
|
353 |
+
target_frames = [1, 25, 79] # required for video dataset. list of video lengths to extract frames. each element must be N*4+1 (N=0,1,2,...)
|
354 |
+
frame_extraction = "head" # optional, "head" or "chunk", "slide", "uniform". Default is "head"
|
355 |
+
frame_stride = 1 # optional, default is 1, available for "slide" frame extraction
|
356 |
+
frame_sample = 4 # optional, default is 1 (same as "head"), available for "uniform" frame extraction
|
357 |
+
# batch_size, enable_bucket, bucket_no_upscale, cache_directory are also available for video dataset
|
358 |
+
|
359 |
+
# sample video dataset with metadata jsonl file
|
360 |
+
[[datasets]]
|
361 |
+
video_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of video files and captions
|
362 |
+
target_frames = [1, 79]
|
363 |
+
cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
|
364 |
+
# frame_extraction, frame_stride, frame_sample are also available for metadata jsonl file
|
365 |
+
```
|
366 |
+
|
367 |
+
# sample image dataset with lance
|
368 |
+
[[datasets]]
|
369 |
+
image_lance_dataset = "/path/to/lance_dataset"
|
370 |
+
resolution = [960, 544] # required if general resolution is not set
|
371 |
+
# batch_size, enable_bucket, bucket_no_upscale, cache_directory are also available for lance dataset
|
372 |
+
|
373 |
+
The metadata with .json file will be supported in the near future.
|
374 |
+
|
375 |
+
|
376 |
+
|
377 |
+
|
378 |
+
-->
|
dataset/dataset_example.toml
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# resolution, caption_extension, target_frames, frame_extraction, frame_stride, frame_sample,
|
2 |
+
# batch_size, num_repeats, enable_bucket, bucket_no_upscale should be set in either general or datasets
|
3 |
+
|
4 |
+
|
5 |
+
# general configurations
|
6 |
+
[general]
|
7 |
+
caption_extension = ".txt"
|
8 |
+
batch_size = 1
|
9 |
+
enable_bucket = true
|
10 |
+
bucket_no_upscale = false
|
11 |
+
|
12 |
+
|
13 |
+
# dataset configurations
|
14 |
+
[[datasets]]
|
15 |
+
resolution = [160, 160]
|
16 |
+
video_directory = "D:/musubi-tuner-wan-gui/dataset/My_Best_Lora_dataset/video" # path to your video dataset
|
17 |
+
cache_directory = "D:/musubi-tuner-wan-gui/dataset/My_Best_Lora_dataset/cache/video" # recommended to set cache directory
|
18 |
+
target_frames = [17, 33, 65]
|
19 |
+
frame_extraction = "chunk"
|
20 |
+
num_repeats = 1
|
21 |
+
|
22 |
+
# head: Extract the first N frames from the video.
|
23 |
+
# chunk: Extract frames by splitting the video into chunks of N frames.
|
24 |
+
# slide: Extract frames from the video with a stride of frame_stride.
|
25 |
+
# uniform: Extract frame_sample samples uniformly from the video.
|
26 |
+
# NOTE: Please do not include 1 in target_frames if you are using the frame_extraction "chunk". It will make the all frames to be extracted.
|
27 |
+
|
28 |
+
# More info here: https://github.com/Kvento/musubi-tuner-wan-gui/blob/main/dataset/dataset_config.md
|
29 |
+
|
30 |
+
|
31 |
+
|
32 |
+
|
33 |
+
|
34 |
+
|
35 |
+
|
36 |
+
# other datasets can be added here. each dataset can have different configurations
|
37 |
+
|
38 |
+
# If you don't need image training, remove this code:
|
39 |
+
# dataset configurations
|
40 |
+
[[datasets]]
|
41 |
+
resolution = [256, 256]
|
42 |
+
image_directory = "D:/musubi-tuner-wan-gui/dataset/My_Best_Lora_dataset/images" # path to your image dataset
|
43 |
+
cache_directory = "D:/musubi-tuner-wan-gui/dataset/My_Best_Lora_dataset/cache/images" # recommended to set cache directory
|
44 |
+
num_repeats = 1
|
dataset/ebPhotos-001/20190915_193922.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20190915_193922.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with curly hair smiling wearing a black top standing on a boat. In the background a group of people including children sit on a wooden bench wearing casual clothes. A green suspension bridge spans a river with a cloudy sky above. The woman is in the foreground with the group and bridge in the mid-ground. The boat has a green floor and a cylindrical black structure on the left.
|
dataset/ebPhotos-001/20190921_182515.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20190921_182515.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with dark skin smiling standing in a hotel room. She has her hair in a neat bun wearing a lace pink sleeveless dress that accentuates her medium-sized breasts and a large green bow at the chest. She accessorizes with a silver necklace bracelet and watch. The room has a patterned carpet wooden door green chair and metal trash can. She stands confidently one hand on her shoulder.
|
dataset/ebPhotos-001/20190921_182517.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20190921_182517.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with dark skin standing in a hotel room. She has a slim curvy figure wearing a pink lace dress that accentuates her medium-sized breasts. She's smiling with her right hand touching her shoulder and her left hand resting on her hip. She wears a white beaded necklace matching bracelet and a blue watch. Her hair is styled in a neat bun. The background includes a green chair a trash can and two wooden doors with gold handles. The carpet has a
|
dataset/ebPhotos-001/20220521_222809.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20220521_222809.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with medium brown skin dark brown eyes and red lipstick wearing a pink checkered shirt her hair in a high bun standing in a cluttered office. background includes a TV showing a collage of images snacks on a shelf a red box papers and a desk with a white plastic bag. she wears a gold pendant necklace. the office has beige walls and wooden furniture.
|
dataset/ebPhotos-001/20230427_082757.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20230427_082757.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with medium brown skin shoulder-length wavy black hair and brown eyes. She wears a deep purple sleeveless top looking directly at the camera with a neutral expression. The background is a dimly lit indoor space with beige walls and a dark curtain. The image is a close-up focusing on her face and upper torso.
|
dataset/ebPhotos-001/20230427_082800.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20230427_082800.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with medium brown skin shoulder-length wavy black hair and brown eyes. She wears a deep purple halter top looking directly at the camera with a slight confident smile. The background is a dimly lit indoor space with a dark curtain on the right and a beige wall on the left. The lighting highlights her natural skin texture and subtle makeup.
|
dataset/ebPhotos-001/20230427_082805.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20230427_082805.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with medium brown skin shoulder-length wavy black hair and brown eyes. She wears a low-cut sleeveless purple top revealing a hint of cleavage. Her expression is neutral with slightly pursed lips. The background is a dimly lit indoor room with a dark curtain and a partially visible doorway. The lighting highlights her natural skin texture and subtle makeup.
|
dataset/ebPhotos-001/20230502_185323.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20230502_185323.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with medium brown skin wearing a form-fitting white satin dress with thin straps standing in a purple-walled dressing room. She has shoulder-length black hair a necklace and is smiling while dancing. The room has a gray carpet a standing mirror and a pink and purple garment hanging on the left. An "EXIT" sign is visible on the ceiling.
|
dataset/ebPhotos-001/20230504_193610.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20230504_193610.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with curly hair wearing a gray and red track jacket gray leggings and red and black sneakers kneeling on a patterned carpet in a hallway her right hand on her chest left hand on the floor beige walls wooden floor and a door in the background.
|
dataset/ebPhotos-001/20230504_193624.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20230504_193624.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with curly hair wearing a gray Fila jacket with red trim black top gray leggings and red and white Nike sneakers. She's posing in a hallway one leg raised hand on jacket. wooden floor patterned rug beige walls and white door in background. confident stylish athletic.
|
dataset/ebPhotos-001/20230504_193657.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20230504_193657.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with wavy hair wearing a gray and red track jacket black top and black leggings kneeling on one leg in a hallway with wooden floors and a patterned rug. She's wearing red white and gray sneakers. The hallway has white doors and beige walls. She has a confident expression and her right hand is in her jacket pocket. The lighting is warm and soft.
|
dataset/ebPhotos-001/20230504_193734.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20230504_193734.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with long wavy hair kneeling on a wooden floor in a hallway. she wears a gray and red track jacket black leggings and red and white sneakers. her right hand is in her jacket pocket. the hallway has beige walls white doors and a patterned gray rug. the lighting is warm and she looks down at the camera with a slight smile.
|
dataset/ebPhotos-001/20230504_193750.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20230504_193750.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with medium brown skin long wavy black hair and a slender build kneeling in a hallway. She wears a gray track jacket with red and white accents black leggings and red and black sneakers. She has a necklace with a circular pendant. The hallway has wooden floors a patterned gray rug and white walls with a door and window blinds in the background. She smiles slightly looking at the camera.
|
dataset/ebPhotos-001/20230504_193805.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20230504_193805.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with wavy hair wearing a gray and red jacket black top gray leggings and red and white sneakers kneeling in a hallway with wooden floors and beige walls holding her hair with a necklace visible smiling at the camera.
|
dataset/ebPhotos-001/20230505_194441.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20230505_194441.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with medium brown skin wearing a form-fitting long-sleeve red dress with a keyhole neckline standing in a narrow hallway. She has shoulder-length wavy black hair and is posing with one hand on her head and the other on the wall. She wears black high heels and has a tattoo on her left thigh. The hallway has beige walls white doors and a wooden step. She looks confident and alluring.
|
dataset/ebPhotos-001/20230505_194607.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20230505_194607.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with dark skin standing in a narrow hallway. She has shoulder-length curly black hair wearing a tight long-sleeve red mini dress with a keyhole neckline revealing moderate cleavage. She is standing with arms outstretched touching the walls wearing black high-heeled shoes. The hallway has beige walls white doors and wooden floors with a patterned rug at the bottom. Recessed ceiling lights illuminate the scene.
|
dataset/ebPhotos-001/20230505_194707.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20230505_194707.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with medium brown skin standing in a narrow hallway. she has wavy black hair wears a tight red long-sleeve mini dress with a low neckline black fishnet stockings and black high-heeled sandals. she stands confidently one hand on the wall the other on her hip. the hallway has beige walls white trim and a patterned doormat. a ceiling light illuminates her from above.
|
dataset/ebPhotos-001/20230505_194729.jpg
ADDED
![]() |
Git LFS Details
|
dataset/ebPhotos-001/20230505_194729.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
photo of a beautiful black woman with curly hair wearing a tight red long-sleeve mini dress black high heels and a gold bracelet. She stands in a narrow hallway leaning against a white door showcasing a large tattoo on her right thigh. The hallway has beige walls wooden floor and patterned rug. The lighting is warm and the angle is low emphasizing her confident pose and curvy figure.
|