kundaja-green commited on
Commit
ebb79f2
·
0 Parent(s):

Completely fresh repository upload

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +8 -0
  2. .gitignore +11 -0
  3. .python-version +1 -0
  4. Dockerfile +49 -0
  5. README.ja.md +426 -0
  6. README.md +64 -0
  7. Start_Wan_GUI.bat +54 -0
  8. cache_latents.py +281 -0
  9. cache_text_encoder_outputs.py +214 -0
  10. convert_lora.py +131 -0
  11. dataset/__init__.py +0 -0
  12. dataset/config_utils.py +372 -0
  13. dataset/dataset_config.md +378 -0
  14. dataset/dataset_example.toml +44 -0
  15. dataset/ebPhotos-001/20190915_193922.jpg +3 -0
  16. dataset/ebPhotos-001/20190915_193922.txt +1 -0
  17. dataset/ebPhotos-001/20190921_182515.jpg +3 -0
  18. dataset/ebPhotos-001/20190921_182515.txt +1 -0
  19. dataset/ebPhotos-001/20190921_182517.jpg +3 -0
  20. dataset/ebPhotos-001/20190921_182517.txt +1 -0
  21. dataset/ebPhotos-001/20220521_222809.jpg +3 -0
  22. dataset/ebPhotos-001/20220521_222809.txt +1 -0
  23. dataset/ebPhotos-001/20230427_082757.jpg +3 -0
  24. dataset/ebPhotos-001/20230427_082757.txt +1 -0
  25. dataset/ebPhotos-001/20230427_082800.jpg +3 -0
  26. dataset/ebPhotos-001/20230427_082800.txt +1 -0
  27. dataset/ebPhotos-001/20230427_082805.jpg +3 -0
  28. dataset/ebPhotos-001/20230427_082805.txt +1 -0
  29. dataset/ebPhotos-001/20230502_185323.jpg +3 -0
  30. dataset/ebPhotos-001/20230502_185323.txt +1 -0
  31. dataset/ebPhotos-001/20230504_193610.jpg +3 -0
  32. dataset/ebPhotos-001/20230504_193610.txt +1 -0
  33. dataset/ebPhotos-001/20230504_193624.jpg +3 -0
  34. dataset/ebPhotos-001/20230504_193624.txt +1 -0
  35. dataset/ebPhotos-001/20230504_193657.jpg +3 -0
  36. dataset/ebPhotos-001/20230504_193657.txt +1 -0
  37. dataset/ebPhotos-001/20230504_193734.jpg +3 -0
  38. dataset/ebPhotos-001/20230504_193734.txt +1 -0
  39. dataset/ebPhotos-001/20230504_193750.jpg +3 -0
  40. dataset/ebPhotos-001/20230504_193750.txt +1 -0
  41. dataset/ebPhotos-001/20230504_193805.jpg +3 -0
  42. dataset/ebPhotos-001/20230504_193805.txt +1 -0
  43. dataset/ebPhotos-001/20230505_194441.jpg +3 -0
  44. dataset/ebPhotos-001/20230505_194441.txt +1 -0
  45. dataset/ebPhotos-001/20230505_194607.jpg +3 -0
  46. dataset/ebPhotos-001/20230505_194607.txt +1 -0
  47. dataset/ebPhotos-001/20230505_194707.jpg +3 -0
  48. dataset/ebPhotos-001/20230505_194707.txt +1 -0
  49. dataset/ebPhotos-001/20230505_194729.jpg +3 -0
  50. dataset/ebPhotos-001/20230505_194729.txt +1 -0
.gitattributes ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ *.pth filter=lfs diff=lfs merge=lfs -text
2
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
3
+ *.jpg filter=lfs diff=lfs merge=lfs -text
4
+ *.png filter=lfs diff=lfs merge=lfs -text
5
+ *.webp filter=lfs diff=lfs merge=lfs -text
6
+ PXL_20240227_181242253jpg filter=lfs diff=lfs merge=lfs -text
7
+ PXL_20240227_181242253jpg* filter=lfs diff=lfs merge=lfs -text
8
+ *PXL_20240227_181242253jpg filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ .venv
3
+ venv/
4
+ logs/
5
+ uv.lock
6
+ main.exp
7
+ main.lib
8
+ main.obj
9
+ dataset/Wan
10
+ Models/
11
+ Output_LoRAs/
.python-version ADDED
@@ -0,0 +1 @@
 
 
1
+ 3.10
Dockerfile ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use a standard Python 3.12 base image
2
+ FROM python:3.12-slim
3
+
4
+ # Set the working directory inside the container
5
+ WORKDIR /code
6
+
7
+ # Install git and aria2 for faster downloads
8
+ RUN apt-get update && apt-get install -y git aria2
9
+
10
+ # Copy the requirements file first to leverage Docker cache
11
+ COPY requirements.txt .
12
+
13
+ # Install the correct CUDA-enabled PyTorch version and other requirements
14
+ RUN pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
15
+ RUN pip install --no-cache-dir -r requirements.txt
16
+
17
+ # --- NEW SECTION: DOWNLOAD MODELS ---
18
+ # Download the official Wan2.1 models from their Hugging Face repository
19
+ # This downloads them into a "Models/Wan" folder inside the container
20
+ RUN huggingface-cli download wan-video/wan2.1 \
21
+ --repo-type model \
22
+ --include "*.pth" "*.json" "*.safetensors" \
23
+ --local-dir Models/Wan --local-dir-use-symlinks False
24
+
25
+ # Copy all your project files (code, dataset configs, etc.) into the container
26
+ COPY . .
27
+
28
+ # This is the command that will run when the Space starts.
29
+ # It uses the models we just downloaded.
30
+ CMD ["accelerate", "launch", "wan_train_network.py", \
31
+ "--task", "i2v-14B", \
32
+ "--dit", "Models/Wan/wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors", \
33
+ "--vae", "Models/Wan/Wan2.1_VAE.pth", \
34
+ "--clip", "Models/Wan/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth", \
35
+ "--t5", "Models/Wan/models_t5_umt5-xxl-enc-bf16.pth", \
36
+ "--dataset_config", "dataset/testtoml.toml", \
37
+ "--output_dir", "/data/output", \
38
+ "--output_name", "My_HF_Lora_v1", \
39
+ "--save_every_n_epochs", "10", \
40
+ "--max_train_epochs", "70", \
41
+ "--network_module", "networks.lora_wan", \
42
+ "--network_dim", "32", \
43
+ "--network_alpha", "4", \
44
+ "--learning_rate", "2e-5", \
45
+ "--optimizer_type", "adamw", \
46
+ "--mixed_precision", "bf16", \
47
+ "--gradient_checkpointing", \
48
+ "--sdpa" \
49
+ ]
README.ja.md ADDED
@@ -0,0 +1,426 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GUIの使い方
2
+
3
+ - GUIを開くには、次のコマンドを実行します - `Start_Wan_GUI.bat`
4
+ - すべての設定は、 "**Load Settings**" および "**Save Setting**" ボタンを使用して保存および読み込むことができます。
5
+ - 設定の詳細については以下を参照してください。 [Wan2.1 documentation](./docs/wan.md), [Advanced Configuration](./docs/advanced_config.md#fp8-quantization), [Dataset configuration guide](./dataset/dataset_config.md).
6
+
7
+
8
+ ![Preview](docs/Preview.png)
9
+
10
+
11
+
12
+ # Musubi Tuner
13
+
14
+ [English](./README.md) | [日本語](./README.ja.md)
15
+
16
+ ## 目次
17
+
18
+ - [はじめに](#はじめに)
19
+ - [最近の更新](#最近の更新)
20
+ - [リリースについて](#リリースについて)
21
+ - [概要](#概要)
22
+ - [ハードウェア要件](#ハードウェア要件)
23
+ - [特徴](#特徴)
24
+ - [インストール](#インストール)
25
+ - [モデルのダウンロード](#モデルのダウンロード)
26
+ - [HunyuanVideoの公式モデルを使う](#HunyuanVideoの公式モデルを使う)
27
+ - [Text EncoderにComfyUI提供のモデルを使う](#Text-EncoderにComfyUI提供のモデルを使う)
28
+ - [使い方](#使い方)
29
+ - [データセット設定](#データセット設定)
30
+ - [latentの事前キャッシュ](#latentの事前キャッシュ)
31
+ - [Text Encoder出力の事前キャッシュ](#Text-Encoder出力の事前キャッシュ)
32
+ - [学習](#学習)
33
+ - [LoRAの重みのマージ](#LoRAの重みのマージ)
34
+ - [推論](#推論)
35
+ - [SkyReels V1での推論](#SkyReels-V1での推論)
36
+ - [LoRAの形式の変換](#LoRAの形式の変換)
37
+ - [その他](#その他)
38
+ - [SageAttentionのインストール方法](#SageAttentionのインストール方法)
39
+ - [免責事項](#免責事項)
40
+ - [コントリビューションについて](#コントリビューションについて)
41
+ - [ライセンス](#ライセンス)
42
+
43
+ ## はじめに
44
+
45
+ このリポジトリは、HunyuanVideoおよびWan2.1のLoRA学習用のコマンドラインツールです。このリポジトリは非公式であり、公式のHunyuanVideoやWan2.1のリポジトリとは関係ありません。
46
+
47
+ Wan2.1については、[Wan2.1のドキュメント](./docs/wan.md)も参照してください。
48
+
49
+ *リポジトリは開発中です。*
50
+
51
+ ### 最近の更新
52
+
53
+ - 2025/03/16
54
+ - Wan2.1の学習で、fp16の重みを使用した場合でも重みがbf16にcastされていた不具合を修正しました。[PR #160]https://github.com/kohya-ss/musubi-tuner/pull/160)
55
+ - あわせてfp16の重みを使用するとサンプル画像生成で黒画像が生成される不具合を修正しました。
56
+ - fp16の学習で不具合が起きる場合にはbf16をお使いください。
57
+ - Wan2.1の推論スクリプトをリファクタリングしました。`--fp8_fast`と`--compile`オプションが追加されました。詳しくは[こちら](./docs/wan.md#inference--推論)を参照してください。PR [#153](https://github.com/kohya-ss/musubi-tuner/pull/153)
58
+ - 大幅に変更を行ったため、不具合があればお知らせください。
59
+ - 先日追加された`--fp8_scaled`オプションは、fp8での学習および推論の精度向上に効果があるようです。`--fp8_base`で学習している場合や、`--fp8`で推論している場合は、`--fp8_scaled`の追加をご検討ください。問題があればご連絡ください。
60
+
61
+ - 2025/03/13
62
+ - HunyuanVideoの推論スクリプトで、RTX 40x0向けの高速化オプション`--fp8_fast`と、`torch.compile`を使用するオプション`--compile`が追加されました。[PR #137](https://github.com/kohya-ss/musubi-tuner/pull/137) Sarania 氏に感謝いたします。
63
+ - 詳細は[推論](#推論)を参照してください。
64
+ - Wan2.1の学習、推論で、fp8量子化を行うオプションを`--fp8_scaled`を追加しました。[PR #141](https://github.com/kohya-ss/musubi-tuner/pull/141)
65
+ - 単純なFP8へのキャストではなく、スケーリングを行うことで、VRAM使用量の削減と精度の維持を両立します。
66
+ - 詳細は[高度な設定](./docs/advanced_config.md#fp8-quantization)を参照してください。
67
+ - また`fp16`のモデルをWan2.1の学習と推論でサポートしました。
68
+
69
+ - 2025/03/07
70
+ - Wan 2.1の学習で、サンプル画像生成を行わない場合でも`--t5`オプションが必須になっていたのを修正しました。
71
+
72
+ - 2025/03/07
73
+ - Wan 2.1のLoRA学習をサポートしました。`wan_train_network.py`を使用してください。詳細は[こちら](./docs/wan.md)を参照してください。
74
+
75
+ - 2025/03/04
76
+ - Wan 2.1の推論をサポートしました。`wan_generate_video.py`を使用してください。詳細は[こちら](./docs/wan.md)を参照してください。
77
+ - `requirements.txt`が更新されました。`pip install -r requirements.txt`を実行���てください。
78
+
79
+ ### リリースについて
80
+
81
+ Musubi Tunerの解説記事執筆や、関連ツールの開発に取り組んでくださる方々に感謝いたします。このプロジェクトは開発中のため、互換性のない変更や機能追加が起きる可能性があります。想定外の互換性問題を避けるため、参照用として[リリース](https://github.com/kohya-ss/musubi-tuner/releases)をお使いください。
82
+
83
+ 最新のリリースとバージョン履歴は[リリースページ](https://github.com/kohya-ss/musubi-tuner/releases)で確認できます。
84
+
85
+ ## 概要
86
+
87
+ ### ハードウェア要件
88
+
89
+ - VRAM: 静止画での学習は12GB以上推奨、動画での学習は24GB以上推奨。
90
+ - *解像度等の学習設定により異なります。*12GBでは解像度 960x544 以下とし、`--blocks_to_swap`、`--fp8_llm`等の省メモリオプションを使用してください。
91
+ - メインメモリ: 64GB以上を推奨、32GB+スワップで動作するかもしれませんが、未検証です。
92
+
93
+ ### 特徴
94
+
95
+ - 省メモリに特化
96
+ - Windows対応(Linuxでの動作報告もあります)
97
+ - マルチGPUには対応していません
98
+
99
+ ## インストール
100
+
101
+ ### pipによるインストール
102
+
103
+ Python 3.10以上を使用してください(3.10で動作確認済み)。
104
+
105
+ 適当な仮想環境を作成し、ご利用のCUDAバージョンに合わせたPyTorchとtorchvisionをインストールしてください。
106
+
107
+ PyTorchはバージョン2.5.1以上を使用してください([補足](#PyTorchのバージョンについて))。
108
+
109
+ ```bash
110
+ pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
111
+ ```
112
+
113
+ 以下のコマンドを使用して、必要な依存関係をインストールします。
114
+
115
+ ```bash
116
+ pip install -r requirements.txt
117
+ ```
118
+
119
+ オプションとして、FlashAttention、SageAttention(推論にのみ使用、インストール方法は[こちら](#SageAttentionのインストール方法)を参照)を使用できます。
120
+
121
+ また、`ascii-magic`(データセットの確認に使用)、`matplotlib`(timestepsの可視化に使用)、`tensorboard`(学習ログの記録に使用)を必要に応じてインストールしてください。
122
+
123
+ ```bash
124
+ pip install ascii-magic matplotlib tensorboard
125
+ ```
126
+ ### uvによるインストール
127
+
128
+ uvを使用してインストールすることもできますが、uvによるインストールは試験的なものです。フィードバックを歓迎します。
129
+
130
+ #### Linux/MacOS
131
+
132
+ ```sh
133
+ curl -LsSf https://astral.sh/uv/install.sh | sh
134
+ ```
135
+
136
+ 表示される指示に従い、pathを設定してください。
137
+
138
+ #### Windows
139
+
140
+ ```powershell
141
+ powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
142
+ ```
143
+
144
+ 表示される指示に従い、PATHを設定するか、この時点でシステムを再起動してください。
145
+
146
+ ## モデルのダウンロード
147
+
148
+ 以下のいずれかの方法で、モデルをダウンロードしてください。
149
+
150
+ ### HunyuanVideoの公式モデルを使う
151
+
152
+ [公式のREADME](https://github.com/Tencent/HunyuanVideo/blob/main/ckpts/README.md)を参考にダウンロードし、任意のディレクトリに以下のように配置します。
153
+
154
+ ```
155
+ ckpts
156
+ ├──hunyuan-video-t2v-720p
157
+ │ ├──transformers
158
+ │ ├──vae
159
+ ├──text_encoder
160
+ ├──text_encoder_2
161
+ ├──...
162
+ ```
163
+
164
+ ### Text EncoderにComfyUI提供のモデルを使う
165
+
166
+ こちらの方法の方がより簡単です。DiTとVAEのモデルはHumyuanVideoのものを使用します。
167
+
168
+ https://huggingface.co/tencent/HunyuanVideo/tree/main/hunyuan-video-t2v-720p/transformers から、[mp_rank_00_model_states.pt](https://huggingface.co/tencent/HunyuanVideo/resolve/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt) をダウンロードし、任意のディレクトリに配置します。
169
+
170
+ (同じページにfp8のモデルもありますが、未検証です。)
171
+
172
+ `--fp8_base`を指定して学習する場合は、`mp_rank_00_model_states.pt`の代わりに、[こちら](https://huggingface.co/kohya-ss/HunyuanVideo-fp8_e4m3fn-unofficial)の`mp_rank_00_model_states_fp8.safetensors`を使用可能です。(このファイルは非公式のもので、重みを単純にfloat8_e4m3fnに変換したものです。)
173
+
174
+ また、https://huggingface.co/tencent/HunyuanVideo/tree/main/hunyuan-video-t2v-720p/vae から、[pytorch_model.pt](https://huggingface.co/tencent/HunyuanVideo/resolve/main/hunyuan-video-t2v-720p/vae/pytorch_model.pt) をダウンロードし、任意のディレクトリに配置します。
175
+
176
+ Text EncoderにはComfyUI提供のモデルを使用させていただきます。[ComyUIのページ](https://comfyanonymous.github.io/ComfyUI_examples/hunyuan_video/)を参考に、https://huggingface.co/Comfy-Org/HunyuanVideo_repackaged/tree/main/split_files/text_encoders から、llava_llama3_fp16.safetensors (Text Encoder 1、LLM)と��clip_l.safetensors (Text Encoder 2、CLIP)をダウンロードし、任意のディレクトリに配置します。
177
+
178
+ (同じページにfp8のLLMモデルもありますが、動作未検証です。)
179
+
180
+ ## 使い方
181
+
182
+ ### データセット設定
183
+
184
+ [こちら](./dataset/dataset_config.md)を参照してください。
185
+
186
+ ### latentの事前キャッシュ
187
+
188
+ latentの事前キャッシュは必須です。以下のコマンドを使用して、事前キャッシュを作成してください。(pipによるインストールの場合)
189
+
190
+ ```bash
191
+ python cache_latents.py --dataset_config path/to/toml --vae path/to/ckpts/hunyuan-video-t2v-720p/vae/pytorch_model.pt --vae_chunk_size 32 --vae_tiling
192
+ ```
193
+
194
+ uvでインストールした場合は、`uv run python cache_latents.py ...`のように、`uv run`を先頭につけてください。以下のコマンドも同様です。
195
+
196
+ その他のオプションは`python cache_latents.py --help`で確認できます。
197
+
198
+ VRAMが足りない場合は、`--vae_spatial_tile_sample_min_size`を128程度に減らし、`--batch_size`を小さくしてください。
199
+
200
+ `--debug_mode image` を指定するとデータセットの画像とキャプションが新規ウィンドウに表示されます。`--debug_mode console`でコンソールに表示されます(`ascii-magic`が必要)。
201
+
202
+ デフォルトではデータセットに含まれないキャッシュファイルは自動的に削除されます。`--keep_cache`を指定すると、キャッシュファイルを残すことができます。
203
+
204
+ ### Text Encoder出力の事前キャッシュ
205
+
206
+ Text Encoder出力の事前キャッシュは必須です。以下のコマンドを使用して、事前キャッシュを作成してください。
207
+
208
+ ```bash
209
+ python cache_text_encoder_outputs.py --dataset_config path/to/toml --text_encoder1 path/to/ckpts/text_encoder --text_encoder2 path/to/ckpts/text_encoder_2 --batch_size 16
210
+ ```
211
+
212
+ その他のオプションは`python cache_text_encoder_outputs.py --help`で確認できます。
213
+
214
+ `--batch_size`はVRAMに合わせて調整してください。
215
+
216
+ VRAMが足りない場合(16GB程度未満の場合)は、`--fp8_llm`を指定して、fp8でLLMを実行してください。
217
+
218
+ デフォルトではデータセットに含まれないキャッシュファイルは自動的に削除されます。`--keep_cache`を指定すると、キャッシュファイルを残すことができます。
219
+
220
+ ### 学習
221
+
222
+ 以下のコマンドを使用して、学習を開始します(実際には一行で入力してください)。
223
+
224
+ ```bash
225
+ accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 hv_train_network.py
226
+ --dit path/to/ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt
227
+ --dataset_config path/to/toml --sdpa --mixed_precision bf16 --fp8_base
228
+ --optimizer_type adamw8bit --learning_rate 2e-4 --gradient_checkpointing
229
+ --max_data_loader_n_workers 2 --persistent_data_loader_workers
230
+ --network_module networks.lora --network_dim 32
231
+ --timestep_sampling shift --discrete_flow_shift 7.0
232
+ --max_train_epochs 16 --save_every_n_epochs 1 --seed 42
233
+ --output_dir path/to/output_dir --output_name name-of-lora
234
+ ```
235
+
236
+ __更新__:サンプルの学習率を1e-3から2e-4に、`--timestep_sampling`を`sigmoid`から`shift`に、`--discrete_flow_shift`を1.0から7.0に変更しました。より高速な学習が期待されます。ディテールが甘くなる場合は、discrete flow shiftを3.0程度に下げてみてください。
237
+
238
+ ただ、適切な学習率、学習ステップ数、timestepsの分布、loss weightingなどのパラメータは、以前として不明な点が数多くあります。情報提供をお待ちしています。
239
+
240
+ その他のオプションは`python hv_train_network.py --help`で確認できます(ただし多くのオプションは動作未確認です)。
241
+
242
+ `--fp8_base`を指定すると、DiTがfp8で学習されます。未指定時はmixed precisionのデータ型が使用されます。fp8は大きく消費メモリを削減できますが、品質は低下する可能性があります。`--fp8_base`を指定しない場合はVRAM 24GB以上を推奨します。また必要に応じて`--blocks_to_swap`を使用してください。
243
+
244
+ VRAMが足りない場合は、`--blocks_to_swap`を指定して、一部のブロックをCPUにオフロードしてください。最大36が指定できます。
245
+
246
+ (block swapのアイデアは2kpr氏の実装に基づくものです。2kpr氏にあらためて感謝します。)
247
+
248
+ `--sdpa`でPyTorchのscaled dot product attentionを使用します。`--flash_attn`で[FlashAttention]:(https://github.com/Dao-AILab/flash-attention)を使用します。`--xformers`でxformersの利用も可能ですが、xformersを使う場合は`--split_attn`を指定してください。`--sage_attn`でSageAttentionを使用しますが、SageAttentionは現時点では学習に未対応のため、正しく動作しません。
249
+
250
+ `--split_attn`を指定すると、attentionを分割して処理します。速度が多少低下しますが、VRAM使用量はわずかに減ります。
251
+
252
+ 学習されるLoRAの形式は、`sd-scripts`と同じです。
253
+
254
+ `--show_timesteps`に`image`(`matplotlib`が必要)または`console`を指定すると、学習時のtimestepsの分布とtimestepsごとのloss weightingが確認できます。
255
+
256
+ 学習時のログの記録が可能です。[TensorBoard形式のログの保存と参照](./docs/advanced_config.md#save-and-view-logs-in-tensorboard-format--tensorboard形式のログの保存と参照)を参照してください。
257
+
258
+ 学習中のサンプル画像生成については、[こちらのドキュメント](./docs/sampling_during_training.md)を参照してください。その他の高度な設定については[こちらのドキュメント](./docs/advanced_config.md)を参照してください。
259
+
260
+ ### LoRAの重みのマージ
261
+
262
+ 注:Wan 2.1には対応していません。
263
+
264
+ ```bash
265
+ python merge_lora.py \
266
+ --dit path/to/ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt \
267
+ --lora_weight path/to/lora.safetensors \
268
+ --save_merged_model path/to/merged_model.safetensors \
269
+ --device cpu \
270
+ --lora_multiplier 1.0
271
+ ```
272
+
273
+ `--device`には計算を行うデバイス(`cpu`または`cuda`等)を指定してください。`cuda`を指定すると計算が高速化されます。
274
+
275
+ `--lora_weight`にはマージするLoRAの重みを、`--lora_multiplier`にはLoRAの重みの係数を、それぞれ指定してください。複数個が指定可能で、両者の数は一致させてください。
276
+
277
+ ### 推論
278
+
279
+ 以下のコマンドを使用して動画を生成します。
280
+
281
+ ```bash
282
+ python hv_generate_video.py --fp8 --video_size 544 960 --video_length 5 --infer_steps 30
283
+ --prompt "A cat walks on the grass, realistic style." --save_path path/to/save/dir --output_type both
284
+ --dit path/to/ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt --attn_mode sdpa --split_attn
285
+ --vae path/to/ckpts/hunyuan-video-t2v-720p/vae/pytorch_model.pt
286
+ --vae_chunk_size 32 --vae_spatial_tile_sample_min_size 128
287
+ --text_encoder1 path/to/ckpts/text_encoder
288
+ --text_encoder2 path/to/ckpts/text_encoder_2
289
+ --seed 1234 --lora_multiplier 1.0 --lora_weight path/to/lora.safetensors
290
+ ```
291
+
292
+ その他のオプションは`python hv_generate_video.py --help`で確認できます。
293
+
294
+ `--fp8`を指定すると、DiTがfp8で推論されます。fp8は大きく消費メモリを削減できますが、品質は低下する可能性があります。
295
+
296
+ RTX 40x0シリーズのGPUを使用している場合は、`--fp8_fast`オプションを指定することで、高速推論が可能です。このオプションを指定する場合は、`--fp8`も指定してください。
297
+
298
+ VRAMが足りない場合は、`--blocks_to_swap`を指定して、一部のブロックをCPUにオフロードしてください。最大38が指定できます。
299
+
300
+ `--attn_mode`には`flash`、`torch`、`sageattn`、`xformers`または`sdpa`(`torch`指定時と同じ)のいずれかを指定してください。それぞれFlashAttention、scaled dot product attention、SageAttention、xformersに対応します。デフォルトは`torch`です。SageAttentionはVRAMの削減に有効です。
301
+
302
+ `--split_attn`を指定すると、attentionを分割して処理します。SageAttention利用時で10%程度の高速化が見込まれます。
303
+
304
+ `--output_type`には`both`、`latent`、`video`、`images`のいずれかを指定してください。`both`はlatentと動画の両方を出力します。VAEでOut of Memoryエラーが発生する場合に備えて、`both`を指定することをお勧めします。`--latent_path`に保存されたlatentを指定し、`--output_type video` (または`images`)としてスクリプトを実行すると、VAEのdecodeのみを行えます。
305
+
306
+ `--seed`は省略可能です。指定しない場合はランダムなシードが使用されます。
307
+
308
+ `--video_length`は「4の倍数+1」を指定してください。
309
+
310
+ `--flow_shift`にタイムステップのシフト値(discrete flow shift)を指定可能です。省略時のデフォルト値は7.0で、これは推論ステップ数が50の時の推奨値です。HunyuanVideoの論文では、ステップ数50の場合は7.0、ステップ数20未満(10など)で17.0が推奨されています。
311
+
312
+ `--video_path`に読み込む動画を指定すると、video2videoの推論が可能です。動画ファイルを指定するか、複数の画像ファイルが入ったディレクトリを指定してください(画像ファイルはファイル名でソートされ、各フレームとして用いられます)。`--video_length`よりも短い動画を指定するとエラーになります。`--strength`で強度を指定できます。0~1.0で指定でき、大きいほど元の動画からの変化が大きくな���ます。
313
+
314
+ なおvideo2video推論の処理は実験的なものです。
315
+
316
+ `--compile`オプションでPyTorchのコンパイル機能を有効にします(実験的機能)。tritonのインストールが必要です。また、WindowsではVisual C++ build toolsが必要で、かつPyTorch>=2.6.0でのみ動作します。`--compile_args`でコンパイル時の引数を渡すことができます。
317
+
318
+ `--compile`は初回実行時にかなりの時間がかかりますが、2回目以降は高速化されます。
319
+
320
+ `--save_merged_model`オプションで、LoRAマージ後のDiTモデルを保存できます。`--save_merged_model path/to/merged_model.safetensors`のように指定してください。なおこのオプションを指定すると推論は行われません。
321
+
322
+ ### SkyReels V1での推論
323
+
324
+ SkyReels V1のT2VとI2Vモデルがサポートされています(推論のみ)。
325
+
326
+ モデルは[こちら](https://huggingface.co/Kijai/SkyReels-V1-Hunyuan_comfy)からダウンロードできます。モデルを提供してくださったKijai氏に感謝します。`skyreels_hunyuan_i2v_bf16.safetensors`がI2Vモデル、`skyreels_hunyuan_t2v_bf16.safetensors`がT2Vモデルです。`bf16`以外の形式は未検証です(`fp8_e4m3fn`は動作するかもしれません)。
327
+
328
+ T2V推論を行う場合、以下のオプションを推論コマンドに追加してください:
329
+
330
+ ```bash
331
+ --guidance_scale 6.0 --embedded_cfg_scale 1.0 --negative_prompt "Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion" --split_uncond
332
+ ```
333
+
334
+ SkyReels V1はclassifier free guidance(ネガティブプロンプト)を必要とするようです。`--guidance_scale`はネガティブプロンプトのガイダンススケールです。公式リポジトリの推奨値は6.0です。デフォルトは1.0で、この場合はclassifier free guidanceは使用されません(ネガティブプロンプトは無視されます)。
335
+
336
+ `--embedded_cfg_scale`は埋め込みガイダンスのスケールです。公式リポジトリの推奨値は1.0です(埋め込みガイダンスなしを意味すると思われます)。
337
+
338
+ `--negative_prompt`はいわゆるネガティブプロンプトです。上記のサンプルは公式リポジトリのものです。`--guidance_scale`を指定し、`--negative_prompt`を指定しなかった場合は、空文字列が使用されます。
339
+
340
+ `--split_uncond`を指定すると、モデル呼び出しをuncondとcond(ネガティブプロンプトとプロンプト)に分割します。VRAM使用量が減りますが、推論速度は低下する可能性があります。`--split_attn`が指定されている場合、`--split_uncond`は自動的に有効になります。
341
+
342
+ ### LoRAの形式の変換
343
+
344
+ ComfyUIで使用可能な形式(Diffusion-pipeと思われる)への変換は以下のコマンドで行えます。
345
+
346
+ ```bash
347
+ python convert_lora.py --input path/to/musubi_lora.safetensors --output path/to/another_format.safetensors --target other
348
+ ```
349
+
350
+ `--input`と`--output`はそれぞれ入力と出力のファイルパスを指定してください。
351
+
352
+ `--target`には`other`を指定してください。`default`を指定すると、他の形式から当リポジトリの形式に変換できます。
353
+
354
+ Wan2.1も対応済みです。
355
+
356
+ ## その他
357
+
358
+ ### SageAttentionのインストール方法
359
+
360
+ sdbds氏によるWindows対応のSageAttentionのwheelが https://github.com/sdbds/SageAttention-for-windows で公開されています。triton をインストールし、Python、PyTorch、CUDAのバージョンが一致する場合は、[Releases](https://github.com/sdbds/SageAttention-for-windows/releases)からビルド済みwheelをダウンロードしてインストールすることが可能です。sdbds氏に感謝します。
361
+
362
+ 参考までに、以下は、SageAttentionをビルドしインストールするための簡単な手順です。Microsoft Visual C++ 再頒布可能パッケージを最新にする必要があるかもしれません。
363
+
364
+ 1. Pythonのバージョンに応じたtriton 3.1.0のwhellを[こちら](https://github.com/woct0rdho/triton-windows/releases/tag/v3.1.0-windows.post5)からダウンロードしてインストールします。
365
+
366
+ 2. Microsoft Visual Studio 2022かBuild Tools for Visual Studio 2022を、C++のビルドができるよう設定し、インストールします。(上のRedditの投稿を参照してください)。
367
+
368
+ 3. 任意のフォルダにSageAttentionのリポジトリをクローンします。
369
+ ```shell
370
+ git clone https://github.com/thu-ml/SageAttention.git
371
+ ```
372
+
373
+ なお `git clone https://github.com/sdbds/SageAttention-for-windows.git` で、前述のsdbds氏のリポジトリを使用することで、手順4.を省略できます。
374
+
375
+ 4. `SageAttention/csrc`フォルダ内の`math.cuh`を開き、71行目と146行目の `ushort` を `unsigned short` に変更し��保存します。
376
+
377
+ 5. スタートメニューから Visual Studio 2022 内の `x64 Native Tools Command Prompt for VS 2022` を選択してコマンドプロンプトを開きます。
378
+
379
+ 6. venvを有効にし、SageAttentionのフォルダに移動して以下のコマンドを実行します。DISTUTILSが設定されていない、のようなエラーが出た場合は `set DISTUTILS_USE_SDK=1`としてから再度実行してください。
380
+ ```shell
381
+ python setup.py install
382
+ ```
383
+
384
+ 以上でSageAttentionのインストールが完了です。
385
+
386
+ ### PyTorchのバージョンについて
387
+
388
+ `--attn_mode`に`torch`を指定する場合、2.5.1以降のPyTorchを使用してください(それより前のバージョンでは生成される動画が真っ黒になるようです)。
389
+
390
+ 古いバージョンを使う場合、xformersやSageAttentionを使用してください。
391
+
392
+ ## 免責事項
393
+
394
+ このリポジトリは非公式であり、公式のHunyuanVideoリポジトリとは関係ありません。また、このリポジトリは開発中で、実験的なものです。テストおよびフィードバックを歓迎しますが、以下の点にご注意ください:
395
+
396
+ - 実際の稼働環境での動作を意図したものではありません
397
+ - 機能やAPIは予告なく変更されることがあります
398
+ - いくつもの機能が未検証です
399
+ - 動画学習機能はまだ開発中です
400
+
401
+ 問題やバグについては、以下の情報とともにIssueを作成してください:
402
+
403
+ - 問題の詳細な説明
404
+ - 再現手順
405
+ - 環境の詳細(OS、GPU、VRAM、Pythonバージョンなど)
406
+ - 関連するエラーメッセージやログ
407
+
408
+ ## コントリビューションについて
409
+
410
+ コントリビューションを歓迎します。ただし、以下にご注意ください:
411
+
412
+ - メンテナーのリソースが限られているため、PRのレビューやマージには時間がかかる場合があります
413
+ - 大きな変更に取り組む前には、議論のためのIssueを作成してください
414
+ - PRに関して:
415
+ - 変更は焦点を絞り、適度なサイズにしてください
416
+ - 明確な説明をお願いします
417
+ - 既存のコードスタイルに従ってください
418
+ - ドキュメントが更新されていることを確認してください
419
+
420
+ ## ライセンス
421
+
422
+ `hunyuan_model`ディレクトリ以下のコードは、[HunyuanVideo](https://github.com/Tencent/HunyuanVideo)のコードを一部改変して使用しているため、そちらのライセンスに従います。
423
+
424
+ `wan`ディレクトリ以下のコードは、[Wan2.1](https://github.com/Wan-Video/Wan2.1)のコードを一部改変して使用しています。ライセンスはApache License 2.0です。
425
+
426
+ 他のコードはApache License 2.0に従います。一部Diffusersのコードをコピー、改変して使用しています。
README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Simple GUI for [Musubi Tuner](https://github.com/kohya-ss/musubi-tuner) (Wan 2.1 models only)
2
+
3
+
4
+ # How to use GUI
5
+
6
+ - Download the repository by running in the command line:
7
+ `git clone https://github.com/Kvento/musubi-tuner-wan-gui`
8
+
9
+ - To open the GUI just run `Start_Wan_GUI.bat`.
10
+ - All settings can be saved and loaded using the "**Load Settings**" and "**Save Setting**" buttons.
11
+ - More info about settings see in [Wan2.1 documentation](./docs/wan.md), [Advanced Configuration](./docs/advanced_config.md#fp8-quantization), [Dataset configuration guide](./dataset/dataset_config.md).
12
+
13
+
14
+ ![Preview](docs/Preview.png)
15
+
16
+
17
+
18
+
19
+
20
+ # Miscellaneous
21
+
22
+
23
+ ## SageAttention Installation
24
+
25
+ sdbsd has provided a Windows-compatible SageAttention implementation and pre-built wheels here: https://github.com/sdbds/SageAttention-for-windows. After installing triton, if your Python, PyTorch, and CUDA versions match, you can download and install the pre-built wheel from the [Releases](https://github.com/sdbds/SageAttention-for-windows/releases) page. Thanks to sdbsd for this contribution.
26
+
27
+ For reference, the build and installation instructions are as follows. You may need to update Microsoft Visual C++ Redistributable to the latest version.
28
+
29
+ 1. Download and install triton 3.1.0 wheel matching your Python version from [here](https://github.com/woct0rdho/triton-windows/releases/tag/v3.1.0-windows.post5).
30
+
31
+ 2. Install Microsoft Visual Studio 2022 or Build Tools for Visual Studio 2022, configured for C++ builds.
32
+
33
+ 3. Clone the SageAttention repository in your preferred directory:
34
+ ```shell
35
+ git clone https://github.com/thu-ml/SageAttention.git
36
+ ```
37
+
38
+ You can skip step 4 by using the sdbsd repository mentioned above by `git clone https://github.com/sdbds/SageAttention-for-windows.git`.
39
+
40
+ 4. Open `math.cuh` in the `SageAttention/csrc` folder and change `ushort` to `unsigned short` on lines 71 and 146, then save.
41
+
42
+ 5. Open `x64 Native Tools Command Prompt for VS 2022` from the Start menu under Visual Studio 2022.
43
+
44
+ 6. Activate your venv, navigate to the SageAttention folder, and run the following command. If you get a DISTUTILS not configured error, set `set DISTUTILS_USE_SDK=1` and try again:
45
+ ```shell
46
+ python setup.py install
47
+ ```
48
+
49
+ This completes the SageAttention installation.
50
+
51
+ ### PyTorch version
52
+
53
+ If you specify `torch` for `--attn_mode`, use PyTorch 2.5.1 or later (earlier versions may result in black videos).
54
+
55
+ If you use an earlier version, use xformers or SageAttention.
56
+
57
+
58
+ # License
59
+
60
+ Code under the `hunyuan_model` directory is modified from [HunyuanVideo](https://github.com/Tencent/HunyuanVideo) and follows their license.
61
+
62
+ Code under the `wan` directory is modified from [Wan2.1](https://github.com/Wan-Video/Wan2.1). The license is under the Apache License 2.0.
63
+
64
+ Other code is under the Apache License 2.0. Some code is copied and modified from Diffusers.
Start_Wan_GUI.bat ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @echo off
2
+ setlocal
3
+
4
+ :: Specify the path to your Python script
5
+ set SCRIPT_PATH=wan_lora_trainer_gui.py
6
+
7
+ :: Check if Python is installed
8
+ echo Checking for Python...
9
+ python --version >nul 2>&1
10
+ if %errorlevel% neq 0 (
11
+ echo Python not found. Automatic installation is not possible via bat file.
12
+ echo Please install Python manually from the official website: https://www.python.org/
13
+ pause
14
+ exit /b 1
15
+ )
16
+
17
+ :: Check for pip (tool for installing Python packages)
18
+ echo Checking for pip...
19
+ python -m ensurepip >nul 2>&1
20
+ python -m pip --version >nul 2>&1
21
+ if %errorlevel% neq 0 (
22
+ echo pip not found. Installing pip...
23
+ python -m ensurepip --upgrade
24
+ python -m pip install --upgrade pip
25
+ if %errorlevel% neq 0 (
26
+ echo Failed to install pip. Please check your Python installation.
27
+ pause
28
+ exit /b 1
29
+ )
30
+ )
31
+
32
+ :: Check for tkinter
33
+ echo Checking for tkinter...
34
+ python -c "import tkinter" >nul 2>&1
35
+ if %errorlevel% neq 0 (
36
+ echo tkinter module not found. Attempting to install...
37
+ python -m pip install tk
38
+ if %errorlevel% neq 0 (
39
+ echo Failed to install tkinter. There might be an issue with permissions.
40
+ pause
41
+ exit /b 1
42
+ )
43
+ )
44
+
45
+ :: Run the script
46
+ echo All dependencies are installed. Running the script...
47
+ start /min python %SCRIPT_PATH%
48
+ if %errorlevel% neq 0 (
49
+ echo An error occurred while running the script.
50
+ pause
51
+ exit /b 1
52
+ )
53
+
54
+ echo Script executed successfully.
cache_latents.py ADDED
@@ -0,0 +1,281 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import os
3
+ import glob
4
+ from typing import Optional, Union
5
+
6
+ import numpy as np
7
+ import torch
8
+ from tqdm import tqdm
9
+
10
+ from dataset import config_utils
11
+ from dataset.config_utils import BlueprintGenerator, ConfigSanitizer
12
+ from PIL import Image
13
+
14
+ import logging
15
+
16
+ from dataset.image_video_dataset import BaseDataset, ItemInfo, save_latent_cache, ARCHITECTURE_HUNYUAN_VIDEO
17
+ from hunyuan_model.vae import load_vae
18
+ from hunyuan_model.autoencoder_kl_causal_3d import AutoencoderKLCausal3D
19
+ from utils.model_utils import str_to_dtype
20
+
21
+ logger = logging.getLogger(__name__)
22
+ logging.basicConfig(level=logging.INFO)
23
+
24
+
25
+ def show_image(image: Union[list[Union[Image.Image, np.ndarray], Union[Image.Image, np.ndarray]]]) -> int:
26
+ import cv2
27
+
28
+ imgs = (
29
+ [image]
30
+ if (isinstance(image, np.ndarray) and len(image.shape) == 3) or isinstance(image, Image.Image)
31
+ else [image[0], image[-1]]
32
+ )
33
+ if len(imgs) > 1:
34
+ print(f"Number of images: {len(image)}")
35
+ for i, img in enumerate(imgs):
36
+ if len(imgs) > 1:
37
+ print(f"{'First' if i == 0 else 'Last'} image: {img.shape}")
38
+ else:
39
+ print(f"Image: {img.shape}")
40
+ cv2_img = np.array(img) if isinstance(img, Image.Image) else img
41
+ cv2_img = cv2.cvtColor(cv2_img, cv2.COLOR_RGB2BGR)
42
+ cv2.imshow("image", cv2_img)
43
+ k = cv2.waitKey(0)
44
+ cv2.destroyAllWindows()
45
+ if k == ord("q") or k == ord("d"):
46
+ return k
47
+ return k
48
+
49
+
50
+ def show_console(
51
+ image: Union[list[Union[Image.Image, np.ndarray], Union[Image.Image, np.ndarray]]],
52
+ width: int,
53
+ back: str,
54
+ interactive: bool = False,
55
+ ) -> int:
56
+ from ascii_magic import from_pillow_image, Back
57
+
58
+ back = None
59
+ if back is not None:
60
+ back = getattr(Back, back.upper())
61
+
62
+ k = None
63
+ imgs = (
64
+ [image]
65
+ if (isinstance(image, np.ndarray) and len(image.shape) == 3) or isinstance(image, Image.Image)
66
+ else [image[0], image[-1]]
67
+ )
68
+ if len(imgs) > 1:
69
+ print(f"Number of images: {len(image)}")
70
+ for i, img in enumerate(imgs):
71
+ if len(imgs) > 1:
72
+ print(f"{'First' if i == 0 else 'Last'} image: {img.shape}")
73
+ else:
74
+ print(f"Image: {img.shape}")
75
+ pil_img = img if isinstance(img, Image.Image) else Image.fromarray(img)
76
+ ascii_img = from_pillow_image(pil_img)
77
+ ascii_img.to_terminal(columns=width, back=back)
78
+
79
+ if interactive:
80
+ k = input("Press q to quit, d to next dataset, other key to next: ")
81
+ if k == "q" or k == "d":
82
+ return ord(k)
83
+
84
+ if not interactive:
85
+ return ord(" ")
86
+ return ord(k) if k else ord(" ")
87
+
88
+
89
+ def show_datasets(
90
+ datasets: list[BaseDataset], debug_mode: str, console_width: int, console_back: str, console_num_images: Optional[int]
91
+ ):
92
+ print(f"d: next dataset, q: quit")
93
+
94
+ num_workers = max(1, os.cpu_count() - 1)
95
+ for i, dataset in enumerate(datasets):
96
+ print(f"Dataset [{i}]")
97
+ batch_index = 0
98
+ num_images_to_show = console_num_images
99
+ k = None
100
+ for key, batch in dataset.retrieve_latent_cache_batches(num_workers):
101
+ print(f"bucket resolution: {key}, count: {len(batch)}")
102
+ for j, item_info in enumerate(batch):
103
+ item_info: ItemInfo
104
+ print(f"{batch_index}-{j}: {item_info}")
105
+ if debug_mode == "image":
106
+ k = show_image(item_info.content)
107
+ elif debug_mode == "console":
108
+ k = show_console(item_info.content, console_width, console_back, console_num_images is None)
109
+ if num_images_to_show is not None:
110
+ num_images_to_show -= 1
111
+ if num_images_to_show == 0:
112
+ k = ord("d") # next dataset
113
+
114
+ if k == ord("q"):
115
+ return
116
+ elif k == ord("d"):
117
+ break
118
+ if k == ord("d"):
119
+ break
120
+ batch_index += 1
121
+
122
+
123
+ def encode_and_save_batch(vae: AutoencoderKLCausal3D, batch: list[ItemInfo]):
124
+ contents = torch.stack([torch.from_numpy(item.content) for item in batch])
125
+ if len(contents.shape) == 4:
126
+ contents = contents.unsqueeze(1) # B, H, W, C -> B, F, H, W, C
127
+
128
+ contents = contents.permute(0, 4, 1, 2, 3).contiguous() # B, C, F, H, W
129
+ contents = contents.to(vae.device, dtype=vae.dtype)
130
+ contents = contents / 127.5 - 1.0 # normalize to [-1, 1]
131
+
132
+ h, w = contents.shape[3], contents.shape[4]
133
+ if h < 8 or w < 8:
134
+ item = batch[0] # other items should have the same size
135
+ raise ValueError(f"Image or video size too small: {item.item_key} and {len(batch) - 1} more, size: {item.original_size}")
136
+
137
+ # print(f"encode batch: {contents.shape}")
138
+ with torch.no_grad():
139
+ latent = vae.encode(contents).latent_dist.sample()
140
+ # latent = latent * vae.config.scaling_factor
141
+
142
+ # # debug: decode and save
143
+ # with torch.no_grad():
144
+ # latent_to_decode = latent / vae.config.scaling_factor
145
+ # images = vae.decode(latent_to_decode, return_dict=False)[0]
146
+ # images = (images / 2 + 0.5).clamp(0, 1)
147
+ # images = images.cpu().float().numpy()
148
+ # images = (images * 255).astype(np.uint8)
149
+ # images = images.transpose(0, 2, 3, 4, 1) # B, C, F, H, W -> B, F, H, W, C
150
+ # for b in range(images.shape[0]):
151
+ # for f in range(images.shape[1]):
152
+ # fln = os.path.splitext(os.path.basename(batch[b].item_key))[0]
153
+ # img = Image.fromarray(images[b, f])
154
+ # img.save(f"./logs/decode_{fln}_{b}_{f:03d}.jpg")
155
+
156
+ for item, l in zip(batch, latent):
157
+ # print(f"save latent cache: {item.latent_cache_path}, latent shape: {l.shape}")
158
+ save_latent_cache(item, l)
159
+
160
+
161
+ def encode_datasets(datasets: list[BaseDataset], encode: callable, args: argparse.Namespace):
162
+ num_workers = args.num_workers if args.num_workers is not None else max(1, os.cpu_count() - 1)
163
+ for i, dataset in enumerate(datasets):
164
+ logger.info(f"Encoding dataset [{i}]")
165
+ all_latent_cache_paths = []
166
+ for _, batch in tqdm(dataset.retrieve_latent_cache_batches(num_workers)):
167
+ all_latent_cache_paths.extend([item.latent_cache_path for item in batch])
168
+
169
+ if args.skip_existing:
170
+ filtered_batch = [item for item in batch if not os.path.exists(item.latent_cache_path)]
171
+ if len(filtered_batch) == 0:
172
+ continue
173
+ batch = filtered_batch
174
+
175
+ bs = args.batch_size if args.batch_size is not None else len(batch)
176
+ for i in range(0, len(batch), bs):
177
+ encode(batch[i : i + bs])
178
+
179
+ # normalize paths
180
+ all_latent_cache_paths = [os.path.normpath(p) for p in all_latent_cache_paths]
181
+ all_latent_cache_paths = set(all_latent_cache_paths)
182
+
183
+ # remove old cache files not in the dataset
184
+ all_cache_files = dataset.get_all_latent_cache_files()
185
+ for cache_file in all_cache_files:
186
+ if os.path.normpath(cache_file) not in all_latent_cache_paths:
187
+ if args.keep_cache:
188
+ logger.info(f"Keep cache file not in the dataset: {cache_file}")
189
+ else:
190
+ os.remove(cache_file)
191
+ logger.info(f"Removed old cache file: {cache_file}")
192
+
193
+
194
+ def main(args):
195
+ device = args.device if args.device is not None else "cuda" if torch.cuda.is_available() else "cpu"
196
+ device = torch.device(device)
197
+
198
+ # Load dataset config
199
+ blueprint_generator = BlueprintGenerator(ConfigSanitizer())
200
+ logger.info(f"Load dataset config from {args.dataset_config}")
201
+ user_config = config_utils.load_user_config(args.dataset_config)
202
+ blueprint = blueprint_generator.generate(user_config, args, architecture=ARCHITECTURE_HUNYUAN_VIDEO)
203
+ train_dataset_group = config_utils.generate_dataset_group_by_blueprint(blueprint.dataset_group)
204
+
205
+ datasets = train_dataset_group.datasets
206
+
207
+ if args.debug_mode is not None:
208
+ show_datasets(datasets, args.debug_mode, args.console_width, args.console_back, args.console_num_images)
209
+ return
210
+
211
+ assert args.vae is not None, "vae checkpoint is required"
212
+
213
+ # Load VAE model: HunyuanVideo VAE model is float16
214
+ vae_dtype = torch.float16 if args.vae_dtype is None else str_to_dtype(args.vae_dtype)
215
+ vae, _, s_ratio, t_ratio = load_vae(vae_dtype=vae_dtype, device=device, vae_path=args.vae)
216
+ vae.eval()
217
+ logger.info(f"Loaded VAE: {vae.config}, dtype: {vae.dtype}")
218
+
219
+ if args.vae_chunk_size is not None:
220
+ vae.set_chunk_size_for_causal_conv_3d(args.vae_chunk_size)
221
+ logger.info(f"Set chunk_size to {args.vae_chunk_size} for CausalConv3d in VAE")
222
+ if args.vae_spatial_tile_sample_min_size is not None:
223
+ vae.enable_spatial_tiling(True)
224
+ vae.tile_sample_min_size = args.vae_spatial_tile_sample_min_size
225
+ vae.tile_latent_min_size = args.vae_spatial_tile_sample_min_size // 8
226
+ elif args.vae_tiling:
227
+ vae.enable_spatial_tiling(True)
228
+
229
+ # Encode images
230
+ def encode(one_batch: list[ItemInfo]):
231
+ encode_and_save_batch(vae, one_batch)
232
+
233
+ encode_datasets(datasets, encode, args)
234
+
235
+
236
+ def setup_parser_common() -> argparse.ArgumentParser:
237
+ parser = argparse.ArgumentParser()
238
+
239
+ parser.add_argument("--dataset_config", type=str, required=True, help="path to dataset config .toml file")
240
+ parser.add_argument("--vae", type=str, required=False, default=None, help="path to vae checkpoint")
241
+ parser.add_argument("--vae_dtype", type=str, default=None, help="data type for VAE, default is float16")
242
+ parser.add_argument("--device", type=str, default=None, help="device to use, default is cuda if available")
243
+ parser.add_argument(
244
+ "--batch_size", type=int, default=None, help="batch size, override dataset config if dataset batch size > this"
245
+ )
246
+ parser.add_argument("--num_workers", type=int, default=None, help="number of workers for dataset. default is cpu count-1")
247
+ parser.add_argument("--skip_existing", action="store_true", help="skip existing cache files")
248
+ parser.add_argument("--keep_cache", action="store_true", help="keep cache files not in dataset")
249
+ parser.add_argument("--debug_mode", type=str, default=None, choices=["image", "console"], help="debug mode")
250
+ parser.add_argument("--console_width", type=int, default=80, help="debug mode: console width")
251
+ parser.add_argument(
252
+ "--console_back", type=str, default=None, help="debug mode: console background color, one of ascii_magic.Back"
253
+ )
254
+ parser.add_argument(
255
+ "--console_num_images",
256
+ type=int,
257
+ default=None,
258
+ help="debug mode: not interactive, number of images to show for each dataset",
259
+ )
260
+ return parser
261
+
262
+
263
+ def hv_setup_parser(parser: argparse.ArgumentParser) -> argparse.ArgumentParser:
264
+ parser.add_argument(
265
+ "--vae_tiling",
266
+ action="store_true",
267
+ help="enable spatial tiling for VAE, default is False. If vae_spatial_tile_sample_min_size is set, this is automatically enabled",
268
+ )
269
+ parser.add_argument("--vae_chunk_size", type=int, default=None, help="chunk size for CausalConv3d in VAE")
270
+ parser.add_argument(
271
+ "--vae_spatial_tile_sample_min_size", type=int, default=None, help="spatial tile sample min size for VAE, default 256"
272
+ )
273
+ return parser
274
+
275
+
276
+ if __name__ == "__main__":
277
+ parser = setup_parser_common()
278
+ parser = hv_setup_parser(parser)
279
+
280
+ args = parser.parse_args()
281
+ main(args)
cache_text_encoder_outputs.py ADDED
@@ -0,0 +1,214 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import os
3
+ from typing import Optional, Union
4
+
5
+ import numpy as np
6
+ import torch
7
+ from tqdm import tqdm
8
+
9
+ from dataset import config_utils
10
+ from dataset.config_utils import BlueprintGenerator, ConfigSanitizer
11
+ import accelerate
12
+
13
+ from dataset.image_video_dataset import ARCHITECTURE_HUNYUAN_VIDEO, BaseDataset, ItemInfo, save_text_encoder_output_cache
14
+ from hunyuan_model import text_encoder as text_encoder_module
15
+ from hunyuan_model.text_encoder import TextEncoder
16
+
17
+ import logging
18
+
19
+ from utils.model_utils import str_to_dtype
20
+
21
+ logger = logging.getLogger(__name__)
22
+ logging.basicConfig(level=logging.INFO)
23
+
24
+
25
+ def encode_prompt(text_encoder: TextEncoder, prompt: Union[str, list[str]]):
26
+ data_type = "video" # video only, image is not supported
27
+ text_inputs = text_encoder.text2tokens(prompt, data_type=data_type)
28
+
29
+ with torch.no_grad():
30
+ prompt_outputs = text_encoder.encode(text_inputs, data_type=data_type)
31
+
32
+ return prompt_outputs.hidden_state, prompt_outputs.attention_mask
33
+
34
+
35
+ def encode_and_save_batch(
36
+ text_encoder: TextEncoder, batch: list[ItemInfo], is_llm: bool, accelerator: Optional[accelerate.Accelerator]
37
+ ):
38
+ prompts = [item.caption for item in batch]
39
+ # print(prompts)
40
+
41
+ # encode prompt
42
+ if accelerator is not None:
43
+ with accelerator.autocast():
44
+ prompt_embeds, prompt_mask = encode_prompt(text_encoder, prompts)
45
+ else:
46
+ prompt_embeds, prompt_mask = encode_prompt(text_encoder, prompts)
47
+
48
+ # # convert to fp16 if needed
49
+ # if prompt_embeds.dtype == torch.float32 and text_encoder.dtype != torch.float32:
50
+ # prompt_embeds = prompt_embeds.to(text_encoder.dtype)
51
+
52
+ # save prompt cache
53
+ for item, embed, mask in zip(batch, prompt_embeds, prompt_mask):
54
+ save_text_encoder_output_cache(item, embed, mask, is_llm)
55
+
56
+
57
+ def prepare_cache_files_and_paths(datasets: list[BaseDataset]):
58
+ all_cache_files_for_dataset = [] # exisiting cache files
59
+ all_cache_paths_for_dataset = [] # all cache paths in the dataset
60
+ for dataset in datasets:
61
+ all_cache_files = [os.path.normpath(file) for file in dataset.get_all_text_encoder_output_cache_files()]
62
+ all_cache_files = set(all_cache_files)
63
+ all_cache_files_for_dataset.append(all_cache_files)
64
+
65
+ all_cache_paths_for_dataset.append(set())
66
+ return all_cache_files_for_dataset, all_cache_paths_for_dataset
67
+
68
+
69
+ def process_text_encoder_batches(
70
+ num_workers: Optional[int],
71
+ skip_existing: bool,
72
+ batch_size: int,
73
+ datasets: list[BaseDataset],
74
+ all_cache_files_for_dataset: list[set],
75
+ all_cache_paths_for_dataset: list[set],
76
+ encode: callable,
77
+ ):
78
+ num_workers = num_workers if num_workers is not None else max(1, os.cpu_count() - 1)
79
+ for i, dataset in enumerate(datasets):
80
+ logger.info(f"Encoding dataset [{i}]")
81
+ all_cache_files = all_cache_files_for_dataset[i]
82
+ all_cache_paths = all_cache_paths_for_dataset[i]
83
+ for batch in tqdm(dataset.retrieve_text_encoder_output_cache_batches(num_workers)):
84
+ # update cache files (it's ok if we update it multiple times)
85
+ all_cache_paths.update([os.path.normpath(item.text_encoder_output_cache_path) for item in batch])
86
+
87
+ # skip existing cache files
88
+ if skip_existing:
89
+ filtered_batch = [
90
+ item for item in batch if not os.path.normpath(item.text_encoder_output_cache_path) in all_cache_files
91
+ ]
92
+ # print(f"Filtered {len(batch) - len(filtered_batch)} existing cache files")
93
+ if len(filtered_batch) == 0:
94
+ continue
95
+ batch = filtered_batch
96
+
97
+ bs = batch_size if batch_size is not None else len(batch)
98
+ for i in range(0, len(batch), bs):
99
+ encode(batch[i : i + bs])
100
+
101
+
102
+ def post_process_cache_files(
103
+ datasets: list[BaseDataset], all_cache_files_for_dataset: list[set], all_cache_paths_for_dataset: list[set]
104
+ ):
105
+ for i, dataset in enumerate(datasets):
106
+ all_cache_files = all_cache_files_for_dataset[i]
107
+ all_cache_paths = all_cache_paths_for_dataset[i]
108
+ for cache_file in all_cache_files:
109
+ if cache_file not in all_cache_paths:
110
+ if args.keep_cache:
111
+ logger.info(f"Keep cache file not in the dataset: {cache_file}")
112
+ else:
113
+ os.remove(cache_file)
114
+ logger.info(f"Removed old cache file: {cache_file}")
115
+
116
+
117
+ def main(args):
118
+ device = args.device if args.device is not None else "cuda" if torch.cuda.is_available() else "cpu"
119
+ device = torch.device(device)
120
+
121
+ # Load dataset config
122
+ blueprint_generator = BlueprintGenerator(ConfigSanitizer())
123
+ logger.info(f"Load dataset config from {args.dataset_config}")
124
+ user_config = config_utils.load_user_config(args.dataset_config)
125
+ blueprint = blueprint_generator.generate(user_config, args, architecture=ARCHITECTURE_HUNYUAN_VIDEO)
126
+ train_dataset_group = config_utils.generate_dataset_group_by_blueprint(blueprint.dataset_group)
127
+
128
+ datasets = train_dataset_group.datasets
129
+
130
+ # define accelerator for fp8 inference
131
+ accelerator = None
132
+ if args.fp8_llm:
133
+ accelerator = accelerate.Accelerator(mixed_precision="fp16")
134
+
135
+ # prepare cache files and paths: all_cache_files_for_dataset = exisiting cache files, all_cache_paths_for_dataset = all cache paths in the dataset
136
+ all_cache_files_for_dataset, all_cache_paths_for_dataset = prepare_cache_files_and_paths(datasets)
137
+
138
+ # Load Text Encoder 1
139
+ text_encoder_dtype = torch.float16 if args.text_encoder_dtype is None else str_to_dtype(args.text_encoder_dtype)
140
+ logger.info(f"loading text encoder 1: {args.text_encoder1}")
141
+ text_encoder_1 = text_encoder_module.load_text_encoder_1(args.text_encoder1, device, args.fp8_llm, text_encoder_dtype)
142
+ text_encoder_1.to(device=device)
143
+
144
+ # Encode with Text Encoder 1 (LLM)
145
+ logger.info("Encoding with Text Encoder 1")
146
+
147
+ def encode_for_text_encoder_1(batch: list[ItemInfo]):
148
+ encode_and_save_batch(text_encoder_1, batch, is_llm=True, accelerator=accelerator)
149
+
150
+ process_text_encoder_batches(
151
+ args.num_workers,
152
+ args.skip_existing,
153
+ args.batch_size,
154
+ datasets,
155
+ all_cache_files_for_dataset,
156
+ all_cache_paths_for_dataset,
157
+ encode_for_text_encoder_1,
158
+ )
159
+ del text_encoder_1
160
+
161
+ # Load Text Encoder 2
162
+ logger.info(f"loading text encoder 2: {args.text_encoder2}")
163
+ text_encoder_2 = text_encoder_module.load_text_encoder_2(args.text_encoder2, device, text_encoder_dtype)
164
+ text_encoder_2.to(device=device)
165
+
166
+ # Encode with Text Encoder 2
167
+ logger.info("Encoding with Text Encoder 2")
168
+
169
+ def encode_for_text_encoder_2(batch: list[ItemInfo]):
170
+ encode_and_save_batch(text_encoder_2, batch, is_llm=False, accelerator=None)
171
+
172
+ process_text_encoder_batches(
173
+ args.num_workers,
174
+ args.skip_existing,
175
+ args.batch_size,
176
+ datasets,
177
+ all_cache_files_for_dataset,
178
+ all_cache_paths_for_dataset,
179
+ encode_for_text_encoder_2,
180
+ )
181
+ del text_encoder_2
182
+
183
+ # remove cache files not in dataset
184
+ post_process_cache_files(datasets, all_cache_files_for_dataset, all_cache_paths_for_dataset)
185
+
186
+
187
+ def setup_parser_common():
188
+ parser = argparse.ArgumentParser()
189
+
190
+ parser.add_argument("--dataset_config", type=str, required=True, help="path to dataset config .toml file")
191
+ parser.add_argument("--device", type=str, default=None, help="device to use, default is cuda if available")
192
+ parser.add_argument(
193
+ "--batch_size", type=int, default=None, help="batch size, override dataset config if dataset batch size > this"
194
+ )
195
+ parser.add_argument("--num_workers", type=int, default=None, help="number of workers for dataset. default is cpu count-1")
196
+ parser.add_argument("--skip_existing", action="store_true", help="skip existing cache files")
197
+ parser.add_argument("--keep_cache", action="store_true", help="keep cache files not in dataset")
198
+ return parser
199
+
200
+
201
+ def hv_setup_parser(parser: argparse.ArgumentParser) -> argparse.ArgumentParser:
202
+ parser.add_argument("--text_encoder1", type=str, required=True, help="Text Encoder 1 directory")
203
+ parser.add_argument("--text_encoder2", type=str, required=True, help="Text Encoder 2 directory")
204
+ parser.add_argument("--text_encoder_dtype", type=str, default=None, help="data type for Text Encoder, default is float16")
205
+ parser.add_argument("--fp8_llm", action="store_true", help="use fp8 for Text Encoder 1 (LLM)")
206
+ return parser
207
+
208
+
209
+ if __name__ == "__main__":
210
+ parser = setup_parser_common()
211
+ parser = hv_setup_parser(parser)
212
+
213
+ args = parser.parse_args()
214
+ main(args)
convert_lora.py ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import torch
3
+ from safetensors.torch import load_file, save_file
4
+ from safetensors import safe_open
5
+ from utils import model_utils
6
+ import logging
7
+
8
+ logger = logging.getLogger(__name__)
9
+ logging.basicConfig(level=logging.INFO)
10
+
11
+
12
+ def convert_from_diffusers(prefix, weights_sd):
13
+ # convert from diffusers(?) to default LoRA
14
+ # Diffusers format: {"diffusion_model.module.name.lora_A.weight": weight, "diffusion_model.module.name.lora_B.weight": weight, ...}
15
+ # default LoRA format: {"prefix_module_name.lora_down.weight": weight, "prefix_module_name.lora_up.weight": weight, ...}
16
+
17
+ # note: Diffusers has no alpha, so alpha is set to rank
18
+ new_weights_sd = {}
19
+ lora_dims = {}
20
+ for key, weight in weights_sd.items():
21
+ diffusers_prefix, key_body = key.split(".", 1)
22
+ if diffusers_prefix != "diffusion_model" and diffusers_prefix != "transformer":
23
+ logger.warning(f"unexpected key: {key} in diffusers format")
24
+ continue
25
+
26
+ new_key = f"{prefix}{key_body}".replace(".", "_").replace("_lora_A_", ".lora_down.").replace("_lora_B_", ".lora_up.")
27
+ new_weights_sd[new_key] = weight
28
+
29
+ lora_name = new_key.split(".")[0] # before first dot
30
+ if lora_name not in lora_dims and "lora_down" in new_key:
31
+ lora_dims[lora_name] = weight.shape[0]
32
+
33
+ # add alpha with rank
34
+ for lora_name, dim in lora_dims.items():
35
+ new_weights_sd[f"{lora_name}.alpha"] = torch.tensor(dim)
36
+ return new_weights_sd
37
+
38
+
39
+ def convert_to_diffusers(prefix, weights_sd):
40
+ # convert from default LoRA to diffusers
41
+
42
+ # get alphas
43
+ lora_alphas = {}
44
+ for key, weight in weights_sd.items():
45
+ if key.startswith(prefix):
46
+ lora_name = key.split(".", 1)[0] # before first dot
47
+ if lora_name not in lora_alphas and "alpha" in key:
48
+ lora_alphas[lora_name] = weight
49
+
50
+ new_weights_sd = {}
51
+ for key, weight in weights_sd.items():
52
+ if key.startswith(prefix):
53
+ if "alpha" in key:
54
+ continue
55
+
56
+ lora_name = key.split(".", 1)[0] # before first dot
57
+
58
+ module_name = lora_name[len(prefix) :] # remove prefix
59
+ module_name = module_name.replace("_", ".") # replace "_" with "."
60
+ if ".cross.attn." in module_name or ".self.attn." in module_name:
61
+ # Wan2.1 lora name to module name: ugly but works
62
+ module_name = module_name.replace("cross.attn", "cross_attn")
63
+ module_name = module_name.replace("self.attn", "self_attn")
64
+ module_name = module_name.replace("k.img", "k_img")
65
+ module_name = module_name.replace("v.img", "v_img")
66
+ else:
67
+ # HunyuanVideo lora name to module name: ugly but works
68
+ module_name = module_name.replace("double.blocks.", "double_blocks.")
69
+ module_name = module_name.replace("single.blocks.", "single_blocks.")
70
+ module_name = module_name.replace("img.", "img_")
71
+ module_name = module_name.replace("txt.", "txt_")
72
+ module_name = module_name.replace("attn.", "attn_")
73
+ diffusers_prefix = "diffusion_model"
74
+ if "lora_down" in key:
75
+ new_key = f"{diffusers_prefix}.{module_name}.lora_A.weight"
76
+ dim = weight.shape[0]
77
+ elif "lora_up" in key:
78
+ new_key = f"{diffusers_prefix}.{module_name}.lora_B.weight"
79
+ dim = weight.shape[1]
80
+ else:
81
+ logger.warning(f"unexpected key: {key} in default LoRA format")
82
+ continue
83
+
84
+ # scale weight by alpha using float16
85
+ if lora_name in lora_alphas:
86
+ scale = lora_alphas[lora_name].half() / dim
87
+ scale = scale.sqrt()
88
+ weight = weight.half() * scale
89
+ else:
90
+ logger.warning(f"missing alpha for {lora_name}")
91
+
92
+ new_weights_sd[new_key] = weight
93
+
94
+ return new_weights_sd
95
+
96
+
97
+ def convert(input_file, output_file, target_format):
98
+ logger.info(f"loading {input_file}")
99
+ weights_sd = load_file(input_file)
100
+ with safe_open(input_file, framework="pt") as f:
101
+ metadata = f.metadata()
102
+
103
+ logger.info(f"converting to {target_format}")
104
+ prefix = "lora_unet_"
105
+ if target_format == "default":
106
+ new_weights_sd = convert_from_diffusers(prefix, weights_sd)
107
+ metadata = metadata or {}
108
+ model_utils.precalculate_safetensors_hashes(new_weights_sd, metadata)
109
+ elif target_format == "other":
110
+ new_weights_sd = convert_to_diffusers(prefix, weights_sd)
111
+ else:
112
+ raise ValueError(f"unknown target format: {target_format}")
113
+
114
+ logger.info(f"saving to {output_file}")
115
+ save_file(new_weights_sd, output_file, metadata=metadata)
116
+
117
+ logger.info("done")
118
+
119
+
120
+ def parse_args():
121
+ parser = argparse.ArgumentParser(description="Convert LoRA weights between default and other formats")
122
+ parser.add_argument("--input", type=str, required=True, help="input model file")
123
+ parser.add_argument("--output", type=str, required=True, help="output model file")
124
+ parser.add_argument("--target", type=str, required=True, choices=["other", "default"], help="target format")
125
+ args = parser.parse_args()
126
+ return args
127
+
128
+
129
+ if __name__ == "__main__":
130
+ args = parse_args()
131
+ convert(args.input, args.output, args.target)
dataset/__init__.py ADDED
File without changes
dataset/config_utils.py ADDED
@@ -0,0 +1,372 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ from dataclasses import (
3
+ asdict,
4
+ dataclass,
5
+ )
6
+ import functools
7
+ import random
8
+ from textwrap import dedent, indent
9
+ import json
10
+ from pathlib import Path
11
+
12
+ # from toolz import curry
13
+ from typing import Dict, List, Optional, Sequence, Tuple, Union
14
+
15
+ import toml
16
+ import voluptuous
17
+ from voluptuous import Any, ExactSequence, MultipleInvalid, Object, Schema
18
+
19
+ from .image_video_dataset import DatasetGroup, ImageDataset, VideoDataset
20
+
21
+ import logging
22
+
23
+ logger = logging.getLogger(__name__)
24
+ logging.basicConfig(level=logging.INFO)
25
+
26
+
27
+ @dataclass
28
+ class BaseDatasetParams:
29
+ resolution: Tuple[int, int] = (960, 544)
30
+ enable_bucket: bool = False
31
+ bucket_no_upscale: bool = False
32
+ caption_extension: Optional[str] = None
33
+ batch_size: int = 1
34
+ num_repeats: int = 1
35
+ cache_directory: Optional[str] = None
36
+ debug_dataset: bool = False
37
+ architecture: str = "no_default" # short style like "hv" or "wan"
38
+
39
+
40
+ @dataclass
41
+ class ImageDatasetParams(BaseDatasetParams):
42
+ image_directory: Optional[str] = None
43
+ image_jsonl_file: Optional[str] = None
44
+
45
+
46
+ @dataclass
47
+ class VideoDatasetParams(BaseDatasetParams):
48
+ video_directory: Optional[str] = None
49
+ video_jsonl_file: Optional[str] = None
50
+ target_frames: Sequence[int] = (1,)
51
+ frame_extraction: Optional[str] = "head"
52
+ frame_stride: Optional[int] = 1
53
+ frame_sample: Optional[int] = 1
54
+
55
+
56
+ @dataclass
57
+ class DatasetBlueprint:
58
+ is_image_dataset: bool
59
+ params: Union[ImageDatasetParams, VideoDatasetParams]
60
+
61
+
62
+ @dataclass
63
+ class DatasetGroupBlueprint:
64
+ datasets: Sequence[DatasetBlueprint]
65
+
66
+
67
+ @dataclass
68
+ class Blueprint:
69
+ dataset_group: DatasetGroupBlueprint
70
+
71
+
72
+ class ConfigSanitizer:
73
+ # @curry
74
+ @staticmethod
75
+ def __validate_and_convert_twodim(klass, value: Sequence) -> Tuple:
76
+ Schema(ExactSequence([klass, klass]))(value)
77
+ return tuple(value)
78
+
79
+ # @curry
80
+ @staticmethod
81
+ def __validate_and_convert_scalar_or_twodim(klass, value: Union[float, Sequence]) -> Tuple:
82
+ Schema(Any(klass, ExactSequence([klass, klass])))(value)
83
+ try:
84
+ Schema(klass)(value)
85
+ return (value, value)
86
+ except:
87
+ return ConfigSanitizer.__validate_and_convert_twodim(klass, value)
88
+
89
+ # datasets schema
90
+ DATASET_ASCENDABLE_SCHEMA = {
91
+ "caption_extension": str,
92
+ "batch_size": int,
93
+ "num_repeats": int,
94
+ "resolution": functools.partial(__validate_and_convert_scalar_or_twodim.__func__, int),
95
+ "enable_bucket": bool,
96
+ "bucket_no_upscale": bool,
97
+ }
98
+ IMAGE_DATASET_DISTINCT_SCHEMA = {
99
+ "image_directory": str,
100
+ "image_jsonl_file": str,
101
+ "cache_directory": str,
102
+ }
103
+ VIDEO_DATASET_DISTINCT_SCHEMA = {
104
+ "video_directory": str,
105
+ "video_jsonl_file": str,
106
+ "target_frames": [int],
107
+ "frame_extraction": str,
108
+ "frame_stride": int,
109
+ "frame_sample": int,
110
+ "cache_directory": str,
111
+ }
112
+
113
+ # options handled by argparse but not handled by user config
114
+ ARGPARSE_SPECIFIC_SCHEMA = {
115
+ "debug_dataset": bool,
116
+ }
117
+
118
+ def __init__(self) -> None:
119
+ self.image_dataset_schema = self.__merge_dict(
120
+ self.DATASET_ASCENDABLE_SCHEMA,
121
+ self.IMAGE_DATASET_DISTINCT_SCHEMA,
122
+ )
123
+ self.video_dataset_schema = self.__merge_dict(
124
+ self.DATASET_ASCENDABLE_SCHEMA,
125
+ self.VIDEO_DATASET_DISTINCT_SCHEMA,
126
+ )
127
+
128
+ def validate_flex_dataset(dataset_config: dict):
129
+ if "target_frames" in dataset_config:
130
+ return Schema(self.video_dataset_schema)(dataset_config)
131
+ else:
132
+ return Schema(self.image_dataset_schema)(dataset_config)
133
+
134
+ self.dataset_schema = validate_flex_dataset
135
+
136
+ self.general_schema = self.__merge_dict(
137
+ self.DATASET_ASCENDABLE_SCHEMA,
138
+ )
139
+ self.user_config_validator = Schema(
140
+ {
141
+ "general": self.general_schema,
142
+ "datasets": [self.dataset_schema],
143
+ }
144
+ )
145
+ self.argparse_schema = self.__merge_dict(
146
+ self.ARGPARSE_SPECIFIC_SCHEMA,
147
+ )
148
+ self.argparse_config_validator = Schema(Object(self.argparse_schema), extra=voluptuous.ALLOW_EXTRA)
149
+
150
+ def sanitize_user_config(self, user_config: dict) -> dict:
151
+ try:
152
+ return self.user_config_validator(user_config)
153
+ except MultipleInvalid:
154
+ # TODO: clarify the error message
155
+ logger.error("Invalid user config / ユーザ設定の形式が正しくないようです")
156
+ raise
157
+
158
+ # NOTE: In nature, argument parser result is not needed to be sanitize
159
+ # However this will help us to detect program bug
160
+ def sanitize_argparse_namespace(self, argparse_namespace: argparse.Namespace) -> argparse.Namespace:
161
+ try:
162
+ return self.argparse_config_validator(argparse_namespace)
163
+ except MultipleInvalid:
164
+ # XXX: this should be a bug
165
+ logger.error(
166
+ "Invalid cmdline parsed arguments. This should be a bug. / コマンドラインのパース結果が正しくないようです。プログラムのバグの可能性が高いです。"
167
+ )
168
+ raise
169
+
170
+ # NOTE: value would be overwritten by latter dict if there is already the same key
171
+ @staticmethod
172
+ def __merge_dict(*dict_list: dict) -> dict:
173
+ merged = {}
174
+ for schema in dict_list:
175
+ # merged |= schema
176
+ for k, v in schema.items():
177
+ merged[k] = v
178
+ return merged
179
+
180
+
181
+ class BlueprintGenerator:
182
+ BLUEPRINT_PARAM_NAME_TO_CONFIG_OPTNAME = {}
183
+
184
+ def __init__(self, sanitizer: ConfigSanitizer):
185
+ self.sanitizer = sanitizer
186
+
187
+ # runtime_params is for parameters which is only configurable on runtime, such as tokenizer
188
+ def generate(self, user_config: dict, argparse_namespace: argparse.Namespace, **runtime_params) -> Blueprint:
189
+ sanitized_user_config = self.sanitizer.sanitize_user_config(user_config)
190
+ sanitized_argparse_namespace = self.sanitizer.sanitize_argparse_namespace(argparse_namespace)
191
+
192
+ argparse_config = {k: v for k, v in vars(sanitized_argparse_namespace).items() if v is not None}
193
+ general_config = sanitized_user_config.get("general", {})
194
+
195
+ dataset_blueprints = []
196
+ for dataset_config in sanitized_user_config.get("datasets", []):
197
+ is_image_dataset = "target_frames" not in dataset_config
198
+ if is_image_dataset:
199
+ dataset_params_klass = ImageDatasetParams
200
+ else:
201
+ dataset_params_klass = VideoDatasetParams
202
+
203
+ params = self.generate_params_by_fallbacks(
204
+ dataset_params_klass, [dataset_config, general_config, argparse_config, runtime_params]
205
+ )
206
+ dataset_blueprints.append(DatasetBlueprint(is_image_dataset, params))
207
+
208
+ dataset_group_blueprint = DatasetGroupBlueprint(dataset_blueprints)
209
+
210
+ return Blueprint(dataset_group_blueprint)
211
+
212
+ @staticmethod
213
+ def generate_params_by_fallbacks(param_klass, fallbacks: Sequence[dict]):
214
+ name_map = BlueprintGenerator.BLUEPRINT_PARAM_NAME_TO_CONFIG_OPTNAME
215
+ search_value = BlueprintGenerator.search_value
216
+ default_params = asdict(param_klass())
217
+ param_names = default_params.keys()
218
+
219
+ params = {name: search_value(name_map.get(name, name), fallbacks, default_params.get(name)) for name in param_names}
220
+
221
+ return param_klass(**params)
222
+
223
+ @staticmethod
224
+ def search_value(key: str, fallbacks: Sequence[dict], default_value=None):
225
+ for cand in fallbacks:
226
+ value = cand.get(key)
227
+ if value is not None:
228
+ return value
229
+
230
+ return default_value
231
+
232
+
233
+ # if training is True, it will return a dataset group for training, otherwise for caching
234
+ def generate_dataset_group_by_blueprint(dataset_group_blueprint: DatasetGroupBlueprint, training: bool = False) -> DatasetGroup:
235
+ datasets: List[Union[ImageDataset, VideoDataset]] = []
236
+
237
+ for dataset_blueprint in dataset_group_blueprint.datasets:
238
+ if dataset_blueprint.is_image_dataset:
239
+ dataset_klass = ImageDataset
240
+ else:
241
+ dataset_klass = VideoDataset
242
+
243
+ dataset = dataset_klass(**asdict(dataset_blueprint.params))
244
+ datasets.append(dataset)
245
+
246
+ # assertion
247
+ cache_directories = [dataset.cache_directory for dataset in datasets]
248
+ num_of_unique_cache_directories = len(set(cache_directories))
249
+ if num_of_unique_cache_directories != len(cache_directories):
250
+ raise ValueError(
251
+ "cache directory should be unique for each dataset (note that cache directory is image/video directory if not specified)"
252
+ + " / cache directory は各データセットごとに異なる必要があります(指定されていない場合はimage/video directoryが使われるので注意)"
253
+ )
254
+
255
+ # print info
256
+ info = ""
257
+ for i, dataset in enumerate(datasets):
258
+ is_image_dataset = isinstance(dataset, ImageDataset)
259
+ info += dedent(
260
+ f"""\
261
+ [Dataset {i}]
262
+ is_image_dataset: {is_image_dataset}
263
+ resolution: {dataset.resolution}
264
+ batch_size: {dataset.batch_size}
265
+ num_repeats: {dataset.num_repeats}
266
+ caption_extension: "{dataset.caption_extension}"
267
+ enable_bucket: {dataset.enable_bucket}
268
+ bucket_no_upscale: {dataset.bucket_no_upscale}
269
+ cache_directory: "{dataset.cache_directory}"
270
+ debug_dataset: {dataset.debug_dataset}
271
+ """
272
+ )
273
+
274
+ if is_image_dataset:
275
+ info += indent(
276
+ dedent(
277
+ f"""\
278
+ image_directory: "{dataset.image_directory}"
279
+ image_jsonl_file: "{dataset.image_jsonl_file}"
280
+ \n"""
281
+ ),
282
+ " ",
283
+ )
284
+ else:
285
+ info += indent(
286
+ dedent(
287
+ f"""\
288
+ video_directory: "{dataset.video_directory}"
289
+ video_jsonl_file: "{dataset.video_jsonl_file}"
290
+ target_frames: {dataset.target_frames}
291
+ frame_extraction: {dataset.frame_extraction}
292
+ frame_stride: {dataset.frame_stride}
293
+ frame_sample: {dataset.frame_sample}
294
+ \n"""
295
+ ),
296
+ " ",
297
+ )
298
+ logger.info(f"{info}")
299
+
300
+ # make buckets first because it determines the length of dataset
301
+ # and set the same seed for all datasets
302
+ seed = random.randint(0, 2**31) # actual seed is seed + epoch_no
303
+ for i, dataset in enumerate(datasets):
304
+ # logger.info(f"[Dataset {i}]")
305
+ dataset.set_seed(seed)
306
+ if training:
307
+ dataset.prepare_for_training()
308
+
309
+ return DatasetGroup(datasets)
310
+
311
+
312
+ def load_user_config(file: str) -> dict:
313
+ file: Path = Path(file)
314
+ if not file.is_file():
315
+ raise ValueError(f"file not found / ファイルが見つかりません: {file}")
316
+
317
+ if file.name.lower().endswith(".json"):
318
+ try:
319
+ with open(file, "r", encoding="utf-8") as f:
320
+ config = json.load(f)
321
+ except Exception:
322
+ logger.error(
323
+ f"Error on parsing JSON config file. Please check the format. / JSON 形式の設定ファイルの読み込みに失敗しました。文法が正しいか確認してください。: {file}"
324
+ )
325
+ raise
326
+ elif file.name.lower().endswith(".toml"):
327
+ try:
328
+ config = toml.load(file)
329
+ except Exception:
330
+ logger.error(
331
+ f"Error on parsing TOML config file. Please check the format. / TOML 形式の設定ファイルの読み込みに失敗しました。文法が正しいか確認してください。: {file}"
332
+ )
333
+ raise
334
+ else:
335
+ raise ValueError(f"not supported config file format / 対応していない設定ファイルの形式です: {file}")
336
+
337
+ return config
338
+
339
+
340
+ # for config test
341
+ if __name__ == "__main__":
342
+ parser = argparse.ArgumentParser()
343
+ parser.add_argument("dataset_config")
344
+ config_args, remain = parser.parse_known_args()
345
+
346
+ parser = argparse.ArgumentParser()
347
+ parser.add_argument("--debug_dataset", action="store_true")
348
+ argparse_namespace = parser.parse_args(remain)
349
+
350
+ logger.info("[argparse_namespace]")
351
+ logger.info(f"{vars(argparse_namespace)}")
352
+
353
+ user_config = load_user_config(config_args.dataset_config)
354
+
355
+ logger.info("")
356
+ logger.info("[user_config]")
357
+ logger.info(f"{user_config}")
358
+
359
+ sanitizer = ConfigSanitizer()
360
+ sanitized_user_config = sanitizer.sanitize_user_config(user_config)
361
+
362
+ logger.info("")
363
+ logger.info("[sanitized_user_config]")
364
+ logger.info(f"{sanitized_user_config}")
365
+
366
+ blueprint = BlueprintGenerator(sanitizer).generate(user_config, argparse_namespace)
367
+
368
+ logger.info("")
369
+ logger.info("[blueprint]")
370
+ logger.info(f"{blueprint}")
371
+
372
+ dataset_group = generate_dataset_group_by_blueprint(blueprint.dataset_group)
dataset/dataset_config.md ADDED
@@ -0,0 +1,378 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ > 📝 Click on the language section to expand / 言語をクリックして展開
2
+
3
+ ## Dataset Configuration
4
+
5
+ Please create a TOML file for dataset configuration.
6
+
7
+ Image and video datasets are supported. The configuration file can include multiple datasets, either image or video datasets, with caption text files or metadata JSONL files.
8
+
9
+ The cache directory must be different for each dataset.
10
+
11
+ <details>
12
+ <summary>日本語</summary>
13
+
14
+ データセットの設定を行うためのTOMLファイルを作成してください。
15
+
16
+ 画像データセットと動画データセットがサポートされています。設定ファイルには、画像または動画データセットを複数含めることができます。キャプションテキストファイルまたはメタデータJSONLファイルを使用できます。
17
+
18
+ キャッシュディレクトリは、各データセットごとに異なるディレクトリである必要があります。
19
+ </details>
20
+
21
+ ### Sample for Image Dataset with Caption Text Files
22
+
23
+ ```toml
24
+ # resolution, caption_extension, batch_size, num_repeats, enable_bucket, bucket_no_upscale should be set in either general or datasets
25
+ # otherwise, the default values will be used for each item
26
+
27
+ # general configurations
28
+ [general]
29
+ resolution = [960, 544]
30
+ caption_extension = ".txt"
31
+ batch_size = 1
32
+ enable_bucket = true
33
+ bucket_no_upscale = false
34
+
35
+ [[datasets]]
36
+ image_directory = "/path/to/image_dir"
37
+ cache_directory = "/path/to/cache_directory"
38
+ num_repeats = 1 # optional, default is 1. Number of times to repeat the dataset. Useful to balance the multiple datasets with different sizes.
39
+
40
+ # other datasets can be added here. each dataset can have different configurations
41
+ ```
42
+
43
+ `cache_directory` is optional, default is None to use the same directory as the image directory. However, we recommend to set the cache directory to avoid accidental sharing of the cache files between different datasets.
44
+
45
+ `num_repeats` is also available. It is optional, default is 1 (no repeat). It repeats the images (or videos) that many times to expand the dataset. For example, if `num_repeats = 2` and there are 20 images in the dataset, each image will be duplicated twice (with the same caption) to have a total of 40 images. It is useful to balance the multiple datasets with different sizes.
46
+
47
+ <details>
48
+ <summary>日本語</summary>
49
+
50
+ `cache_directory` はオプションです。デフォルトは画像ディレクトリと同じディレクトリに設定されます。ただし、異なるデータセット間でキャッシュファイルが共有されるのを防ぐために、明示的に別のキャッシュディレクトリを設定することをお勧めします。
51
+
52
+ `num_repeats` はオプションで、デフォルトは 1 です(繰り返しなし)。画像(や動画)を、その回数だけ単純に繰り返してデータセットを拡張します。たとえば`num_repeats = 2`としたとき、画像20枚のデータセットなら、各画像が2枚ずつ(同一のキャプションで)計40枚存在した場合と同じになります。異なるデータ数のデータセット間でバランスを取るために使用可能です。
53
+
54
+ resolution, caption_extension, batch_size, num_repeats, enable_bucket, bucket_no_upscale は general または datasets のどちらかに設定してください。省略時は各項目のデフォルト値が使用されます。
55
+
56
+ `[[datasets]]`以下を追加することで、他のデータセットを追加できます。各データセットには異なる設定を持てます。
57
+ </details>
58
+
59
+ ### Sample for Image Dataset with Metadata JSONL File
60
+
61
+ ```toml
62
+ # resolution, batch_size, num_repeats, enable_bucket, bucket_no_upscale should be set in either general or datasets
63
+ # caption_extension is not required for metadata jsonl file
64
+ # cache_directory is required for each dataset with metadata jsonl file
65
+
66
+ # general configurations
67
+ [general]
68
+ resolution = [960, 544]
69
+ batch_size = 1
70
+ enable_bucket = true
71
+ bucket_no_upscale = false
72
+
73
+ [[datasets]]
74
+ image_jsonl_file = "/path/to/metadata.jsonl"
75
+ cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
76
+ num_repeats = 1 # optional, default is 1. Same as above.
77
+
78
+ # other datasets can be added here. each dataset can have different configurations
79
+ ```
80
+
81
+ JSONL file format for metadata:
82
+
83
+ ```json
84
+ {"image_path": "/path/to/image1.jpg", "caption": "A caption for image1"}
85
+ {"image_path": "/path/to/image2.jpg", "caption": "A caption for image2"}
86
+ ```
87
+
88
+ <details>
89
+ <summary>日本語</summary>
90
+
91
+ resolution, batch_size, num_repeats, enable_bucket, bucket_no_upscale は general または datasets のどちらかに設定してください。省略時は各項目のデフォルト値が使用されます。
92
+
93
+ metadata jsonl ファイルを使用する場合、caption_extension は必要ありません。また、cache_directory は必須です。
94
+
95
+ キャプションによるデータセットと同様に、複数のデータセットを追加���きます。各データセットには異なる設定を持てます。
96
+ </details>
97
+
98
+
99
+ ### Sample for Video Dataset with Caption Text Files
100
+
101
+ ```toml
102
+ # resolution, caption_extension, target_frames, frame_extraction, frame_stride, frame_sample,
103
+ # batch_size, num_repeats, enable_bucket, bucket_no_upscale should be set in either general or datasets
104
+ # num_repeats is also available for video dataset, example is not shown here
105
+
106
+ # general configurations
107
+ [general]
108
+ resolution = [960, 544]
109
+ caption_extension = ".txt"
110
+ batch_size = 1
111
+ enable_bucket = true
112
+ bucket_no_upscale = false
113
+
114
+ [[datasets]]
115
+ video_directory = "/path/to/video_dir"
116
+ cache_directory = "/path/to/cache_directory" # recommended to set cache directory
117
+ target_frames = [1, 25, 45]
118
+ frame_extraction = "head"
119
+
120
+ # other datasets can be added here. each dataset can have different configurations
121
+ ```
122
+
123
+ __In HunyuanVideo and Wan2.1, the number of `target_frames` must be "N*4+1" (N=0,1,2,...).__
124
+
125
+ <details>
126
+ <summary>日本語</summary>
127
+
128
+ resolution, caption_extension, target_frames, frame_extraction, frame_stride, frame_sample, batch_size, num_repeats, enable_bucket, bucket_no_upscale は general または datasets のどちらかに設定してください。
129
+
130
+ __HunyuanVideoおよびWan2.1では、target_framesの数値は「N*4+1」である必要があります。__
131
+
132
+ 他の注意事項は画像データセットと同様です。
133
+ </details>
134
+
135
+ ### Sample for Video Dataset with Metadata JSONL File
136
+
137
+ ```toml
138
+ # resolution, target_frames, frame_extraction, frame_stride, frame_sample,
139
+ # batch_size, num_repeats, enable_bucket, bucket_no_upscale should be set in either general or datasets
140
+ # caption_extension is not required for metadata jsonl file
141
+ # cache_directory is required for each dataset with metadata jsonl file
142
+
143
+ # general configurations
144
+ [general]
145
+ resolution = [960, 544]
146
+ batch_size = 1
147
+ enable_bucket = true
148
+ bucket_no_upscale = false
149
+
150
+ [[datasets]]
151
+ video_jsonl_file = "/path/to/metadata.jsonl"
152
+ target_frames = [1, 25, 45]
153
+ frame_extraction = "head"
154
+ cache_directory = "/path/to/cache_directory_head"
155
+
156
+ # same metadata jsonl file can be used for multiple datasets
157
+ [[datasets]]
158
+ video_jsonl_file = "/path/to/metadata.jsonl"
159
+ target_frames = [1]
160
+ frame_stride = 10
161
+ cache_directory = "/path/to/cache_directory_stride"
162
+
163
+ # other datasets can be added here. each dataset can have different configurations
164
+ ```
165
+
166
+ JSONL file format for metadata:
167
+
168
+ ```json
169
+ {"video_path": "/path/to/video1.mp4", "caption": "A caption for video1"}
170
+ {"video_path": "/path/to/video2.mp4", "caption": "A caption for video2"}
171
+ ```
172
+
173
+ <details>
174
+ <summary>日本語</summary>
175
+
176
+ resolution, target_frames, frame_extraction, frame_stride, frame_sample, batch_size, num_repeats, enable_bucket, bucket_no_upscale は general または datasets のどちらかに設定してください。
177
+
178
+ metadata jsonl ファイルを使用する場合、caption_extension は必要ありません。また、cache_directory は必須です。
179
+
180
+ 他の注意事項は今までのデータセットと同様です。
181
+ </details>
182
+
183
+ ### frame_extraction Options
184
+
185
+ - `head`: Extract the first N frames from the video.
186
+ - `chunk`: Extract frames by splitting the video into chunks of N frames.
187
+ - `slide`: Extract frames from the video with a stride of `frame_stride`.
188
+ - `uniform`: Extract `frame_sample` samples uniformly from the video.
189
+
190
+ For example, consider a video with 40 frames. The following diagrams illustrate each extraction:
191
+
192
+ <details>
193
+ <summary>日本語</summary>
194
+
195
+ - `head`: 動画から最初のNフレームを抽出します。
196
+ - `chunk`: 動画をNフレームずつに分割してフレームを抽出します。
197
+ - `slide`: `frame_stride`に指定したフレームごとに動画からNフレームを抽出します。
198
+ - `uniform`: 動画から一定間隔で、`frame_sample`個のNフレームを抽出します。
199
+
200
+ 例えば、40フレームの動画を例とした抽出について、以下の図で説明します。
201
+ </details>
202
+
203
+ ```
204
+ Original Video, 40 frames: x = frame, o = no frame
205
+ oooooooooooooooooooooooooooooooooooooooo
206
+
207
+ head, target_frames = [1, 13, 25] -> extract head frames:
208
+ xooooooooooooooooooooooooooooooooooooooo
209
+ xxxxxxxxxxxxxooooooooooooooooooooooooooo
210
+ xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo
211
+
212
+ chunk, target_frames = [13, 25] -> extract frames by splitting into chunks, into 13 and 25 frames:
213
+ xxxxxxxxxxxxxooooooooooooooooooooooooooo
214
+ oooooooooooooxxxxxxxxxxxxxoooooooooooooo
215
+ ooooooooooooooooooooooooooxxxxxxxxxxxxxo
216
+ xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo
217
+
218
+ NOTE: Please do not include 1 in target_frames if you are using the frame_extraction "chunk". It will make the all frames to be extracted.
219
+ 注: frame_extraction "chunk" を使用する場合、target_frames に 1 を含めないでください。全てのフレームが抽出されてしまいます。
220
+
221
+ slide, target_frames = [1, 13, 25], frame_stride = 10 -> extract N frames with a stride of 10:
222
+ xooooooooooooooooooooooooooooooooooooooo
223
+ ooooooooooxooooooooooooooooooooooooooooo
224
+ ooooooooooooooooooooxooooooooooooooooooo
225
+ ooooooooooooooooooooooooooooooxooooooooo
226
+ xxxxxxxxxxxxxooooooooooooooooooooooooooo
227
+ ooooooooooxxxxxxxxxxxxxooooooooooooooooo
228
+ ooooooooooooooooooooxxxxxxxxxxxxxooooooo
229
+ xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo
230
+ ooooooooooxxxxxxxxxxxxxxxxxxxxxxxxxooooo
231
+
232
+ uniform, target_frames =[1, 13, 25], frame_sample = 4 -> extract `frame_sample` samples uniformly, N frames each:
233
+ xooooooooooooooooooooooooooooooooooooooo
234
+ oooooooooooooxoooooooooooooooooooooooooo
235
+ oooooooooooooooooooooooooxoooooooooooooo
236
+ ooooooooooooooooooooooooooooooooooooooox
237
+ xxxxxxxxxxxxxooooooooooooooooooooooooooo
238
+ oooooooooxxxxxxxxxxxxxoooooooooooooooooo
239
+ ooooooooooooooooooxxxxxxxxxxxxxooooooooo
240
+ oooooooooooooooooooooooooooxxxxxxxxxxxxx
241
+ xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo
242
+ oooooxxxxxxxxxxxxxxxxxxxxxxxxxoooooooooo
243
+ ooooooooooxxxxxxxxxxxxxxxxxxxxxxxxxooooo
244
+ oooooooooooooooxxxxxxxxxxxxxxxxxxxxxxxxx
245
+ ```
246
+
247
+ ## Specifications
248
+
249
+ ```toml
250
+ # general configurations
251
+ [general]
252
+ resolution = [960, 544] # optional, [W, H], default is None. This is the default resolution for all datasets
253
+ caption_extension = ".txt" # optional, default is None. This is the default caption extension for all datasets
254
+ batch_size = 1 # optional, default is 1. This is the default batch size for all datasets
255
+ num_repeats = 1 # optional, default is 1. Number of times to repeat the dataset. Useful to balance the multiple datasets with different sizes.
256
+ enable_bucket = true # optional, default is false. Enable bucketing for datasets
257
+ bucket_no_upscale = false # optional, default is false. Disable upscaling for bucketing. Ignored if enable_bucket is false
258
+
259
+ ### Image Dataset
260
+
261
+ # sample image dataset with caption text files
262
+ [[datasets]]
263
+ image_directory = "/path/to/image_dir"
264
+ caption_extension = ".txt" # required for caption text files, if general caption extension is not set
265
+ resolution = [960, 544] # required if general resolution is not set
266
+ batch_size = 4 # optional, overwrite the default batch size
267
+ num_repeats = 1 # optional, overwrite the default num_repeats
268
+ enable_bucket = false # optional, overwrite the default bucketing setting
269
+ bucket_no_upscale = true # optional, overwrite the default bucketing setting
270
+ cache_directory = "/path/to/cache_directory" # optional, default is None to use the same directory as the image directory. NOTE: caching is always enabled
271
+
272
+ # sample image dataset with metadata **jsonl** file
273
+ [[datasets]]
274
+ image_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of image files and captions
275
+ resolution = [960, 544] # required if general resolution is not set
276
+ cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
277
+ # caption_extension is not required for metadata jsonl file
278
+ # batch_size, num_repeats, enable_bucket, bucket_no_upscale are also available for metadata jsonl file
279
+
280
+ ### Video Dataset
281
+
282
+ # sample video dataset with caption text files
283
+ [[datasets]]
284
+ video_directory = "/path/to/video_dir"
285
+ caption_extension = ".txt" # required for caption text files, if general caption extension is not set
286
+ resolution = [960, 544] # required if general resolution is not set
287
+
288
+ target_frames = [1, 25, 79] # required for video dataset. list of video lengths to extract frames. each element must be N*4+1 (N=0,1,2,...)
289
+
290
+ # NOTE: Please do not include 1 in target_frames if you are using the frame_extraction "chunk". It will make the all frames to be extracted.
291
+
292
+ frame_extraction = "head" # optional, "head" or "chunk", "slide", "uniform". Default is "head"
293
+ frame_stride = 1 # optional, default is 1, available for "slide" frame extraction
294
+ frame_sample = 4 # optional, default is 1 (same as "head"), available for "uniform" frame extraction
295
+ # batch_size, num_repeats, enable_bucket, bucket_no_upscale, cache_directory are also available for video dataset
296
+
297
+ # sample video dataset with metadata jsonl file
298
+ [[datasets]]
299
+ video_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of video files and captions
300
+
301
+ target_frames = [1, 79]
302
+
303
+ cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
304
+ # frame_extraction, frame_stride, frame_sample are also available for metadata jsonl file
305
+ ```
306
+
307
+ <!--
308
+ # sample image dataset with lance
309
+ [[datasets]]
310
+ image_lance_dataset = "/path/to/lance_dataset"
311
+ resolution = [960, 544] # required if general resolution is not set
312
+ # batch_size, enable_bucket, bucket_no_upscale, cache_directory are also available for lance dataset
313
+ -->
314
+
315
+ The metadata with .json file will be supported in the near future.
316
+
317
+
318
+
319
+ <!--
320
+
321
+ ```toml
322
+ # general configurations
323
+ [general]
324
+ resolution = [960, 544] # optional, [W, H], default is None. This is the default resolution for all datasets
325
+ caption_extension = ".txt" # optional, default is None. This is the default caption extension for all datasets
326
+ batch_size = 1 # optional, default is 1. This is the default batch size for all datasets
327
+ enable_bucket = true # optional, default is false. Enable bucketing for datasets
328
+ bucket_no_upscale = false # optional, default is false. Disable upscaling for bucketing. Ignored if enable_bucket is false
329
+
330
+ # sample image dataset with caption text files
331
+ [[datasets]]
332
+ image_directory = "/path/to/image_dir"
333
+ caption_extension = ".txt" # required for caption text files, if general caption extension is not set
334
+ resolution = [960, 544] # required if general resolution is not set
335
+ batch_size = 4 # optional, overwrite the default batch size
336
+ enable_bucket = false # optional, overwrite the default bucketing setting
337
+ bucket_no_upscale = true # optional, overwrite the default bucketing setting
338
+ cache_directory = "/path/to/cache_directory" # optional, default is None to use the same directory as the image directory. NOTE: caching is always enabled
339
+
340
+ # sample image dataset with metadata **jsonl** file
341
+ [[datasets]]
342
+ image_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of image files and captions
343
+ resolution = [960, 544] # required if general resolution is not set
344
+ cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
345
+ # caption_extension is not required for metadata jsonl file
346
+ # batch_size, enable_bucket, bucket_no_upscale are also available for metadata jsonl file
347
+
348
+ # sample video dataset with caption text files
349
+ [[datasets]]
350
+ video_directory = "/path/to/video_dir"
351
+ caption_extension = ".txt" # required for caption text files, if general caption extension is not set
352
+ resolution = [960, 544] # required if general resolution is not set
353
+ target_frames = [1, 25, 79] # required for video dataset. list of video lengths to extract frames. each element must be N*4+1 (N=0,1,2,...)
354
+ frame_extraction = "head" # optional, "head" or "chunk", "slide", "uniform". Default is "head"
355
+ frame_stride = 1 # optional, default is 1, available for "slide" frame extraction
356
+ frame_sample = 4 # optional, default is 1 (same as "head"), available for "uniform" frame extraction
357
+ # batch_size, enable_bucket, bucket_no_upscale, cache_directory are also available for video dataset
358
+
359
+ # sample video dataset with metadata jsonl file
360
+ [[datasets]]
361
+ video_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of video files and captions
362
+ target_frames = [1, 79]
363
+ cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
364
+ # frame_extraction, frame_stride, frame_sample are also available for metadata jsonl file
365
+ ```
366
+
367
+ # sample image dataset with lance
368
+ [[datasets]]
369
+ image_lance_dataset = "/path/to/lance_dataset"
370
+ resolution = [960, 544] # required if general resolution is not set
371
+ # batch_size, enable_bucket, bucket_no_upscale, cache_directory are also available for lance dataset
372
+
373
+ The metadata with .json file will be supported in the near future.
374
+
375
+
376
+
377
+
378
+ -->
dataset/dataset_example.toml ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # resolution, caption_extension, target_frames, frame_extraction, frame_stride, frame_sample,
2
+ # batch_size, num_repeats, enable_bucket, bucket_no_upscale should be set in either general or datasets
3
+
4
+
5
+ # general configurations
6
+ [general]
7
+ caption_extension = ".txt"
8
+ batch_size = 1
9
+ enable_bucket = true
10
+ bucket_no_upscale = false
11
+
12
+
13
+ # dataset configurations
14
+ [[datasets]]
15
+ resolution = [160, 160]
16
+ video_directory = "D:/musubi-tuner-wan-gui/dataset/My_Best_Lora_dataset/video" # path to your video dataset
17
+ cache_directory = "D:/musubi-tuner-wan-gui/dataset/My_Best_Lora_dataset/cache/video" # recommended to set cache directory
18
+ target_frames = [17, 33, 65]
19
+ frame_extraction = "chunk"
20
+ num_repeats = 1
21
+
22
+ # head: Extract the first N frames from the video.
23
+ # chunk: Extract frames by splitting the video into chunks of N frames.
24
+ # slide: Extract frames from the video with a stride of frame_stride.
25
+ # uniform: Extract frame_sample samples uniformly from the video.
26
+ # NOTE: Please do not include 1 in target_frames if you are using the frame_extraction "chunk". It will make the all frames to be extracted.
27
+
28
+ # More info here: https://github.com/Kvento/musubi-tuner-wan-gui/blob/main/dataset/dataset_config.md
29
+
30
+
31
+
32
+
33
+
34
+
35
+
36
+ # other datasets can be added here. each dataset can have different configurations
37
+
38
+ # If you don't need image training, remove this code:
39
+ # dataset configurations
40
+ [[datasets]]
41
+ resolution = [256, 256]
42
+ image_directory = "D:/musubi-tuner-wan-gui/dataset/My_Best_Lora_dataset/images" # path to your image dataset
43
+ cache_directory = "D:/musubi-tuner-wan-gui/dataset/My_Best_Lora_dataset/cache/images" # recommended to set cache directory
44
+ num_repeats = 1
dataset/ebPhotos-001/20190915_193922.jpg ADDED

Git LFS Details

  • SHA256: 11fc9ec911df776045021f6c8660a8d02e8a4bd7028df5221e31665f1f41068c
  • Pointer size: 131 Bytes
  • Size of remote file: 997 kB
dataset/ebPhotos-001/20190915_193922.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with curly hair smiling wearing a black top standing on a boat. In the background a group of people including children sit on a wooden bench wearing casual clothes. A green suspension bridge spans a river with a cloudy sky above. The woman is in the foreground with the group and bridge in the mid-ground. The boat has a green floor and a cylindrical black structure on the left.
dataset/ebPhotos-001/20190921_182515.jpg ADDED

Git LFS Details

  • SHA256: 62b1f3f3e0b60c133885d3222661241dc7e671695478debcb1ea4d3d7a3d3dbc
  • Pointer size: 132 Bytes
  • Size of remote file: 2.67 MB
dataset/ebPhotos-001/20190921_182515.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with dark skin smiling standing in a hotel room. She has her hair in a neat bun wearing a lace pink sleeveless dress that accentuates her medium-sized breasts and a large green bow at the chest. She accessorizes with a silver necklace bracelet and watch. The room has a patterned carpet wooden door green chair and metal trash can. She stands confidently one hand on her shoulder.
dataset/ebPhotos-001/20190921_182517.jpg ADDED

Git LFS Details

  • SHA256: 58aedcb9de3acb1f13501321f7f6d10171dbcfbf9310a4bb3712116f5c263aea
  • Pointer size: 132 Bytes
  • Size of remote file: 2.82 MB
dataset/ebPhotos-001/20190921_182517.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with dark skin standing in a hotel room. She has a slim curvy figure wearing a pink lace dress that accentuates her medium-sized breasts. She's smiling with her right hand touching her shoulder and her left hand resting on her hip. She wears a white beaded necklace matching bracelet and a blue watch. Her hair is styled in a neat bun. The background includes a green chair a trash can and two wooden doors with gold handles. The carpet has a
dataset/ebPhotos-001/20220521_222809.jpg ADDED

Git LFS Details

  • SHA256: 8a19a8d2c096250e6ddab028ace80c3fc3c6e69d7f9d4ee1123a781edadf840c
  • Pointer size: 132 Bytes
  • Size of remote file: 4.39 MB
dataset/ebPhotos-001/20220521_222809.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with medium brown skin dark brown eyes and red lipstick wearing a pink checkered shirt her hair in a high bun standing in a cluttered office. background includes a TV showing a collage of images snacks on a shelf a red box papers and a desk with a white plastic bag. she wears a gold pendant necklace. the office has beige walls and wooden furniture.
dataset/ebPhotos-001/20230427_082757.jpg ADDED

Git LFS Details

  • SHA256: 42a5dc8533130c279192f88200861d5aa3c247e83bf35fff17aab3ea982fbeaf
  • Pointer size: 132 Bytes
  • Size of remote file: 1.67 MB
dataset/ebPhotos-001/20230427_082757.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with medium brown skin shoulder-length wavy black hair and brown eyes. She wears a deep purple sleeveless top looking directly at the camera with a neutral expression. The background is a dimly lit indoor space with beige walls and a dark curtain. The image is a close-up focusing on her face and upper torso.
dataset/ebPhotos-001/20230427_082800.jpg ADDED

Git LFS Details

  • SHA256: 213e80f9fddc50f0a5733435b4f60b4426f19b87531ba6fdaff649934bcc09ec
  • Pointer size: 132 Bytes
  • Size of remote file: 1.72 MB
dataset/ebPhotos-001/20230427_082800.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with medium brown skin shoulder-length wavy black hair and brown eyes. She wears a deep purple halter top looking directly at the camera with a slight confident smile. The background is a dimly lit indoor space with a dark curtain on the right and a beige wall on the left. The lighting highlights her natural skin texture and subtle makeup.
dataset/ebPhotos-001/20230427_082805.jpg ADDED

Git LFS Details

  • SHA256: 826b87123a8e51d755c4c781552b2606f600766d463d81822d2513277a3ef354
  • Pointer size: 132 Bytes
  • Size of remote file: 1.78 MB
dataset/ebPhotos-001/20230427_082805.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with medium brown skin shoulder-length wavy black hair and brown eyes. She wears a low-cut sleeveless purple top revealing a hint of cleavage. Her expression is neutral with slightly pursed lips. The background is a dimly lit indoor room with a dark curtain and a partially visible doorway. The lighting highlights her natural skin texture and subtle makeup.
dataset/ebPhotos-001/20230502_185323.jpg ADDED

Git LFS Details

  • SHA256: f8e0d22c9d3a3b325a1bededcd9c6e56223bdce77e38160ed3753e3252a660ec
  • Pointer size: 132 Bytes
  • Size of remote file: 2.68 MB
dataset/ebPhotos-001/20230502_185323.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with medium brown skin wearing a form-fitting white satin dress with thin straps standing in a purple-walled dressing room. She has shoulder-length black hair a necklace and is smiling while dancing. The room has a gray carpet a standing mirror and a pink and purple garment hanging on the left. An "EXIT" sign is visible on the ceiling.
dataset/ebPhotos-001/20230504_193610.jpg ADDED

Git LFS Details

  • SHA256: 7402ec5e5994031d6343fcc0b7636c6469b54e6e5d01ea6747a82817b4b8405b
  • Pointer size: 132 Bytes
  • Size of remote file: 1.8 MB
dataset/ebPhotos-001/20230504_193610.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with curly hair wearing a gray and red track jacket gray leggings and red and black sneakers kneeling on a patterned carpet in a hallway her right hand on her chest left hand on the floor beige walls wooden floor and a door in the background.
dataset/ebPhotos-001/20230504_193624.jpg ADDED

Git LFS Details

  • SHA256: a4ed95ed45601cec31e164da28dd1de6840615945b59370c748a4039ad0b1696
  • Pointer size: 132 Bytes
  • Size of remote file: 2.01 MB
dataset/ebPhotos-001/20230504_193624.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with curly hair wearing a gray Fila jacket with red trim black top gray leggings and red and white Nike sneakers. She's posing in a hallway one leg raised hand on jacket. wooden floor patterned rug beige walls and white door in background. confident stylish athletic.
dataset/ebPhotos-001/20230504_193657.jpg ADDED

Git LFS Details

  • SHA256: 632a50223f8591339d03c63de042f8733973f7bc706426dc34dc1825e1473e8b
  • Pointer size: 132 Bytes
  • Size of remote file: 1.95 MB
dataset/ebPhotos-001/20230504_193657.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with wavy hair wearing a gray and red track jacket black top and black leggings kneeling on one leg in a hallway with wooden floors and a patterned rug. She's wearing red white and gray sneakers. The hallway has white doors and beige walls. She has a confident expression and her right hand is in her jacket pocket. The lighting is warm and soft.
dataset/ebPhotos-001/20230504_193734.jpg ADDED

Git LFS Details

  • SHA256: d7e64f7ea78ef867478bc40c02e41a20386888f19a90b934e350fc328f011624
  • Pointer size: 132 Bytes
  • Size of remote file: 1.96 MB
dataset/ebPhotos-001/20230504_193734.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with long wavy hair kneeling on a wooden floor in a hallway. she wears a gray and red track jacket black leggings and red and white sneakers. her right hand is in her jacket pocket. the hallway has beige walls white doors and a patterned gray rug. the lighting is warm and she looks down at the camera with a slight smile.
dataset/ebPhotos-001/20230504_193750.jpg ADDED

Git LFS Details

  • SHA256: 2b903442db5c5f852b337d82e98977555a2430d72b3112a84978a79c583d2b4b
  • Pointer size: 132 Bytes
  • Size of remote file: 2.04 MB
dataset/ebPhotos-001/20230504_193750.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with medium brown skin long wavy black hair and a slender build kneeling in a hallway. She wears a gray track jacket with red and white accents black leggings and red and black sneakers. She has a necklace with a circular pendant. The hallway has wooden floors a patterned gray rug and white walls with a door and window blinds in the background. She smiles slightly looking at the camera.
dataset/ebPhotos-001/20230504_193805.jpg ADDED

Git LFS Details

  • SHA256: b86fad2aeff5931ca2ff92073e5b583e7f85a41faf6ac6cddd8662f6f65c0bfe
  • Pointer size: 132 Bytes
  • Size of remote file: 1.83 MB
dataset/ebPhotos-001/20230504_193805.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with wavy hair wearing a gray and red jacket black top gray leggings and red and white sneakers kneeling in a hallway with wooden floors and beige walls holding her hair with a necklace visible smiling at the camera.
dataset/ebPhotos-001/20230505_194441.jpg ADDED

Git LFS Details

  • SHA256: 0696b16e283480c7b069770ae1065beb6076a90df6b43dc2abf8ed932599150b
  • Pointer size: 132 Bytes
  • Size of remote file: 1.95 MB
dataset/ebPhotos-001/20230505_194441.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with medium brown skin wearing a form-fitting long-sleeve red dress with a keyhole neckline standing in a narrow hallway. She has shoulder-length wavy black hair and is posing with one hand on her head and the other on the wall. She wears black high heels and has a tattoo on her left thigh. The hallway has beige walls white doors and a wooden step. She looks confident and alluring.
dataset/ebPhotos-001/20230505_194607.jpg ADDED

Git LFS Details

  • SHA256: 215bc59c6e1a69954f0963f72be24a399d485c71b2ca233bdfbe36c10698403c
  • Pointer size: 132 Bytes
  • Size of remote file: 2.08 MB
dataset/ebPhotos-001/20230505_194607.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with dark skin standing in a narrow hallway. She has shoulder-length curly black hair wearing a tight long-sleeve red mini dress with a keyhole neckline revealing moderate cleavage. She is standing with arms outstretched touching the walls wearing black high-heeled shoes. The hallway has beige walls white doors and wooden floors with a patterned rug at the bottom. Recessed ceiling lights illuminate the scene.
dataset/ebPhotos-001/20230505_194707.jpg ADDED

Git LFS Details

  • SHA256: 6ebb33e8ba8924c5bb81606e6a627a4c3991cdba55abd767c04107369ea1330d
  • Pointer size: 132 Bytes
  • Size of remote file: 2.01 MB
dataset/ebPhotos-001/20230505_194707.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with medium brown skin standing in a narrow hallway. she has wavy black hair wears a tight red long-sleeve mini dress with a low neckline black fishnet stockings and black high-heeled sandals. she stands confidently one hand on the wall the other on her hip. the hallway has beige walls white trim and a patterned doormat. a ceiling light illuminates her from above.
dataset/ebPhotos-001/20230505_194729.jpg ADDED

Git LFS Details

  • SHA256: 94cc36ceeb33fa697653233c86183d3e6a24080106b42cfef0bdc76304cc0651
  • Pointer size: 132 Bytes
  • Size of remote file: 2.05 MB
dataset/ebPhotos-001/20230505_194729.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ photo of a beautiful black woman with curly hair wearing a tight red long-sleeve mini dress black high heels and a gold bracelet. She stands in a narrow hallway leaning against a white door showcasing a large tattoo on her right thigh. The hallway has beige walls wooden floor and patterned rug. The lighting is warm and the angle is low emphasizing her confident pose and curvy figure.