Wan2GP / docs /MODELS.md
zxymimi23451's picture
Upload 258 files
78360e7 verified
# Models Overview
WanGP supports multiple video generation models, each optimized for different use cases and hardware configurations.
## Wan 2.1 Text2Video Models
Please note that that the term *Text2Video* refers to the underlying Wan architecture but as it has been greatly improved overtime many derived Text2Video models can now generate videos using images.
#### Wan 2.1 Text2Video 1.3B
- **Size**: 1.3 billion parameters
- **VRAM**: 6GB minimum
- **Speed**: Fast generation
- **Quality**: Good quality for the size
- **Best for**: Quick iterations, lower-end hardware
- **Command**: `python wgp.py --t2v-1-3B`
#### Wan 2.1 Text2Video 14B
- **Size**: 14 billion parameters
- **VRAM**: 12GB+ recommended
- **Speed**: Slower but higher quality
- **Quality**: Excellent detail and coherence
- **Best for**: Final production videos
- **Command**: `python wgp.py --t2v-14B`
#### Wan Vace 1.3B
- **Type**: ControlNet for advanced video control
- **VRAM**: 6GB minimum
- **Features**: Motion transfer, object injection, inpainting
- **Best for**: Advanced video manipulation
- **Command**: `python wgp.py --vace-1.3B`
#### Wan Vace 14B
- **Type**: Large ControlNet model
- **VRAM**: 12GB+ recommended
- **Features**: All Vace features with higher quality
- **Best for**: Professional video editing workflows
#### MoviiGen (Experimental)
- **Resolution**: Claims 1080p capability
- **VRAM**: 20GB+ required
- **Speed**: Very slow generation
- **Features**: Should generate cinema like video, specialized for 2.1 / 1 ratios
- **Status**: Experimental, feedback welcome
<BR>
## Wan 2.1 Image-to-Video Models
#### Wan 2.1 Image2Video 14B
- **Size**: 14 billion parameters
- **VRAM**: 12GB+ recommended
- **Speed**: Slower but higher quality
- **Quality**: Excellent detail and coherence
- **Best for**: Most Loras available work with this model
- **Command**: `python wgp.py --i2v-14B`
#### FLF2V
- **Type**: Start/end frame specialist
- **Resolution**: Optimized for 720p
- **Official**: Wan team supported
- **Use case**: Image-to-video with specific endpoints
<BR>
## Wan 2.1 Specialized Models
#### FantasySpeaking
- **Type**: Talking head animation
- **Input**: Voice track + image
- **Works on**: People and objects
- **Use case**: Lip-sync and voice-driven animation
#### Phantom
- **Type**: Person/object transfer
- **Resolution**: Works well at 720p
- **Requirements**: 30+ steps for good results
- **Best for**: Transferring subjects between videos
#### Recam Master
- **Type**: Viewpoint change
- **Requirements**: 81+ frame input videos, 15+ denoising steps
- **Use case**: View same scene from different angles
#### Sky Reels v2
- **Type**: Diffusion Forcing model
- **Specialty**: "Infinite length" videos
- **Features**: High quality continuous generation
<BR>
## Wan Fun InP Models
#### Wan Fun InP 1.3B
- **Size**: 1.3 billion parameters
- **VRAM**: 6GB minimum
- **Quality**: Good for the size, accessible to lower hardware
- **Best for**: Entry-level image animation
- **Command**: `python wgp.py --i2v-1-3B`
#### Wan Fun InP 14B
- **Size**: 14 billion parameters
- **VRAM**: 12GB+ recommended
- **Quality**: Better end image support
- **Limitation**: Existing loras don't work as well
<BR>
## Wan Special Loras
### Safe-Forcing lightx2v Lora
- **Type**: Distilled model (Lora implementation)
- **Speed**: 4-8 steps generation, 2x faster (no classifier free guidance)
- **Compatible**: Works with t2v and i2v Wan 14B models
- **Setup**: Requires Safe-Forcing lightx2v Lora (see [LORAS.md](LORAS.md))
### Causvid Lora
- **Type**: Distilled model (Lora implementation)
- **Speed**: 4-12 steps generation, 2x faster (no classifier free guidance)
- **Compatible**: Works with Wan 14B models
- **Setup**: Requires CausVid Lora (see [LORAS.md](LORAS.md))
<BR>
## Hunyuan Video Models
#### Hunyuan Video Text2Video
- **Quality**: Among the best open source t2v models
- **VRAM**: 12GB+ recommended
- **Speed**: Slower generation but excellent results
- **Features**: Superior text adherence and video quality, up to 10s of video
- **Best for**: High-quality text-to-video generation
#### Hunyuan Video Custom
- **Specialty**: Identity preservation
- **Use case**: Injecting specific people into videos
- **Quality**: Excellent for character consistency
- **Best for**: Character-focused video generation
#### Hunyuan Video Avater
- **Specialty**: Generate up to 15s of high quality speech / song driven Video .
- **Use case**: Injecting specific people into videos
- **Quality**: Excellent for character consistency
- **Best for**: Character-focused video generation, Video synchronized with voice
<BR>
## LTX Video Models
#### LTX Video 13B
- **Specialty**: Long video generation
- **Resolution**: Fast 720p generation
- **VRAM**: Optimized by WanGP (4x reduction in requirements)
- **Best for**: Longer duration videos
#### LTX Video 13B Distilled
- **Speed**: Generate in less than one minute
- **Quality**: Very high quality despite speed
- **Best for**: Rapid prototyping and quick results
<BR>
## Model Selection Guide
### By Hardware (VRAM)
#### 6-8GB VRAM
- Wan 2.1 T2V 1.3B
- Wan Fun InP 1.3B
- Wan Vace 1.3B
#### 10-12GB VRAM
- Wan 2.1 T2V 14B
- Wan Fun InP 14B
- Hunyuan Video (with optimizations)
- LTX Video 13B
#### 16GB+ VRAM
- All models supported
- Longer videos possible
- Higher resolutions
- Multiple simultaneous Loras
#### 20GB+ VRAM
- MoviiGen (experimental 1080p)
- Very long videos
- Maximum quality settings
### By Use Case
#### Quick Prototyping
1. **LTX Video 13B Distilled** - Fastest, high quality
2. **Wan 2.1 T2V 1.3B** - Fast, good quality
3. **CausVid Lora** - 4-12 steps, very fast
#### Best Quality
1. **Hunyuan Video** - Overall best t2v quality
2. **Wan 2.1 T2V 14B** - Excellent Wan quality
3. **Wan Vace 14B** - Best for controlled generation
#### Advanced Control
1. **Wan Vace 14B/1.3B** - Motion transfer, object injection
2. **Phantom** - Person/object transfer
3. **FantasySpeaking** - Voice-driven animation
#### Long Videos
1. **LTX Video 13B** - Specialized for length
2. **Sky Reels v2** - Infinite length videos
3. **Wan Vace + Sliding Windows** - Up to 1 minute
#### Lower Hardware
1. **Wan Fun InP 1.3B** - Image-to-video
2. **Wan 2.1 T2V 1.3B** - Text-to-video
3. **Wan Vace 1.3B** - Advanced control
<BR>
## Performance Comparison
### Speed (Relative)
1. **CausVid Lora** (4-12 steps) - Fastest
2. **LTX Video Distilled** - Very fast
3. **Wan 1.3B models** - Fast
4. **Wan 14B models** - Medium
5. **Hunyuan Video** - Slower
6. **MoviiGen** - Slowest
### Quality (Subjective)
1. **Hunyuan Video** - Highest overall
2. **Wan 14B models** - Excellent
3. **LTX Video models** - Very good
4. **Wan 1.3B models** - Good
5. **CausVid** - Good (varies with steps)
### VRAM Efficiency
1. **Wan 1.3B models** - Most efficient
2. **LTX Video** (with WanGP optimizations)
3. **Wan 14B models**
4. **Hunyuan Video**
5. **MoviiGen** - Least efficient
<BR>
## Model Switching
WanGP allows switching between models without restarting:
1. Use the dropdown menu in the web interface
2. Models are loaded on-demand
3. Previous model is unloaded to save VRAM
4. Settings are preserved when possible
<BR>
## Tips for Model Selection
### First Time Users
Start with **Wan 2.1 T2V 1.3B** to learn the interface and test your hardware.
### Production Work
Use **Hunyuan Video** or **Wan 14B** models for final output quality.
### Experimentation
**CausVid Lora** or **LTX Distilled** for rapid iteration and testing.
### Specialized Tasks
- **VACE** for advanced control
- **FantasySpeaking** for talking heads
- **LTX Video** for long sequences
### Hardware Optimization
Always start with the largest model your VRAM can handle, then optimize settings for speed vs quality based on your needs.