WAN 2.2 FP8 I2V - Image-to-Video and Text-to-Video Models

High-quality text-to-video (T2V) and image-to-video (I2V) generation models in FP8 quantized format for memory-efficient deployment on consumer-grade GPUs.

Model Description

WAN 2.2 FP8 is a 14-billion parameter video generation model based on diffusion architecture, optimized with FP8 quantization for efficient deployment. This repository contains FP8 quantized variants that provide excellent quality with significantly reduced VRAM requirements compared to FP16 models (~50% memory reduction).

Key Features:

  • 14B parameter diffusion-based video generation architecture
  • FP8 E4M3FN quantization for memory efficiency
  • Dual noise schedules (high-noise for creativity, low-noise for faithfulness)
  • Support for both text-to-video and image-to-video generation
  • Production-ready .safetensors format

Model Statistics:

  • Total Repository Size: ~56GB
  • Model Architecture: Diffusion transformer (14B parameters)
  • Precision: FP8 E4M3FN quantization
  • Format: .safetensors (secure tensor format)
  • Input: Text prompts or text + images
  • Output: Video sequences (typically 16-24 frames)

Repository Contents

Text-to-Video (T2V) Models

Located in diffusion_models/wan/

Model Size Noise Schedule Use Case
wan22-t2v-14b-fp8-high-scaled.safetensors 14GB High-noise Creative T2V, higher variance outputs
wan22-t2v-14b-fp8-low-scaled.safetensors 14GB Low-noise Faithful T2V, consistent results

Total T2V models: 28GB

Image-to-Video (I2V) Models

Located in diffusion_models/wan/

Model Size Noise Schedule Use Case
wan22-i2v-14b-fp8-high-scaled.safetensors 14GB High-noise Creative I2V, artistic interpretation
wan22-i2v-14b-fp8-low-scaled.safetensors 14GB Low-noise Faithful I2V, accurate reproduction

Total I2V models: 26GB

Hardware Requirements

Model Type Minimum VRAM Recommended VRAM GPU Examples
T2V FP8 16GB 20GB+ RTX 4080, RTX 3090, RTX 4070 Ti Super
I2V FP8 16GB 20GB+ RTX 4080, RTX 3090, RTX 4070 Ti Super

System Requirements:

  • VRAM: 16GB minimum, 20GB+ recommended
  • Disk Space: 56GB for full repository (14GB per model)
  • System RAM: 32GB+ recommended
  • CUDA: 11.8+ or 12.1+
  • PyTorch: 2.1+ with FP8 support
  • diffusers: 0.20+ or compatible library

Compatible GPUs:

  • NVIDIA RTX 4090 (24GB) - Excellent
  • NVIDIA RTX 4080 (16GB) - Good
  • NVIDIA RTX 3090 (24GB) - Excellent
  • NVIDIA RTX 3090 Ti (24GB) - Excellent
  • NVIDIA RTX 4070 Ti Super (16GB) - Good
  • NVIDIA A5000 (24GB) - Excellent

Usage Examples

Text-to-Video Generation (FP8)

from diffusers import DiffusionPipeline
import torch

# Load T2V pipeline with FP8 support
pipe = DiffusionPipeline.from_pretrained(
    "path-to-base-wan22-model",
    torch_dtype=torch.float8_e4m3fn
)

# Load WAN 2.2 FP8 T2V model (low-noise for consistent results)
pipe.unet.from_single_file(
    "E:/huggingface/wan22-fp8-i2v/diffusion_models/wan/wan22-t2v-14b-fp8-low-scaled.safetensors"
)

pipe.to("cuda")

# Generate video from text prompt
video = pipe(
    prompt="a cat walking through a garden, cinematic, high quality",
    num_inference_steps=50,
    num_frames=16,
    guidance_scale=7.5
).frames

# Save video
from diffusers.utils import export_to_video
export_to_video(video, "output_t2v.mp4", fps=8)

Image-to-Video Generation (FP8)

from diffusers import DiffusionPipeline
import torch
from PIL import Image

# Load input image
input_image = Image.open("path/to/your/image.jpg")

# Load I2V pipeline with FP8 support
pipe = DiffusionPipeline.from_pretrained(
    "path-to-base-wan22-model",
    torch_dtype=torch.float8_e4m3fn
)

# Load WAN 2.2 FP8 I2V model (high-noise for creative output)
pipe.unet.from_single_file(
    "E:/huggingface/wan22-fp8-i2v/diffusion_models/wan/wan22-i2v-14b-fp8-high-scaled.safetensors"
)

pipe.to("cuda")

# Generate video from image
video = pipe(
    image=input_image,
    prompt="cinematic camera movement, high quality",
    num_inference_steps=50,
    num_frames=16,
    guidance_scale=7.5
).frames

# Save video
from diffusers.utils import export_to_video
export_to_video(video, "output_i2v.mp4", fps=8)

Advanced: Memory-Efficient Generation

# Enable memory optimizations for 16GB GPUs
pipe.enable_model_cpu_offload()
pipe.enable_xformers_memory_efficient_attention()

# Generate with reduced memory footprint
video = pipe(
    prompt="your prompt here",
    num_inference_steps=50,
    num_frames=12,  # Reduced from 16 for memory savings
    guidance_scale=7.5
).frames

Model Specifications

Architecture Details

  • Model Type: Diffusion transformer for video generation
  • Parameters: 14 billion
  • Precision: FP8 E4M3FN (8-bit floating point)
  • Memory Footprint: ~14GB per model (50% reduction vs FP16)
  • Format: SafeTensors (secure, efficient serialization)

Noise Schedules

High-Noise Models (*-high-scaled.safetensors):

  • Greater noise variance during diffusion process
  • More creative and artistic interpretation
  • Higher output variance and diversity
  • Best for: Abstract content, artistic videos, creative exploration

Low-Noise Models (*-low-scaled.safetensors):

  • Lower noise variance during diffusion process
  • More faithful to input prompts/images
  • More consistent and predictable results
  • Best for: Realistic content, precise control, production use

FP8 Quantization Benefits

  • Memory Efficiency: 50% smaller than FP16 (14GB vs 27GB per model)
  • Speed: Faster inference on GPUs with FP8 tensor cores (RTX 40 series)
  • Quality: Minimal quality degradation compared to FP16
  • Accessibility: Enables deployment on 16GB consumer GPUs
  • Compatibility: Works with standard diffusers pipelines

Performance Tips

Memory Optimization

  1. Enable CPU Offloading: Offload model components to CPU when not in use

    pipe.enable_model_cpu_offload()
    
  2. Enable Attention Optimization: Use xformers for memory-efficient attention

    pipe.enable_xformers_memory_efficient_attention()
    
  3. Reduce Frame Count: Generate fewer frames for memory savings

    num_frames=12  # Instead of 16
    
  4. Sequential CPU Offload: Most aggressive memory savings

    pipe.enable_sequential_cpu_offload()
    

Quality Optimization

  1. Choose Appropriate Noise Schedule:

    • Use low-noise models for realistic, faithful generation
    • Use high-noise models for creative, artistic results
  2. Increase Inference Steps: More steps = better quality (50-100 recommended)

    num_inference_steps=75  # Higher quality, slower
    
  3. Adjust Guidance Scale: Control prompt adherence (7.5 is standard)

    guidance_scale=7.5  # Lower = more creative, Higher = more literal
    

Speed Optimization

  1. Use FP8 on RTX 40 Series: Native tensor core acceleration
  2. Reduce Inference Steps: Faster generation with slight quality trade-off
    num_inference_steps=30  # Faster, lower quality
    
  3. Reduce Frame Count: Fewer frames = faster generation
  4. Enable xformers: Faster attention computation

GPU-Specific Recommendations

  • RTX 40 Series (4080, 4090): Excellent FP8 performance, use native precision
  • RTX 30 Series (3090, 3090 Ti): Good FP8 support, memory-efficient
  • 16GB GPUs: Enable CPU offloading and xformers for best results
  • 24GB GPUs: Can run without optimizations, room for larger batches

Model Selection Guide

Noise Schedule Selection

Content Type Recommended Model Reason
Realistic videos Low-noise Faithful reproduction, consistency
Artistic/abstract High-noise Creative interpretation, variety
Product demos Low-noise Predictable, professional results
Creative exploration High-noise Diverse outputs, experimentation
Production work Low-noise Consistent, reliable results

Task Selection

Task Models Description
Text-to-Video wan22-t2v-* Generate videos from text prompts only
Image-to-Video wan22-i2v-* Animate static images with text guidance

Prompting Guidelines

Effective T2V Prompts

"a cat walking through a garden, cinematic lighting, high quality, 4k"
"drone shot of mountain landscape at sunset, volumetric lighting"
"close-up of coffee being poured, slow motion, professional cinematography"
"time-lapse of city traffic at night, long exposure, urban photography"

Effective I2V Prompts

"cinematic camera movement, smooth motion"
"gentle zoom in, professional cinematography"
"dynamic action, high energy movement"
"subtle animation, natural motion"

Quality Keywords

  • Cinematography: "cinematic", "professional", "high quality", "4k"
  • Lighting: "volumetric lighting", "dramatic lighting", "soft light"
  • Camera: "smooth motion", "stabilized", "professional camera work"
  • Style: "realistic", "photorealistic", "detailed", "sharp"

Intended Uses

Direct Use

  • Content Creation: Video generation for creative projects, advertising, social media
  • Prototyping: Rapid visualization of video concepts and storyboards
  • Research: Academic research in video generation and diffusion models
  • Application Development: Building video generation features in apps and services

Downstream Use

  • Fine-tuning on domain-specific video datasets
  • Integration with video editing and post-production pipelines
  • Custom LoRA development for specialized effects
  • Synthetic data generation for training other AI models

Out-of-Scope Use

The model should NOT be used for:

  • Generating deceptive, harmful, or misleading video content
  • Creating deepfakes or non-consensual content of individuals
  • Producing content violating copyright or intellectual property rights
  • Generating content for harassment, abuse, or discrimination
  • Creating videos for illegal purposes or activities

Limitations

Technical Limitations

  • Temporal Consistency: May produce flickering or motion inconsistencies in long sequences
  • Fine Details: Small objects or intricate textures may lack detail
  • Physical Realism: Generated physics may not follow real-world rules perfectly
  • Text Rendering: Cannot reliably render readable text in generated videos
  • Memory Requirements: Requires 16GB+ VRAM, limiting accessibility
  • Frame Count: Limited to shorter video sequences (typically 16-24 frames)

Content Limitations

  • Training data biases may affect representation of diverse demographics
  • May struggle with uncommon objects, rare scenarios, or niche content
  • Generated content may reflect biases present in training data
  • Complex motions or interactions may be challenging

Bias, Risks, and Limitations

Known Risks

Misuse Risks:

  • Deepfakes: Could be used to create deceptive or misleading content
    • Mitigation: Implement watermarking and content authentication
  • Copyright: May generate content similar to copyrighted material
    • Mitigation: Content filtering and responsible use policies
  • Harmful Content: Could generate inappropriate content
    • Mitigation: Safety filters and content moderation

Ethical Considerations

  • Obtain appropriate permissions before generating videos of identifiable individuals
  • Clearly label AI-generated content to prevent deception
  • Consider environmental impact of compute-intensive inference
  • Respect privacy, consent, and intellectual property rights

Recommendations

  • Implement content moderation and safety filters in production
  • Add watermarks to identify AI-generated content
  • Provide clear disclaimers for AI-generated videos
  • Monitor for misuse and implement usage policies
  • Validate outputs for biases or harmful content

License

This repository uses the "other" license tag. Please check the original WAN 2.2 model repository for specific license terms, usage restrictions, and commercial use permissions.

Citation

If you use WAN 2.2 FP8 in your research or applications, please cite the original model:

@misc{wan22-fp8,
  title={WAN 2.2 FP8: Text-to-Video and Image-to-Video Generation},
  author={WAN Team},
  year={2024},
  howpublished={\url{https://huggingface.co/wan22}},
  note={FP8 quantized variant}
}

Troubleshooting

Out of Memory Errors

Problem: CUDA out of memory during generation

Solutions:

  1. Enable CPU offloading: pipe.enable_model_cpu_offload()
  2. Enable sequential offload: pipe.enable_sequential_cpu_offload()
  3. Reduce frame count: num_frames=12 (instead of 16)
  4. Enable xformers: pipe.enable_xformers_memory_efficient_attention()
  5. Close other GPU applications
  6. Reduce batch size to 1

Quality Issues

Problem: Generated videos have poor quality or artifacts

Solutions:

  1. Try both high-noise and low-noise variants
  2. Increase inference steps to 75-100
  3. Adjust guidance scale (try 6.0-9.0 range)
  4. Improve prompt quality with specific details
  5. Use low-noise models for more consistent results

Slow Generation

Problem: Video generation is too slow

Solutions:

  1. Enable xformers: pipe.enable_xformers_memory_efficient_attention()
  2. Reduce inference steps to 30-40 for testing
  3. Use RTX 40 series GPUs for better FP8 performance
  4. Reduce frame count for faster iteration
  5. Close background applications

Model Loading Issues

Problem: Cannot load model or incorrect format errors

Solutions:

  1. Verify model path is correct with absolute path
  2. Ensure diffusers library supports FP8 (version 0.20+)
  3. Check PyTorch version supports FP8 (2.1+)
  4. Verify CUDA version compatibility (11.8+ or 12.1+)
  5. Use from_single_file() method for safetensors loading

Related Resources

  • WAN 2.2 Official Repository: [Link to official HuggingFace repo]
  • Diffusers Documentation: https://huggingface.co/docs/diffusers
  • FP8 Training Guide: [Link to FP8 documentation]
  • Community Examples: [Link to community resources]

Version History

v1.0 (August 2024)

  • Initial release with 4 FP8 quantized models
  • 2 text-to-video models (high-noise, low-noise)
  • 2 image-to-video models (high-noise, low-noise)
  • Total repository size: ~56GB

Contact

For questions, issues, or contributions:

  • Open an issue in the Hugging Face repository
  • Refer to the original WAN 2.2 model documentation
  • Check community discussions for common questions

Model Card Authors

This model card was created following Hugging Face model card guidelines and best practices for responsible AI documentation.


Last Updated: October 14, 2025 Model Version: WAN 2.2 FP8 I2V v1.0 Repository Type: Quantized Model Weights Total Size: ~56GB (4 models × 14GB each)

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including wangkanai/wan22-fp8-i2v