Spaces:
Running
on
Zero
Running
on
Zero
title: VEO3 Free | |
emoji: ๐ | |
colorFrom: blue | |
colorTo: indigo | |
sdk: gradio | |
sdk_version: 5.35.0 | |
app_file: app.py | |
pinned: false | |
short_description: Wan2.1-T2V-14B + Fast 4-step with NAG + Automatic Audio | |
models: | |
- VIDraft/Gemma-3-R1984-4B | |
- google/gemma-3-4b-it | |
- Wan-AI/Wan2.1-T2V-14B-Diffusers | |
- vrgamedevgirl84/Wan14BT2VFusioniX | |
- Kijai/WanVideo_comfy | |
## English Explanation | |
### Overview | |
This is a **VEO3 Free** application - an advanced AI video generation system that combines Wan2.1-T2V-14B model with automatic audio generation capabilities. It creates videos from text descriptions and automatically generates matching audio using MMAudio technology. | |
### Key Features | |
1. **Text-to-Video Generation** | |
- Uses Wan2.1-T2V-14B Diffusion model (14 billion parameters) | |
- Fast 4-step generation with NAG (Noise-Augmented Generation) | |
- Supports various resolutions from 128x128 to 896x896 | |
- Duration: 1-8 seconds at 16 FPS | |
- Cinema-quality output with professional camera movements | |
2. **Automatic Audio Generation** | |
- MMAudio integration for synchronized sound effects | |
- Uses the same text prompt for both video and audio | |
- Configurable audio quality and guidance strength | |
- Optional feature - can be disabled if needed | |
3. **Advanced Controls** | |
- **NAG Scale**: Controls guidance strength (1.0-20.0) | |
- **Inference Steps**: Balances quality vs speed (1-8 steps) | |
- **Seed Control**: For reproducible results | |
- **Negative Prompts**: Specify what to avoid in generation | |
### How It Works | |
1. **Input**: Enter a detailed scene description | |
2. **Video Generation**: The AI creates video frames based on your prompt | |
3. **Audio Synthesis**: Automatically generates matching sound effects | |
4. **Output**: Combined video with synchronized audio | |
### Example Use Cases | |
- Film previews and concept visualization | |
- Music video creation | |
- Advertising content | |
- Creative storytelling | |
- Game cinematics | |
### Technical Details | |
- **GPU Acceleration**: Uses CUDA for fast processing | |
- **Model Architecture**: Transformer-based diffusion model | |
- **Audio Model**: Flow-matching based audio synthesis | |
- **Processing Time**: ~30-70 seconds depending on settings | |
### Tips for Best Results | |
- Use detailed, cinematic descriptions | |
- Include camera movements and visual style | |
- Specify lighting, colors, and atmosphere | |
- Add sound descriptions for better audio matching | |
- Higher NAG scale = more prompt adherence | |
--- | |
## ํ๊ธ ์ค๋ช | |
### ๊ฐ์ | |
**VEO3 Free**๋ Wan2.1-T2V-14B ๋ชจ๋ธ๊ณผ ์๋ ์ค๋์ค ์์ฑ ๊ธฐ๋ฅ์ ๊ฒฐํฉํ ๊ณ ๊ธ AI ๋น๋์ค ์์ฑ ์์คํ ์ ๋๋ค. ํ ์คํธ ์ค๋ช ์ผ๋ก๋ถํฐ ๋น๋์ค๋ฅผ ์์ฑํ๊ณ MMAudio ๊ธฐ์ ์ ์ฌ์ฉํด ์๋์ผ๋ก ์ผ์นํ๋ ์ค๋์ค๋ฅผ ์์ฑํฉ๋๋ค. | |
### ์ฃผ์ ๊ธฐ๋ฅ | |
1. **ํ ์คํธ-๋น๋์ค ๋ณํ** | |
- Wan2.1-T2V-14B Diffusion ๋ชจ๋ธ ์ฌ์ฉ (140์ต ํ๋ผ๋ฏธํฐ) | |
- NAG(๋ ธ์ด์ฆ ์ฆ๊ฐ ์์ฑ)๋ฅผ ํตํ ๋น ๋ฅธ 4๋จ๊ณ ์์ฑ | |
- 128x128๋ถํฐ 896x896๊น์ง ๋ค์ํ ํด์๋ ์ง์ | |
- ์ง์ ์๊ฐ: 16 FPS๋ก 1-8์ด | |
- ์ ๋ฌธ์ ์ธ ์นด๋ฉ๋ผ ์์ง์์ ํฌํจํ ์ํ ํ์ง ์ถ๋ ฅ | |
2. **์๋ ์ค๋์ค ์์ฑ** | |
- ๋๊ธฐํ๋ ์ฌ์ด๋ ํจ๊ณผ๋ฅผ ์ํ MMAudio ํตํฉ | |
- ๋น๋์ค์ ์ค๋์ค ๋ชจ๋ ๋์ผํ ํ ์คํธ ํ๋กฌํํธ ์ฌ์ฉ | |
- ์ค๋์ค ํ์ง๊ณผ ๊ฐ์ด๋์ค ๊ฐ๋ ์กฐ์ ๊ฐ๋ฅ | |
- ์ ํ์ ๊ธฐ๋ฅ - ํ์์ ๋นํ์ฑํ ๊ฐ๋ฅ | |
3. **๊ณ ๊ธ ์ ์ด ๊ธฐ๋ฅ** | |
- **NAG ์ค์ผ์ผ**: ๊ฐ์ด๋์ค ๊ฐ๋ ์ ์ด (1.0-20.0) | |
- **์ถ๋ก ๋จ๊ณ**: ํ์ง ๋ ์๋ ๊ท ํ ์กฐ์ (1-8๋จ๊ณ) | |
- **์๋ ์ ์ด**: ์ฌํ ๊ฐ๋ฅํ ๊ฒฐ๊ณผ๋ฅผ ์ํ ์ค์ | |
- **๋ค๊ฑฐํฐ๋ธ ํ๋กฌํํธ**: ์์ฑ์์ ํผํ ์์ ์ง์ | |
### ์๋ ๋ฐฉ์ | |
1. **์ ๋ ฅ**: ์์ธํ ์ฅ๋ฉด ์ค๋ช ์ ๋ ฅ | |
2. **๋น๋์ค ์์ฑ**: AI๊ฐ ํ๋กฌํํธ ๊ธฐ๋ฐ ๋น๋์ค ํ๋ ์ ์์ฑ | |
3. **์ค๋์ค ํฉ์ฑ**: ์๋์ผ๋ก ์ผ์นํ๋ ์ฌ์ด๋ ํจ๊ณผ ์์ฑ | |
4. **์ถ๋ ฅ**: ๋๊ธฐํ๋ ์ค๋์ค๊ฐ ํฌํจ๋ ๋น๋์ค ์ถ๋ ฅ | |
### ํ์ฉ ์ฌ๋ก | |
- ์ํ ํ๋ฆฌ๋ทฐ ๋ฐ ์ปจ์ ์๊ฐํ | |
- ๋ฎค์ง ๋น๋์ค ์ ์ | |
- ๊ด๊ณ ์ฝํ ์ธ ์์ฑ | |
- ์ฐฝ์์ ์คํ ๋ฆฌํ ๋ง | |
- ๊ฒ์ ์๋ค๋งํฑ | |
### ๊ธฐ์ ์ฌ์ | |
- **GPU ๊ฐ์**: ๋น ๋ฅธ ์ฒ๋ฆฌ๋ฅผ ์ํ CUDA ์ฌ์ฉ | |
- **๋ชจ๋ธ ์ํคํ ์ฒ**: ํธ๋์คํฌ๋จธ ๊ธฐ๋ฐ ํ์ฐ ๋ชจ๋ธ | |
- **์ค๋์ค ๋ชจ๋ธ**: ํ๋ก์ฐ ๋งค์นญ ๊ธฐ๋ฐ ์ค๋์ค ํฉ์ฑ | |
- **์ฒ๋ฆฌ ์๊ฐ**: ์ค์ ์ ๋ฐ๋ผ ์ฝ 30-70์ด | |
### ์ต์์ ๊ฒฐ๊ณผ๋ฅผ ์ํ ํ | |
- ์์ธํ๊ณ ์ํ์ ์ธ ์ค๋ช ์ฌ์ฉ | |
- ์นด๋ฉ๋ผ ์์ง์๊ณผ ์๊ฐ์ ์คํ์ผ ํฌํจ | |
- ์กฐ๋ช , ์์, ๋ถ์๊ธฐ ๋ช ์ | |
- ๋ ๋์ ์ค๋์ค ๋งค์นญ์ ์ํด ์ฌ์ด๋ ์ค๋ช ์ถ๊ฐ | |
- ๋์ NAG ์ค์ผ์ผ = ํ๋กฌํํธ์ ๋ ์ถฉ์คํ ์์ฑ | |
### ํน๋ณ ๊ธฐ๋ฅ | |
- **์ํ๊ธ ํ๋กฌํํธ ์์ **: ์ ๋ฌธ์ ์ธ ์ดฌ์ ๊ธฐ๋ฒ์ด ํฌํจ๋ 3๊ฐ์ง ์์ ์ ๊ณต | |
- **์ค์๊ฐ ์งํ ํ์**: ์์ฑ ๊ณผ์ ์ ์ค์๊ฐ์ผ๋ก ํ์ธ | |
- **์ํด๋ฆญ ์์ ์ ์ฉ**: ์์ ๋ฅผ ํด๋ฆญํ๋ฉด ์๋์ผ๋ก ์ค์ ๊ฐ ์ ์ฉ | |
์ด ๋๊ตฌ๋ ์ ๋ฌธ๊ฐ ์์ค์ ๋น๋์ค ์ฝํ ์ธ ๋ฅผ ์ฝ๊ฒ ์์ฑํ ์ ์๋๋ก ์ค๊ณ๋์์ผ๋ฉฐ, ์ฐฝ์์ ์ธ ์์ด๋์ด๋ฅผ ๋น ๋ฅด๊ฒ ์๊ฐํํ๋ ๋ฐ ์ด์์ ์ ๋๋ค. |