VEO3-Free / README.md
ginipick's picture
Update README.md
c942c40 verified
---
title: VEO3 Free
emoji: ๐Ÿ”Š
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
short_description: Wan2.1-T2V-14B + Fast 4-step with NAG + Automatic Audio
models:
- VIDraft/Gemma-3-R1984-4B
- google/gemma-3-4b-it
- Wan-AI/Wan2.1-T2V-14B-Diffusers
- vrgamedevgirl84/Wan14BT2VFusioniX
- Kijai/WanVideo_comfy
---
## English Explanation
### Overview
This is a **VEO3 Free** application - an advanced AI video generation system that combines Wan2.1-T2V-14B model with automatic audio generation capabilities. It creates videos from text descriptions and automatically generates matching audio using MMAudio technology.
### Key Features
1. **Text-to-Video Generation**
- Uses Wan2.1-T2V-14B Diffusion model (14 billion parameters)
- Fast 4-step generation with NAG (Noise-Augmented Generation)
- Supports various resolutions from 128x128 to 896x896
- Duration: 1-8 seconds at 16 FPS
- Cinema-quality output with professional camera movements
2. **Automatic Audio Generation**
- MMAudio integration for synchronized sound effects
- Uses the same text prompt for both video and audio
- Configurable audio quality and guidance strength
- Optional feature - can be disabled if needed
3. **Advanced Controls**
- **NAG Scale**: Controls guidance strength (1.0-20.0)
- **Inference Steps**: Balances quality vs speed (1-8 steps)
- **Seed Control**: For reproducible results
- **Negative Prompts**: Specify what to avoid in generation
### How It Works
1. **Input**: Enter a detailed scene description
2. **Video Generation**: The AI creates video frames based on your prompt
3. **Audio Synthesis**: Automatically generates matching sound effects
4. **Output**: Combined video with synchronized audio
### Example Use Cases
- Film previews and concept visualization
- Music video creation
- Advertising content
- Creative storytelling
- Game cinematics
### Technical Details
- **GPU Acceleration**: Uses CUDA for fast processing
- **Model Architecture**: Transformer-based diffusion model
- **Audio Model**: Flow-matching based audio synthesis
- **Processing Time**: ~30-70 seconds depending on settings
### Tips for Best Results
- Use detailed, cinematic descriptions
- Include camera movements and visual style
- Specify lighting, colors, and atmosphere
- Add sound descriptions for better audio matching
- Higher NAG scale = more prompt adherence
---
## ํ•œ๊ธ€ ์„ค๋ช…
### ๊ฐœ์š”
**VEO3 Free**๋Š” Wan2.1-T2V-14B ๋ชจ๋ธ๊ณผ ์ž๋™ ์˜ค๋””์˜ค ์ƒ์„ฑ ๊ธฐ๋Šฅ์„ ๊ฒฐํ•ฉํ•œ ๊ณ ๊ธ‰ AI ๋น„๋””์˜ค ์ƒ์„ฑ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. ํ…์ŠคํŠธ ์„ค๋ช…์œผ๋กœ๋ถ€ํ„ฐ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•˜๊ณ  MMAudio ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•ด ์ž๋™์œผ๋กœ ์ผ์น˜ํ•˜๋Š” ์˜ค๋””์˜ค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
### ์ฃผ์š” ๊ธฐ๋Šฅ
1. **ํ…์ŠคํŠธ-๋น„๋””์˜ค ๋ณ€ํ™˜**
- Wan2.1-T2V-14B Diffusion ๋ชจ๋ธ ์‚ฌ์šฉ (140์–ต ํŒŒ๋ผ๋ฏธํ„ฐ)
- NAG(๋…ธ์ด์ฆˆ ์ฆ๊ฐ• ์ƒ์„ฑ)๋ฅผ ํ†ตํ•œ ๋น ๋ฅธ 4๋‹จ๊ณ„ ์ƒ์„ฑ
- 128x128๋ถ€ํ„ฐ 896x896๊นŒ์ง€ ๋‹ค์–‘ํ•œ ํ•ด์ƒ๋„ ์ง€์›
- ์ง€์† ์‹œ๊ฐ„: 16 FPS๋กœ 1-8์ดˆ
- ์ „๋ฌธ์ ์ธ ์นด๋ฉ”๋ผ ์›€์ง์ž„์„ ํฌํ•จํ•œ ์˜ํ™” ํ’ˆ์งˆ ์ถœ๋ ฅ
2. **์ž๋™ ์˜ค๋””์˜ค ์ƒ์„ฑ**
- ๋™๊ธฐํ™”๋œ ์‚ฌ์šด๋“œ ํšจ๊ณผ๋ฅผ ์œ„ํ•œ MMAudio ํ†ตํ•ฉ
- ๋น„๋””์˜ค์™€ ์˜ค๋””์˜ค ๋ชจ๋‘ ๋™์ผํ•œ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ ์‚ฌ์šฉ
- ์˜ค๋””์˜ค ํ’ˆ์งˆ๊ณผ ๊ฐ€์ด๋˜์Šค ๊ฐ•๋„ ์กฐ์ ˆ ๊ฐ€๋Šฅ
- ์„ ํƒ์  ๊ธฐ๋Šฅ - ํ•„์š”์‹œ ๋น„ํ™œ์„ฑํ™” ๊ฐ€๋Šฅ
3. **๊ณ ๊ธ‰ ์ œ์–ด ๊ธฐ๋Šฅ**
- **NAG ์Šค์ผ€์ผ**: ๊ฐ€์ด๋˜์Šค ๊ฐ•๋„ ์ œ์–ด (1.0-20.0)
- **์ถ”๋ก  ๋‹จ๊ณ„**: ํ’ˆ์งˆ ๋Œ€ ์†๋„ ๊ท ํ˜• ์กฐ์ ˆ (1-8๋‹จ๊ณ„)
- **์‹œ๋“œ ์ œ์–ด**: ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ๊ฒฐ๊ณผ๋ฅผ ์œ„ํ•œ ์„ค์ •
- **๋„ค๊ฑฐํ‹ฐ๋ธŒ ํ”„๋กฌํ”„ํŠธ**: ์ƒ์„ฑ์—์„œ ํ”ผํ•  ์š”์†Œ ์ง€์ •
### ์ž‘๋™ ๋ฐฉ์‹
1. **์ž…๋ ฅ**: ์ƒ์„ธํ•œ ์žฅ๋ฉด ์„ค๋ช… ์ž…๋ ฅ
2. **๋น„๋””์˜ค ์ƒ์„ฑ**: AI๊ฐ€ ํ”„๋กฌํ”„ํŠธ ๊ธฐ๋ฐ˜ ๋น„๋””์˜ค ํ”„๋ ˆ์ž„ ์ƒ์„ฑ
3. **์˜ค๋””์˜ค ํ•ฉ์„ฑ**: ์ž๋™์œผ๋กœ ์ผ์น˜ํ•˜๋Š” ์‚ฌ์šด๋“œ ํšจ๊ณผ ์ƒ์„ฑ
4. **์ถœ๋ ฅ**: ๋™๊ธฐํ™”๋œ ์˜ค๋””์˜ค๊ฐ€ ํฌํ•จ๋œ ๋น„๋””์˜ค ์ถœ๋ ฅ
### ํ™œ์šฉ ์‚ฌ๋ก€
- ์˜ํ™” ํ”„๋ฆฌ๋ทฐ ๋ฐ ์ปจ์…‰ ์‹œ๊ฐํ™”
- ๋ฎค์ง ๋น„๋””์˜ค ์ œ์ž‘
- ๊ด‘๊ณ  ์ฝ˜ํ…์ธ  ์ƒ์„ฑ
- ์ฐฝ์˜์  ์Šคํ† ๋ฆฌํ…”๋ง
- ๊ฒŒ์ž„ ์‹œ๋„ค๋งˆํ‹ฑ
### ๊ธฐ์ˆ  ์‚ฌ์–‘
- **GPU ๊ฐ€์†**: ๋น ๋ฅธ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ CUDA ์‚ฌ์šฉ
- **๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜**: ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ํ™•์‚ฐ ๋ชจ๋ธ
- **์˜ค๋””์˜ค ๋ชจ๋ธ**: ํ”Œ๋กœ์šฐ ๋งค์นญ ๊ธฐ๋ฐ˜ ์˜ค๋””์˜ค ํ•ฉ์„ฑ
- **์ฒ˜๋ฆฌ ์‹œ๊ฐ„**: ์„ค์ •์— ๋”ฐ๋ผ ์•ฝ 30-70์ดˆ
### ์ตœ์ƒ์˜ ๊ฒฐ๊ณผ๋ฅผ ์œ„ํ•œ ํŒ
- ์ƒ์„ธํ•˜๊ณ  ์˜ํ™”์ ์ธ ์„ค๋ช… ์‚ฌ์šฉ
- ์นด๋ฉ”๋ผ ์›€์ง์ž„๊ณผ ์‹œ๊ฐ์  ์Šคํƒ€์ผ ํฌํ•จ
- ์กฐ๋ช…, ์ƒ‰์ƒ, ๋ถ„์œ„๊ธฐ ๋ช…์‹œ
- ๋” ๋‚˜์€ ์˜ค๋””์˜ค ๋งค์นญ์„ ์œ„ํ•ด ์‚ฌ์šด๋“œ ์„ค๋ช… ์ถ”๊ฐ€
- ๋†’์€ NAG ์Šค์ผ€์ผ = ํ”„๋กฌํ”„ํŠธ์— ๋” ์ถฉ์‹คํ•œ ์ƒ์„ฑ
### ํŠน๋ณ„ ๊ธฐ๋Šฅ
- **์˜ํ™”๊ธ‰ ํ”„๋กฌํ”„ํŠธ ์˜ˆ์ œ**: ์ „๋ฌธ์ ์ธ ์ดฌ์˜ ๊ธฐ๋ฒ•์ด ํฌํ•จ๋œ 3๊ฐ€์ง€ ์˜ˆ์ œ ์ œ๊ณต
- **์‹ค์‹œ๊ฐ„ ์ง„ํ–‰ ํ‘œ์‹œ**: ์ƒ์„ฑ ๊ณผ์ •์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํ™•์ธ
- **์›ํด๋ฆญ ์˜ˆ์ œ ์ ์šฉ**: ์˜ˆ์ œ๋ฅผ ํด๋ฆญํ•˜๋ฉด ์ž๋™์œผ๋กœ ์„ค์ •๊ฐ’ ์ ์šฉ
์ด ๋„๊ตฌ๋Š” ์ „๋ฌธ๊ฐ€ ์ˆ˜์ค€์˜ ๋น„๋””์˜ค ์ฝ˜ํ…์ธ ๋ฅผ ์‰ฝ๊ฒŒ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์œผ๋ฉฐ, ์ฐฝ์˜์ ์ธ ์•„์ด๋””์–ด๋ฅผ ๋น ๋ฅด๊ฒŒ ์‹œ๊ฐํ™”ํ•˜๋Š” ๋ฐ ์ด์ƒ์ ์ž…๋‹ˆ๋‹ค.