rocketmandrey commited on
Commit
f846252
Β·
1 Parent(s): d64091b

Update Space configuration and documentation

Browse files
Files changed (3) hide show
  1. .DS_Store +0 -0
  2. .gitignore +53 -0
  3. README.md +53 -40
.DS_Store CHANGED
Binary files a/.DS_Store and b/.DS_Store differ
 
.gitignore ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Virtual Environment
24
+ venv/
25
+ ENV/
26
+ env/
27
+
28
+ # IDE
29
+ .idea/
30
+ .vscode/
31
+ *.swp
32
+ *.swo
33
+
34
+ # Logs
35
+ *.log
36
+
37
+ # Local development
38
+ .env
39
+ .env.local
40
+
41
+ # Model weights
42
+ weights/
43
+
44
+ # Generated content
45
+ outputs/
46
+ temp/
47
+ *.mp4
48
+ *.wav
49
+
50
+ # Keep example files
51
+ !examples/*.json
52
+ !assets/examples/*
53
+ !assets/audio/*
README.md CHANGED
@@ -1,61 +1,74 @@
1
  ---
2
- title: MeiGen MultiTalk Demo
3
  emoji: 🎬
4
  colorFrom: blue
5
- colorTo: red
6
  sdk: gradio
7
- sdk_version: 4.19.2
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
- hf_oauth: true
12
- models:
13
- - MeiGen-AI/MeiGen-MultiTalk
14
- - TencentGameMate/chinese-wav2vec2-base
15
- tags:
16
- - audio
17
- - video
18
- - image
19
- - text-to-video
20
  ---
21
 
22
- # MeiGen-MultiTalk
23
 
24
- Audio-driven multi-person conversational video generation system based on [MeiGen-AI/MeiGen-MultiTalk](https://huggingface.co/MeiGen-AI/MeiGen-MultiTalk).
25
 
26
- ## Features
27
 
28
- - πŸ’¬ Realistic Conversations - Support single & multi-person generation
29
- - πŸ‘₯ Interactive Character Control - Direct virtual humans via prompts
30
- - 🎀 Generalization Performance - Support generation of cartoon characters and singing
31
- - πŸ“Ί Resolution Flexibility - 480p & 720p output at arbitrary aspect ratios
32
- - ⏱️ Long Video Generation - Support videos up to 15 seconds
33
 
34
- ## Setup
35
 
36
- 1. Install dependencies:
37
- ```bash
38
- pip install -r requirements.txt
39
- ```
40
 
41
- 2. Download required models:
42
- ```bash
43
- huggingface-cli download MeiGen-AI/MeiGen-MultiTalk --local-dir ./weights/MeiGen-MultiTalk
44
- huggingface-cli download TencentGameMate/chinese-wav2vec2-base --local-dir ./weights/chinese-wav2vec2-base
45
- ```
46
 
47
- ## Usage
 
 
48
 
49
- See the examples directory for sample configurations:
50
- - `examples/single_example.json` - Single person video generation
51
- - `examples/multi_example.json` - Multi-person conversation generation
 
52
 
53
- ## License
 
 
 
54
 
55
- This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
 
 
 
56
 
57
- ## Configuration Options
58
 
59
- - `image`: Path to reference image
60
- - `audio`: Path to audio file(s)
61
- - `
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Phunter Space - Video Generation Demo
3
  emoji: 🎬
4
  colorFrom: blue
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 4.12.0
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
+ # Phunter Space - Video Generation Demo
14
 
15
+ This is a Gradio demo for generating talking head videos from images and audio using advanced AI models.
16
 
17
+ ## 🌟 Features
18
 
19
+ - πŸ’¬ Generate talking head videos from images and audio
20
+ - πŸ‘₯ Support for both single and multi-person video generation
21
+ - 🎯 High-quality lip synchronization
22
+ - πŸ“Ί Support for multiple resolutions (480p, 720p)
23
+ - 🎨 Customizable generation parameters
24
 
25
+ ## πŸš€ Quick Start
26
 
27
+ 1. Click "Load Night Studio Example" or "Load Day Studio Example"
28
+ 2. Upload your audio file (WAV format)
29
+ 3. Click "Generate Video"
 
30
 
31
+ ## πŸ“ Parameters Guide
 
 
 
 
32
 
33
+ ### Resolution
34
+ - 480p: Faster generation, lower quality
35
+ - 720p: Better quality, slower generation
36
 
37
+ ### Audio CFG (1.0-10.0)
38
+ - Controls lip movement influence
39
+ - Recommended: 4.0
40
+ - Higher values = more pronounced articulation
41
 
42
+ ### CFG Scale (1.0-15.0)
43
+ - Controls prompt adherence
44
+ - Recommended: 7.5
45
+ - Higher values = stricter prompt following
46
 
47
+ ### Max Duration
48
+ - Limits output video length
49
+ - Maximum: 15 seconds
50
+ - Default: 10 seconds
51
 
52
+ ## πŸ’‘ Tips
53
 
54
+ 1. Use high-quality reference images
55
+ 2. Provide detailed prompts
56
+ 3. Start with example settings
57
+ 4. Experiment with CFG values
58
+ 5. Ensure good lighting in reference images
59
+
60
+ ## πŸ“‹ Requirements
61
+
62
+ - Input Image: Clear face photo(s)
63
+ - Audio: WAV format
64
+ - Prompt: Detailed scene description
65
+
66
+ ## πŸ›  Technical Details
67
+
68
+ - Model: MeiGen MultiTalk
69
+ - Framework: Gradio 4.12.0
70
+ - GPU: T4 (recommended)
71
+
72
+ ## πŸ“¬ Contact
73
+
74
+ For questions or issues, please visit the [GitHub repository](https://github.com/yourusername/phunter_space) or create an issue on Hugging Face Spaces.