phunter_space / README_SPACE.md
rocketmandrey's picture
Upload folder using huggingface_hub
912dd8d verified
# MeiGen-MultiTalk Demo
This is a demo of [MeiGen-MultiTalk](https://huggingface.co/MeiGen-AI/MeiGen-MultiTalk), an audio-driven multi-person conversational video generation model.
## Features
- πŸ’¬ Generate videos of people talking from still images and audio
- πŸ‘₯ Support for both single-person and multi-person conversations
- 🎯 High-quality lip synchronization
- πŸ“Ί Support for 480p and 720p resolution
- ⏱️ Generate videos up to 15 seconds long
## How to Use
1. Upload a reference image (photo of person(s) who will be speaking)
2. Upload one or more audio files:
- For single person: Upload one audio file
- For conversation: Upload multiple audio files (one per person)
3. Enter a prompt describing the desired video
4. Adjust generation parameters if needed:
- Resolution: Video quality (480p or 720p)
- Audio CFG: Controls strength of audio influence
- Guidance Scale: Controls adherence to prompt
- Random Seed: For reproducible results
- Max Duration: Video length in seconds
5. Click "Generate Video" and wait for the result
## Tips
- Use clear, front-facing photos for best results
- Ensure good audio quality without background noise
- Keep prompts clear and specific
- For multi-person videos, ensure the reference image shows all speakers clearly
## Limitations
- Generation can take several minutes
- Maximum video duration is 15 seconds
- Best results with clear, well-lit reference images
- Audio should be clear and without background noise
## Credits
This demo uses the MeiGen-MultiTalk model created by MeiGen-AI. If you use this in your work, please cite:
```bibtex
@article{kong2025let,
title={Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation},
author={Kong, Zhe and Gao, Feng and Zhang, Yong and Kang, Zhuoliang and Wei, Xiaoming and Cai, Xunliang and Chen, Guanying and Luo, Wenhan},
journal={arXiv preprint arXiv:2505.22647},
year={2025}
}