Bangladeshi Bangla Text-to-Speech (VITS)
This is a fine-tuned VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) model for Bangladeshi Bangla text-to-speech synthesis.
Model Description
- Language: Bangla (Bengali) - Bangladeshi variant
- Architecture: VITS (Variational Inference Text-to-Speech)
- Task: Text-to-Speech (TTS)
- Training: Fine-tuned on Bangladeshi Bangla speech data
- Voice: Female speaker
- Sample Rate: 22050 Hz
Features
- High-quality Bangla speech synthesis
- Support for Bangladeshi Bangla pronunciation and intonation
- End-to-end neural text-to-speech
- Real-time inference capability
- Proper handling of Bangla numerals and punctuation
Usage
Using with TTS Library
import torch
import soundfile as sf
from TTS.api import TTS
# Load the model
tts = TTS(model_path="path/to/pytorch_model.pth",
config_path="path/to/config.json")
# Generate speech
text = "আমি বাংলা টেক্সট টু স্পিচ ব্যবহার করছি।"
audio = tts.tts(text)
# Save audio
sf.write("output.wav", audio, 22050)
Using with Hugging Face Hub
from huggingface_hub import snapshot_download
import torch
import soundfile as sf
# Download model
model_path = snapshot_download(repo_id="EMTIAZZ/bangladeshi-bangla-tts-vits")
# Load and use the model
# (Add your custom loading code here)
Model Performance
- Training Epochs: 100
- Model Size: ~997MB
- Inference Speed: Real-time capable
- Audio Quality: High fidelity speech synthesis
Training Data
This model was fine-tuned on Bangladeshi Bangla speech data to capture the specific pronunciation, intonation, and linguistic characteristics of Bangladeshi Bengali.
Limitations
- Optimized for Bangladeshi Bangla variant
- Single female voice
- May not handle out-of-vocabulary words perfectly
- Performance may vary with very long texts
Technical Details
- Framework: PyTorch
- Architecture: VITS
- Sampling Rate: 22050 Hz
- Quantization: Model is quantized for efficient inference
Citation
If you use this model in your research, please cite:
@misc{bangladeshi-bangla-tts-vits,
title={Bangladeshi Bangla Text-to-Speech using VITS},
author={EMTIAZZ},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/EMTIAZZ/bangladeshi-bangla-tts-vits}
}
License
This model is released under the Apache 2.0 License.
Contact
For questions or issues, please open an issue in the repository or contact the author.
- Downloads last month
- 23