Bangladeshi Bangla Text-to-Speech (VITS)

This is a fine-tuned VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) model for Bangladeshi Bangla text-to-speech synthesis.

Model Description

Language: Bangla (Bengali) - Bangladeshi variant
Architecture: VITS (Variational Inference Text-to-Speech)
Task: Text-to-Speech (TTS)
Training: Fine-tuned on Bangladeshi Bangla speech data
Voice: Female speaker
Sample Rate: 22050 Hz

Features

High-quality Bangla speech synthesis
Support for Bangladeshi Bangla pronunciation and intonation
End-to-end neural text-to-speech
Real-time inference capability
Proper handling of Bangla numerals and punctuation

Usage

Using with TTS Library

import torch
import soundfile as sf
from TTS.api import TTS

# Load the model
tts = TTS(model_path="path/to/pytorch_model.pth", 
          config_path="path/to/config.json")

# Generate speech
text = "আমি বাংলা টেক্সট টু স্পিচ ব্যবহার করছি।"
audio = tts.tts(text)

# Save audio
sf.write("output.wav", audio, 22050)

Using with Hugging Face Hub

from huggingface_hub import snapshot_download
import torch
import soundfile as sf

# Download model
model_path = snapshot_download(repo_id="EMTIAZZ/bangladeshi-bangla-tts-vits")

# Load and use the model
# (Add your custom loading code here)

Model Performance

Training Epochs: 100
Model Size: ~997MB
Inference Speed: Real-time capable
Audio Quality: High fidelity speech synthesis

Training Data

This model was fine-tuned on Bangladeshi Bangla speech data to capture the specific pronunciation, intonation, and linguistic characteristics of Bangladeshi Bengali.

Limitations

Optimized for Bangladeshi Bangla variant
Single female voice
May not handle out-of-vocabulary words perfectly
Performance may vary with very long texts

Technical Details

Framework: PyTorch
Architecture: VITS
Sampling Rate: 22050 Hz
Quantization: Model is quantized for efficient inference

Citation

If you use this model in your research, please cite:

@misc{bangladeshi-bangla-tts-vits,
  title={Bangladeshi Bangla Text-to-Speech using VITS},
  author={EMTIAZZ},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/EMTIAZZ/bangladeshi-bangla-tts-vits}
}

License

This model is released under the Apache 2.0 License.

Contact

For questions or issues, please open an issue in the repository or contact the author.

Downloads last month: 23