Bangladeshi Bangla Text-to-Speech (VITS)

This is a fine-tuned VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) model for Bangladeshi Bangla text-to-speech synthesis.

Model Description

  • Language: Bangla (Bengali) - Bangladeshi variant
  • Architecture: VITS (Variational Inference Text-to-Speech)
  • Task: Text-to-Speech (TTS)
  • Training: Fine-tuned on Bangladeshi Bangla speech data
  • Voice: Female speaker
  • Sample Rate: 22050 Hz

Features

  • High-quality Bangla speech synthesis
  • Support for Bangladeshi Bangla pronunciation and intonation
  • End-to-end neural text-to-speech
  • Real-time inference capability
  • Proper handling of Bangla numerals and punctuation

Usage

Using with TTS Library

import torch
import soundfile as sf
from TTS.api import TTS

# Load the model
tts = TTS(model_path="path/to/pytorch_model.pth", 
          config_path="path/to/config.json")

# Generate speech
text = "আমি বাংলা টেক্সট টু স্পিচ ব্যবহার করছি।"
audio = tts.tts(text)

# Save audio
sf.write("output.wav", audio, 22050)

Using with Hugging Face Hub

from huggingface_hub import snapshot_download
import torch
import soundfile as sf

# Download model
model_path = snapshot_download(repo_id="EMTIAZZ/bangladeshi-bangla-tts-vits")

# Load and use the model
# (Add your custom loading code here)

Model Performance

  • Training Epochs: 100
  • Model Size: ~997MB
  • Inference Speed: Real-time capable
  • Audio Quality: High fidelity speech synthesis

Training Data

This model was fine-tuned on Bangladeshi Bangla speech data to capture the specific pronunciation, intonation, and linguistic characteristics of Bangladeshi Bengali.

Limitations

  • Optimized for Bangladeshi Bangla variant
  • Single female voice
  • May not handle out-of-vocabulary words perfectly
  • Performance may vary with very long texts

Technical Details

  • Framework: PyTorch
  • Architecture: VITS
  • Sampling Rate: 22050 Hz
  • Quantization: Model is quantized for efficient inference

Citation

If you use this model in your research, please cite:

@misc{bangladeshi-bangla-tts-vits,
  title={Bangladeshi Bangla Text-to-Speech using VITS},
  author={EMTIAZZ},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/EMTIAZZ/bangladeshi-bangla-tts-vits}
}

License

This model is released under the Apache 2.0 License.

Contact

For questions or issues, please open an issue in the repository or contact the author.

Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support