π Overview
tts-si-F5-TTS is a state-of-the-art Text-to-Speech (TTS) model tailored for the Sinhala (ΰ·ΰ·ΰΆΰ·ΰΆ½) language. It is built upon the advanced F5-TTS (Flow-Matching) architecture, enabling high-quality, natural-sounding speech generation.
This model is a significant resource for the Sinhala language community, supporting research, content creation, and accessibility initiatives.
π οΈ Model Details
| Attribute | Value |
|---|---|
| Model ID | tharindumihi/tts-si-F5-TTS |
| Architecture | F5-TTS (Flow-Matching based Text-to-Speech) |
| Primary Language | Sinhala (si) |
| Estimated Model Size | approx 650 Million Parameters (Based on a 1.25GB checkpoint) |
| Inference Library | F5-TTS Library |
π Training Data
The model was trained on a single-speaker, custom Sinhala dataset.
| Attribute | Value |
|---|---|
| Dataset Name | Pathnirvana |
| Total Duration | 07 hours 41 minutes 18 seconds |
| Total Utterances | 3,300 files |
| Speaker Count | 1 (Monolingual, single speaker) |
π― Intended Uses & Performance
Primary Intended Uses
- Research and development in Sinhala speech synthesis.
- Generating voiceovers for non-commercial educational content, documentaries, and apps.
- Accessibility tools for text-to-speech for Sinhala speakers.
Performance
- Speech Quality: The model produces high-quality and natural-sounding Sinhala speech.
- Voice Cloning: It supports Zero-Shot Voice Cloning for Sinhala. Users should note that the voice similarity may be variable and not perfectly match the reference audio.
π» How to Use
To use this model, you will need the official f5-tts Python package.
1. Install the necessary libraries
pip install f5-tts
2. clone the original repositary
git clone https://github.com/JarodMica/F5-TTS.git
cd F5-TTS
python -m venv venv
source venv/bin/activate
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118
pip install -e .
pip install tensorboard
3. Download the model files
Use huggingface-cli or git clone to get the files (model.pt, vocab.txt)
Alternative Method (Manual Download): Navigate to the "Files and versions" tab on the Hugging Face repo page and download model_230000_reduced.pt, and vocab.txt manually. Create a folder named ckpts\f5-TTS and place all files directly inside it.
Note on Inference CLI
The official f5-tts inference CLI is currently known to throw errors. Therefore, the recommended method for local testing and inference is to use the custom Gradio web interface provided with this model.
Using the Custom Gradio UI for GUI Inference
- Download Custom UI File: Download the provided
custom_gradio.pyand infer_cli_custom.py files from this repository. - Place the Files: You must place the
custom_gradio.pyand 'infer_cli_custom.py' file inside your local F5-TTS installation directory structure:[Root_Directory]\F5-TTS\src\f5_tts\infer\ - Run the UI: Execute the Gradio application using the Python module runner from the F5-TTS root directory:
python -m src\f5_tts\infer\custom_gradio.py
- Access: This command will start a local Gradio server. Open the displayed local URL http://127.0.0.1:7860 in your web browser to use the model via the graphical interface.
π Acknowledgements
This project would not have been possible without the foundational work and the data provided by the following entities:
- F5-TTS Framework: Deepest gratitude to the F5-TTS developers for creating the robust training and inference framework that was used to develop this Sinhala model.
- Pathnirvana Dataset: We acknowledge the Pathnirvana project/contributors for providing the essential high-quality Sinhala speech data used to train this model.
π Licensing and Terms of Use
License
This model is licensed under the Creative Commons Attribution Non Commercial 4.0 International License (cc-by-nc-4.0).
- You are free to: Share (copy and redistribute) and Adapt (remix, transform, and build upon) the material.
- Under the following terms:
- Attribution (BY): You must give appropriate credit.
- NonCommercial (NC): You may not use the material for commercial purposes.
Please review the full license terms by following the link in the license metadata at the top of this card.
β οΈ Ethical Considerations and Limitations
All synthetic voice technology carries potential risks. Users must adhere to the model's license and ethical guidelines.
Misuse Policy
The use of this model to generate audio that impersonates, deceives, or violates the privacy or rights of individuals or groups is strictly prohibited. It must not be used for illegal or unethical activities, including creating malicious deepfakes or spam.
- Downloads last month
- 7