You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

🚀 Overview

tts-si-F5-TTS is a state-of-the-art Text-to-Speech (TTS) model tailored for the Sinhala (සිංහල) language. It is built upon the advanced F5-TTS (Flow-Matching) architecture, enabling high-quality, natural-sounding speech generation.

This model is a significant resource for the Sinhala language community, supporting research, content creation, and accessibility initiatives.

🛠️ Model Details

Attribute	Value
Model ID	`tharindumihi/tts-si-F5-TTS`
Architecture	F5-TTS (Flow-Matching based Text-to-Speech)
Primary Language	Sinhala (`si`)
Estimated Model Size	approx 650 Million Parameters (Based on a 1.25GB checkpoint)
Inference Library	F5-TTS Library

📊 Training Data

The model was trained on a single-speaker, custom Sinhala dataset.

Attribute	Value
Dataset Name	Pathnirvana
Total Duration	07 hours 41 minutes 18 seconds
Total Utterances	3,300 files
Speaker Count	1 (Monolingual, single speaker)

🎯 Intended Uses & Performance

Primary Intended Uses

Research and development in Sinhala speech synthesis.
Generating voiceovers for non-commercial educational content, documentaries, and apps.
Accessibility tools for text-to-speech for Sinhala speakers.

Performance

Speech Quality: The model produces high-quality and natural-sounding Sinhala speech.
Voice Cloning: It supports Zero-Shot Voice Cloning for Sinhala. Users should note that the voice similarity may be variable and not perfectly match the reference audio.

💻 How to Use

To use this model, you will need the official f5-tts Python package.

1. Install the necessary libraries

pip install f5-tts

2. clone the original repositary

git clone https://github.com/JarodMica/F5-TTS.git
cd F5-TTS

python -m venv venv
source venv/bin/activate

pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118
pip install -e .
pip install tensorboard

3. Download the model files

Use huggingface-cli or git clone to get the files (model.pt, vocab.txt)

Alternative Method (Manual Download): Navigate to the "Files and versions" tab on the Hugging Face repo page and download model_230000_reduced.pt, and vocab.txt manually. Create a folder named ckpts\f5-TTS and place all files directly inside it.

Note on Inference CLI

The official f5-tts inference CLI is currently known to throw errors. Therefore, the recommended method for local testing and inference is to use the custom Gradio web interface provided with this model.

Using the Custom Gradio UI for GUI Inference

Download Custom UI File: Download the provided custom_gradio.py and infer_cli_custom.py files from this repository.
Place the Files: You must place the custom_gradio.py and 'infer_cli_custom.py' file inside your local F5-TTS installation directory structure: [Root_Directory]\F5-TTS\src\f5_tts\infer\
Run the UI: Execute the Gradio application using the Python module runner from the F5-TTS root directory:

python -m src\f5_tts\infer\custom_gradio.py

Access: This command will start a local Gradio server. Open the displayed local URL http://127.0.0.1:7860 in your web browser to use the model via the graphical interface.

🙏 Acknowledgements

This project would not have been possible without the foundational work and the data provided by the following entities:

F5-TTS Framework: Deepest gratitude to the F5-TTS developers for creating the robust training and inference framework that was used to develop this Sinhala model.
Pathnirvana Dataset: We acknowledge the Pathnirvana project/contributors for providing the essential high-quality Sinhala speech data used to train this model.

📜 Licensing and Terms of Use

License

This model is licensed under the Creative Commons Attribution Non Commercial 4.0 International License (cc-by-nc-4.0).

You are free to: Share (copy and redistribute) and Adapt (remix, transform, and build upon) the material.
Under the following terms:
- Attribution (BY): You must give appropriate credit.
- NonCommercial (NC): You may not use the material for commercial purposes.

Please review the full license terms by following the link in the license metadata at the top of this card.

⚠️ Ethical Considerations and Limitations

All synthetic voice technology carries potential risks. Users must adhere to the model's license and ethical guidelines.

Misuse Policy

The use of this model to generate audio that impersonates, deceives, or violates the privacy or rights of individuals or groups is strictly prohibited. It must not be used for illegal or unethical activities, including creating malicious deepfakes or spam.

Downloads last month: 7