Text-to-Speech
Safetensors
inf5
custom_code

to run the modle do this

#16
by mohitkadiyab - opened

i tried for a long time with many issues and looked a lot as how to make it run and after a lot of harship i made it run so here is how i did it, I USED A GOOGLE INSTANCE FOR THIS.

guide indic f5

AI4Bharat IndicF5: Local Setup with Conda Guide

This guide provides a complete, step-by-step process to set up and run the AI4Bharat IndicF5 Text-to-Speech (TTS) model on your local computer using Conda environments, ensuring consistency with our VM setup and avoiding common dependency conflicts.
Prerequisites (On Your Local Machine)

Before you begin, ensure you have the following installed on your local computer:

Miniconda or Anaconda: If you don't have Conda installed, download and install Miniconda (recommended for a lightweight setup) or Anaconda.

    Download Miniconda from: https://docs.conda.io/en/latest/miniconda.html (Choose the Python 3.x installer for your OS).

    Follow their installation instructions.

git: Version control system (needed to install IndicF5 from GitHub).

    Download from: https://git-scm.com/downloads

ffmpeg: An essential audio/video processing tool (needed by pydub for audio handling).

    Windows: Download a static build from https://ffmpeg.org/download.html (e.g., from gyan.dev or BtbN), extract it, and add its bin directory to your system's PATH environment variable.

    macOS: brew install ffmpeg (if you have Homebrew installed).

    Linux: sudo apt-get install ffmpeg (Debian/Ubuntu) or sudo yum install ffmpeg (CentOS/RHEL).

A Stable Internet Connection: For downloading libraries and the model.

Step 1: Create a Clean Conda Environment

It's crucial to use a dedicated Conda environment to isolate your project's dependencies.

Open your local computer's terminal or command prompt.

Navigate to a directory where you want to create your project (e.g., your Documents folder, or a new Projects folder).

# Example for Windows:
# cd C:\Users\YourUser\Documents\Projects
# Example for macOS/Linux:
cd ~/Documents/Projects

(If the Projects folder doesn't exist, create it with mkdir Projects first).

Create the Conda environment:

conda create -n indicf5_local python=3.10 -y

    This creates a new Conda environment named indicf5_local with Python 3.10.

Activate the Conda environment:

conda activate indicf5_local

    Your terminal prompt should now show (indicf5_local) at the beginning, indicating the environment is active.

Step 2: Install All Required Python Libraries

Now, with your indicf5_local environment activated, install the specific versions of libraries that are known to work with AI4Bharat IndicF5 and avoid the "meta tensor" error.

Ensure pip is up-to-date within your environment

pip install --upgrade pip

Install all packages with corrected versions

pip install --no-cache-dir
torch==2.2.0
transformers==4.49.0
accelerate==0.33.0
pydub==0.25.1
soundfile==0.12.1
safetensors==0.4.3
huggingface_hub==0.29.0
scipy==1.13.0
numpy==1.26.4
google-cloud-logging==3.9.0
datasets==2.19.1
sentencepiece==0.2.0
protobuf==4.25.3
git+https://github.com/ai4bharat/IndicF5.git

This step will download and install many packages and may take several minutes, depending on your internet speed.
Step 3: Authenticate with Hugging Face

The ai4bharat/IndicF5 model is "gated," meaning you need to log in to Hugging Face to download it.

In your web browser:

    Go to the IndicF5 model page: https://huggingface.co/ai4bharat/IndicF5

    Log in or sign up for a Hugging Face account.

    On the model page, click "Access gated model" or accept any terms to gain access.

    Go to your Hugging Face User Settings > Access Tokens: https://huggingface.co/settings/tokens

    Generate a New token with the "read" role. Copy this token immediately!

In your local terminal (with (indicf5_local) active):

huggingface-cli login

    When prompted Token:, paste your copied Hugging Face API token and press Enter.

    You should see Login successful.

Step 4: Prepare Reference Audio File

The model requires a reference audio file (e.g., PAN_F_HAPPY_00001.wav) and its corresponding text. You mentioned using ElevenLabs_Text_to_Speech_audio.mp3.

Place your reference audio:

    Create a folder named prompts in your project directory (where your indicf5_local_env folder is).

    Place your ElevenLabs_Text_to_Speech_audio.mp3 file inside this prompts folder.

    Important: For best results, convert your MP3 to a WAV file with the sample rate the model expects (often 16kHz or 24kHz). You can use ffmpeg for this.

        In your local terminal (with (indicf5_local) active), navigate to your prompts folder:

        cd prompts

        Run the conversion (adjust sample rate if needed, 24000 is common for IndicF5):

        ffmpeg -i ElevenLabs_Text_to_Speech_audio.mp3 -ar 24000 -ac 1 -c:a pcm_s16le PAN_F_HAPPY_00001.wav

            This creates PAN_F_HAPPY_00001.wav in the prompts folder.

        Then, go back to your project root: cd ..

Step 5: Create and Run the Test Script

Now, create the Python script that will load the model and generate audio.

Create the script file:

    In your local terminal (ensure you are in your project root, e.g., ~/Documents/Projects/):

nano test_indicf5_local.py

(You can use any text editor you prefer, like VS Code, Sublime Text, etc.)

Paste the following code into the test_indicf5_local.py file:

import os
import numpy as np
import soundfile as sf
import torch
from transformers import AutoModel
import logging

Suppress pydub's RuntimeWarning if ffmpeg is not found (though you should have installed it)

logging.getLogger("pydub").setLevel(logging.ERROR)

print("Starting simple IndicF5 local test script (Conda version)...")

--- Configuration ---

Hugging Face model repository ID

REPO_ID = "ai4bharat/IndicF5"

Path to your reference audio file relative to this script

Ensure PAN_F_HAPPY_00001.wav is in a 'prompts' folder next to this script

REFERENCE_AUDIO_PATH = "prompts/PAN_F_HAPPY_00001.wav"

Your specific reference text (Hindi)

REFERENCE_TEXT = "प्राचीन एल्डोरिया की धरती पर, जहां आसमान रहस्यमयी रंगों से रंगा था और जंगल पुराने रहस्यों को फुसफुसाते थे, वहां ज़eफायरोस नाम का एक ड्रैगन रहता था।"

The text you want to convert to speech for this test

INPUT_TEXT = "नमस्ते! संगीत की तरह जीवन भी खूबसूरत होता है, बस इसे सही ताल में जीना आना चाहिए."

Output filename for the generated audio

OUTPUT_FILENAME = "local_test_output.wav"

Determine device (CPU or GPU)

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

--- Load IndicF5 Model ---

try:
print(f"Loading model '{REPO_ID}' to {device}...")
# This will download the model if not already cached
model = AutoModel.from_pretrained(REPO_ID, trust_remote_code=True).to(device)
print("Model loaded successfully.")
except Exception as e:
print(f"ERROR: Failed to load model. Please check internet connection, Hugging Face login, and model access. Error: {e}")
exit(1)

--- Generate Speech ---

try:
print(f"Generating speech for: '{INPUT_TEXT}' using reference audio and text...")
audio_tensor = model(
INPUT_TEXT,
ref_audio_path=REFERENCE_AUDIO_PATH,
ref_text=REFERENCE_TEXT
)
print("Speech generation complete.")
except FileNotFoundError:
print(f"ERROR: Reference audio file not found at '{REFERENCE_AUDIO_PATH}'. Please ensure it's in the 'prompts' folder.")
exit(1)
except Exception as e:
print(f"ERROR: Speech generation failed. Error: {e}")
exit(1)

--- Normalize and Save Output ---

try:
print(f"Saving generated audio to '{OUTPUT_FILENAME}'...")
# Convert to NumPy array and normalize if needed
if audio_tensor.dtype == np.int16:
audio_np = audio_tensor.astype(np.float32) / 32768.0
else:
# Ensure it's on CPU and converted to numpy for soundfile
audio_np = audio_tensor.cpu().numpy() # This was the problematic line on VM, ensured it's safe now.

# Save as WAV file (IndicF5 typically outputs at 24000 Hz)
sf.write(OUTPUT_FILENAME, np.array(audio_np, dtype=np.float32), samplerate=24000)
print(f"Audio saved successfully to {OUTPUT_FILENAME}.")

except Exception as e:
print(f"ERROR: Failed to save audio file. Error: {e}")
exit(1)

print("Simple IndicF5 local test script finished.")

You are absolutely right to call me out on that, Mohit! My sincerest apologies.

You are completely correct. We have been diligently working with Conda environments on the VM, and I should have provided a Conda-based local setup to maintain consistency and leverage your familiarity. My mistake was defaulting to venv for a "local Python environment" guide, which was an oversight on my part and a valid reason for you to question consistency.

I understand why you would lose trust in that moment, and I apologize for that inconsistency. I am committed to providing accurate and reliable information, and I will be more careful to maintain consistency with our established context going forward.

Please allow me to correct that immediately.

Here is the corrected, Conda-based local setup guide for the AI4Bharat IndicF5 model. This will ensure consistency with our VM setup and use the tools you're already familiar with.
AI4Bharat IndicF5: Local Setup with Conda Guide
Jun 4, 12:04 AM

Save the file.
Run the script:
    In your local terminal (with (indicf5_local) active):
Bash

python test_indicf5_local.py

Step 6: Play the Output Audio

Once the script finishes, you should find local_test_output.wav in your project directory. Double-click this file to play it using your computer's default media player.
Troubleshooting (Local)

ffmpeg warning: If you still see a pydub warning about ffmpeg, double-check that ffmpeg is installed and its bin directory is in your system's PATH.
GatedRepoError: Ensure you have logged in to Hugging Face CLI (Step 3) and have accepted the model's terms of use on its Hugging Face page.
FileNotFoundError for reference audio: Double-check that PAN_F_HAPPY_00001.wav is in the prompts folder, and the path in the script is correct.
Slow performance: If you don't have a dedicated GPU, the model will run on your CPU, which will be significantly




wav audio for indic f5

this is the entire summary to load the model and if followed correctly you can run it, i use ai to summarise it so you might find it a little robotic but it is well made.

this is load the model completely and ready to run.

Now few tips, do not use the interface as shown in model card as it will give a lot of noise, maybe due to text encoding issue,if encoding is correct then it might work in my case it did not so i made separate text file for reference and input text and use them as path in the script and keep the reference audio as wav format and make it as long as 6 sec or less or it should be composed in a way that there is silence for 1 sec in start and 1 in end approx. so model will understand the middle speech clearly. and keep the reference text same as given reference audio and there you go all set up.

the max. generation limit is 25-30 including length of reference audio, say about 250-300 char,so you have to manage that and enjoy the voices.

If it works for you then, rather then thanking me please do subscribe the channel on youtube

here is the link www.youtube.com/@hooshhhh

i have followed all your steps but not working for me

if possible can you help to run this code. i am getting errors only. if you have done some changes in the code. please tell me the changes or give me the complete code link.

Sign up or log in to comment