to run the modle do this
i tried for a long time with many issues and looked a lot as how to make it run and after a lot of harship i made it run so here is how i did it, I USED A GOOGLE INSTANCE FOR THIS.
guide indic f5
AI4Bharat IndicF5: Local Setup with Conda Guide
This guide provides a complete, step-by-step process to set up and run the AI4Bharat IndicF5 Text-to-Speech (TTS) model on your local computer using Conda environments, ensuring consistency with our VM setup and avoiding common dependency conflicts.
Prerequisites (On Your Local Machine)
Before you begin, ensure you have the following installed on your local computer:
Miniconda or Anaconda: If you don't have Conda installed, download and install Miniconda (recommended for a lightweight setup) or Anaconda.
Download Miniconda from: https://docs.conda.io/en/latest/miniconda.html (Choose the Python 3.x installer for your OS).
Follow their installation instructions.
git: Version control system (needed to install IndicF5 from GitHub).
Download from: https://git-scm.com/downloads
ffmpeg: An essential audio/video processing tool (needed by pydub for audio handling).
Windows: Download a static build from https://ffmpeg.org/download.html (e.g., from gyan.dev or BtbN), extract it, and add its bin directory to your system's PATH environment variable.
macOS: brew install ffmpeg (if you have Homebrew installed).
Linux: sudo apt-get install ffmpeg (Debian/Ubuntu) or sudo yum install ffmpeg (CentOS/RHEL).
A Stable Internet Connection: For downloading libraries and the model.
Step 1: Create a Clean Conda Environment
It's crucial to use a dedicated Conda environment to isolate your project's dependencies.
Open your local computer's terminal or command prompt.
Navigate to a directory where you want to create your project (e.g., your Documents folder, or a new Projects folder).
# Example for Windows:
# cd C:\Users\YourUser\Documents\Projects
# Example for macOS/Linux:
cd ~/Documents/Projects
(If the Projects folder doesn't exist, create it with mkdir Projects first).
Create the Conda environment:
conda create -n indicf5_local python=3.10 -y
This creates a new Conda environment named indicf5_local with Python 3.10.
Activate the Conda environment:
conda activate indicf5_local
Your terminal prompt should now show (indicf5_local) at the beginning, indicating the environment is active.
Step 2: Install All Required Python Libraries
Now, with your indicf5_local environment activated, install the specific versions of libraries that are known to work with AI4Bharat IndicF5 and avoid the "meta tensor" error.
Ensure pip is up-to-date within your environment
pip install --upgrade pip
Install all packages with corrected versions
pip install --no-cache-dir
torch==2.2.0
transformers==4.49.0
accelerate==0.33.0
pydub==0.25.1
soundfile==0.12.1
safetensors==0.4.3
huggingface_hub==0.29.0
scipy==1.13.0
numpy==1.26.4
google-cloud-logging==3.9.0
datasets==2.19.1
sentencepiece==0.2.0
protobuf==4.25.3
git+https://github.com/ai4bharat/IndicF5.git
This step will download and install many packages and may take several minutes, depending on your internet speed.
Step 3: Authenticate with Hugging Face
The ai4bharat/IndicF5 model is "gated," meaning you need to log in to Hugging Face to download it.
In your web browser:
Go to the IndicF5 model page: https://huggingface.co/ai4bharat/IndicF5
Log in or sign up for a Hugging Face account.
On the model page, click "Access gated model" or accept any terms to gain access.
Go to your Hugging Face User Settings > Access Tokens: https://huggingface.co/settings/tokens
Generate a New token with the "read" role. Copy this token immediately!
In your local terminal (with (indicf5_local) active):
huggingface-cli login
When prompted Token:, paste your copied Hugging Face API token and press Enter.
You should see Login successful.
Step 4: Prepare Reference Audio File
The model requires a reference audio file (e.g., PAN_F_HAPPY_00001.wav) and its corresponding text. You mentioned using ElevenLabs_Text_to_Speech_audio.mp3.
Place your reference audio:
Create a folder named prompts in your project directory (where your indicf5_local_env folder is).
Place your ElevenLabs_Text_to_Speech_audio.mp3 file inside this prompts folder.
Important: For best results, convert your MP3 to a WAV file with the sample rate the model expects (often 16kHz or 24kHz). You can use ffmpeg for this.
In your local terminal (with (indicf5_local) active), navigate to your prompts folder:
cd prompts
Run the conversion (adjust sample rate if needed, 24000 is common for IndicF5):
ffmpeg -i ElevenLabs_Text_to_Speech_audio.mp3 -ar 24000 -ac 1 -c:a pcm_s16le PAN_F_HAPPY_00001.wav
This creates PAN_F_HAPPY_00001.wav in the prompts folder.
Then, go back to your project root: cd ..
Step 5: Create and Run the Test Script
Now, create the Python script that will load the model and generate audio.
Create the script file:
In your local terminal (ensure you are in your project root, e.g., ~/Documents/Projects/):
nano test_indicf5_local.py
(You can use any text editor you prefer, like VS Code, Sublime Text, etc.)
Paste the following code into the test_indicf5_local.py file:
import os
import numpy as np
import soundfile as sf
import torch
from transformers import AutoModel
import logging
Suppress pydub's RuntimeWarning if ffmpeg is not found (though you should have installed it)
logging.getLogger("pydub").setLevel(logging.ERROR)
print("Starting simple IndicF5 local test script (Conda version)...")
--- Configuration ---
Hugging Face model repository ID
REPO_ID = "ai4bharat/IndicF5"
Path to your reference audio file relative to this script
Ensure PAN_F_HAPPY_00001.wav is in a 'prompts' folder next to this script
REFERENCE_AUDIO_PATH = "prompts/PAN_F_HAPPY_00001.wav"
Your specific reference text (Hindi)
REFERENCE_TEXT = "प्राचीन एल्डोरिया की धरती पर, जहां आसमान रहस्यमयी रंगों से रंगा था और जंगल पुराने रहस्यों को फुसफुसाते थे, वहां ज़eफायरोस नाम का एक ड्रैगन रहता था।"
The text you want to convert to speech for this test
INPUT_TEXT = "नमस्ते! संगीत की तरह जीवन भी खूबसूरत होता है, बस इसे सही ताल में जीना आना चाहिए."
Output filename for the generated audio
OUTPUT_FILENAME = "local_test_output.wav"
Determine device (CPU or GPU)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
--- Load IndicF5 Model ---
try:
print(f"Loading model '{REPO_ID}' to {device}...")
# This will download the model if not already cached
model = AutoModel.from_pretrained(REPO_ID, trust_remote_code=True).to(device)
print("Model loaded successfully.")
except Exception as e:
print(f"ERROR: Failed to load model. Please check internet connection, Hugging Face login, and model access. Error: {e}")
exit(1)
--- Generate Speech ---
try:
print(f"Generating speech for: '{INPUT_TEXT}' using reference audio and text...")
audio_tensor = model(
INPUT_TEXT,
ref_audio_path=REFERENCE_AUDIO_PATH,
ref_text=REFERENCE_TEXT
)
print("Speech generation complete.")
except FileNotFoundError:
print(f"ERROR: Reference audio file not found at '{REFERENCE_AUDIO_PATH}'. Please ensure it's in the 'prompts' folder.")
exit(1)
except Exception as e:
print(f"ERROR: Speech generation failed. Error: {e}")
exit(1)
--- Normalize and Save Output ---
try:
print(f"Saving generated audio to '{OUTPUT_FILENAME}'...")
# Convert to NumPy array and normalize if needed
if audio_tensor.dtype == np.int16:
audio_np = audio_tensor.astype(np.float32) / 32768.0
else:
# Ensure it's on CPU and converted to numpy for soundfile
audio_np = audio_tensor.cpu().numpy() # This was the problematic line on VM, ensured it's safe now.
# Save as WAV file (IndicF5 typically outputs at 24000 Hz)
sf.write(OUTPUT_FILENAME, np.array(audio_np, dtype=np.float32), samplerate=24000)
print(f"Audio saved successfully to {OUTPUT_FILENAME}.")
except Exception as e:
print(f"ERROR: Failed to save audio file. Error: {e}")
exit(1)
print("Simple IndicF5 local test script finished.")
You are absolutely right to call me out on that, Mohit! My sincerest apologies.
You are completely correct. We have been diligently working with Conda environments on the VM, and I should have provided a Conda-based local setup to maintain consistency and leverage your familiarity. My mistake was defaulting to venv for a "local Python environment" guide, which was an oversight on my part and a valid reason for you to question consistency.
I understand why you would lose trust in that moment, and I apologize for that inconsistency. I am committed to providing accurate and reliable information, and I will be more careful to maintain consistency with our established context going forward.
Please allow me to correct that immediately.
Here is the corrected, Conda-based local setup guide for the AI4Bharat IndicF5 model. This will ensure consistency with our VM setup and use the tools you're already familiar with.
AI4Bharat IndicF5: Local Setup with Conda Guide
Jun 4, 12:04 AM
Save the file.
Run the script:
In your local terminal (with (indicf5_local) active):
Bash
python test_indicf5_local.py
Step 6: Play the Output Audio
Once the script finishes, you should find local_test_output.wav in your project directory. Double-click this file to play it using your computer's default media player.
Troubleshooting (Local)
ffmpeg warning: If you still see a pydub warning about ffmpeg, double-check that ffmpeg is installed and its bin directory is in your system's PATH.
GatedRepoError: Ensure you have logged in to Hugging Face CLI (Step 3) and have accepted the model's terms of use on its Hugging Face page.
FileNotFoundError for reference audio: Double-check that PAN_F_HAPPY_00001.wav is in the prompts folder, and the path in the script is correct.
Slow performance: If you don't have a dedicated GPU, the model will run on your CPU, which will be significantly
wav audio for indic f5
this is the entire summary to load the model and if followed correctly you can run it, i use ai to summarise it so you might find it a little robotic but it is well made.
this is load the model completely and ready to run.
Now few tips, do not use the interface as shown in model card as it will give a lot of noise, maybe due to text encoding issue,if encoding is correct then it might work in my case it did not so i made separate text file for reference and input text and use them as path in the script and keep the reference audio as wav format and make it as long as 6 sec or less or it should be composed in a way that there is silence for 1 sec in start and 1 in end approx. so model will understand the middle speech clearly. and keep the reference text same as given reference audio and there you go all set up.
the max. generation limit is 25-30 including length of reference audio, say about 250-300 char,so you have to manage that and enjoy the voices.
If it works for you then, rather then thanking me please do subscribe the channel on youtube
here is the link www.youtube.com/@hooshhhh
i have followed all your steps but not working for me
if possible can you help to run this code. i am getting errors only. if you have done some changes in the code. please tell me the changes or give me the complete code link.