Spaces:

diabolic6045
/

tts-api

Running

App Files Files Community

Avinyaa commited on 26 days ago

Commit

9acb9c3

1 Parent(s): bbaf488

new

Browse files

Files changed (9) hide show

.gitignore +24 -1
Dockerfile +2 -8
README.md +123 -37
XTTS-v2_C3PO +0 -1
app.py +82 -90
client_example.py +90 -32
requirements.txt +3 -6
test.py +21 -8
test_kokoro_install.py +1 -0

.gitignore CHANGED Viewed

	@@ -1 +1,24 @@
1	- ~~XTTS-v2_C3PO/~~

+# Generated audio files
+*.wav
+*.mp3
+# Python cache
+__pycache__/
+*.pyc
+*.pyo
+# Temporary files
+*.tmp
+*.temp
+# Environment files
+.env
+.venv/
+# IDE files
+.vscode/
+.idea/
+# OS files
+.DS_Store
+Thumbs.db

Dockerfile CHANGED Viewed

@@ -6,8 +6,8 @@ WORKDIR /app
 ENV NUMBA_CACHE_DIR=/tmp/numba_cache
 ENV NUMBA_DISABLE_JIT=1
-# Install git and git-lfs
-RUN apt-get update && apt-get install -y git git-lfs && rm -rf /var/lib/apt/lists/*
 # Initialize git lfs
 RUN git lfs install
@@ -17,12 +17,6 @@ COPY requirements.txt .
 RUN pip install uv
 RUN uv pip install --no-cache-dir -r requirements.txt --system
-# Clone the XTTS-v2_C3PO model and verify it
-RUN git clone https://huggingface.co/Borcherding/XTTS-v2_C3PO && \
-    ls -la XTTS-v2_C3PO/ && \
-    echo "Model directory contents:" && \
-    find XTTS-v2_C3PO/ -type f -name "*.json" -o -name "*.pth" -o -name "*.pt" | head -10
 COPY . .
 # Expose the port

 ENV NUMBA_CACHE_DIR=/tmp/numba_cache
 ENV NUMBA_DISABLE_JIT=1
+# Install git, git-lfs, and espeak-ng for Kokoro TTS
+RUN apt-get update && apt-get install -y git git-lfs espeak-ng && rm -rf /var/lib/apt/lists/*
 # Initialize git lfs
 RUN git lfs install
 RUN pip install uv
 RUN uv pip install --no-cache-dir -r requirements.txt --system
 COPY . .
 # Expose the port

README.md CHANGED Viewed

@@ -1,103 +1,175 @@
 ---
-title: Tts Api
-emoji: 🚀
 colorFrom: indigo
 colorTo: yellow
 sdk: docker
 pinned: false
 ---
-# TTS API
-A FastAPI-based Text-to-Speech API using XTTS-v2 for voice cloning.
 ## Features
-- Convert text to speech using voice cloning
-- Upload reference speaker audio files
-- Support for multiple languages
 - RESTful API with automatic documentation
 - Docker support
 ## Setup
 ### Local Development
-1. Install dependencies:
 ```bash
 pip install -r requirements.txt
 ```
-2. Run the API:
 ```bash
-python app.py
 ```
-The API will be available at `http://localhost:8000`
 ### Using Docker
 1. Build the Docker image:
 ```bash
-docker build -t tts-api .
 ```
 2. Run the container:
 ```bash
-docker run -p 8000:8000 tts-api
 ```
 ## API Endpoints
 ### Health Check
-- **GET** `/health` - Check API status
-### Text-to-Speech
-- **POST** `/tts` - Convert text to speech with uploaded speaker file
   - **Parameters:**
     - `text` (form): Text to convert to speech
-    - `language` (form): Language code (default: "en")
-    - `speaker_file` (file): Reference speaker audio file
 ### API Documentation
 - **GET** `/docs` - Interactive API documentation (Swagger UI)
 - **GET** `/redoc` - Alternative API documentation
 ## Usage Examples
-### Using Python requests
 ```python
 import requests
 # Prepare the request
-url = "http://localhost:8000/tts"
 data = {
-    "text": "Hello, this is a test of voice cloning!",
-    "language": "en"
 }
-files = {
-    "speaker_file": open("path/to/speaker.wav", "rb")
 }
 # Make the request
-response = requests.post(url, data=data, files=files)
 # Save the generated audio
 if response.status_code == 200:
-    with open("output.wav", "wb") as f:
         f.write(response.content)
     print("Speech generated successfully!")
 ```
-### Using curl
 ```bash
-curl -X POST "http://localhost:8000/tts" \
-  -F "text=Hello, this is a test!" \
-  -F "language=en" \
-  -F "speaker_file=@path/to/speaker.wav" \
-  --output generated_speech.wav
 ```
 ### Using the provided client example
@@ -108,11 +180,25 @@ python client_example.py
 ## Requirements
-- Python 3.8+
-- CUDA-compatible GPU (recommended for faster processing)
-- Audio file in supported format (WAV, MP3, etc.) for speaker reference
-## Model
-This API uses the XTTS-v2_C3PO model for voice cloning, which is automatically downloaded when building the Docker image.

 ---
+title: Kokoro TTS API
+emoji: 🎤
 colorFrom: indigo
 colorTo: yellow
 sdk: docker
 pinned: false
 ---
+# Kokoro TTS API
+A FastAPI-based Text-to-Speech API using Kokoro, an open-weight TTS model with 82 million parameters.
 ## Features
+- Convert text to speech using Kokoro TTS
+- Multiple voice options (af_heart, af_sky, af_bella, etc.)
+- Automatic language detection
 - RESTful API with automatic documentation
 - Docker support
+- Lightweight and fast processing
+- Apache-licensed weights
+## About Kokoro
+[Kokoro](/kˈOkəɹO/) is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
 ## Setup
 ### Local Development
+1. Install system dependencies:
+```bash
+# On Ubuntu/Debian
+sudo apt-get install espeak-ng
+# On macOS
+brew install espeak
+```
+2. Install Python dependencies:
 ```bash
 pip install -r requirements.txt
 ```
+3. Run the API:
 ```bash
+uvicorn app:app --host 0.0.0.0 --port 7860
 ```
+The API will be available at `http://localhost:7860`
 ### Using Docker
 1. Build the Docker image:
 ```bash
+docker build -t kokoro-tts-api .
 ```
 2. Run the container:
 ```bash
+docker run -p 7860:7860 kokoro-tts-api
 ```
 ## API Endpoints
 ### Health Check
+- **GET** `/health` - Check API status and device information
+### Available Voices
+- **GET** `/voices` - Get list of available voices
+### Text-to-Speech (Form Data)
+- **POST** `/tts` - Convert text to speech using form data
   - **Parameters:**
     - `text` (form): Text to convert to speech
+    - `voice` (form): Voice to use (default: "af_heart")
+    - `lang_code` (form): Language code (default: "a" for auto-detect)
+### Text-to-Speech (JSON)
+- **POST** `/tts-json` - Convert text to speech using JSON request body
+  - **Body:** JSON object with `text`, `voice`, and `lang_code` fields
 ### API Documentation
 - **GET** `/docs` - Interactive API documentation (Swagger UI)
 - **GET** `/redoc` - Alternative API documentation
+## Available Voices
+- `af_heart` - Female voice (Heart)
+- `af_sky` - Female voice (Sky)
+- `af_bella` - Female voice (Bella)
+- `af_sarah` - Female voice (Sarah)
+- `af_nicole` - Female voice (Nicole)
+- `am_adam` - Male voice (Adam)
+- `am_michael` - Male voice (Michael)
+- `am_edward` - Male voice (Edward)
+- `am_lewis` - Male voice (Lewis)
 ## Usage Examples
+### Using Python requests (Form Data)
 ```python
 import requests
 # Prepare the request
+url = "http://localhost:7860/tts"
 data = {
+    "text": "Hello, this is Kokoro TTS in action!",
+    "voice": "af_heart",
+    "lang_code": "a"
 }
+# Make the request
+response = requests.post(url, data=data)
+# Save the generated audio
+if response.status_code == 200:
+    with open("kokoro_output.wav", "wb") as f:
+        f.write(response.content)
+    print("Speech generated successfully!")
+```
+### Using Python requests (JSON)
+```python
+import requests
+# Prepare the JSON request
+url = "http://localhost:7860/tts-json"
+data = {
+    "text": "Kokoro delivers high-quality speech synthesis!",
+    "voice": "af_bella",
+    "lang_code": "a"
 }
+headers = {"Content-Type": "application/json"}
 # Make the request
+response = requests.post(url, json=data, headers=headers)
 # Save the generated audio
 if response.status_code == 200:
+    with open("kokoro_json_output.wav", "wb") as f:
         f.write(response.content)
     print("Speech generated successfully!")
 ```
+### Using curl (Form Data)
+```bash
+curl -X POST "http://localhost:7860/tts" \
+  -F "text=Hello from Kokoro TTS!" \
+  -F "voice=af_heart" \
+  -F "lang_code=a" \
+  --output kokoro_speech.wav
+```
+### Using curl (JSON)
 ```bash
+curl -X POST "http://localhost:7860/tts-json" \
+  -H "Content-Type: application/json" \
+  -d '{"text":"Hello from Kokoro TTS!","voice":"af_heart","lang_code":"a"}' \
+  --output kokoro_speech.wav
+```
+### Get Available Voices
+```bash
+curl http://localhost:7860/voices
 ```
 ### Using the provided client example
 ## Requirements
+- Python 3.11+
+- espeak-ng system package
+- CUDA-compatible GPU (optional, for faster processing)
+## Model Information
+This API uses Kokoro TTS, which:
+- Has 82 million parameters
+- Supports multiple voices and languages
+- Provides fast, high-quality speech synthesis
+- Uses Apache-licensed weights
+- Requires minimal system resources compared to larger models
+## Testing
+Run the standalone test:
+```bash
+python test.py
+```
+This will generate audio files demonstrating Kokoro's capabilities.

XTTS-v2_C3PO DELETED Viewed

	@@ -1 +0,0 @@
1	- Subproject commit 4a9c0315b5b82f33bced654b0773e74832f2bb9a

app.py CHANGED Viewed

@@ -1,169 +1,161 @@
-from fastapi import FastAPI, HTTPException, UploadFile, File, Form
 from fastapi.responses import FileResponse
 from pydantic import BaseModel
-from TTS.api import TTS
 import os
 import tempfile
 import uuid
-import torch
-from typing import Optional
 import logging
 # Configure logging
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
-app = FastAPI(title="TTS API", description="Text-to-Speech API using XTTS-v2", version="1.0.0")
 class TTSRequest(BaseModel):
     text: str
-    language: str = "en"
-class TTSService:
     def __init__(self):
         self.device = "cuda" if torch.cuda.is_available() else "cpu"
         logger.info(f"Using device: {self.device}")
-        # Use absolute paths for the model
-        model_path = "XTTS-v2_C3PO/"
-        config_path = "XTTS-v2_C3PO/config.json"
-        # Check if model files exist
-        if not os.path.exists(config_path):
-            logger.warning(f"Custom model config not found at {config_path}")
-            # List contents of model directory for debugging
-            model_dir = "/app/XTTS-v2_C3PO"
-            if os.path.exists(model_dir):
-                logger.info(f"Contents of {model_dir}: {os.listdir(model_dir)}")
-            else:
-                logger.warning(f"Model directory {model_dir} does not exist")
-            # Fallback to default XTTS model
-            logger.info("Falling back to default XTTS model")
-            try:
-                self.tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(self.device)
-                logger.info("Default TTS model loaded successfully")
-                return
-            except Exception as e:
-                logger.error(f"Failed to load default TTS model: {e}")
-                raise e
         try:
-            self.tts = TTS(
-                model_path=model_path,
-                config_path=config_path,
-                progress_bar=False,
-                gpu=torch.cuda.is_available()
-            ).to(self.device)
-            logger.info("Custom TTS model loaded successfully")
         except Exception as e:
-            logger.error(f"Failed to load custom TTS model: {e}")
-            # Fallback to default model
-            logger.info("Falling back to default XTTS model")
-            try:
-                self.tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(self.device)
-                logger.info("Default TTS model loaded successfully")
-            except Exception as fallback_e:
-                logger.error(f"Failed to load default TTS model: {fallback_e}")
-                raise fallback_e
-    def generate_speech(self, text: str, speaker_wav_path: str, language: str = "en") -> str:
         """Generate speech and return the path to the output file"""
         try:
             # Create a unique filename for the output
-            output_filename = f"output_{uuid.uuid4().hex}.wav"
             output_path = os.path.join(tempfile.gettempdir(), output_filename)
-            # Generate speech
-            self.tts.tts_to_file(
-                text=text,
-                file_path=output_path,
-                speaker_wav=speaker_wav_path,
-                language=language
-            )
             return output_path
         except Exception as e:
             logger.error(f"Error generating speech: {e}")
             raise HTTPException(status_code=500, detail=f"Failed to generate speech: {str(e)}")
-# Initialize TTS service
-tts_service = TTSService()
 @app.get("/")
 async def root():
-    return {"message": "TTS API is running", "status": "healthy"}
 @app.get("/health")
 async def health_check():
     return {"status": "healthy", "device": tts_service.device}
 @app.post("/tts")
 async def text_to_speech(
     text: str = Form(...),
-    language: str = Form("en"),
-    speaker_file: UploadFile = File(...)
 ):
     """
-    Convert text to speech using a reference speaker voice
     - **text**: The text to convert to speech
-    - **language**: Language code (default: "en")
-    - **speaker_file**: Audio file containing the reference speaker voice
     """
     if not text.strip():
         raise HTTPException(status_code=400, detail="Text cannot be empty")
-    # Validate file type
-    if not speaker_file.content_type.startswith('audio/'):
-        raise HTTPException(status_code=400, detail="Speaker file must be an audio file")
     try:
-        # Save uploaded speaker file temporarily
-        speaker_temp_path = "XTTS-v2_C3PO/reference.wav"
-        with open(speaker_temp_path, "wb") as buffer:
-            content = await speaker_file.read()
-            buffer.write(content)
         # Generate speech
-        output_path = tts_service.generate_speech(text, speaker_temp_path, language)
         # Return the generated audio file
         return FileResponse(
             output_path,
             media_type="audio/wav",
-            filename=f"tts_output_{uuid.uuid4().hex}.wav",
             headers={"Content-Disposition": "attachment"}
         )
     except Exception as e:
-        # Clean up files in case of error
-        if 'speaker_temp_path' in locals() and os.path.exists(speaker_temp_path):
-            os.remove(speaker_temp_path)
         logger.error(f"Error in TTS endpoint: {e}")
         raise HTTPException(status_code=500, detail=str(e))
-@app.post("/tts-with-url")
-async def text_to_speech_with_url(request: TTSRequest, speaker_wav_url: str):
     """
-    Convert text to speech using a reference speaker voice from URL
-    - **request**: TTSRequest containing text and language
-    - **speaker_wav_url**: URL to the reference speaker audio file
     """
     if not request.text.strip():
         raise HTTPException(status_code=400, detail="Text cannot be empty")
     try:
-        # For this endpoint, you would need to download the file from URL
-        # This is a simplified version - you might want to add URL validation and download logic
-        raise HTTPException(status_code=501, detail="URL-based speaker input not implemented yet")
     except Exception as e:
-        logger.error(f"Error in TTS URL endpoint: {e}")
         raise HTTPException(status_code=500, detail=str(e))

+from fastapi import FastAPI, HTTPException, Form
 from fastapi.responses import FileResponse
 from pydantic import BaseModel
+from kokoro import KPipeline
+import soundfile as sf
+import torch
 import os
 import tempfile
 import uuid
 import logging
+from typing import Optional
 # Configure logging
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
+app = FastAPI(title="Kokoro TTS API", description="Text-to-Speech API using Kokoro", version="1.0.0")
 class TTSRequest(BaseModel):
     text: str
+    voice: str = "af_heart"
+    lang_code: str = "a"
+class KokoroTTSService:
     def __init__(self):
         self.device = "cuda" if torch.cuda.is_available() else "cpu"
         logger.info(f"Using device: {self.device}")
         try:
+            # Initialize Kokoro pipeline with default language
+            self.pipeline = KPipeline(lang_code='a')
+            logger.info("Kokoro TTS pipeline loaded successfully")
         except Exception as e:
+            logger.error(f"Failed to load Kokoro TTS pipeline: {e}")
+            raise e
+    def generate_speech(self, text: str, voice: str = "af_heart", lang_code: str = "a") -> str:
         """Generate speech and return the path to the output file"""
         try:
             # Create a unique filename for the output
+            output_filename = f"kokoro_output_{uuid.uuid4().hex}.wav"
             output_path = os.path.join(tempfile.gettempdir(), output_filename)
+            # Update pipeline language if different
+            if self.pipeline.lang_code != lang_code:
+                self.pipeline = KPipeline(lang_code=lang_code)
+            # Generate speech using Kokoro
+            generator = self.pipeline(text, voice=voice)
+            # Get the first (and typically only) audio output
+            for i, (gs, ps, audio) in enumerate(generator):
+                logger.info(f"Generated audio segment {i}: gs={gs}, ps={ps}")
+                # Save the audio to file
+                sf.write(output_path, audio, 24000)
+                break  # Take the first generated audio
             return output_path
         except Exception as e:
             logger.error(f"Error generating speech: {e}")
             raise HTTPException(status_code=500, detail=f"Failed to generate speech: {str(e)}")
+    def get_available_voices(self):
+        """Return list of available voices"""
+        # Common Kokoro voices - you may want to expand this list
+        return [
+            "af_heart", "af_sky", "af_bella", "af_sarah", "af_nicole",
+            "am_adam", "am_michael", "am_edward", "am_lewis"
+        ]
+# Initialize Kokoro TTS service
+tts_service = KokoroTTSService()
 @app.get("/")
 async def root():
+    return {"message": "Kokoro TTS API is running", "status": "healthy"}
 @app.get("/health")
 async def health_check():
     return {"status": "healthy", "device": tts_service.device}
+@app.get("/voices")
+async def get_voices():
+    """Get list of available voices"""
+    return {"voices": tts_service.get_available_voices()}
 @app.post("/tts")
 async def text_to_speech(
     text: str = Form(...),
+    voice: str = Form("af_heart"),
+    lang_code: str = Form("a")
 ):
     """
+    Convert text to speech using Kokoro TTS
     - **text**: The text to convert to speech
+    - **voice**: Voice to use (default: "af_heart")
+    - **lang_code**: Language code (default: "a" for auto-detect)
     """
     if not text.strip():
         raise HTTPException(status_code=400, detail="Text cannot be empty")
+    # Validate voice
+    available_voices = tts_service.get_available_voices()
+    if voice not in available_voices:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Voice '{voice}' not available. Available voices: {available_voices}"
+        )
     try:
         # Generate speech
+        output_path = tts_service.generate_speech(text, voice, lang_code)
         # Return the generated audio file
         return FileResponse(
             output_path,
             media_type="audio/wav",
+            filename=f"kokoro_tts_{voice}_{uuid.uuid4().hex}.wav",
             headers={"Content-Disposition": "attachment"}
         )
     except Exception as e:
         logger.error(f"Error in TTS endpoint: {e}")
         raise HTTPException(status_code=500, detail=str(e))
+@app.post("/tts-json")
+async def text_to_speech_json(request: TTSRequest):
     """
+    Convert text to speech using JSON request body
+    - **request**: TTSRequest containing text, voice, and lang_code
     """
     if not request.text.strip():
         raise HTTPException(status_code=400, detail="Text cannot be empty")
+    # Validate voice
+    available_voices = tts_service.get_available_voices()
+    if request.voice not in available_voices:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Voice '{request.voice}' not available. Available voices: {available_voices}"
+        )
     try:
+        # Generate speech
+        output_path = tts_service.generate_speech(request.text, request.voice, request.lang_code)
+        # Return the generated audio file
+        return FileResponse(
+            output_path,
+            media_type="audio/wav",
+            filename=f"kokoro_tts_{request.voice}_{uuid.uuid4().hex}.wav",
+            headers={"Content-Disposition": "attachment"}
+        )
     except Exception as e:
+        logger.error(f"Error in TTS JSON endpoint: {e}")
         raise HTTPException(status_code=500, detail=str(e))

client_example.py CHANGED Viewed

@@ -1,41 +1,70 @@
 import requests
-import os
-def test_tts_api():
-    """Example of how to use the TTS API"""
     # API endpoint
-    url = "http://localhost:8000/tts"
-    # Text to convert to speech
-    text = "It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent."
-    # Path to your speaker reference audio file
-    speaker_file_path = "/path/to/target/speaker.wav"  # Update this path
-    # Check if speaker file exists
-    if not os.path.exists(speaker_file_path):
-        print(f"Error: Speaker file not found at {speaker_file_path}")
-        print("Please update the speaker_file_path variable with a valid audio file path")
-        return
-    # Prepare the request
     data = {
         "text": text,
-        "language": "en"
     }
-    files = {
-        "speaker_file": open(speaker_file_path, "rb")
     }
     try:
-        print("Sending request to TTS API...")
-        response = requests.post(url, data=data, files=files)
         if response.status_code == 200:
             # Save the generated audio
-            output_filename = "generated_speech.wav"
             with open(output_filename, "wb") as f:
                 f.write(response.content)
             print(f"Success! Generated speech saved as {output_filename}")
@@ -44,30 +73,59 @@ def test_tts_api():
             print(response.text)
     except requests.exceptions.ConnectionError:
-        print("Error: Could not connect to the API. Make sure the server is running on http://localhost:8000")
     except Exception as e:
         print(f"Error: {e}")
-    finally:
-        files["speaker_file"].close()
 def check_api_health():
     """Check if the API is running"""
     try:
-        response = requests.get("http://localhost:8000/health")
         if response.status_code == 200:
             print("API is healthy:", response.json())
         else:
             print("API health check failed:", response.status_code)
     except requests.exceptions.ConnectionError:
-        print("API is not running. Start it with: python app.py")
 if __name__ == "__main__":
-    print("TTS API Client Example")
-    print("=" * 30)
     # First check if API is running
-    check_api_health()
-    print()
-    # Test the TTS functionality
-    test_tts_api()

 import requests
+import json
+def test_kokoro_tts_api():
+    """Example of how to use the Kokoro TTS API"""
     # API endpoint
+    url = "http://localhost:7860/tts"
+    # Text to convert to speech (using the example from the user's request)
+    text = """
+[Kokoro](/kˈOkəɹO/) is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, [Kokoro](/kˈOkəɹO/) can be deployed anywhere from production environments to personal projects.
+"""
+    # Prepare the request data
+    data = {
+        "text": text,
+        "voice": "af_heart",  # Available voices: af_heart, af_sky, af_bella, etc.
+        "lang_code": "a"      # 'a' for auto-detect
+    }
+    try:
+        print("Sending request to Kokoro TTS API...")
+        response = requests.post(url, data=data)
+        if response.status_code == 200:
+            # Save the generated audio
+            output_filename = "kokoro_generated_speech.wav"
+            with open(output_filename, "wb") as f:
+                f.write(response.content)
+            print(f"Success! Generated speech saved as {output_filename}")
+        else:
+            print(f"Error: {response.status_code}")
+            print(response.text)
+    except requests.exceptions.ConnectionError:
+        print("Error: Could not connect to the API. Make sure the server is running on http://localhost:7860")
+    except Exception as e:
+        print(f"Error: {e}")
+def test_kokoro_tts_json_api():
+    """Example of using the JSON endpoint"""
+    # API endpoint
+    url = "http://localhost:7860/tts-json"
+    # Text to convert to speech
+    text = "Hello, this is a test of the Kokoro TTS system using the JSON API endpoint."
+    # Prepare the JSON request
     data = {
         "text": text,
+        "voice": "af_bella",
+        "lang_code": "a"
     }
+    headers = {
+        "Content-Type": "application/json"
     }
     try:
+        print("Sending JSON request to Kokoro TTS API...")
+        response = requests.post(url, json=data, headers=headers)
         if response.status_code == 200:
             # Save the generated audio
+            output_filename = "kokoro_json_speech.wav"
             with open(output_filename, "wb") as f:
                 f.write(response.content)
             print(f"Success! Generated speech saved as {output_filename}")
             print(response.text)
     except requests.exceptions.ConnectionError:
+        print("Error: Could not connect to the API. Make sure the server is running on http://localhost:7860")
     except Exception as e:
         print(f"Error: {e}")
+def get_available_voices():
+    """Get list of available voices"""
+    try:
+        response = requests.get("http://localhost:7860/voices")
+        if response.status_code == 200:
+            voices = response.json()
+            print("Available voices:", voices["voices"])
+            return voices["voices"]
+        else:
+            print("Failed to get voices:", response.status_code)
+            return []
+    except requests.exceptions.ConnectionError:
+        print("API is not running. Start it with: uvicorn app:app --host 0.0.0.0 --port 7860")
+        return []
 def check_api_health():
     """Check if the API is running"""
     try:
+        response = requests.get("http://localhost:7860/health")
         if response.status_code == 200:
             print("API is healthy:", response.json())
+            return True
         else:
             print("API health check failed:", response.status_code)
+            return False
     except requests.exceptions.ConnectionError:
+        print("API is not running. Start it with: uvicorn app:app --host 0.0.0.0 --port 7860")
+        return False
 if __name__ == "__main__":
+    print("Kokoro TTS API Client Example")
+    print("=" * 35)
     # First check if API is running
+    if check_api_health():
+        print()
+        # Get available voices
+        voices = get_available_voices()
+        print()
+        # Test the TTS functionality with form data
+        print("Testing form-data endpoint...")
+        test_kokoro_tts_api()
+        print()
+        # Test the TTS functionality with JSON
+        print("Testing JSON endpoint...")
+        test_kokoro_tts_json_api()
+    else:
+        print("\nPlease start the API server first:")
+        print("uvicorn app:app --host 0.0.0.0 --port 7860")

requirements.txt CHANGED Viewed

@@ -1,10 +1,7 @@
-coqui-tts
-pandas
-scikit-learn
 fastapi
 uvicorn[standard]
 python-multipart
 torch
-torchaudio
-requests
-librosa==0.10.1

+kokoro>=0.9.2
+soundfile
 fastapi
 uvicorn[standard]
 python-multipart
 torch
+torchaudio

test.py CHANGED Viewed

@@ -1,10 +1,23 @@
-from TTS.api import TTS
-tts = TTS(model_path="XTTS-v2_C3PO/",
-          config_path="XTTS-v2_C3PO/config.json", progress_bar=False, gpu=True).to(self.device)
-# generate speech by cloning a voice using default settings
-tts.tts_to_file(text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
-                file_path="output.wav",
-                speaker_wav="/path/to/target/speaker.wav",
-                language="en")

+from kokoro import KPipeline
+import soundfile as sf
+import torch
+# Initialize Kokoro pipeline
+pipeline = KPipeline(lang_code='a')
+# Text to convert to speech
+text = '''
+[Kokoro](/kˈOkəɹO/) is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, [Kokoro](/kˈOkəɹO/) can be deployed anywhere from production environments to personal projects.
+'''
+# Generate speech using Kokoro
+generator = pipeline(text, voice='af_heart')
+# Process and save the generated audio
+for i, (gs, ps, audio) in enumerate(generator):
+    print(f"Segment {i}: gs={gs}, ps={ps}")
+    # Save each segment as a separate file
+    sf.write(f'{i}.wav', audio, 24000)
+    print(f"Saved segment {i} as {i}.wav")
+print("Speech generation completed!")

test_kokoro_install.py ADDED Viewed

	@@ -0,0 +1 @@


1	+