Spaces:

diabolic6045
/

tts-api

Sleeping

App Files Files Community

Divax commited on Jun 2

Commit

71905d8

1 Parent(s): 6a83fff

test

Browse files

Files changed (9) hide show

Dockerfile.coqui +51 -0
README_coqui.md +351 -0
coqui_api.py +372 -0
requirements.txt +13 -11
requirements_coqui.txt +12 -0
start_c3po_api.py +176 -0
test_c3po_model.py +214 -0
test_coqui_api.py +146 -0
test_coqui_tts.py +99 -0

Dockerfile.coqui ADDED Viewed

	@@ -0,0 +1,51 @@

+FROM python:3.11
+# Set up a new user named "user" with user ID 1000
+RUN useradd -m -u 1000 user
+# Install system dependencies as root
+RUN apt-get update && apt-get install -y \
+    git \
+    git-lfs \
+    espeak-ng \
+    ffmpeg \
+    libsndfile1 \
+    && rm -rf /var/lib/apt/lists/*
+# Initialize git lfs
+RUN git lfs install
+# Switch to the "user" user
+USER user
+# Set home to the user's home directory
+ENV HOME=/home/user \
+    PATH=/home/user/.local/bin:$PATH \
+    COQUI_TOS_AGREED=1 \
+    HF_HUB_DISABLE_TELEMETRY=1 \
+    HF_HOME=/home/user/.cache/huggingface
+# Set the working directory to the user's home directory
+WORKDIR $HOME/app
+# Upgrade pip
+RUN pip install --no-cache-dir --upgrade pip
+# Install PyTorch with CPU support for Hugging Face Spaces
+RUN pip install --no-cache-dir torch torchaudio --index-url https://download.pytorch.org/whl/cpu
+# Copy requirements and install dependencies
+COPY --chown=user requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy the API file
+COPY --chown=user coqui_api.py .
+# Create necessary directories
+RUN mkdir -p $HOME/.cache $HOME/app/models
+# Expose the port
+EXPOSE 7860
+# Start the Coqui TTS API
+CMD ["uvicorn", "coqui_api:app", "--host", "0.0.0.0", "--port", "7860"]

README_coqui.md ADDED Viewed

	@@ -0,0 +1,351 @@

+# 🤖 Coqui TTS C-3PO API for Hugging Face Spaces
+A FastAPI-based text-to-speech service using the Coqui TTS library with the **C-3PO fine-tuned XTTS v2 model** from [Borcherding/XTTS-v2_C3PO](https://huggingface.co/Borcherding/XTTS-v2_C3PO) for authentic C-3PO voice synthesis.
+## ✨ Features
+- 🤖 **C-3PO Voice**: Authentic C-3PO voice using fine-tuned XTTS v2 model
+- 🎯 **Text-to-Speech**: Convert text to natural-sounding speech
+- 🎭 **Voice Cloning**: Clone any voice from a reference audio sample
+- 🌍 **Multilingual**: Support for 17+ languages with C-3PO voice characteristics
+- 🚀 **FastAPI**: Modern, fast API with automatic documentation
+- 🐳 **Docker Ready**: Containerized for easy deployment
+- ☁️ **Hugging Face Spaces**: Optimized for HF Spaces deployment
+## 🎭 C-3PO Model Information
+This API uses the fine-tuned C-3PO voice model from [Borcherding/XTTS-v2_C3PO](https://huggingface.co/Borcherding/XTTS-v2_C3PO), which features:
+- **Fine-tuned on 20 unique C-3PO voice lines** from Star Wars
+- **Multi-lingual support** (17 languages) while maintaining C-3PO's distinctive voice
+- **Emotion & Style Transfer** capturing C-3PO's formal, protocol droid characteristics
+- **High-Quality Audio** output at 24kHz sampling rate
+## 📡 API Endpoints
+### 1. Health Check
+```bash
+GET /health
+```
+Returns API status, model information, and C-3PO voice availability.
+### 2. List Models
+```bash
+GET /models
+```
+Returns available TTS models.
+### 3. C-3PO Text-to-Speech (Dedicated)
+```bash
+POST /tts-c3po
+```
+**Parameters:**
+- `text` (string): Text to convert to C-3PO voice (2-500 characters)
+- `language` (string): Language code (default: "en")
+**Example using curl:**
+```bash
+curl -X POST "http://localhost:7860/tts-c3po" \
+  -F "text=I am C-3PO, human-cyborg relations." \
+  -F "language=en" \
+  --output c3po_voice.wav
+```
+### 4. General Text-to-Speech
+```bash
+POST /tts
+```
+**Parameters:**
+- `text` (string): Text to convert to speech (2-500 characters)
+- `language` (string): Language code (default: "en")
+- `speaker_file` (file, optional): Reference audio for voice cloning
+- `use_c3po_voice` (boolean): Use C-3PO voice if no speaker file provided (default: true)
+**Example using curl:**
+```bash
+# C-3PO voice (default)
+curl -X POST "http://localhost:7860/tts" \
+  -F "text=The odds of successfully navigating an asteroid field are approximately 3,720 to 1." \
+  -F "language=en" \
+  --output c3po_output.wav
+# Custom voice cloning
+curl -X POST "http://localhost:7860/tts" \
+  -F "text=This will sound like the reference voice." \
+  -F "language=en" \
+  -F "speaker_file=@reference_voice.wav" \
+  -F "use_c3po_voice=false" \
+  --output cloned_voice.wav
+```
+### 5. JSON TTS (C-3PO Voice)
+```bash
+POST /tts-json
+```
+**JSON Body:**
+```json
+{
+  "text": "R2-D2, you know better than to trust a strange computer!",
+  "language": "en"
+}
+```
+## 🚀 Deployment on Hugging Face Spaces
+### Step 1: Create a new Space
+1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
+2. Click "Create new Space"
+3. Choose "Docker" as the SDK
+4. Set your space name and visibility
+### Step 2: Add files to your Space
+Upload these files to your Hugging Face Space repository:
+```
+your-space/
+├── coqui_api.py          # Main API file with C-3PO integration
+├── requirements.txt      # Dependencies (includes huggingface_hub)
+├── Dockerfile.coqui      # Docker configuration
+├── test_c3po_model.py    # Test script for C-3PO functionality
+└── README.md            # This file
+```
+### Step 3: Configure your Space
+Rename the files in your Space:
+- `Dockerfile.coqui` → `Dockerfile`
+### Step 4: Deploy
+Your Space will automatically build and deploy. The build process may take 15-20 minutes as it downloads the C-3PO fine-tuned model from Hugging Face.
+## 💻 Local Development
+### Requirements
+- Python 3.11+
+- PyTorch
+- Coqui TTS library
+- Hugging Face Hub
+### Installation
+```bash
+# Clone the repository
+git clone <your-repo>
+cd <your-repo>
+# Install dependencies
+pip install -r requirements.txt
+# Run the API
+python coqui_api.py
+```
+The API will be available at `http://localhost:7860`
+### Testing
+```bash
+# Run the C-3PO model test suite
+python test_c3po_model.py
+# Run the general test client
+python test_coqui_api.py
+```
+## 🎪 Usage Examples
+### Python Client - C-3PO Voice
+```python
+import requests
+# C-3PO voice synthesis
+data = {"text": "I am C-3PO, human-cyborg relations.", "language": "en"}
+response = requests.post("http://localhost:7860/tts-c3po", data=data)
+with open("c3po_output.wav", "wb") as f:
+    f.write(response.content)
+# JSON API
+import json
+headers = {'Content-Type': 'application/json'}
+data = {"text": "The odds are approximately 3,720 to 1!", "language": "en"}
+response = requests.post("http://localhost:7860/tts-json", json=data, headers=headers)
+with open("c3po_json.wav", "wb") as f:
+    f.write(response.content)
+```
+### JavaScript/Web - C-3PO Voice
+```javascript
+// C-3PO voice synthesis
+const formData = new FormData();
+formData.append('text', 'Oh my! How interesting!');
+formData.append('language', 'en');
+fetch('http://localhost:7860/tts-c3po', {
+    method: 'POST',
+    body: formData
+})
+.then(response => response.blob())
+.then(blob => {
+    const url = URL.createObjectURL(blob);
+    const audio = new Audio(url);
+    audio.play();
+});
+// JSON API
+fetch('http://localhost:7860/tts-json', {
+    method: 'POST',
+    headers: {'Content-Type': 'application/json'},
+    body: JSON.stringify({
+        text: 'R2-D2, you know better than to trust a strange computer!',
+        language: 'en'
+    })
+})
+.then(response => response.blob())
+.then(blob => {
+    const url = URL.createObjectURL(blob);
+    const audio = new Audio(url);
+    audio.play();
+});
+```
+## 🎨 C-3PO Voice Examples
+Perfect texts for demonstrating C-3PO's voice characteristics:
+```bash
+# Classic C-3PO phrases
+curl -X POST "http://localhost:7860/tts-c3po" \
+  -F "text=I am C-3PO, human-cyborg relations." \
+  -F "language=en" --output c3po_intro.wav
+curl -X POST "http://localhost:7860/tts-c3po" \
+  -F "text=The odds of successfully navigating an asteroid field are approximately 3,720 to 1." \
+  -F "language=en" --output c3po_odds.wav
+curl -X POST "http://localhost:7860/tts-c3po" \
+  -F "text=R2-D2, you know better than to trust a strange computer!" \
+  -F "language=en" --output c3po_r2d2.wav
+curl -X POST "http://localhost:7860/tts-c3po" \
+  -F "text=Oh my! How interesting!" \
+  -F "language=en" --output c3po_oh_my.wav
+```
+## 🌍 Multilingual C-3PO Support
+The C-3PO model maintains its distinctive voice characteristics across multiple languages:
+```python
+# Multilingual examples
+languages = [
+    ("Hello, I am C-3PO", "en"),
+    ("Hola, soy C-3PO", "es"),
+    ("Bonjour, je suis C-3PO", "fr"),
+    ("Guten Tag, ich bin C-3PO", "de"),
+    ("Ciao, sono C-3PO", "it"),
+    ("Olá, eu sou C-3PO", "pt")
+]
+for text, lang in languages:
+    response = requests.post("http://localhost:7860/tts-c3po",
+                           data={"text": text, "language": lang})
+    with open(f"c3po_{lang}.wav", "wb") as f:
+        f.write(response.content)
+```
+## 🔧 Voice Cloning Guide
+1. **Prepare Reference Audio:**
+   - Duration: 5-10 seconds (optimal)
+   - Format: WAV, MP3, or M4A
+   - Quality: Clear speech, minimal background noise
+   - Content: Natural speaking, preferably in target language
+2. **API Request:**
+   ```bash
+   curl -X POST "http://your-space.hf.space/tts" \
+     -F "text=Your text to synthesize" \
+     -F "language=en" \
+     -F "speaker_file=@your_reference.wav" \
+     --output result.wav
+   ```
+3. **Tips for Best Results:**
+   - Use high-quality reference audio
+   - Match the language of reference and target text
+   - Keep text length reasonable (under 500 characters)
+   - Experiment with different reference samples
+## Supported Languages
+The XTTS v2 model supports multiple languages including:
+- English (en)
+- Spanish (es)
+- French (fr)
+- German (de)
+- Italian (it)
+- Portuguese (pt)
+- Polish (pl)
+- Turkish (tr)
+- Russian (ru)
+- Dutch (nl)
+- Czech (cs)
+- Arabic (ar)
+- Chinese (zh-cn)
+- Japanese (ja)
+- Hungarian (hu)
+- Korean (ko)
+## Troubleshooting
+### Common Issues
+1. **Model Download Errors:**
+   - The first run downloads ~1.7GB model files
+   - Ensure stable internet connection
+   - Check Hugging Face Spaces logs
+2. **Audio Quality Issues:**
+   - Use high-quality reference audio for voice cloning
+   - Ensure reference audio matches target language
+   - Try different reference samples
+3. **Memory Issues on HF Spaces:**
+   - The model requires significant memory
+   - Consider upgrading to a higher-tier Space if needed
+4. **API Timeouts:**
+   - Initial model loading takes time
+   - Subsequent requests are faster
+   - Consider warming up the model with a test request
+### Environment Variables
+- `COQUI_TOS_AGREED=1`: Accepts Coqui TTS terms of service
+- `HF_HUB_DISABLE_TELEMETRY=1`: Disables telemetry
+- `TORCH_HOME`: PyTorch cache directory
+## API Documentation
+Once deployed, visit your Space URL and add `/docs` to access the interactive API documentation:
+```
+https://your-username-your-space-name.hf.space/docs
+```
+## Contributing
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Test thoroughly
+5. Submit a pull request
+## License
+This project uses the Coqui TTS library. Please check [Coqui TTS license](https://github.com/coqui-ai/TTS) for usage terms.
+## Credits
+- [Coqui TTS](https://github.com/coqui-ai/TTS) - The underlying TTS engine
+- [XTTS v2](https://arxiv.org/abs/2309.11321) - The voice cloning model
+- [FastAPI](https://fastapi.tiangolo.com/) - Web framework
+- [Hugging Face Spaces](https://huggingface.co/spaces) - Deployment platform

coqui_api.py ADDED Viewed

	@@ -0,0 +1,372 @@

+import os
+import torch
+import tempfile
+import uuid
+import logging
+from typing import Optional
+from huggingface_hub import snapshot_download
+from fastapi import FastAPI, HTTPException, UploadFile, File, Form
+from fastapi.responses import FileResponse
+from pydantic import BaseModel
+from TTS.api import TTS
+# Set environment variables for Coqui TTS
+os.environ["COQUI_TOS_AGREED"] = "1"
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+app = FastAPI(
+    title="Coqui TTS C-3PO API",
+    description="Text-to-Speech API using Coqui TTS with C-3PO fine-tuned voice model",
+    version="1.0.0"
+)
+class TTSRequest(BaseModel):
+    text: str
+    language: str = "en"
+class CoquiTTSService:
+    def __init__(self):
+        self.device = "cuda" if torch.cuda.is_available() else "cpu"
+        logger.info(f"Using device: {self.device}")
+        # Download and initialize the C-3PO fine-tuned model
+        try:
+            logger.info("Downloading C-3PO fine-tuned XTTS model from Hugging Face...")
+            # Download the model files from Hugging Face
+            model_path = snapshot_download(
+                repo_id="Borcherding/XTTS-v2_C3PO",
+                local_dir="./models/XTTS-v2_C3PO",
+                local_dir_use_symlinks=False
+            )
+            logger.info(f"Model downloaded to: {model_path}")
+            # Initialize TTS with the downloaded C-3PO model
+            config_path = os.path.join(model_path, "config.json")
+            if os.path.exists(config_path):
+                logger.info("Loading C-3PO fine-tuned model...")
+                self.tts = TTS(
+                    model_path=model_path,
+                    config_path=config_path,
+                    progress_bar=False,
+                    gpu=torch.cuda.is_available()
+                ).to(self.device)
+                logger.info("C-3PO fine-tuned model loaded successfully!")
+            else:
+                # Fallback to using the model by name if config not found
+                logger.info("Config not found, trying to load by repo ID...")
+                self.tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(self.device)
+                logger.info("Fallback XTTS v2 model loaded!")
+            # Store model path for reference audio
+            self.model_path = model_path
+            # Check for speakers
+            if hasattr(self.tts, 'speakers') and self.tts.speakers:
+                logger.info(f"Available speakers: {len(self.tts.speakers)}")
+                self.default_speaker = self.tts.speakers[0] if self.tts.speakers else None
+            else:
+                logger.info("No preset speakers available - voice cloning mode")
+                self.default_speaker = None
+        except Exception as e:
+            logger.error(f"Failed to load C-3PO model: {e}")
+            logger.info("Falling back to standard XTTS v2 model...")
+            try:
+                self.tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(self.device)
+                self.model_path = None
+                self.default_speaker = None
+                logger.info("Fallback XTTS v2 model loaded!")
+            except Exception as fallback_error:
+                logger.error(f"Fallback model also failed: {fallback_error}")
+                raise fallback_error
+    def get_c3po_reference_audio(self):
+        """Get reference audio file for C-3PO voice if available"""
+        if self.model_path:
+            # Look for reference audio files in the model directory
+            possible_ref_files = [
+                "reference.wav", "speaker.wav", "c3po.wav",
+                "sample.wav", "reference_audio.wav"
+            ]
+            for ref_file in possible_ref_files:
+                ref_path = os.path.join(self.model_path, ref_file)
+                if os.path.exists(ref_path):
+                    logger.info(f"Found C-3PO reference audio: {ref_path}")
+                    return ref_path
+        return None
+    def generate_speech(self, text: str, speaker_wav_path: Optional[str] = None,
+                       language: str = "en", use_c3po_voice: bool = True) -> str:
+        """Generate speech using Coqui TTS with optional C-3PO voice"""
+        try:
+            # Validate text length
+            if len(text) < 2:
+                raise HTTPException(status_code=400, detail="Text too short")
+            if len(text) > 500:
+                raise HTTPException(status_code=400, detail="Text too long (max 500 characters)")
+            # Generate unique output filename
+            output_filename = f"c3po_tts_output_{uuid.uuid4().hex}.wav"
+            output_path = os.path.join(tempfile.gettempdir(), output_filename)
+            # Determine which speaker to use
+            final_speaker_wav = speaker_wav_path
+            # If no speaker provided and C-3PO voice requested, try to use reference audio
+            if not final_speaker_wav and use_c3po_voice:
+                c3po_ref = self.get_c3po_reference_audio()
+                if c3po_ref:
+                    final_speaker_wav = c3po_ref
+                    logger.info("Using C-3PO reference audio for voice synthesis")
+            if final_speaker_wav:
+                # Voice cloning mode
+                logger.info("Generating speech with voice cloning...")
+                wav = self.tts.tts(
+                    text=text,
+                    speaker_wav=final_speaker_wav,
+                    language=language
+                )
+                # Save the audio
+                import torchaudio
+                if isinstance(wav, list):
+                    wav = torch.tensor(wav)
+                if wav.dim() == 1:
+                    wav = wav.unsqueeze(0)
+                torchaudio.save(output_path, wav, 22050)
+            elif self.default_speaker:
+                # Use preset speaker
+                logger.info(f"Generating speech with preset speaker: {self.default_speaker}")
+                self.tts.tts_to_file(
+                    text=text,
+                    speaker=self.default_speaker,
+                    language=language,
+                    file_path=output_path
+                )
+            else:
+                # Try without speaker (some models support this)
+                logger.info("Generating speech without specific speaker...")
+                self.tts.tts_to_file(
+                    text=text,
+                    language=language,
+                    file_path=output_path
+                )
+            if not os.path.exists(output_path):
+                raise HTTPException(status_code=500, detail="Failed to generate audio file")
+            logger.info(f"Speech generated successfully: {output_path}")
+            return output_path
+        except Exception as e:
+            logger.error(f"Error generating speech: {e}")
+            if isinstance(e, HTTPException):
+                raise e
+            raise HTTPException(status_code=500, detail=f"Speech generation failed: {str(e)}")
+# Initialize TTS service
+logger.info("Initializing Coqui TTS service...")
+try:
+    tts_service = CoquiTTSService()
+    logger.info("TTS service initialized successfully")
+except Exception as e:
+    logger.error(f"Failed to initialize TTS service: {e}")
+    tts_service = None
+@app.get("/")
+async def root():
+    """Root endpoint with API information"""
+    return {
+        "message": "Coqui TTS C-3PO API",
+        "status": "healthy" if tts_service else "error",
+        "model": "XTTS v2",
+        "voice_cloning": True
+    }
+@app.get("/health")
+async def health_check():
+    """Health check endpoint"""
+    if not tts_service:
+        raise HTTPException(status_code=503, detail="TTS service not available")
+    c3po_ref_available = tts_service.get_c3po_reference_audio() is not None
+    return {
+        "status": "healthy",
+        "device": tts_service.device,
+        "model": "C-3PO Fine-tuned XTTS v2 (Coqui TTS)",
+        "default_speaker": tts_service.default_speaker,
+        "voice_cloning_available": True,
+        "c3po_voice_available": c3po_ref_available,
+        "model_path": getattr(tts_service, 'model_path', None)
+    }
+@app.post("/tts")
+async def text_to_speech(
+    text: str = Form(...),
+    language: str = Form("en"),
+    speaker_file: UploadFile = File(None),
+    use_c3po_voice: bool = Form(True)
+):
+    """
+    Convert text to speech using Coqui TTS
+    - **text**: Text to convert to speech (2-500 characters)
+    - **language**: Language code (default: "en")
+    - **speaker_file**: Reference audio file for voice cloning (optional)
+    - **use_c3po_voice**: Use C-3PO voice if no speaker file provided (default: True)
+    """
+    if not tts_service:
+        raise HTTPException(status_code=503, detail="TTS service not available")
+    if not text.strip():
+        raise HTTPException(status_code=400, detail="Text cannot be empty")
+    speaker_temp_path = None
+    try:
+        # Handle speaker file if provided
+        if speaker_file is not None:
+            if not speaker_file.content_type or not speaker_file.content_type.startswith('audio/'):
+                raise HTTPException(status_code=400, detail="Speaker file must be an audio file")
+            # Save uploaded file temporarily
+            speaker_temp_path = os.path.join(
+                tempfile.gettempdir(),
+                f"speaker_{uuid.uuid4().hex}.wav"
+            )
+            with open(speaker_temp_path, "wb") as buffer:
+                content = await speaker_file.read()
+                buffer.write(content)
+            logger.info(f"Speaker file saved: {speaker_temp_path}")
+        # Generate speech
+        output_path = tts_service.generate_speech(text, speaker_temp_path, language, use_c3po_voice)
+        # Clean up temporary speaker file
+        if speaker_temp_path and os.path.exists(speaker_temp_path):
+            try:
+                os.remove(speaker_temp_path)
+            except:
+                pass
+        # Return the generated audio
+        voice_type = "custom" if speaker_file else ("c3po" if use_c3po_voice else "default")
+        return FileResponse(
+            output_path,
+            media_type="audio/wav",
+            filename=f"c3po_tts_{voice_type}_{uuid.uuid4().hex}.wav",
+            headers={"Content-Disposition": "attachment"}
+        )
+    except Exception as e:
+        # Clean up on error
+        if speaker_temp_path and os.path.exists(speaker_temp_path):
+            try:
+                os.remove(speaker_temp_path)
+            except:
+                pass
+        logger.error(f"Error in TTS endpoint: {e}")
+        if isinstance(e, HTTPException):
+            raise e
+        raise HTTPException(status_code=500, detail=str(e))
+@app.post("/tts-c3po")
+async def text_to_speech_c3po(
+    text: str = Form(...),
+    language: str = Form("en")
+):
+    """
+    Convert text to speech using C-3PO voice specifically
+    - **text**: Text to convert to speech (2-500 characters)
+    - **language**: Language code (default: "en")
+    """
+    if not tts_service:
+        raise HTTPException(status_code=503, detail="TTS service not available")
+    if not text.strip():
+        raise HTTPException(status_code=400, detail="Text cannot be empty")
+    # Check if C-3PO voice is available
+    c3po_ref = tts_service.get_c3po_reference_audio()
+    if not c3po_ref:
+        raise HTTPException(status_code=503, detail="C-3PO reference audio not available")
+    try:
+        # Generate speech with C-3PO voice
+        output_path = tts_service.generate_speech(text, None, language, use_c3po_voice=True)
+        return FileResponse(
+            output_path,
+            media_type="audio/wav",
+            filename=f"c3po_voice_{uuid.uuid4().hex}.wav",
+            headers={"Content-Disposition": "attachment"}
+        )
+    except Exception as e:
+        logger.error(f"Error in C-3PO TTS endpoint: {e}")
+        if isinstance(e, HTTPException):
+            raise e
+        raise HTTPException(status_code=500, detail=str(e))
+@app.post("/tts-json")
+async def text_to_speech_json(request: TTSRequest):
+    """
+    Convert text to speech using JSON request with C-3PO voice
+    - **request**: TTSRequest containing text and language
+    """
+    if not tts_service:
+        raise HTTPException(status_code=503, detail="TTS service not available")
+    if not request.text.strip():
+        raise HTTPException(status_code=400, detail="Text cannot be empty")
+    try:
+        # Generate speech with C-3PO voice by default
+        output_path = tts_service.generate_speech(request.text, None, request.language, use_c3po_voice=True)
+        return FileResponse(
+            output_path,
+            media_type="audio/wav",
+            filename=f"c3po_tts_{request.language}_{uuid.uuid4().hex}.wav",
+            headers={"Content-Disposition": "attachment"}
+        )
+    except Exception as e:
+        logger.error(f"Error in TTS JSON endpoint: {e}")
+        if isinstance(e, HTTPException):
+            raise e
+        raise HTTPException(status_code=500, detail=str(e))
+@app.get("/models")
+async def list_models():
+    """List available TTS models"""
+    try:
+        # Create a temporary TTS instance to list models
+        temp_tts = TTS()
+        models = temp_tts.list_models()
+        return {"models": models[:20]}  # Return first 20 models
+    except Exception as e:
+        logger.error(f"Error listing models: {e}")
+        raise HTTPException(status_code=500, detail="Failed to list models")
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=7860)

requirements.txt CHANGED Viewed

@@ -1,11 +1,13 @@
-TTS @ git+https://github.com/coqui-ai/TTS@v0.21.1
-pydantic==1.10.13
-python-multipart==0.0.6
-typing-extensions>=4.8.0
-cutlet
-mecab-python3==1.0.6
-unidic-lite==1.0.8
-unidic==1.1.0
-langid
-uvicorn
-pydub

+SpeechRecognition>=3.8.1
+gtts>=2.3.2
+openai-whisper>=20240930
+pygame>=2.5.2
+anyascii>=0.3.0
+einops>=0.6.0
+encodec>=0.1.1
+inflect>=5.6.0
+num2words>=0.5.14
+pysbd>=0.3.4
+tqdm>=4.64.1
+coqui-tts == 0.26.2
+huggingface_hub>=0.17.0

requirements_coqui.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+fastapi>=0.104.1
+uvicorn[standard]>=0.24.0
+python-multipart>=0.0.6
+coqui-tts==0.26.2
+torch>=2.0.0
+torchaudio>=2.0.0
+numpy>=1.24.0
+scipy>=1.11.0
+pydub>=0.25.1
+librosa>=0.10.0
+soundfile>=0.12.1
+typing-extensions>=4.8.0

start_c3po_api.py ADDED Viewed

	@@ -0,0 +1,176 @@

+#!/usr/bin/env python3
+"""
+Startup script for C-3PO TTS API
+Handles model download, initialization, and server startup
+"""
+import os
+import sys
+import subprocess
+import logging
+import time
+from pathlib import Path
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+def check_dependencies():
+    """Check if all required dependencies are installed"""
+    logger.info("🔍 Checking dependencies...")
+    try:
+        import torch
+        import TTS
+        import fastapi
+        import huggingface_hub
+        logger.info("✅ All core dependencies found")
+        return True
+    except ImportError as e:
+        logger.error(f"❌ Missing dependency: {e}")
+        logger.info("💡 Install with: pip install -r requirements.txt")
+        return False
+def check_gpu():
+    """Check GPU availability"""
+    try:
+        import torch
+        if torch.cuda.is_available():
+            gpu_name = torch.cuda.get_device_name(0)
+            logger.info(f"🎮 GPU available: {gpu_name}")
+            return True
+        else:
+            logger.info("💻 No GPU available, using CPU")
+            return False
+    except Exception as e:
+        logger.warning(f"⚠️  GPU check failed: {e}")
+        return False
+def check_disk_space():
+    """Check available disk space for model download"""
+    try:
+        import shutil
+        free_space = shutil.disk_usage('.').free / (1024**3)  # GB
+        if free_space < 5:
+            logger.warning(f"⚠️  Low disk space: {free_space:.1f}GB available")
+            logger.warning("💽 C-3PO model requires ~2GB space")
+        else:
+            logger.info(f"💾 Disk space: {free_space:.1f}GB available")
+        return free_space > 2
+    except Exception as e:
+        logger.warning(f"⚠️  Disk space check failed: {e}")
+        return True
+def setup_environment():
+    """Set up environment variables"""
+    os.environ["COQUI_TOS_AGREED"] = "1"
+    os.environ["HF_HUB_DISABLE_TELEMETRY"] = "1"
+    # Create models directory
+    models_dir = Path("./models")
+    models_dir.mkdir(exist_ok=True)
+    logger.info("🌍 Environment configured")
+def install_dependencies():
+    """Install missing dependencies"""
+    logger.info("📦 Installing dependencies...")
+    try:
+        subprocess.check_call([
+            sys.executable, "-m", "pip", "install", "-r", "requirements.txt"
+        ])
+        logger.info("✅ Dependencies installed successfully")
+        return True
+    except subprocess.CalledProcessError as e:
+        logger.error(f"❌ Failed to install dependencies: {e}")
+        return False
+def test_model_download():
+    """Test if the C-3PO model can be downloaded"""
+    logger.info("🤖 Testing C-3PO model availability...")
+    try:
+        from huggingface_hub import repo_info
+        # Check if the repo exists and is accessible
+        info = repo_info(repo_id="Borcherding/XTTS-v2_C3PO")
+        logger.info(f"✅ C-3PO model accessible: {info.id}")
+        logger.info(f"   Last modified: {info.last_modified}")
+        return True
+    except Exception as e:
+        logger.error(f"❌ C-3PO model not accessible: {e}")
+        return False
+def start_api_server():
+    """Start the FastAPI server"""
+    logger.info("🚀 Starting C-3PO TTS API server...")
+    try:
+        # Import and run the API
+        import uvicorn
+        from coqui_api import app
+        logger.info("🎭 C-3PO TTS API starting on http://localhost:7860")
+        logger.info("📖 API documentation available at http://localhost:7860/docs")
+        uvicorn.run(
+            app,
+            host="0.0.0.0",
+            port=7860,
+            log_level="info"
+        )
+    except Exception as e:
+        logger.error(f"❌ Failed to start API server: {e}")
+        return False
+def main():
+    """Main startup sequence"""
+    print("🤖 C-3PO TTS API Startup")
+    print("=" * 50)
+    # Step 1: Check dependencies
+    if not check_dependencies():
+        logger.info("📦 Attempting to install dependencies...")
+        if not install_dependencies():
+            logger.error("❌ Failed to install dependencies. Exiting.")
+            sys.exit(1)
+    # Step 2: Setup environment
+    setup_environment()
+    # Step 3: Check system resources
+    has_gpu = check_gpu()
+    has_space = check_disk_space()
+    if not has_space:
+        logger.error("❌ Insufficient disk space. Exiting.")
+        sys.exit(1)
+    # Step 4: Test model availability
+    if not test_model_download():
+        logger.warning("⚠️  C-3PO model may not be accessible")
+        logger.warning("   The API will fall back to standard XTTS v2")
+    # Step 5: Start the server
+    print("\n" + "=" * 50)
+    logger.info("🎬 All checks passed! Starting C-3PO TTS API...")
+    print("=" * 50)
+    try:
+        start_api_server()
+    except KeyboardInterrupt:
+        logger.info("\n🛑 Server stopped by user")
+    except Exception as e:
+        logger.error(f"❌ Server error: {e}")
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

test_c3po_model.py ADDED Viewed

	@@ -0,0 +1,214 @@

+#!/usr/bin/env python3
+"""
+Test script for C-3PO TTS model integration
+"""
+import os
+import requests
+import json
+import tempfile
+from pathlib import Path
+# Test configuration
+API_BASE_URL = "http://localhost:7860"
+TEST_TEXTS = [
+    "I am C-3PO, human-cyborg relations.",
+    "The odds of successfully navigating an asteroid field are approximately 3,720 to 1.",
+    "R2-D2, you know better than to trust a strange computer!",
+    "Oh my! How interesting!"
+]
+def test_health_check():
+    """Test the health check endpoint"""
+    print("🔍 Testing health check...")
+    try:
+        response = requests.get(f"{API_BASE_URL}/health")
+        if response.status_code == 200:
+            data = response.json()
+            print(f"✅ Health check passed")
+            print(f"   Model: {data.get('model', 'Unknown')}")
+            print(f"   Device: {data.get('device', 'Unknown')}")
+            print(f"   C-3PO voice available: {data.get('c3po_voice_available', False)}")
+            print(f"   Model path: {data.get('model_path', 'Not specified')}")
+            return True
+        else:
+            print(f"❌ Health check failed: {response.status_code}")
+            return False
+    except Exception as e:
+        print(f"❌ Health check error: {e}")
+        return False
+def test_c3po_endpoint():
+    """Test the dedicated C-3PO endpoint"""
+    print("\n🎭 Testing C-3PO endpoint...")
+    test_text = "I am C-3PO, human-cyborg relations."
+    try:
+        data = {
+            'text': test_text,
+            'language': 'en'
+        }
+        response = requests.post(f"{API_BASE_URL}/tts-c3po", data=data)
+        if response.status_code == 200:
+            # Save the audio file
+            output_path = Path(tempfile.gettempdir()) / "c3po_test_output.wav"
+            with open(output_path, 'wb') as f:
+                f.write(response.content)
+            print(f"✅ C-3PO endpoint test passed")
+            print(f"   Audio saved to: {output_path}")
+            print(f"   File size: {os.path.getsize(output_path)} bytes")
+            return True
+        else:
+            print(f"❌ C-3PO endpoint failed: {response.status_code}")
+            print(f"   Response: {response.text}")
+            return False
+    except Exception as e:
+        print(f"❌ C-3PO endpoint error: {e}")
+        return False
+def test_general_tts_with_c3po():
+    """Test the general TTS endpoint with C-3PO voice enabled"""
+    print("\n🎤 Testing general TTS with C-3PO voice...")
+    test_text = "The odds of successfully navigating an asteroid field are approximately 3,720 to 1."
+    try:
+        data = {
+            'text': test_text,
+            'language': 'en',
+            'use_c3po_voice': 'true'
+        }
+        response = requests.post(f"{API_BASE_URL}/tts", data=data)
+        if response.status_code == 200:
+            # Save the audio file
+            output_path = Path(tempfile.gettempdir()) / "general_c3po_test_output.wav"
+            with open(output_path, 'wb') as f:
+                f.write(response.content)
+            print(f"✅ General TTS with C-3PO test passed")
+            print(f"   Audio saved to: {output_path}")
+            print(f"   File size: {os.path.getsize(output_path)} bytes")
+            return True
+        else:
+            print(f"❌ General TTS with C-3PO failed: {response.status_code}")
+            print(f"   Response: {response.text}")
+            return False
+    except Exception as e:
+        print(f"❌ General TTS with C-3PO error: {e}")
+        return False
+def test_json_endpoint():
+    """Test the JSON endpoint"""
+    print("\n📄 Testing JSON endpoint...")
+    test_text = "R2-D2, you know better than to trust a strange computer!"
+    try:
+        data = {
+            'text': test_text,
+            'language': 'en'
+        }
+        headers = {'Content-Type': 'application/json'}
+        response = requests.post(f"{API_BASE_URL}/tts-json", json=data, headers=headers)
+        if response.status_code == 200:
+            # Save the audio file
+            output_path = Path(tempfile.gettempdir()) / "json_c3po_test_output.wav"
+            with open(output_path, 'wb') as f:
+                f.write(response.content)
+            print(f"✅ JSON endpoint test passed")
+            print(f"   Audio saved to: {output_path}")
+            print(f"   File size: {os.path.getsize(output_path)} bytes")
+            return True
+        else:
+            print(f"❌ JSON endpoint failed: {response.status_code}")
+            print(f"   Response: {response.text}")
+            return False
+    except Exception as e:
+        print(f"❌ JSON endpoint error: {e}")
+        return False
+def test_multilingual_support():
+    """Test multilingual support with C-3PO voice"""
+    print("\n🌍 Testing multilingual support...")
+    test_cases = [
+        ("Hello, I am C-3PO", "en"),
+        ("Hola, soy C-3PO", "es"),
+        ("Bonjour, je suis C-3PO", "fr"),
+        ("Guten Tag, ich bin C-3PO", "de")
+    ]
+    success_count = 0
+    for text, language in test_cases:
+        try:
+            data = {
+                'text': text,
+                'language': language
+            }
+            response = requests.post(f"{API_BASE_URL}/tts-c3po", data=data)
+            if response.status_code == 200:
+                output_path = Path(tempfile.gettempdir()) / f"c3po_test_{language}.wav"
+                with open(output_path, 'wb') as f:
+                    f.write(response.content)
+                print(f"   ✅ {language}: {text} -> {output_path}")
+                success_count += 1
+            else:
+                print(f"   ❌ {language}: Failed ({response.status_code})")
+        except Exception as e:
+            print(f"   ❌ {language}: Error - {e}")
+    print(f"\n   Multilingual test: {success_count}/{len(test_cases)} languages successful")
+    return success_count == len(test_cases)
+def main():
+    """Run all tests"""
+    print("🚀 Starting C-3PO TTS Model Tests")
+    print("=" * 50)
+    tests = [
+        test_health_check,
+        test_c3po_endpoint,
+        test_general_tts_with_c3po,
+        test_json_endpoint,
+        test_multilingual_support
+    ]
+    passed = 0
+    total = len(tests)
+    for test in tests:
+        if test():
+            passed += 1
+    print("\n" + "=" * 50)
+    print(f"🎯 Test Results: {passed}/{total} tests passed")
+    if passed == total:
+        print("🎉 All tests passed! C-3PO model integration is working correctly.")
+    else:
+        print("⚠️  Some tests failed. Check the API logs for more details.")
+    print("\n💡 Tips:")
+    print("   - Make sure the API server is running on http://localhost:7860")
+    print("   - Check that the C-3PO model downloaded successfully")
+    print("   - Generated audio files are saved in the system temp directory")
+if __name__ == "__main__":
+    main()

test_coqui_api.py ADDED Viewed

	@@ -0,0 +1,146 @@

+import requests
+import os
+import time
+# API base URL (update this to your deployed Hugging Face Space URL)
+BASE_URL = "http://localhost:7860"  # Change to your HF Space URL when deployed
+def test_health():
+    """Test the health endpoint"""
+    print("🔍 Testing health endpoint...")
+    try:
+        response = requests.get(f"{BASE_URL}/health")
+        if response.status_code == 200:
+            print("✅ Health check passed!")
+            print(f"Response: {response.json()}")
+        else:
+            print(f"❌ Health check failed: {response.status_code}")
+            print(f"Response: {response.text}")
+    except Exception as e:
+        print(f"❌ Health check error: {e}")
+def test_list_models():
+    """Test the models endpoint"""
+    print("\n🔍 Testing models endpoint...")
+    try:
+        response = requests.get(f"{BASE_URL}/models")
+        if response.status_code == 200:
+            models = response.json()
+            print("✅ Models endpoint working!")
+            print(f"Found {len(models.get('models', []))} models")
+            # Show first 5 models
+            for i, model in enumerate(models.get('models', [])[:5]):
+                print(f"  {i+1}. {model}")
+        else:
+            print(f"❌ Models endpoint failed: {response.status_code}")
+    except Exception as e:
+        print(f"❌ Models endpoint error: {e}")
+def test_simple_tts():
+    """Test simple text-to-speech without voice cloning"""
+    print("\n🔍 Testing simple TTS...")
+    try:
+        data = {
+            "text": "Hello world! This is a test of Coqui TTS.",
+            "language": "en"
+        }
+        response = requests.post(f"{BASE_URL}/tts", data=data)
+        if response.status_code == 200:
+            # Save the audio file
+            output_file = "simple_tts_output.wav"
+            with open(output_file, "wb") as f:
+                f.write(response.content)
+            print(f"✅ Simple TTS successful! Audio saved to: {output_file}")
+            print(f"File size: {len(response.content)} bytes")
+        else:
+            print(f"❌ Simple TTS failed: {response.status_code}")
+            print(f"Response: {response.text}")
+    except Exception as e:
+        print(f"❌ Simple TTS error: {e}")
+def test_voice_cloning(speaker_file_path=None):
+    """Test voice cloning with uploaded speaker file"""
+    if not speaker_file_path or not os.path.exists(speaker_file_path):
+        print("\n⚠️  Skipping voice cloning test - no speaker file provided")
+        print("   To test voice cloning, provide a .wav file path")
+        return
+    print(f"\n🔍 Testing voice cloning with: {speaker_file_path}")
+    try:
+        data = {
+            "text": "This is voice cloning using Coqui TTS. The voice should match the reference audio.",
+            "language": "en"
+        }
+        with open(speaker_file_path, "rb") as f:
+            files = {"speaker_file": f}
+            response = requests.post(f"{BASE_URL}/tts", data=data, files=files)
+        if response.status_code == 200:
+            # Save the cloned audio
+            output_file = "voice_cloned_output.wav"
+            with open(output_file, "wb") as f:
+                f.write(response.content)
+            print(f"✅ Voice cloning successful! Audio saved to: {output_file}")
+            print(f"File size: {len(response.content)} bytes")
+        else:
+            print(f"❌ Voice cloning failed: {response.status_code}")
+            print(f"Response: {response.text}")
+    except Exception as e:
+        print(f"❌ Voice cloning error: {e}")
+def test_json_tts():
+    """Test JSON endpoint"""
+    print("\n🔍 Testing JSON TTS endpoint...")
+    try:
+        import json
+        data = {
+            "text": "This is a JSON request test for Coqui TTS API.",
+            "language": "en"
+        }
+        response = requests.post(
+            f"{BASE_URL}/tts-json",
+            headers={"Content-Type": "application/json"},
+            data=json.dumps(data)
+        )
+        if response.status_code == 200:
+            output_file = "json_tts_output.wav"
+            with open(output_file, "wb") as f:
+                f.write(response.content)
+            print(f"✅ JSON TTS successful! Audio saved to: {output_file}")
+            print(f"File size: {len(response.content)} bytes")
+        else:
+            print(f"❌ JSON TTS failed: {response.status_code}")
+            print(f"Response: {response.text}")
+    except Exception as e:
+        print(f"❌ JSON TTS error: {e}")
+def main():
+    print("🐸 Testing Coqui TTS API")
+    print("=" * 50)
+    # Test all endpoints
+    test_health()
+    test_list_models()
+    test_simple_tts()
+    test_json_tts()
+    # Test voice cloning if speaker file is available
+    # You can specify a speaker file path here
+    speaker_file = None  # Change to your speaker file path
+    test_voice_cloning(speaker_file)
+    print("\n🎉 API testing completed!")
+    print("\nTo test voice cloning:")
+    print("1. Record a short audio sample (5-10 seconds)")
+    print("2. Save it as a .wav file")
+    print("3. Update speaker_file variable with the file path")
+    print("4. Run the test again")
+if __name__ == "__main__":
+    main()

test_coqui_tts.py ADDED Viewed

	@@ -0,0 +1,99 @@

+import torch
+from TTS.api import TTS
+import os
+def test_coqui_tts():
+    """Test Coqui TTS functionality"""
+    # Get device
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    print(f"Using device: {device}")
+    try:
+        # List available 🐸TTS models
+        print("\n=== Available TTS Models ===")
+        tts_instance = TTS()
+        models = tts_instance.list_models()
+        # Print first 10 models to avoid overwhelming output
+        print("First 10 available models:")
+        for i, model in enumerate(models[:10]):
+            print(f"{i+1}. {model}")
+        if len(models) > 10:
+            print(f"... and {len(models) - 10} more models")
+    except Exception as e:
+        print(f"Error listing models: {e}")
+        return
+    try:
+        # Initialize TTS with XTTS v2 model
+        print("\n=== Initializing XTTS v2 Model ===")
+        tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
+        print("XTTS v2 model loaded successfully!")
+        # List speakers if available
+        print("\n=== Available Speakers ===")
+        if hasattr(tts, 'speakers') and tts.speakers:
+            print("Available speakers:")
+            for speaker in tts.speakers[:10]:  # Show first 10
+                print(f"- {speaker}")
+            if len(tts.speakers) > 10:
+                print(f"... and {len(tts.speakers) - 10} more speakers")
+        else:
+            print("No preset speakers available or speakers list is empty")
+    except Exception as e:
+        print(f"Error initializing XTTS v2 model: {e}")
+        print("This might be due to model download requirements or missing dependencies")
+        return
+    try:
+        # Test TTS to file with preset speaker (if available)
+        print("\n=== Testing TTS to File ===")
+        output_file = "test_output.wav"
+        # Check if we have speakers available
+        if hasattr(tts, 'speakers') and tts.speakers:
+            # Use first available speaker
+            speaker_name = tts.speakers[0]
+            print(f"Using speaker: {speaker_name}")
+            tts.tts_to_file(
+                text="Hello world! This is a test of Coqui TTS library.",
+                speaker=speaker_name,
+                language="en",
+                file_path=output_file
+            )
+        else:
+            # Try without speaker specification
+            print("No speakers available, trying without speaker specification...")
+            tts.tts_to_file(
+                text="Hello world! This is a test of Coqui TTS library.",
+                language="en",
+                file_path=output_file
+            )
+        if os.path.exists(output_file):
+            print(f"✅ TTS successful! Audio saved to: {output_file}")
+            file_size = os.path.getsize(output_file)
+            print(f"File size: {file_size} bytes")
+        else:
+            print("❌ TTS failed - output file not created")
+    except Exception as e:
+        print(f"Error during TTS generation: {e}")
+    # Note about voice cloning
+    print("\n=== Voice Cloning Information ===")
+    print("To test voice cloning, you would need:")
+    print("1. A reference audio file (speaker_wav parameter)")
+    print("2. Use tts.tts() method instead of tts_to_file()")
+    print("Example:")
+    print('wav = tts.tts(text="Hello!", speaker_wav="reference.wav", language="en")')
+if __name__ == "__main__":
+    print("🐸 Testing Coqui TTS Library")
+    print("=" * 50)
+    test_coqui_tts()