Spaces:
Sleeping
Sleeping
File size: 1,686 Bytes
fd1adc1 aacbb48 fd1adc1 aacbb48 fd1adc1 aacbb48 12d303c aacbb48 fd1adc1 4a58eca 12d303c 4a58eca 12d303c 4a58eca aacbb48 4a58eca 12d303c 4a58eca 12d303c 4a58eca aacbb48 4a58eca 12d303c 4a58eca 12d303c 4a58eca 12d303c 4a58eca 12d303c 4a58eca 12d303c 4a58eca 12d303c f31bed1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
title: Command_RTC
emoji: 🦀
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Text-to-speech using Gradio, FastAPI, and Chatterbox TTS
tags:
- chatterbox-tts
- text-to-speech
- voice-cloning
- gradio
- fastapi
---
# Voice Chat Assistant
A conversational voice assistant powered by AI that responds to your spoken queries with natural-sounding speech.
## Features
- Speech Recognition: Uses OpenAI's Whisper model to accurately transcribe your voice
- Natural Language Understanding: Leverages Cohere's LLM API for intelligent responses
- Text-to-Speech: Generates natural speech using Chatterbox-TTS
- Reply on Pause: Automatically responds when you finish speaking
- Conversation History: Maintains context throughout your dialogue
## Demo
Speak into your microphone and the assistant will respond with voice!
## How It Works
- Your voice is transcribed to text using Whisper
- The text is processed by Cohere's LLM to generate a response
- The response is converted to speech using Chatterbox-TTS
- The conversation continues with full context retention
## Technical Details
This project utilizes:
- Zero-GPU: Efficient GPU memory usage with Hugging Face's Zero-GPU technology
- FastRTC: Real-time communication for seamless voice interaction
- Gradio: Simple and intuitive user interface
## Setup
To run this locally, you'll need a Cohere API key and Python 3.8+.
## Acknowledgements
- OpenAI for the Whisper speech recognition model
- Cohere for the language model API
- Tortoise-TTS for the text-to-speech capabilities
- Hugging Face for the Spaces and Zero-GPU infrastructure |