File size: 1,686 Bytes
fd1adc1
aacbb48
fd1adc1
 
 
 
aacbb48
fd1adc1
 
 
aacbb48
12d303c
aacbb48
 
 
 
 
fd1adc1
 
4a58eca
 
12d303c
4a58eca
12d303c
4a58eca
 
aacbb48
4a58eca
 
12d303c
4a58eca
 
12d303c
4a58eca
 
 
aacbb48
4a58eca
12d303c
 
 
4a58eca
12d303c
4a58eca
 
 
12d303c
4a58eca
12d303c
4a58eca
12d303c
4a58eca
12d303c
f31bed1
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
title: Command_RTC
emoji: 🦀
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Text-to-speech using Gradio, FastAPI, and Chatterbox TTS
tags:
- chatterbox-tts
- text-to-speech
- voice-cloning
- gradio
- fastapi
---

# Voice Chat Assistant
A conversational voice assistant powered by AI that responds to your spoken queries with natural-sounding speech.

## Features

- Speech Recognition: Uses OpenAI's Whisper model to accurately transcribe your voice
- Natural Language Understanding: Leverages Cohere's LLM API for intelligent responses
- Text-to-Speech: Generates natural speech using Chatterbox-TTS
- Reply on Pause: Automatically responds when you finish speaking
- Conversation History: Maintains context throughout your dialogue

## Demo
Speak into your microphone and the assistant will respond with voice!

## How It Works
- Your voice is transcribed to text using Whisper
- The text is processed by Cohere's LLM to generate a response
- The response is converted to speech using Chatterbox-TTS
- The conversation continues with full context retention

## Technical Details

This project utilizes:

- Zero-GPU: Efficient GPU memory usage with Hugging Face's Zero-GPU technology
- FastRTC: Real-time communication for seamless voice interaction
- Gradio: Simple and intuitive user interface

## Setup

To run this locally, you'll need a Cohere API key and Python 3.8+.

## Acknowledgements

- OpenAI for the Whisper speech recognition model
- Cohere for the language model API
- Tortoise-TTS for the text-to-speech capabilities
- Hugging Face for the Spaces and Zero-GPU infrastructure