Spaces:
Sleeping
Sleeping
Commit
Β·
8f45ae9
1
Parent(s):
39ec667
Second Commit
Browse filesFix README.md
Signed-off-by: abraarsyed <abraar.syed01@gmail.com>
README.md
CHANGED
@@ -1,128 +1,10 @@
|
|
1 |
-
# VocalPrint AI
|
2 |
-
|
3 |
-
VocalPrint AI is a CLI + web based tool that detects spoken English accents, scores fluency, and transcribes speech from public video/audio sources.
|
4 |
-
|
5 |
---
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
- Supports YouTube, Loom, and direct MP4 links
|
15 |
-
- Web UI built using Gradio for fast testing
|
16 |
-
- CLI and Web UI use a shared processing core
|
17 |
-
- JSON output for easy API integration
|
18 |
-
|
19 |
---
|
20 |
-
|
21 |
-
## Technical Highlights
|
22 |
-
|
23 |
-
- **Models Used**:
|
24 |
-
- Whisper (for transcription + language detection)
|
25 |
-
- `dima806/english_accents_classification` (for accent prediction)
|
26 |
-
|
27 |
-
- **Audio Segment Handling**:
|
28 |
-
- Only a 30-second segment is extracted from the middle of the video for analysis (to avoid intros and outros)
|
29 |
-
|
30 |
-
- **Transcript Handling**:
|
31 |
-
- Only the first 500 characters of the transcript are returned to keep the result clean
|
32 |
-
|
33 |
-
- **Output**:
|
34 |
-
- Returns JSON with detected accent, confidence %, top-3 predictions, fluency score, language code, and sample transcript
|
35 |
-
|
36 |
-
---
|
37 |
-
|
38 |
-
## Project Structure
|
39 |
-
|
40 |
-
```
|
41 |
-
vocalprint-ai/
|
42 |
-
βββ core/
|
43 |
-
β βββ __init__.py
|
44 |
-
β βββ processor.py # shared logic used by both CLI and web
|
45 |
-
β βββ logger.py # shared logger instance
|
46 |
-
βββ accent_detection_cli.py # CLI entrypoint
|
47 |
-
βββ web/
|
48 |
-
β βββ app.py # Web UI via Gradio
|
49 |
-
βββ requirements.txt
|
50 |
-
βββ README.md
|
51 |
-
βββ .gitignore
|
52 |
-
```
|
53 |
-
|
54 |
-
---
|
55 |
-
|
56 |
-
## Quick Start
|
57 |
-
|
58 |
-
### 1. Install dependencies
|
59 |
-
|
60 |
-
```bash
|
61 |
-
pip3 install -r requirements.txt
|
62 |
-
```
|
63 |
-
|
64 |
-
### 2. Run the CLI tool
|
65 |
-
|
66 |
-
```bash
|
67 |
-
python3 accent_detection_cli.py \
|
68 |
-
--url "https://www.youtube.com/watch?v=W2Jzkl8J2nM" \
|
69 |
-
--device cpu
|
70 |
-
```
|
71 |
-
|
72 |
-
### 3. Sample output
|
73 |
-
|
74 |
-
```bash
|
75 |
-
{
|
76 |
-
"accent": "canada",
|
77 |
-
"accent_confidence": 86.0,
|
78 |
-
"top_3_predictions": [
|
79 |
-
{
|
80 |
-
"accent": "canada",
|
81 |
-
"confidence": 86.0
|
82 |
-
},
|
83 |
-
{
|
84 |
-
"accent": "us",
|
85 |
-
"confidence": 13.56
|
86 |
-
},
|
87 |
-
{
|
88 |
-
"accent": "england",
|
89 |
-
"confidence": 0.21
|
90 |
-
}
|
91 |
-
],
|
92 |
-
"fluency_score": 100,
|
93 |
-
"language_detected_by_whisper": "en",
|
94 |
-
"transcript_sample": " you're a mass of competing short term interests. And so the question is then, well, which short term interest should win out? And the answer to that is none of them. They need to be organized into a hierarchy that makes them functional across time and across individuals. So like a two year old is v"
|
95 |
-
}
|
96 |
-
```
|
97 |
-
|
98 |
-
### 4. Run the Web UI
|
99 |
-
|
100 |
-
```bash
|
101 |
-
python3 web/app.py
|
102 |
-
```
|
103 |
-
Then open `http://localhost:7860` in your browser.
|
104 |
-
|
105 |
-
---
|
106 |
-
|
107 |
-
## Example Outputs
|
108 |
-
|
109 |
-
### π€ Example 1 β Indian Accent
|
110 |
-
**URL:** [https://www.youtube.com/watch?v=BZ7v0wVrKDo](https://www.youtube.com/watch?v=BZ7v0wVrKDo)
|
111 |
-
|
112 |
-

|
113 |
-
|
114 |
-
### π€ Example 2 β Canadian Accent
|
115 |
-
**URL:** [https://www.youtube.com/watch?v=W2Jzkl8J2nM](https://www.youtube.com/watch?v=W2Jzkl8J2nM)
|
116 |
-
|
117 |
-

|
118 |
-
|
119 |
-
---
|
120 |
-
|
121 |
-
## Known Bottlenecks
|
122 |
-
|
123 |
-
- Whisper runs on CPU if no GPU is available β can be slow (~20s on CPU)
|
124 |
-
- Video download + audio extraction depends on stable network and FFmpeg
|
125 |
-
- Some accent misclassifications may occur for mixed/regional speakers
|
126 |
-
- Web UI uses a 30-second middle segment β long videos may not be fully analyzed
|
127 |
-
|
128 |
-
---
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: VocalPrint AI
|
3 |
+
emoji: π£οΈ
|
4 |
+
colorFrom: indigo
|
5 |
+
colorTo: pink
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: "4.0.0"
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
|
|
|
|
|
|
|
|
|
|
10 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|