abraarsyed commited on
Commit
8f45ae9
Β·
1 Parent(s): 39ec667

Second Commit

Browse files

Fix README.md

Signed-off-by: abraarsyed <abraar.syed01@gmail.com>

Files changed (1) hide show
  1. README.md +8 -126
README.md CHANGED
@@ -1,128 +1,10 @@
1
- # VocalPrint AI
2
-
3
- VocalPrint AI is a CLI + web based tool that detects spoken English accents, scores fluency, and transcribes speech from public video/audio sources.
4
-
5
  ---
6
-
7
- ## Features
8
-
9
- - Detects common English accents:
10
- - Indian, American, British, Australian, and more
11
- - Scores fluency based on actual speaking duration
12
- - Transcribes speech using OpenAI's Whisper model
13
- - Top-3 accent predictions with confidence values
14
- - Supports YouTube, Loom, and direct MP4 links
15
- - Web UI built using Gradio for fast testing
16
- - CLI and Web UI use a shared processing core
17
- - JSON output for easy API integration
18
-
19
  ---
20
-
21
- ## Technical Highlights
22
-
23
- - **Models Used**:
24
- - Whisper (for transcription + language detection)
25
- - `dima806/english_accents_classification` (for accent prediction)
26
-
27
- - **Audio Segment Handling**:
28
- - Only a 30-second segment is extracted from the middle of the video for analysis (to avoid intros and outros)
29
-
30
- - **Transcript Handling**:
31
- - Only the first 500 characters of the transcript are returned to keep the result clean
32
-
33
- - **Output**:
34
- - Returns JSON with detected accent, confidence %, top-3 predictions, fluency score, language code, and sample transcript
35
-
36
- ---
37
-
38
- ## Project Structure
39
-
40
- ```
41
- vocalprint-ai/
42
- β”œβ”€β”€ core/
43
- β”‚ β”œβ”€β”€ __init__.py
44
- β”‚ β”œβ”€β”€ processor.py # shared logic used by both CLI and web
45
- β”‚ └── logger.py # shared logger instance
46
- β”œβ”€β”€ accent_detection_cli.py # CLI entrypoint
47
- β”œβ”€β”€ web/
48
- β”‚ └── app.py # Web UI via Gradio
49
- β”œβ”€β”€ requirements.txt
50
- β”œβ”€β”€ README.md
51
- └── .gitignore
52
- ```
53
-
54
- ---
55
-
56
- ## Quick Start
57
-
58
- ### 1. Install dependencies
59
-
60
- ```bash
61
- pip3 install -r requirements.txt
62
- ```
63
-
64
- ### 2. Run the CLI tool
65
-
66
- ```bash
67
- python3 accent_detection_cli.py \
68
- --url "https://www.youtube.com/watch?v=W2Jzkl8J2nM" \
69
- --device cpu
70
- ```
71
-
72
- ### 3. Sample output
73
-
74
- ```bash
75
- {
76
- "accent": "canada",
77
- "accent_confidence": 86.0,
78
- "top_3_predictions": [
79
- {
80
- "accent": "canada",
81
- "confidence": 86.0
82
- },
83
- {
84
- "accent": "us",
85
- "confidence": 13.56
86
- },
87
- {
88
- "accent": "england",
89
- "confidence": 0.21
90
- }
91
- ],
92
- "fluency_score": 100,
93
- "language_detected_by_whisper": "en",
94
- "transcript_sample": " you're a mass of competing short term interests. And so the question is then, well, which short term interest should win out? And the answer to that is none of them. They need to be organized into a hierarchy that makes them functional across time and across individuals. So like a two year old is v"
95
- }
96
- ```
97
-
98
- ### 4. Run the Web UI
99
-
100
- ```bash
101
- python3 web/app.py
102
- ```
103
- Then open `http://localhost:7860` in your browser.
104
-
105
- ---
106
-
107
- ## Example Outputs
108
-
109
- ### 🎀 Example 1 – Indian Accent
110
- **URL:** [https://www.youtube.com/watch?v=BZ7v0wVrKDo](https://www.youtube.com/watch?v=BZ7v0wVrKDo)
111
-
112
- ![Indian Accent Example](assets/indian-accent.png)
113
-
114
- ### 🎀 Example 2 – Canadian Accent
115
- **URL:** [https://www.youtube.com/watch?v=W2Jzkl8J2nM](https://www.youtube.com/watch?v=W2Jzkl8J2nM)
116
-
117
- ![Canadian Accent Example](assets/canadian-accent.png)
118
-
119
- ---
120
-
121
- ## Known Bottlenecks
122
-
123
- - Whisper runs on CPU if no GPU is available β€” can be slow (~20s on CPU)
124
- - Video download + audio extraction depends on stable network and FFmpeg
125
- - Some accent misclassifications may occur for mixed/regional speakers
126
- - Web UI uses a 30-second middle segment β€” long videos may not be fully analyzed
127
-
128
- ---
 
 
 
 
 
1
  ---
2
+ title: VocalPrint AI
3
+ emoji: πŸ—£οΈ
4
+ colorFrom: indigo
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: "4.0.0"
8
+ app_file: app.py
9
+ pinned: false
 
 
 
 
 
10
  ---