kedar432 commited on
Commit
465b605
·
1 Parent(s): f01c181

first commit

Browse files
Files changed (7) hide show
  1. Dockerfile +5 -14
  2. README.md +73 -4
  3. app.log +28 -0
  4. app.py +36 -0
  5. requirements.txt +5 -3
  6. src/streamlit_app.py +0 -40
  7. utils.py +158 -0
Dockerfile CHANGED
@@ -1,21 +1,12 @@
1
- FROM python:3.9-slim
2
 
3
  WORKDIR /app
4
 
5
- RUN apt-get update && apt-get install -y \
6
- build-essential \
7
- curl \
8
- software-properties-common \
9
- git \
10
- && rm -rf /var/lib/apt/lists/*
11
 
12
  COPY requirements.txt ./
13
- COPY src/ ./src/
14
 
15
- RUN pip3 install -r requirements.txt
16
 
17
- EXPOSE 8501
18
-
19
- HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
20
-
21
- ENTRYPOINT ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]
 
1
+ FROM python:3.12.11-slim
2
 
3
  WORKDIR /app
4
 
5
+ RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*
 
 
 
 
 
6
 
7
  COPY requirements.txt ./
8
+ COPY app.py ./
9
 
10
+ RUN pip3 install --no-cache-dir -r requirements.txt
11
 
12
+ ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
 
 
 
 
README.md CHANGED
@@ -11,9 +11,78 @@ pinned: false
11
  short_description: Streamlit template space
12
  ---
13
 
14
- # Welcome to Streamlit!
15
 
16
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
17
 
18
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
19
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  short_description: Streamlit template space
12
  ---
13
 
14
+ # English Accent Detector (SpeechBrain)
15
 
16
+ This Streamlit app detects English accents from speech in public video URLs using the SpeechBrain accent classification model.
17
 
18
+ ---
19
+
20
+ ## Features
21
+
22
+ - Input a public video URL (MP4, Loom, etc.)
23
+ - Downloads the video
24
+ - Extracts up to 60 seconds of audio
25
+ - Classifies English accent with confidence score
26
+ - Provides an explanation of the detected accent
27
+
28
+ ---
29
+
30
+ ## Requirements
31
+
32
+ - Python 3.12 or higher
33
+ - ffmpeg installed and available in PATH (required by `moviepy`)
34
+ - Internet connection (to download videos and model weights)
35
+
36
+ ---
37
+
38
+ ## Setup
39
+
40
+ 1. **Clone the repo** (or copy your project files):
41
+
42
+ ```bash
43
+ git clone https://github.com/Kedar43/accent_detector.git
44
+ cd accent_detector
45
+ ```
46
+
47
+ 2. **Create and activate a virtual environment (optional but recommended):**
48
+
49
+ ```bash
50
+ python -m venv venv
51
+ source venv/bin/activate # On Windows: venv\Scripts\activate
52
+ ```
53
+
54
+ 3. **Install dependencies:**
55
+
56
+ ```bash
57
+ pip install -r requirements.txt
58
+ ```
59
+
60
+ ---
61
+
62
+ ## Usage
63
+
64
+ Run the Streamlit app:
65
+
66
+ ```bash
67
+ streamlit run app.py
68
+ ```
69
+
70
+ - This will open a browser window/tab with the app interface.
71
+ - Paste a public video URL (must be MP4).
72
+ - Wait while the app downloads the video and processes audio (up to 60 seconds).
73
+ - View the detected English accent, confidence score, and explanation.
74
+
75
+ ---
76
+
77
+ ## Testing the app
78
+ - Use sample public MP4 videos containing English speech with distinct accents.
79
+ - The app logs runtime info and errors to app.log in the working directory.
80
+ - If errors occur, check app.log for detailed traceback and messages.
81
+
82
+ ---
83
+
84
+ ## Notes
85
+ - The SpeechBrain model is loaded once and cached to improve performance on repeated runs.
86
+ - Temporary video and audio files are deleted automatically after processing.
87
+ - Accuracy depends on the quality of audio and the SpeechBrain model’s training data.
88
+ - Make sure video URLs are publicly accessible without authentication.
app.log ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2025-06-22 18:07:14,337 - INFO - Downloaded video to /Users/kedarpatel/Downloads/accent_detector/31298032-a128-4fb0-8ab0-49b45db1d7b0_video.mp4
2
+ 2025-06-22 18:07:14,549 - INFO - Extracted audio to /Users/kedarpatel/Downloads/accent_detector/fb42e3f5-3e52-4682-a17e-59061379d57f_audio.wav
3
+ 2025-06-22 18:07:14,549 - INFO - Fetch hyperparams.yaml: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/hyperparams.yaml.
4
+ 2025-06-22 18:07:14,549 - INFO - Fetch custom_interface.py: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/custom_interface.py.
5
+ 2025-06-22 18:07:16,840 - WARNING - speechbrain.lobes.models.huggingface_wav2vec - wav2vec 2.0 is frozen.
6
+ 2025-06-22 18:07:16,841 - INFO - Fetch wav2vec2.ckpt: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/wav2vec2.ckpt.
7
+ 2025-06-22 18:07:16,841 - INFO - Fetch model.ckpt: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/model.ckpt.
8
+ 2025-06-22 18:07:16,841 - INFO - Fetch label_encoder.txt: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/label_encoder.ckpt.
9
+ 2025-06-22 18:07:16,841 - INFO - Loading pretrained files for: wav2vec2, model, label_encoder
10
+ 2025-06-22 18:07:17,275 - INFO - Loaded SpeechBrain accent classifier
11
+ 2025-06-22 18:07:17,276 - INFO - Fetch fb42e3f5-3e52-4682-a17e-59061379d57f_audio.wav: Using existing file/symlink in fb42e3f5-3e52-4682-a17e-59061379d57f_audio.wav.
12
+ 2025-06-22 18:07:20,829 - INFO - Classified accent: ['england'] with confidence 67.67%
13
+ 2025-06-22 18:07:20,829 - INFO - Removed temporary file: /Users/kedarpatel/Downloads/accent_detector/fb42e3f5-3e52-4682-a17e-59061379d57f_audio.wav
14
+ 2025-06-22 18:07:20,830 - INFO - Removed temporary file: /Users/kedarpatel/Downloads/accent_detector/31298032-a128-4fb0-8ab0-49b45db1d7b0_video.mp4
15
+ 2025-06-22 20:20:37,793 - INFO - Downloaded video to /Users/kedarpatel/Downloads/accent_detector/9c59b57e-1a0e-45fd-9cab-3cc5ffc4fef7_video.mp4
16
+ 2025-06-22 20:20:38,014 - INFO - Extracted audio to /Users/kedarpatel/Downloads/accent_detector/a69a14a4-dd9f-4498-a9ee-6b6b33c5f70d_audio.wav
17
+ 2025-06-22 20:20:38,015 - INFO - Fetch hyperparams.yaml: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
18
+ 2025-06-22 20:20:38,253 - INFO - Fetch custom_interface.py: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
19
+ 2025-06-22 20:20:44,671 - WARNING - speechbrain.lobes.models.huggingface_wav2vec - wav2vec 2.0 is frozen.
20
+ 2025-06-22 20:20:44,673 - INFO - Fetch wav2vec2.ckpt: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
21
+ 2025-06-22 20:20:44,814 - INFO - Fetch model.ckpt: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
22
+ 2025-06-22 20:20:44,955 - INFO - Fetch label_encoder.txt: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
23
+ 2025-06-22 20:20:45,090 - INFO - Loading pretrained files for: wav2vec2, model, label_encoder
24
+ 2025-06-22 20:20:45,659 - INFO - Loaded SpeechBrain accent classifier
25
+ 2025-06-22 20:20:45,660 - INFO - Fetch a69a14a4-dd9f-4498-a9ee-6b6b33c5f70d_audio.wav: Using existing file/symlink in a69a14a4-dd9f-4498-a9ee-6b6b33c5f70d_audio.wav.
26
+ 2025-06-22 20:20:50,454 - INFO - Classified accent: ['england'] with confidence 67.67%
27
+ 2025-06-22 20:20:50,455 - INFO - Removed temporary file: /Users/kedarpatel/Downloads/accent_detector/a69a14a4-dd9f-4498-a9ee-6b6b33c5f70d_audio.wav
28
+ 2025-06-22 20:20:50,455 - INFO - Removed temporary file: /Users/kedarpatel/Downloads/accent_detector/9c59b57e-1a0e-45fd-9cab-3cc5ffc4fef7_video.mp4
app.py ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import streamlit as st
3
+ from huggingface_hub import login
4
+ from utils import process_video_url, explain_accent
5
+
6
+ hf_token = os.getenv("HF_HUB_TOKEN")
7
+ if hf_token:
8
+ login(hf_token)
9
+
10
+ # Configure Streamlit page settings
11
+ st.set_page_config(page_title="English Accent Detector", layout="centered")
12
+ st.title("🎤 English Accent Detector (SpeechBrain)")
13
+
14
+ # Input field for user to enter a video URL
15
+ video_url = st.text_input("Paste public video URL (MP4, Loom, etc.):")
16
+
17
+ if video_url:
18
+ try:
19
+ # Show spinner while processing the video and analyzing accent
20
+ with st.spinner("Processing video and analyzing accent..."):
21
+ accent, confidence = process_video_url(video_url)
22
+
23
+ # Display results with confidence scores and explanation
24
+ st.success("Analysis complete!")
25
+ st.markdown(f"### 🗣️ Detected Accent: **{accent}**")
26
+ st.markdown(f"### 📊 Confidence Score: **{float(confidence):.2f}%**")
27
+ st.markdown("---")
28
+ st.markdown("### ℹ️ Accent Explanation")
29
+ st.markdown(explain_accent(accent, confidence))
30
+
31
+ except RuntimeError as err:
32
+ # Handle known runtime errors gracefully in the UI
33
+ st.error(f"⚠️ {err}")
34
+ except Exception as err:
35
+ # Catch-all for unexpected errors with generic user message
36
+ st.error("⚠️ An unexpected error occurred. Please check the log file.")
requirements.txt CHANGED
@@ -1,3 +1,5 @@
1
- altair
2
- pandas
3
- streamlit
 
 
 
1
+ moviepy==2.2.1
2
+ soundfile==0.13.1
3
+ speechbrain==0.5.13
4
+ streamlit==1.46.0
5
+ transformers==4.52.4
src/streamlit_app.py DELETED
@@ -1,40 +0,0 @@
1
- import altair as alt
2
- import numpy as np
3
- import pandas as pd
4
- import streamlit as st
5
-
6
- """
7
- # Welcome to Streamlit!
8
-
9
- Edit `/streamlit_app.py` to customize this app to your heart's desire :heart:.
10
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
11
- forums](https://discuss.streamlit.io).
12
-
13
- In the meantime, below is an example of what you can do with just a few lines of code:
14
- """
15
-
16
- num_points = st.slider("Number of points in spiral", 1, 10000, 1100)
17
- num_turns = st.slider("Number of turns in spiral", 1, 300, 31)
18
-
19
- indices = np.linspace(0, 1, num_points)
20
- theta = 2 * np.pi * num_turns * indices
21
- radius = indices
22
-
23
- x = radius * np.cos(theta)
24
- y = radius * np.sin(theta)
25
-
26
- df = pd.DataFrame({
27
- "x": x,
28
- "y": y,
29
- "idx": indices,
30
- "rand": np.random.randn(num_points),
31
- })
32
-
33
- st.altair_chart(alt.Chart(df, height=700, width=700)
34
- .mark_point(filled=True)
35
- .encode(
36
- x=alt.X("x", axis=None),
37
- y=alt.Y("y", axis=None),
38
- color=alt.Color("idx", legend=None, scale=alt.Scale()),
39
- size=alt.Size("rand", legend=None, scale=alt.Scale(range=[1, 150])),
40
- ))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
utils.py ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import uuid
3
+ import logging
4
+ import requests
5
+ import traceback
6
+ import streamlit as st
7
+ from moviepy.video.io.VideoFileClip import VideoFileClip
8
+ from speechbrain.pretrained.interfaces import foreign_class
9
+
10
+ logging.basicConfig(
11
+ filename="app.log",
12
+ filemode="a",
13
+ format="%(asctime)s - %(levelname)s - %(message)s",
14
+ level=logging.INFO,
15
+ )
16
+
17
+ def download_file(video_url):
18
+ """
19
+ Download a file from a URL and save it as a temporary file.
20
+
21
+ Args:
22
+ url (str): The URL to download from.
23
+
24
+ Returns:
25
+ str: Path to the downloaded temporary file.
26
+ """
27
+ try:
28
+ video_id = str(uuid.uuid4())
29
+ video_filename = os.path.join(os.getcwd(), f"{video_id}_video.mp4")
30
+ with requests.get(video_url, stream=True) as r:
31
+ r.raise_for_status()
32
+ with open(video_filename, 'wb') as f:
33
+ for chunk in r.iter_content(chunk_size=8192):
34
+ if chunk:
35
+ f.write(chunk)
36
+ logging.info(f"Downloaded video to {video_filename}")
37
+ return video_filename
38
+ except Exception as e:
39
+ logging.error(f"Error downloading video: {e}\n{traceback.format_exc()}")
40
+ raise RuntimeError("Failed to download the video. Please try another video.")
41
+
42
+ def extract_audio(video_path):
43
+ """
44
+ Extract up to 60 seconds of audio from the input video file.
45
+ Saves the extracted audio as a temporary .wav file.
46
+
47
+ Args:
48
+ video_path (str): Path to the input video file.
49
+
50
+ Returns:
51
+ str: Path to the extracted audio file.
52
+ """
53
+ try:
54
+ video = VideoFileClip(video_path)
55
+ audio_duration = min(video.audio.duration, 60)
56
+ trimmed_audio = video.audio.subclipped(0, audio_duration)
57
+ audio_id = str(uuid.uuid4())
58
+ audio_filename = os.path.join(os.getcwd(), f"{audio_id}_audio.wav")
59
+ trimmed_audio.write_audiofile(audio_filename, codec='pcm_s16le', logger=None)
60
+ logging.info(f"Extracted audio to {audio_filename}")
61
+ return audio_filename
62
+ except Exception as e:
63
+ logging.error(f"Error extracting audio: {e}\n{traceback.format_exc()}")
64
+ raise RuntimeError("Sorry, we could not extract audio from the video. Please try another video.")
65
+
66
+ @st.cache_resource(show_spinner=False)
67
+ def load_classifier():
68
+ """
69
+ Load the SpeechBrain accent classification model.
70
+
71
+ Returns:
72
+ foreign_class instance: Loaded classifier object.
73
+ """
74
+ try:
75
+ classifier = foreign_class(
76
+ source="Jzuluaga/accent-id-commonaccent_xlsr-en-english",
77
+ pymodule_file="custom_interface.py",
78
+ classname="CustomEncoderWav2vec2Classifier"
79
+ )
80
+ logging.info("Loaded SpeechBrain accent classifier")
81
+ return classifier
82
+ except Exception as e:
83
+ logging.error(f"Error loading SpeechBrain classifier: {e}\n{traceback.format_exc()}")
84
+ raise RuntimeError("Failed to load the Classifier. Please try again later.")
85
+
86
+ def classify_accent(classifier, audio_path):
87
+ """
88
+ Classify the English accent from the given audio file using the loaded classifier.
89
+
90
+ Args:
91
+ classifier (foreign_class): The loaded SpeechBrain classifier.
92
+ audio_path (str): Path to the audio file.
93
+
94
+ Returns:
95
+ tuple: (accent label (str), confidence score (float))
96
+ """
97
+ try:
98
+ out_prob, score, index, text_lab = classifier.classify_file(audio_path)
99
+ logging.info(f"Classified accent: {text_lab} with confidence {float(score)*100:.2f}%")
100
+ return text_lab, score * 100
101
+ except Exception as e:
102
+ logging.error(f"Error classifying accent: {e}\n{traceback.format_exc()}")
103
+ raise RuntimeError("The accent model failed to load. Please try again later.")
104
+
105
+ def explain_accent(accent, confidence):
106
+ """
107
+ Generate a human-readable explanation for the detected accent and confidence score.
108
+
109
+ Args:
110
+ accent (str): Detected accent label.
111
+ confidence (float): Confidence score (percentage).
112
+
113
+ Returns:
114
+ str: Explanation markdown string.
115
+ """
116
+ return f"""
117
+ The system detected a **{accent}** English accent with **{float(confidence):.2f}% confidence**.
118
+ This score reflects how closely your voice matches typical speech patterns of native {accent} English speakers based on pronunciation, rhythm, and intonation.
119
+
120
+ The model analyzes vocal features using a neural network trained on speakers with known accents. While it can differentiate between major English accents, its accuracy may vary with noisy audio, strong regional variation, or non-native speakers.
121
+ """
122
+
123
+ def process_video_url(video_url):
124
+ """
125
+ End-to-end processing of the video URL:
126
+ - Download video file
127
+ - Extract audio (up to 60 seconds)
128
+ - Load classifier model
129
+ - Classify the accent
130
+ - Cleanup temporary files
131
+
132
+ Args:
133
+ video_url (str): URL of the public video file.
134
+
135
+ Returns:
136
+ tuple: (accent label (str), confidence score (float))
137
+ """
138
+ video_path = None
139
+ audio_path = None
140
+
141
+ try:
142
+ video_path = download_file(video_url)
143
+ audio_path = extract_audio(video_path)
144
+
145
+ classifier = load_classifier()
146
+ accent, confidence = classify_accent(classifier, audio_path)
147
+
148
+ return accent[0].upper(), confidence
149
+
150
+ finally:
151
+ # Clean up temporary files if they exist
152
+ for path in [audio_path, video_path]:
153
+ if path and os.path.exists(path):
154
+ try:
155
+ os.remove(path)
156
+ logging.info(f"Removed temporary file: {path}")
157
+ except Exception as e:
158
+ logging.warning(f"Failed to remove temp file {path}: {e}")