Spaces:
Sleeping
Sleeping
first commit
Browse files- Dockerfile +5 -14
- README.md +73 -4
- app.log +28 -0
- app.py +36 -0
- requirements.txt +5 -3
- src/streamlit_app.py +0 -40
- utils.py +158 -0
Dockerfile
CHANGED
@@ -1,21 +1,12 @@
|
|
1 |
-
FROM python:3.
|
2 |
|
3 |
WORKDIR /app
|
4 |
|
5 |
-
RUN apt-get update && apt-get install -y
|
6 |
-
build-essential \
|
7 |
-
curl \
|
8 |
-
software-properties-common \
|
9 |
-
git \
|
10 |
-
&& rm -rf /var/lib/apt/lists/*
|
11 |
|
12 |
COPY requirements.txt ./
|
13 |
-
COPY
|
14 |
|
15 |
-
RUN pip3 install -r requirements.txt
|
16 |
|
17 |
-
|
18 |
-
|
19 |
-
HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
|
20 |
-
|
21 |
-
ENTRYPOINT ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]
|
|
|
1 |
+
FROM python:3.12.11-slim
|
2 |
|
3 |
WORKDIR /app
|
4 |
|
5 |
+
RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*
|
|
|
|
|
|
|
|
|
|
|
6 |
|
7 |
COPY requirements.txt ./
|
8 |
+
COPY app.py ./
|
9 |
|
10 |
+
RUN pip3 install --no-cache-dir -r requirements.txt
|
11 |
|
12 |
+
ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
|
|
|
|
|
|
|
|
README.md
CHANGED
@@ -11,9 +11,78 @@ pinned: false
|
|
11 |
short_description: Streamlit template space
|
12 |
---
|
13 |
|
14 |
-
#
|
15 |
|
16 |
-
|
17 |
|
18 |
-
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
short_description: Streamlit template space
|
12 |
---
|
13 |
|
14 |
+
# English Accent Detector (SpeechBrain)
|
15 |
|
16 |
+
This Streamlit app detects English accents from speech in public video URLs using the SpeechBrain accent classification model.
|
17 |
|
18 |
+
---
|
19 |
+
|
20 |
+
## Features
|
21 |
+
|
22 |
+
- Input a public video URL (MP4, Loom, etc.)
|
23 |
+
- Downloads the video
|
24 |
+
- Extracts up to 60 seconds of audio
|
25 |
+
- Classifies English accent with confidence score
|
26 |
+
- Provides an explanation of the detected accent
|
27 |
+
|
28 |
+
---
|
29 |
+
|
30 |
+
## Requirements
|
31 |
+
|
32 |
+
- Python 3.12 or higher
|
33 |
+
- ffmpeg installed and available in PATH (required by `moviepy`)
|
34 |
+
- Internet connection (to download videos and model weights)
|
35 |
+
|
36 |
+
---
|
37 |
+
|
38 |
+
## Setup
|
39 |
+
|
40 |
+
1. **Clone the repo** (or copy your project files):
|
41 |
+
|
42 |
+
```bash
|
43 |
+
git clone https://github.com/Kedar43/accent_detector.git
|
44 |
+
cd accent_detector
|
45 |
+
```
|
46 |
+
|
47 |
+
2. **Create and activate a virtual environment (optional but recommended):**
|
48 |
+
|
49 |
+
```bash
|
50 |
+
python -m venv venv
|
51 |
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
52 |
+
```
|
53 |
+
|
54 |
+
3. **Install dependencies:**
|
55 |
+
|
56 |
+
```bash
|
57 |
+
pip install -r requirements.txt
|
58 |
+
```
|
59 |
+
|
60 |
+
---
|
61 |
+
|
62 |
+
## Usage
|
63 |
+
|
64 |
+
Run the Streamlit app:
|
65 |
+
|
66 |
+
```bash
|
67 |
+
streamlit run app.py
|
68 |
+
```
|
69 |
+
|
70 |
+
- This will open a browser window/tab with the app interface.
|
71 |
+
- Paste a public video URL (must be MP4).
|
72 |
+
- Wait while the app downloads the video and processes audio (up to 60 seconds).
|
73 |
+
- View the detected English accent, confidence score, and explanation.
|
74 |
+
|
75 |
+
---
|
76 |
+
|
77 |
+
## Testing the app
|
78 |
+
- Use sample public MP4 videos containing English speech with distinct accents.
|
79 |
+
- The app logs runtime info and errors to app.log in the working directory.
|
80 |
+
- If errors occur, check app.log for detailed traceback and messages.
|
81 |
+
|
82 |
+
---
|
83 |
+
|
84 |
+
## Notes
|
85 |
+
- The SpeechBrain model is loaded once and cached to improve performance on repeated runs.
|
86 |
+
- Temporary video and audio files are deleted automatically after processing.
|
87 |
+
- Accuracy depends on the quality of audio and the SpeechBrain model’s training data.
|
88 |
+
- Make sure video URLs are publicly accessible without authentication.
|
app.log
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2025-06-22 18:07:14,337 - INFO - Downloaded video to /Users/kedarpatel/Downloads/accent_detector/31298032-a128-4fb0-8ab0-49b45db1d7b0_video.mp4
|
2 |
+
2025-06-22 18:07:14,549 - INFO - Extracted audio to /Users/kedarpatel/Downloads/accent_detector/fb42e3f5-3e52-4682-a17e-59061379d57f_audio.wav
|
3 |
+
2025-06-22 18:07:14,549 - INFO - Fetch hyperparams.yaml: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/hyperparams.yaml.
|
4 |
+
2025-06-22 18:07:14,549 - INFO - Fetch custom_interface.py: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/custom_interface.py.
|
5 |
+
2025-06-22 18:07:16,840 - WARNING - speechbrain.lobes.models.huggingface_wav2vec - wav2vec 2.0 is frozen.
|
6 |
+
2025-06-22 18:07:16,841 - INFO - Fetch wav2vec2.ckpt: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/wav2vec2.ckpt.
|
7 |
+
2025-06-22 18:07:16,841 - INFO - Fetch model.ckpt: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/model.ckpt.
|
8 |
+
2025-06-22 18:07:16,841 - INFO - Fetch label_encoder.txt: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/label_encoder.ckpt.
|
9 |
+
2025-06-22 18:07:16,841 - INFO - Loading pretrained files for: wav2vec2, model, label_encoder
|
10 |
+
2025-06-22 18:07:17,275 - INFO - Loaded SpeechBrain accent classifier
|
11 |
+
2025-06-22 18:07:17,276 - INFO - Fetch fb42e3f5-3e52-4682-a17e-59061379d57f_audio.wav: Using existing file/symlink in fb42e3f5-3e52-4682-a17e-59061379d57f_audio.wav.
|
12 |
+
2025-06-22 18:07:20,829 - INFO - Classified accent: ['england'] with confidence 67.67%
|
13 |
+
2025-06-22 18:07:20,829 - INFO - Removed temporary file: /Users/kedarpatel/Downloads/accent_detector/fb42e3f5-3e52-4682-a17e-59061379d57f_audio.wav
|
14 |
+
2025-06-22 18:07:20,830 - INFO - Removed temporary file: /Users/kedarpatel/Downloads/accent_detector/31298032-a128-4fb0-8ab0-49b45db1d7b0_video.mp4
|
15 |
+
2025-06-22 20:20:37,793 - INFO - Downloaded video to /Users/kedarpatel/Downloads/accent_detector/9c59b57e-1a0e-45fd-9cab-3cc5ffc4fef7_video.mp4
|
16 |
+
2025-06-22 20:20:38,014 - INFO - Extracted audio to /Users/kedarpatel/Downloads/accent_detector/a69a14a4-dd9f-4498-a9ee-6b6b33c5f70d_audio.wav
|
17 |
+
2025-06-22 20:20:38,015 - INFO - Fetch hyperparams.yaml: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
|
18 |
+
2025-06-22 20:20:38,253 - INFO - Fetch custom_interface.py: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
|
19 |
+
2025-06-22 20:20:44,671 - WARNING - speechbrain.lobes.models.huggingface_wav2vec - wav2vec 2.0 is frozen.
|
20 |
+
2025-06-22 20:20:44,673 - INFO - Fetch wav2vec2.ckpt: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
|
21 |
+
2025-06-22 20:20:44,814 - INFO - Fetch model.ckpt: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
|
22 |
+
2025-06-22 20:20:44,955 - INFO - Fetch label_encoder.txt: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
|
23 |
+
2025-06-22 20:20:45,090 - INFO - Loading pretrained files for: wav2vec2, model, label_encoder
|
24 |
+
2025-06-22 20:20:45,659 - INFO - Loaded SpeechBrain accent classifier
|
25 |
+
2025-06-22 20:20:45,660 - INFO - Fetch a69a14a4-dd9f-4498-a9ee-6b6b33c5f70d_audio.wav: Using existing file/symlink in a69a14a4-dd9f-4498-a9ee-6b6b33c5f70d_audio.wav.
|
26 |
+
2025-06-22 20:20:50,454 - INFO - Classified accent: ['england'] with confidence 67.67%
|
27 |
+
2025-06-22 20:20:50,455 - INFO - Removed temporary file: /Users/kedarpatel/Downloads/accent_detector/a69a14a4-dd9f-4498-a9ee-6b6b33c5f70d_audio.wav
|
28 |
+
2025-06-22 20:20:50,455 - INFO - Removed temporary file: /Users/kedarpatel/Downloads/accent_detector/9c59b57e-1a0e-45fd-9cab-3cc5ffc4fef7_video.mp4
|
app.py
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import streamlit as st
|
3 |
+
from huggingface_hub import login
|
4 |
+
from utils import process_video_url, explain_accent
|
5 |
+
|
6 |
+
hf_token = os.getenv("HF_HUB_TOKEN")
|
7 |
+
if hf_token:
|
8 |
+
login(hf_token)
|
9 |
+
|
10 |
+
# Configure Streamlit page settings
|
11 |
+
st.set_page_config(page_title="English Accent Detector", layout="centered")
|
12 |
+
st.title("🎤 English Accent Detector (SpeechBrain)")
|
13 |
+
|
14 |
+
# Input field for user to enter a video URL
|
15 |
+
video_url = st.text_input("Paste public video URL (MP4, Loom, etc.):")
|
16 |
+
|
17 |
+
if video_url:
|
18 |
+
try:
|
19 |
+
# Show spinner while processing the video and analyzing accent
|
20 |
+
with st.spinner("Processing video and analyzing accent..."):
|
21 |
+
accent, confidence = process_video_url(video_url)
|
22 |
+
|
23 |
+
# Display results with confidence scores and explanation
|
24 |
+
st.success("Analysis complete!")
|
25 |
+
st.markdown(f"### 🗣️ Detected Accent: **{accent}**")
|
26 |
+
st.markdown(f"### 📊 Confidence Score: **{float(confidence):.2f}%**")
|
27 |
+
st.markdown("---")
|
28 |
+
st.markdown("### ℹ️ Accent Explanation")
|
29 |
+
st.markdown(explain_accent(accent, confidence))
|
30 |
+
|
31 |
+
except RuntimeError as err:
|
32 |
+
# Handle known runtime errors gracefully in the UI
|
33 |
+
st.error(f"⚠️ {err}")
|
34 |
+
except Exception as err:
|
35 |
+
# Catch-all for unexpected errors with generic user message
|
36 |
+
st.error("⚠️ An unexpected error occurred. Please check the log file.")
|
requirements.txt
CHANGED
@@ -1,3 +1,5 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
1 |
+
moviepy==2.2.1
|
2 |
+
soundfile==0.13.1
|
3 |
+
speechbrain==0.5.13
|
4 |
+
streamlit==1.46.0
|
5 |
+
transformers==4.52.4
|
src/streamlit_app.py
DELETED
@@ -1,40 +0,0 @@
|
|
1 |
-
import altair as alt
|
2 |
-
import numpy as np
|
3 |
-
import pandas as pd
|
4 |
-
import streamlit as st
|
5 |
-
|
6 |
-
"""
|
7 |
-
# Welcome to Streamlit!
|
8 |
-
|
9 |
-
Edit `/streamlit_app.py` to customize this app to your heart's desire :heart:.
|
10 |
-
If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
|
11 |
-
forums](https://discuss.streamlit.io).
|
12 |
-
|
13 |
-
In the meantime, below is an example of what you can do with just a few lines of code:
|
14 |
-
"""
|
15 |
-
|
16 |
-
num_points = st.slider("Number of points in spiral", 1, 10000, 1100)
|
17 |
-
num_turns = st.slider("Number of turns in spiral", 1, 300, 31)
|
18 |
-
|
19 |
-
indices = np.linspace(0, 1, num_points)
|
20 |
-
theta = 2 * np.pi * num_turns * indices
|
21 |
-
radius = indices
|
22 |
-
|
23 |
-
x = radius * np.cos(theta)
|
24 |
-
y = radius * np.sin(theta)
|
25 |
-
|
26 |
-
df = pd.DataFrame({
|
27 |
-
"x": x,
|
28 |
-
"y": y,
|
29 |
-
"idx": indices,
|
30 |
-
"rand": np.random.randn(num_points),
|
31 |
-
})
|
32 |
-
|
33 |
-
st.altair_chart(alt.Chart(df, height=700, width=700)
|
34 |
-
.mark_point(filled=True)
|
35 |
-
.encode(
|
36 |
-
x=alt.X("x", axis=None),
|
37 |
-
y=alt.Y("y", axis=None),
|
38 |
-
color=alt.Color("idx", legend=None, scale=alt.Scale()),
|
39 |
-
size=alt.Size("rand", legend=None, scale=alt.Scale(range=[1, 150])),
|
40 |
-
))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
utils.py
ADDED
@@ -0,0 +1,158 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import uuid
|
3 |
+
import logging
|
4 |
+
import requests
|
5 |
+
import traceback
|
6 |
+
import streamlit as st
|
7 |
+
from moviepy.video.io.VideoFileClip import VideoFileClip
|
8 |
+
from speechbrain.pretrained.interfaces import foreign_class
|
9 |
+
|
10 |
+
logging.basicConfig(
|
11 |
+
filename="app.log",
|
12 |
+
filemode="a",
|
13 |
+
format="%(asctime)s - %(levelname)s - %(message)s",
|
14 |
+
level=logging.INFO,
|
15 |
+
)
|
16 |
+
|
17 |
+
def download_file(video_url):
|
18 |
+
"""
|
19 |
+
Download a file from a URL and save it as a temporary file.
|
20 |
+
|
21 |
+
Args:
|
22 |
+
url (str): The URL to download from.
|
23 |
+
|
24 |
+
Returns:
|
25 |
+
str: Path to the downloaded temporary file.
|
26 |
+
"""
|
27 |
+
try:
|
28 |
+
video_id = str(uuid.uuid4())
|
29 |
+
video_filename = os.path.join(os.getcwd(), f"{video_id}_video.mp4")
|
30 |
+
with requests.get(video_url, stream=True) as r:
|
31 |
+
r.raise_for_status()
|
32 |
+
with open(video_filename, 'wb') as f:
|
33 |
+
for chunk in r.iter_content(chunk_size=8192):
|
34 |
+
if chunk:
|
35 |
+
f.write(chunk)
|
36 |
+
logging.info(f"Downloaded video to {video_filename}")
|
37 |
+
return video_filename
|
38 |
+
except Exception as e:
|
39 |
+
logging.error(f"Error downloading video: {e}\n{traceback.format_exc()}")
|
40 |
+
raise RuntimeError("Failed to download the video. Please try another video.")
|
41 |
+
|
42 |
+
def extract_audio(video_path):
|
43 |
+
"""
|
44 |
+
Extract up to 60 seconds of audio from the input video file.
|
45 |
+
Saves the extracted audio as a temporary .wav file.
|
46 |
+
|
47 |
+
Args:
|
48 |
+
video_path (str): Path to the input video file.
|
49 |
+
|
50 |
+
Returns:
|
51 |
+
str: Path to the extracted audio file.
|
52 |
+
"""
|
53 |
+
try:
|
54 |
+
video = VideoFileClip(video_path)
|
55 |
+
audio_duration = min(video.audio.duration, 60)
|
56 |
+
trimmed_audio = video.audio.subclipped(0, audio_duration)
|
57 |
+
audio_id = str(uuid.uuid4())
|
58 |
+
audio_filename = os.path.join(os.getcwd(), f"{audio_id}_audio.wav")
|
59 |
+
trimmed_audio.write_audiofile(audio_filename, codec='pcm_s16le', logger=None)
|
60 |
+
logging.info(f"Extracted audio to {audio_filename}")
|
61 |
+
return audio_filename
|
62 |
+
except Exception as e:
|
63 |
+
logging.error(f"Error extracting audio: {e}\n{traceback.format_exc()}")
|
64 |
+
raise RuntimeError("Sorry, we could not extract audio from the video. Please try another video.")
|
65 |
+
|
66 |
+
@st.cache_resource(show_spinner=False)
|
67 |
+
def load_classifier():
|
68 |
+
"""
|
69 |
+
Load the SpeechBrain accent classification model.
|
70 |
+
|
71 |
+
Returns:
|
72 |
+
foreign_class instance: Loaded classifier object.
|
73 |
+
"""
|
74 |
+
try:
|
75 |
+
classifier = foreign_class(
|
76 |
+
source="Jzuluaga/accent-id-commonaccent_xlsr-en-english",
|
77 |
+
pymodule_file="custom_interface.py",
|
78 |
+
classname="CustomEncoderWav2vec2Classifier"
|
79 |
+
)
|
80 |
+
logging.info("Loaded SpeechBrain accent classifier")
|
81 |
+
return classifier
|
82 |
+
except Exception as e:
|
83 |
+
logging.error(f"Error loading SpeechBrain classifier: {e}\n{traceback.format_exc()}")
|
84 |
+
raise RuntimeError("Failed to load the Classifier. Please try again later.")
|
85 |
+
|
86 |
+
def classify_accent(classifier, audio_path):
|
87 |
+
"""
|
88 |
+
Classify the English accent from the given audio file using the loaded classifier.
|
89 |
+
|
90 |
+
Args:
|
91 |
+
classifier (foreign_class): The loaded SpeechBrain classifier.
|
92 |
+
audio_path (str): Path to the audio file.
|
93 |
+
|
94 |
+
Returns:
|
95 |
+
tuple: (accent label (str), confidence score (float))
|
96 |
+
"""
|
97 |
+
try:
|
98 |
+
out_prob, score, index, text_lab = classifier.classify_file(audio_path)
|
99 |
+
logging.info(f"Classified accent: {text_lab} with confidence {float(score)*100:.2f}%")
|
100 |
+
return text_lab, score * 100
|
101 |
+
except Exception as e:
|
102 |
+
logging.error(f"Error classifying accent: {e}\n{traceback.format_exc()}")
|
103 |
+
raise RuntimeError("The accent model failed to load. Please try again later.")
|
104 |
+
|
105 |
+
def explain_accent(accent, confidence):
|
106 |
+
"""
|
107 |
+
Generate a human-readable explanation for the detected accent and confidence score.
|
108 |
+
|
109 |
+
Args:
|
110 |
+
accent (str): Detected accent label.
|
111 |
+
confidence (float): Confidence score (percentage).
|
112 |
+
|
113 |
+
Returns:
|
114 |
+
str: Explanation markdown string.
|
115 |
+
"""
|
116 |
+
return f"""
|
117 |
+
The system detected a **{accent}** English accent with **{float(confidence):.2f}% confidence**.
|
118 |
+
This score reflects how closely your voice matches typical speech patterns of native {accent} English speakers based on pronunciation, rhythm, and intonation.
|
119 |
+
|
120 |
+
The model analyzes vocal features using a neural network trained on speakers with known accents. While it can differentiate between major English accents, its accuracy may vary with noisy audio, strong regional variation, or non-native speakers.
|
121 |
+
"""
|
122 |
+
|
123 |
+
def process_video_url(video_url):
|
124 |
+
"""
|
125 |
+
End-to-end processing of the video URL:
|
126 |
+
- Download video file
|
127 |
+
- Extract audio (up to 60 seconds)
|
128 |
+
- Load classifier model
|
129 |
+
- Classify the accent
|
130 |
+
- Cleanup temporary files
|
131 |
+
|
132 |
+
Args:
|
133 |
+
video_url (str): URL of the public video file.
|
134 |
+
|
135 |
+
Returns:
|
136 |
+
tuple: (accent label (str), confidence score (float))
|
137 |
+
"""
|
138 |
+
video_path = None
|
139 |
+
audio_path = None
|
140 |
+
|
141 |
+
try:
|
142 |
+
video_path = download_file(video_url)
|
143 |
+
audio_path = extract_audio(video_path)
|
144 |
+
|
145 |
+
classifier = load_classifier()
|
146 |
+
accent, confidence = classify_accent(classifier, audio_path)
|
147 |
+
|
148 |
+
return accent[0].upper(), confidence
|
149 |
+
|
150 |
+
finally:
|
151 |
+
# Clean up temporary files if they exist
|
152 |
+
for path in [audio_path, video_path]:
|
153 |
+
if path and os.path.exists(path):
|
154 |
+
try:
|
155 |
+
os.remove(path)
|
156 |
+
logging.info(f"Removed temporary file: {path}")
|
157 |
+
except Exception as e:
|
158 |
+
logging.warning(f"Failed to remove temp file {path}: {e}")
|