Spaces:

kedar432
/

accent_classifier

Sleeping

App Files Files Community

kedar432 commited on Jun 22

Commit

465b605

1 Parent(s): f01c181

first commit

Browse files

Files changed (7) hide show

Dockerfile +5 -14
README.md +73 -4
app.log +28 -0
app.py +36 -0
requirements.txt +5 -3
src/streamlit_app.py +0 -40
utils.py +158 -0

Dockerfile CHANGED Viewed

@@ -1,21 +1,12 @@
-FROM python:3.9-slim
 WORKDIR /app
-RUN apt-get update && apt-get install -y \
-    build-essential \
-    curl \
-    software-properties-common \
-    git \
-    && rm -rf /var/lib/apt/lists/*
 COPY requirements.txt ./
-COPY src/ ./src/
-RUN pip3 install -r requirements.txt
-EXPOSE 8501
-HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
-ENTRYPOINT ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]

+FROM python:3.12.11-slim
 WORKDIR /app
+RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*
 COPY requirements.txt ./
+COPY app.py ./
+RUN pip3 install --no-cache-dir -r requirements.txt
+ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

README.md CHANGED Viewed

@@ -11,9 +11,78 @@ pinned: false
 short_description: Streamlit template space
 ---
-# Welcome to Streamlit!
-Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).

 short_description: Streamlit template space
 ---
+# English Accent Detector (SpeechBrain)
+This Streamlit app detects English accents from speech in public video URLs using the SpeechBrain accent classification model.
+---
+## Features
+- Input a public video URL (MP4, Loom, etc.)
+- Downloads the video
+- Extracts up to 60 seconds of audio
+- Classifies English accent with confidence score
+- Provides an explanation of the detected accent
+---
+## Requirements
+- Python 3.12 or higher
+- ffmpeg installed and available in PATH (required by `moviepy`)
+- Internet connection (to download videos and model weights)
+---
+## Setup
+1. **Clone the repo** (or copy your project files):
+    ```bash
+    git clone https://github.com/Kedar43/accent_detector.git
+    cd accent_detector
+    ```
+2. **Create and activate a virtual environment (optional but recommended):**
+    ```bash
+    python -m venv venv
+    source venv/bin/activate   # On Windows: venv\Scripts\activate
+    ```
+3. **Install dependencies:**
+    ```bash
+    pip install -r requirements.txt
+    ```
+---
+## Usage
+Run the Streamlit app:
+```bash
+streamlit run app.py
+```
+- This will open a browser window/tab with the app interface.
+- Paste a public video URL (must be MP4).
+- Wait while the app downloads the video and processes audio (up to 60 seconds).
+- View the detected English accent, confidence score, and explanation.
+---
+## Testing the app
+- Use sample public MP4 videos containing English speech with distinct accents.
+- The app logs runtime info and errors to app.log in the working directory.
+- If errors occur, check app.log for detailed traceback and messages.
+---
+## Notes
+- The SpeechBrain model is loaded once and cached to improve performance on repeated runs.
+- Temporary video and audio files are deleted automatically after processing.
+- Accuracy depends on the quality of audio and the SpeechBrain model’s training data.
+- Make sure video URLs are publicly accessible without authentication.

app.log ADDED Viewed

	@@ -0,0 +1,28 @@

+2025-06-22 18:07:14,337 - INFO - Downloaded video to /Users/kedarpatel/Downloads/accent_detector/31298032-a128-4fb0-8ab0-49b45db1d7b0_video.mp4
+2025-06-22 18:07:14,549 - INFO - Extracted audio to /Users/kedarpatel/Downloads/accent_detector/fb42e3f5-3e52-4682-a17e-59061379d57f_audio.wav
+2025-06-22 18:07:14,549 - INFO - Fetch hyperparams.yaml: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/hyperparams.yaml.
+2025-06-22 18:07:14,549 - INFO - Fetch custom_interface.py: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/custom_interface.py.
+2025-06-22 18:07:16,840 - WARNING - speechbrain.lobes.models.huggingface_wav2vec - wav2vec 2.0 is frozen.
+2025-06-22 18:07:16,841 - INFO - Fetch wav2vec2.ckpt: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/wav2vec2.ckpt.
+2025-06-22 18:07:16,841 - INFO - Fetch model.ckpt: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/model.ckpt.
+2025-06-22 18:07:16,841 - INFO - Fetch label_encoder.txt: Using existing file/symlink in pretrained_models/CustomEncoderWav2vec2Classifier-a72df039c801fa14a1c3226e95ab8c14/label_encoder.ckpt.
+2025-06-22 18:07:16,841 - INFO - Loading pretrained files for: wav2vec2, model, label_encoder
+2025-06-22 18:07:17,275 - INFO - Loaded SpeechBrain accent classifier
+2025-06-22 18:07:17,276 - INFO - Fetch fb42e3f5-3e52-4682-a17e-59061379d57f_audio.wav: Using existing file/symlink in fb42e3f5-3e52-4682-a17e-59061379d57f_audio.wav.
+2025-06-22 18:07:20,829 - INFO - Classified accent: ['england'] with confidence 67.67%
+2025-06-22 18:07:20,829 - INFO - Removed temporary file: /Users/kedarpatel/Downloads/accent_detector/fb42e3f5-3e52-4682-a17e-59061379d57f_audio.wav
+2025-06-22 18:07:20,830 - INFO - Removed temporary file: /Users/kedarpatel/Downloads/accent_detector/31298032-a128-4fb0-8ab0-49b45db1d7b0_video.mp4
+2025-06-22 20:20:37,793 - INFO - Downloaded video to /Users/kedarpatel/Downloads/accent_detector/9c59b57e-1a0e-45fd-9cab-3cc5ffc4fef7_video.mp4
+2025-06-22 20:20:38,014 - INFO - Extracted audio to /Users/kedarpatel/Downloads/accent_detector/a69a14a4-dd9f-4498-a9ee-6b6b33c5f70d_audio.wav
+2025-06-22 20:20:38,015 - INFO - Fetch hyperparams.yaml: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
+2025-06-22 20:20:38,253 - INFO - Fetch custom_interface.py: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
+2025-06-22 20:20:44,671 - WARNING - speechbrain.lobes.models.huggingface_wav2vec - wav2vec 2.0 is frozen.
+2025-06-22 20:20:44,673 - INFO - Fetch wav2vec2.ckpt: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
+2025-06-22 20:20:44,814 - INFO - Fetch model.ckpt: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
+2025-06-22 20:20:44,955 - INFO - Fetch label_encoder.txt: Delegating to Huggingface hub, source Jzuluaga/accent-id-commonaccent_xlsr-en-english.
+2025-06-22 20:20:45,090 - INFO - Loading pretrained files for: wav2vec2, model, label_encoder
+2025-06-22 20:20:45,659 - INFO - Loaded SpeechBrain accent classifier
+2025-06-22 20:20:45,660 - INFO - Fetch a69a14a4-dd9f-4498-a9ee-6b6b33c5f70d_audio.wav: Using existing file/symlink in a69a14a4-dd9f-4498-a9ee-6b6b33c5f70d_audio.wav.
+2025-06-22 20:20:50,454 - INFO - Classified accent: ['england'] with confidence 67.67%
+2025-06-22 20:20:50,455 - INFO - Removed temporary file: /Users/kedarpatel/Downloads/accent_detector/a69a14a4-dd9f-4498-a9ee-6b6b33c5f70d_audio.wav
+2025-06-22 20:20:50,455 - INFO - Removed temporary file: /Users/kedarpatel/Downloads/accent_detector/9c59b57e-1a0e-45fd-9cab-3cc5ffc4fef7_video.mp4

app.py ADDED Viewed

	@@ -0,0 +1,36 @@

+import os
+import streamlit as st
+from huggingface_hub import login
+from utils import process_video_url, explain_accent
+hf_token = os.getenv("HF_HUB_TOKEN")
+if hf_token:
+    login(hf_token)
+# Configure Streamlit page settings
+st.set_page_config(page_title="English Accent Detector", layout="centered")
+st.title("🎤 English Accent Detector (SpeechBrain)")
+# Input field for user to enter a video URL
+video_url = st.text_input("Paste public video URL (MP4, Loom, etc.):")
+if video_url:
+    try:
+        # Show spinner while processing the video and analyzing accent
+        with st.spinner("Processing video and analyzing accent..."):
+            accent, confidence = process_video_url(video_url)
+        # Display results with confidence scores and explanation
+        st.success("Analysis complete!")
+        st.markdown(f"### 🗣️ Detected Accent: **{accent}**")
+        st.markdown(f"### 📊 Confidence Score: **{float(confidence):.2f}%**")
+        st.markdown("---")
+        st.markdown("### ℹ️ Accent Explanation")
+        st.markdown(explain_accent(accent, confidence))
+    except RuntimeError as err:
+        # Handle known runtime errors gracefully in the UI
+        st.error(f"⚠️ {err}")
+    except Exception as err:
+        # Catch-all for unexpected errors with generic user message
+        st.error("⚠️ An unexpected error occurred. Please check the log file.")

requirements.txt CHANGED Viewed

@@ -1,3 +1,5 @@
-altair
-pandas
-streamlit

+moviepy==2.2.1
+soundfile==0.13.1
+speechbrain==0.5.13
+streamlit==1.46.0
+transformers==4.52.4

src/streamlit_app.py DELETED Viewed

@@ -1,40 +0,0 @@
-import altair as alt
-import numpy as np
-import pandas as pd
-import streamlit as st
-"""
-# Welcome to Streamlit!
-Edit `/streamlit_app.py` to customize this app to your heart's desire :heart:.
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).
-In the meantime, below is an example of what you can do with just a few lines of code:
-"""
-num_points = st.slider("Number of points in spiral", 1, 10000, 1100)
-num_turns = st.slider("Number of turns in spiral", 1, 300, 31)
-indices = np.linspace(0, 1, num_points)
-theta = 2 * np.pi * num_turns * indices
-radius = indices
-x = radius * np.cos(theta)
-y = radius * np.sin(theta)
-df = pd.DataFrame({
-    "x": x,
-    "y": y,
-    "idx": indices,
-    "rand": np.random.randn(num_points),
-})
-st.altair_chart(alt.Chart(df, height=700, width=700)
-    .mark_point(filled=True)
-    .encode(
-        x=alt.X("x", axis=None),
-        y=alt.Y("y", axis=None),
-        color=alt.Color("idx", legend=None, scale=alt.Scale()),
-        size=alt.Size("rand", legend=None, scale=alt.Scale(range=[1, 150])),
-    ))

utils.py ADDED Viewed

	@@ -0,0 +1,158 @@

+import os
+import uuid
+import logging
+import requests
+import traceback
+import streamlit as st
+from moviepy.video.io.VideoFileClip import VideoFileClip
+from speechbrain.pretrained.interfaces import foreign_class
+logging.basicConfig(
+    filename="app.log",
+    filemode="a",
+    format="%(asctime)s - %(levelname)s - %(message)s",
+    level=logging.INFO,
+)
+def download_file(video_url):
+    """
+    Download a file from a URL and save it as a temporary file.
+    Args:
+        url (str): The URL to download from.
+    Returns:
+        str: Path to the downloaded temporary file.
+    """
+    try:
+        video_id = str(uuid.uuid4())
+        video_filename = os.path.join(os.getcwd(), f"{video_id}_video.mp4")
+        with requests.get(video_url, stream=True) as r:
+            r.raise_for_status()
+            with open(video_filename, 'wb') as f:
+                for chunk in r.iter_content(chunk_size=8192):
+                    if chunk:
+                        f.write(chunk)
+        logging.info(f"Downloaded video to {video_filename}")
+        return video_filename
+    except Exception as e:
+        logging.error(f"Error downloading video: {e}\n{traceback.format_exc()}")
+        raise RuntimeError("Failed to download the video. Please try another video.")
+def extract_audio(video_path):
+    """
+    Extract up to 60 seconds of audio from the input video file.
+    Saves the extracted audio as a temporary .wav file.
+    Args:
+        video_path (str): Path to the input video file.
+    Returns:
+        str: Path to the extracted audio file.
+    """
+    try:
+        video = VideoFileClip(video_path)
+        audio_duration = min(video.audio.duration, 60)
+        trimmed_audio = video.audio.subclipped(0, audio_duration)
+        audio_id = str(uuid.uuid4())
+        audio_filename = os.path.join(os.getcwd(), f"{audio_id}_audio.wav")
+        trimmed_audio.write_audiofile(audio_filename, codec='pcm_s16le', logger=None)
+        logging.info(f"Extracted audio to {audio_filename}")
+        return audio_filename
+    except Exception as e:
+        logging.error(f"Error extracting audio: {e}\n{traceback.format_exc()}")
+        raise RuntimeError("Sorry, we could not extract audio from the video. Please try another video.")
+@st.cache_resource(show_spinner=False)
+def load_classifier():
+    """
+    Load the SpeechBrain accent classification model.
+    Returns:
+        foreign_class instance: Loaded classifier object.
+    """
+    try:
+        classifier = foreign_class(
+            source="Jzuluaga/accent-id-commonaccent_xlsr-en-english",
+            pymodule_file="custom_interface.py",
+            classname="CustomEncoderWav2vec2Classifier"
+        )
+        logging.info("Loaded SpeechBrain accent classifier")
+        return classifier
+    except Exception as e:
+        logging.error(f"Error loading SpeechBrain classifier: {e}\n{traceback.format_exc()}")
+        raise RuntimeError("Failed to load the Classifier. Please try again later.")
+def classify_accent(classifier, audio_path):
+    """
+    Classify the English accent from the given audio file using the loaded classifier.
+    Args:
+        classifier (foreign_class): The loaded SpeechBrain classifier.
+        audio_path (str): Path to the audio file.
+    Returns:
+        tuple: (accent label (str), confidence score (float))
+    """
+    try:
+        out_prob, score, index, text_lab = classifier.classify_file(audio_path)
+        logging.info(f"Classified accent: {text_lab} with confidence {float(score)*100:.2f}%")
+        return text_lab, score * 100
+    except Exception as e:
+        logging.error(f"Error classifying accent: {e}\n{traceback.format_exc()}")
+        raise RuntimeError("The accent model failed to load. Please try again later.")
+def explain_accent(accent, confidence):
+    """
+    Generate a human-readable explanation for the detected accent and confidence score.
+    Args:
+        accent (str): Detected accent label.
+        confidence (float): Confidence score (percentage).
+    Returns:
+        str: Explanation markdown string.
+    """
+    return f"""
+        The system detected a **{accent}** English accent with **{float(confidence):.2f}% confidence**.
+        This score reflects how closely your voice matches typical speech patterns of native {accent} English speakers based on pronunciation, rhythm, and intonation.
+        The model analyzes vocal features using a neural network trained on speakers with known accents. While it can differentiate between major English accents, its accuracy may vary with noisy audio, strong regional variation, or non-native speakers.
+    """
+def process_video_url(video_url):
+    """
+    End-to-end processing of the video URL:
+    - Download video file
+    - Extract audio (up to 60 seconds)
+    - Load classifier model
+    - Classify the accent
+    - Cleanup temporary files
+    Args:
+        video_url (str): URL of the public video file.
+    Returns:
+        tuple: (accent label (str), confidence score (float))
+    """
+    video_path = None
+    audio_path = None
+    try:
+        video_path = download_file(video_url)
+        audio_path = extract_audio(video_path)
+        classifier = load_classifier()
+        accent, confidence = classify_accent(classifier, audio_path)
+        return accent[0].upper(), confidence
+    finally:
+        # Clean up temporary files if they exist
+        for path in [audio_path, video_path]:
+            if path and os.path.exists(path):
+                try:
+                    os.remove(path)
+                    logging.info(f"Removed temporary file: {path}")
+                except Exception as e:
+                    logging.warning(f"Failed to remove temp file {path}: {e}")