Whisper Small – Bengali-English Code-Switching ASR

This model is a fine-tuned version of openai/whisper-small for automatic speech recognition (ASR) on Bengali-English code-switched audio.

It is trained to transcribe audio clips where the speaker switches between Bengali and English in natural conversation.

🧠 Model Details

Base Model: openai/whisper-small
Languages: Bengali (bn), English (en)
Fine-tuning task: Speech-to-text transcription
Use case: Lecture notes, interviews, social media, bilingual speech transcription
Training samples: 194 manually prepared code-switching audio chunks (~30s each)

📊 Evaluation

Metric	Score
WER	0.4123
CER	(your CER here)

Evaluation was done on a 10% held-out validation set from the original dataset.

📁 Files

config.json, pytorch_model.bin: Fine-tuned weights
tokenizer.json, vocab.json, merges.txt: Whisper tokenizer
preprocessor_config.json: Feature extractor config

💡 Usage

You can use the model directly with transformers:

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio

model_id = "YOUR_USERNAME/whisper-small-benglish"

processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

# Load audio and resample to 16kHz if needed
waveform, sr = torchaudio.load("your-audio.wav")
if sr != 16000:
    resampler = torchaudio.transforms.Resample(sr, 16000)
    waveform = resampler(waveform)

inputs = processor(waveform.squeeze().numpy(), sampling_rate=16000, return_tensors="pt")
predicted_ids = model.generate(inputs.input_features)
text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]