Whisper Small – Bengali-English Code-Switching ASR

This model is a fine-tuned version of openai/whisper-small for automatic speech recognition (ASR) on Bengali-English code-switched audio.

It is trained to transcribe audio clips where the speaker switches between Bengali and English in natural conversation.

🧠 Model Details

  • Base Model: openai/whisper-small
  • Languages: Bengali (bn), English (en)
  • Fine-tuning task: Speech-to-text transcription
  • Use case: Lecture notes, interviews, social media, bilingual speech transcription
  • Training samples: 194 manually prepared code-switching audio chunks (~30s each)

πŸ“Š Evaluation

Metric Score
WER 0.4123
CER (your CER here)

Evaluation was done on a 10% held-out validation set from the original dataset.

πŸ“ Files

  • config.json, pytorch_model.bin: Fine-tuned weights
  • tokenizer.json, vocab.json, merges.txt: Whisper tokenizer
  • preprocessor_config.json: Feature extractor config

πŸ’‘ Usage

You can use the model directly with transformers:

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio

model_id = "YOUR_USERNAME/whisper-small-benglish"

processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

# Load audio and resample to 16kHz if needed
waveform, sr = torchaudio.load("your-audio.wav")
if sr != 16000:
    resampler = torchaudio.transforms.Resample(sr, 16000)
    waveform = resampler(waveform)

inputs = processor(waveform.squeeze().numpy(), sampling_rate=16000, return_tensors="pt")
predicted_ids = model.generate(inputs.input_features)
text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
Downloads last month
7
Safetensors
Model size
242M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support