Whisper Small β Bengali-English Code-Switching ASR
This model is a fine-tuned version of openai/whisper-small
for automatic speech recognition (ASR) on Bengali-English code-switched audio.
It is trained to transcribe audio clips where the speaker switches between Bengali and English in natural conversation.
π§ Model Details
- Base Model:
openai/whisper-small
- Languages: Bengali (bn), English (en)
- Fine-tuning task: Speech-to-text transcription
- Use case: Lecture notes, interviews, social media, bilingual speech transcription
- Training samples: 194 manually prepared code-switching audio chunks (~30s each)
π Evaluation
Metric | Score |
---|---|
WER | 0.4123 |
CER | (your CER here) |
Evaluation was done on a 10% held-out validation set from the original dataset.
π Files
config.json
,pytorch_model.bin
: Fine-tuned weightstokenizer.json
,vocab.json
,merges.txt
: Whisper tokenizerpreprocessor_config.json
: Feature extractor config
π‘ Usage
You can use the model directly with transformers
:
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio
model_id = "YOUR_USERNAME/whisper-small-benglish"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)
# Load audio and resample to 16kHz if needed
waveform, sr = torchaudio.load("your-audio.wav")
if sr != 16000:
resampler = torchaudio.transforms.Resample(sr, 16000)
waveform = resampler(waveform)
inputs = processor(waveform.squeeze().numpy(), sampling_rate=16000, return_tensors="pt")
predicted_ids = model.generate(inputs.input_features)
text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support