Whisper Small - Arabic Multi-Dialect
Fine-tuned Whisper Small model for Arabic speech recognition across multiple dialects.
Model Details
Model Description
This is a fine-tuned version of OpenAI's Whisper Small model, trained on Arabic multi-dialect speech data for automatic speech recognition tasks.
- Developed by: Madlook
- Model type: Whisper (Encoder-Decoder Transformer)
- Language(s): Arabic
- License: Apache 2.0
- Finetuned from model: openai/whisper-small
Model Sources
- Repository: https://huggingface.co/openai/whisper-small
- Paper: Robust Speech Recognition via Large-Scale Weak Supervision
Uses
Direct Use
This model can be used for automatic transcription of Arabic speech across multiple dialects. It processes audio files and outputs Arabic text transcriptions.
Downstream Use
Can be integrated into:
- Voice assistants for Arabic speakers
- Subtitle generation systems
- Voice-to-text applications
- Arabic language learning tools
Out-of-Scope Use
- Not suitable for production-critical applications without further validation
- Not designed for languages other than Arabic
- Not recommended for medical or legal transcription requiring high accuracy
Bias, Risks, and Limitations
- Moderate accuracy with 48.85% WER on validation set
- Performance varies across different Arabic dialects
- Best results on clear, high-quality audio
- Trained on limited dataset (40% subset)
- May not generalize well to domain-specific vocabulary
Recommendations
Users should validate performance on their specific use case before deployment. Consider additional fine-tuning for specific dialects or domains.
How to Get Started with the Model
from transformers import pipeline
transcriber = pipeline(
"automatic-speech-recognition",
model="MadLook/whisper-small-arabic-multidialect"
)
result = transcriber("arabic_audio.mp3")
print(result["text"])
Or with more control:
import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor
import librosa
# Load model and processor
model = WhisperForConditionalGeneration.from_pretrained("MadLook/whisper-small-arabic-multidialect")
processor = WhisperProcessor.from_pretrained("MadLook/whisper-small-arabic-multidialect")
# Load audio
audio, sr = librosa.load("arabic_audio.mp3", sr=16000)
# Process
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
# Generate transcription
with torch.no_grad():
predicted_ids = model.generate(input_features)
# Decode
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
Training Details
Training Data
Trained on 40% subset of Arabic multi-dialect speech dataset:
- Training samples: 37,835
- Validation samples: 2,628
- Test samples: 2,628
Training Procedure
Training Hyperparameters
- Training epochs: 7
- Training batch size: 12
- Evaluation batch size: 12
- Learning rate: 1e-5
- Warmup steps: 300
- Optimizer: AdamW
- LR scheduler: Cosine
- Training regime: fp16 mixed precision
- Gradient checkpointing: Enabled
- Gradient accumulation steps: 1
Evaluation
Testing Data, Factors & Metrics
Testing Data
Validation set: 2,628 samples from Arabic multi-dialect dataset
Metrics
- WER (Word Error Rate): Primary metric
- CER (Character Error Rate): Secondary metric
Results
Validation Set Performance:
- WER: 48.85%
Technical Specifications
Model Architecture and Objective
- Architecture: Whisper Small (Encoder-Decoder Transformer)
- Parameters: ~244M
- Objective: Sequence-to-sequence speech recognition
- Input: 80-channel log-mel spectrogram
- Output: Arabic text tokens
Compute Infrastructure
Hardware
- GPU: CUDA-enabled GPU with 25GB VRAM
- Training time: ~6 hours
Software
- Framework: Transformers + PyTorch
- Precision: FP16 mixed precision training
Citation
BibTeX:
@article{radford2022whisper,
title={Robust Speech Recognition via Large-Scale Weak Supervision},
author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
journal={arXiv preprint arXiv:2212.04356},
year={2022}
}
Model Card Authors
Madlook
- Downloads last month
- 41
Model tree for MadLook/whisper-small-arabic-multidialect
Base model
openai/whisper-small