Automatic Speech Recognition for Shona

Model Description 🫐

This model is a fine-tuned version of Wav2Vec2-BERT 2.0 for Shona automatic speech recognition (ASR). It was trained on 72 hours of transcribed Shona speech. The ASR model is robust and the in-domain WER is below 23%.

Developed by: Badr al-Absi
Model type: Speech Recognition (ASR)
Language: Shona (sn)
License: CC-BY-4.0
Finetuned from: facebook/w2v-bert-2.0

Direct Use

The model can be used directly for automatic speech recognition of a Shona audio:

from transformers import Wav2Vec2BertProcessor, Wav2Vec2BertForCTC
import torch
import torchaudio

# load model and processor
processor = Wav2Vec2BertProcessor.from_pretrained("badrex/w2v-bert-2.0-shona-asr")
model = Wav2Vec2BertForCTC.from_pretrained("badrex/w2v-bert-2.0-shona-asr")

# load audio
audio_input, sample_rate = torchaudio.load("path/to/audio.wav")

# preprocess
inputs = processor(audio_input.squeeze(), sampling_rate=sample_rate, return_tensors="pt")

# inference
with torch.no_grad():
    logits = model(**inputs).logits

# decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
print(transcription)

Downstream Use

This model can be used as a foundation for:

building voice assistants for Shona speakers
transcription services for Shona content
accessibility tools for Shona-speaking communities
research in low-resource speech recognition

Model Architecture

Base model: Wav2Vec2-BERT 2.0
Architecture: transformer-based with convolutional feature extractor
Parameters: ~600M (inherited from base model)
Objective: connectionist temporal classification (CTC)

Funding

The development of this model was supported by CLEAR Global and Gates Foundation.

Citation

@misc{w2v_bert_shona_asr,
  author = {Badr M. Abdullah},
  title = {Adapting Wav2Vec2-BERT 2.0 for Shona ASR},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/badrex/w2v-bert-2.0-shona-asr}
}

Model Card Contact

For questions or issues, please contact via the Hugging Face model repository in the community discussion section.

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for badrex/w2v-bert-2.0-shona-asr

Base model

facebook/w2v-bert-2.0

Finetuned

(371)

this model

Dataset used to train badrex/w2v-bert-2.0-shona-asr

Collection including badrex/w2v-bert-2.0-shona-asr

ASR for African Voices 🌍

Collection

Robust speech-to-text models for languages of Africa • 9 items • Updated about 18 hours ago • 1