Chillarmo's picture
Upload WhisperForConditionalGeneration (#1)
cd57192 verified
metadata
license: apache-2.0
datasets:
  - Chillarmo/common_voice_20_armenian
language:
  - hy
base_model:
  - openai/whisper-base
pipeline_tag: automatic-speech-recognition
library_name: transformers
model-index:
  - name: whisper-base-armenian
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: Common Voice 20 Armenian
          type: Chillarmo/common_voice_20_armenian
        metrics:
          - type: wer
            value: 33.186880780299205
            name: Word Error Rate
          - type: cer
            value: 6.983058800639766
            name: Character Error Rate
          - type: bleu
            value: 47.70616594276946
            name: BLEU Score
          - type: exact_match
            value: 16.49590163934426
            name: Exact Match

Whisper-Base Fine-tuned for Armenian ASR

This model is a fine-tuned version of OpenAI's Whisper-base on the Common Voice 20 Armenian dataset for automatic speech recognition.

Training Results

The model was trained for 5.34 epochs with the following final results:

Metric Value
Training Loss 0.122
Training Runtime 10,924 seconds (≈3.03 hours)
Training Samples/Second 7.32
Training Steps/Second 0.46
Total Training Steps 5,000
Epochs 5.34

Evaluation Results

Metric Value
Evaluation Loss 0.201
Word Error Rate (WER) 33.19%
Character Error Rate (CER) 6.98%
BLEU Score 47.71
Exact Match 16.50%
Average Prediction Length 7.69 tokens
Average Label Length 7.77 tokens
Length Ratio 0.989
Evaluation Runtime 1,590 seconds (≈26.5 minutes)
Evaluation Samples/Second 3.68
Evaluation Steps/Second 0.46

Model Details

  • Base Model: openai/whisper-base
  • Language: Armenian (hy)
  • Dataset: Chillarmo/common_voice_20_armenian
  • License: Apache 2.0

Notes

During model loading, there were missing keys in the checkpoint: ['proj_out.weight']. This is a common occurrence when fine-tuning Whisper models and typically doesn't affect performance significantly.