daniel_whisper_finetune_base_v2

This model is a fine-tuned version of openai/whisper-base on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.3315

Model description

This is a personal fine-tune of the Whisper base model, trained on approximately 1 hour of audio featuring Daniel Rosehill's voice. The training data includes domain-specific vocabulary focused on:

Technology and software development terminology
A few Hebrew words and phrases

This model was created as a proof of concept for fine-tuning Whisper models for personal use and improved transcription accuracy on domain-specific content.

Training Infrastructure

Fine-tuning was performed using Modal GPU inference infrastructure.

Converted Formats

In addition to the standard SafeTensors format, this repository includes converted model formats in the converted/ directory:

GGML format (converted/ggml/): For use with whisper.cpp
- Cross-platform inference (desktop, mobile, edge devices)
- Optimized for CPU and CUDA (NVIDIA GPU) acceleration
- Compatible with iOS, Android, Raspberry Pi, and other platforms
CTranslate2 format (converted/ctranslate2/): For use with faster-whisper
- Highly optimized inference engine (4x faster than OpenAI Whisper)
- Excellent CPU and GPU (CUDA) support
- Lower memory usage with 8-bit and 16-bit quantization

Intended uses & limitations

This model is optimized for:

Transcribing Daniel Rosehill's voice
Technical and software development content
Mixed English with occasional Hebrew terms

Limitations:

Performance may degrade on voices significantly different from the training data
Limited to the vocabulary and accent patterns in the training set
Best suited for personal use rather than general-purpose transcription

Training and evaluation data

Training dataset consisted of approximately 1 hour of recorded audio featuring:

Technical discussions and software development content
Mixed English with occasional Hebrew vocabulary
Single speaker (Daniel Rosehill)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 6
eval_batch_size: 6
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 12
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 50
training_steps: 200
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
1.7005	0.9901	50	0.7727
0.3845	1.9703	100	0.4044
0.212	2.9505	150	0.3443
0.1624	3.9307	200	0.3315

Framework versions

Transformers 4.57.1
Pytorch 2.9.1+cu128
Datasets 4.4.1
Tokenizers 0.22.1

Downloads last month: 27

Safetensors

Model size

72.6M params

Tensor type

F32

Model tree for danielrosehill/daniel_whisper_finetune_base_v2

Base model

openai/whisper-base

Finetuned

(571)

this model

Collection including danielrosehill/daniel_whisper_finetune_base_v2

My Whisper Fine-Tunes (V2)

Collection

Whisper fine-tunes for my voice and vocab (tech, Hebrew). About 1 hour of training data so still very much POCs! • 5 items • Updated 17 days ago

danielrosehill
/

daniel_whisper_finetune_base_v2