Parakeet-DE-Med: German Medical ASR

Fine-tuned NVIDIA Parakeet-TDT-0.6B for German medical documentation transcription.

This model is a fine-tuned derivative of nvidia/parakeet-tdt-0.6b-v3, which is licensed under CC-BY-4.0.

Model Description

This model is a parameter-efficient fine-tuned (PEFT) version of NVIDIA's Parakeet-TDT-0.6B specialized for German medical documentation (Arztbriefe). It uses the decoder+joint training strategy, training only 2.89% of the model parameters while achieving significant improvements in medical domain accuracy.

Base Model: nvidia/parakeet-tdt-0.6b-v3
Language: German (de-DE)
Domain: Medical documentation
Training Method: PEFT (decoder+joint strategy)
Parameters Trained: 18.1M / 627M (2.89%)

Performance

Evaluated on German medical documentation test set (122 samples):

Model	WER
Base Parakeet-TDT-0.6B	11.73%
Parakeet-DE-Med	3.28%

Improvement: 72% WER reduction

Training Details

Training Data: 976 German medical documentation samples
Training Epochs: 5
Training Strategy: Freeze encoder, train decoder and joint network only
Precision: BF16 mixed precision
Batch Size: 4 (effective batch size 16 with gradient accumulation)
Learning Rate: 2e-4

Usage

Prerequisites

pip install nemo_toolkit['asr']
# Or for the latest version:
pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[asr]

Basic Transcription

import nemo.collections.asr as nemo_asr

# Load the model from HuggingFace
model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")

# Transcribe a single audio file
transcription = model.transcribe(["path/to/audio.wav"])
print(transcription[0])

Batch Transcription

import nemo.collections.asr as nemo_asr

model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")

# Transcribe multiple files
audio_files = [
    "patient_recording_1.wav",
    "patient_recording_2.wav",
    "patient_recording_3.wav"
]

transcriptions = model.transcribe(audio_files, batch_size=4)
for i, text in enumerate(transcriptions):
    print(f"File {i+1}: {text}")

Real-time/Streaming Audio

import nemo.collections.asr as nemo_asr
import soundfile as sf

model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")

# Load audio file
audio, sample_rate = sf.read("medical_dictation.wav")

# Resample to 16kHz if needed
if sample_rate != 16000:
    import librosa
    audio = librosa.resample(audio, orig_sr=sample_rate, target_sr=16000)

# Transcribe from numpy array
transcription = model.transcribe(audio, batch_size=1)
print(transcription[0])

Advanced Configuration

import nemo.collections.asr as nemo_asr

model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")

# Configure decoding parameters
transcription = model.transcribe(
    paths2audio_files=["recording.wav"],
    batch_size=1,
    return_hypotheses=True,  # Get confidence scores
    num_workers=4,
    channel_selector=0,  # For multi-channel audio
    augmentor=None
)

# Access detailed results
for hyp in transcription:
    print(f"Text: {hyp.text}")
    print(f"Confidence: {hyp.score}")

Using with GPU

import nemo.collections.asr as nemo_asr
import torch

# Ensure GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Load model (automatically uses GPU if available)
model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")

# Transcribe
transcription = model.transcribe(["audio.wav"])
print(transcription[0])

Transcribing from Microphone

import nemo.collections.asr as nemo_asr
import pyaudio
import wave
import tempfile

def record_audio(duration=5, sample_rate=16000):
    """Record audio from microphone"""
    p = pyaudio.PyAudio()
    stream = p.open(
        format=pyaudio.paInt16,
        channels=1,
        rate=sample_rate,
        input=True,
        frames_per_buffer=1024
    )

    print(f"Recording for {duration} seconds...")
    frames = []
    for _ in range(0, int(sample_rate / 1024 * duration)):
        data = stream.read(1024)
        frames.append(data)

    print("Recording finished.")
    stream.stop_stream()
    stream.close()
    p.terminate()

    # Save to temporary file
    temp_file = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
    wf = wave.open(temp_file.name, 'wb')
    wf.setnchannels(1)
    wf.setsampwidth(p.get_sample_size(pyaudio.paInt16))
    wf.setframerate(sample_rate)
    wf.writeframes(b''.join(frames))
    wf.close()

    return temp_file.name

# Load model
model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")

# Record and transcribe
audio_file = record_audio(duration=5)
transcription = model.transcribe([audio_file])
print(f"Transcription: {transcription[0]}")

Expected Input Format

Sample Rate: 16 kHz (model will work with other rates but 16kHz is optimal)
Channels: Mono (single channel)
Format: WAV, FLAC, MP3, or any format supported by soundfile
Bit Depth: 16-bit PCM recommended

Medical Domain Coverage

The model is trained on comprehensive medical documentation including:

Patient presentation and admission
Medical history and examinations
Vital signs and lab results
Diagnoses and treatments
Medications and dosages
Discharge summaries
Follow-up recommendations

Limitations

Optimized for German medical documentation speech
Trained on synthetic speech data
May have reduced accuracy on non-medical German content
Performance may vary with different audio conditions and accents

Intended Use

This model is designed for:

✅ German medical documentation transcription
✅ Clinical note-taking assistance
✅ Medical dictation systems
✅ Research in medical ASR

Not recommended for:

❌ Critical medical decisions without human review
❌ General-purpose German ASR (use base model instead)
❌ Languages other than German

License

This model is licensed under CC-BY-4.0, the same license as the base model NVIDIA Parakeet-TDT-0.6B-v3.

You are free to:

✅ Use commercially
✅ Modify and create derivatives
✅ Distribute and share

Under the following terms:

Attribution — You must give appropriate credit to both NVIDIA (original model) and this fine-tuned version, provide a link to the license, and indicate if changes were made.

Citation

Base model:

@misc{parakeet-tdt-0.6b-v3,
  author = {NVIDIA},
  title = {Parakeet-TDT-0.6B},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3}
}

Contact

For questions or issues, please open an issue on the repository.

Downloads last month: 12

Model tree for johannhartmann/parakeet_de_med

Base model

nvidia/parakeet-tdt-0.6b-v3

Finetuned

(3)

this model

Evaluation results

Word Error Rate on German Medical Documentation
self-reported

3.280

View on Papers With Code