Parakeet-DE-Med: German Medical ASR

Fine-tuned NVIDIA Parakeet-TDT-0.6B for German medical documentation transcription.

This model is a fine-tuned derivative of nvidia/parakeet-tdt-0.6b-v3, which is licensed under CC-BY-4.0.

Model Description

This model is a parameter-efficient fine-tuned (PEFT) version of NVIDIA's Parakeet-TDT-0.6B specialized for German medical documentation (Arztbriefe). It uses the decoder+joint training strategy, training only 2.89% of the model parameters while achieving significant improvements in medical domain accuracy.

  • Base Model: nvidia/parakeet-tdt-0.6b-v3
  • Language: German (de-DE)
  • Domain: Medical documentation
  • Training Method: PEFT (decoder+joint strategy)
  • Parameters Trained: 18.1M / 627M (2.89%)

Performance

Evaluated on German medical documentation test set (122 samples):

Model WER
Base Parakeet-TDT-0.6B 11.73%
Parakeet-DE-Med 3.28%

Improvement: 72% WER reduction

Training Details

  • Training Data: 976 German medical documentation samples
  • Training Epochs: 5
  • Training Strategy: Freeze encoder, train decoder and joint network only
  • Precision: BF16 mixed precision
  • Batch Size: 4 (effective batch size 16 with gradient accumulation)
  • Learning Rate: 2e-4

Usage

Prerequisites

pip install nemo_toolkit['asr']
# Or for the latest version:
pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[asr]

Basic Transcription

import nemo.collections.asr as nemo_asr

# Load the model from HuggingFace
model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")

# Transcribe a single audio file
transcription = model.transcribe(["path/to/audio.wav"])
print(transcription[0])

Batch Transcription

import nemo.collections.asr as nemo_asr

model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")

# Transcribe multiple files
audio_files = [
    "patient_recording_1.wav",
    "patient_recording_2.wav",
    "patient_recording_3.wav"
]

transcriptions = model.transcribe(audio_files, batch_size=4)
for i, text in enumerate(transcriptions):
    print(f"File {i+1}: {text}")

Real-time/Streaming Audio

import nemo.collections.asr as nemo_asr
import soundfile as sf

model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")

# Load audio file
audio, sample_rate = sf.read("medical_dictation.wav")

# Resample to 16kHz if needed
if sample_rate != 16000:
    import librosa
    audio = librosa.resample(audio, orig_sr=sample_rate, target_sr=16000)

# Transcribe from numpy array
transcription = model.transcribe(audio, batch_size=1)
print(transcription[0])

Advanced Configuration

import nemo.collections.asr as nemo_asr

model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")

# Configure decoding parameters
transcription = model.transcribe(
    paths2audio_files=["recording.wav"],
    batch_size=1,
    return_hypotheses=True,  # Get confidence scores
    num_workers=4,
    channel_selector=0,  # For multi-channel audio
    augmentor=None
)

# Access detailed results
for hyp in transcription:
    print(f"Text: {hyp.text}")
    print(f"Confidence: {hyp.score}")

Using with GPU

import nemo.collections.asr as nemo_asr
import torch

# Ensure GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Load model (automatically uses GPU if available)
model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")

# Transcribe
transcription = model.transcribe(["audio.wav"])
print(transcription[0])

Transcribing from Microphone

import nemo.collections.asr as nemo_asr
import pyaudio
import wave
import tempfile

def record_audio(duration=5, sample_rate=16000):
    """Record audio from microphone"""
    p = pyaudio.PyAudio()
    stream = p.open(
        format=pyaudio.paInt16,
        channels=1,
        rate=sample_rate,
        input=True,
        frames_per_buffer=1024
    )

    print(f"Recording for {duration} seconds...")
    frames = []
    for _ in range(0, int(sample_rate / 1024 * duration)):
        data = stream.read(1024)
        frames.append(data)

    print("Recording finished.")
    stream.stop_stream()
    stream.close()
    p.terminate()

    # Save to temporary file
    temp_file = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
    wf = wave.open(temp_file.name, 'wb')
    wf.setnchannels(1)
    wf.setsampwidth(p.get_sample_size(pyaudio.paInt16))
    wf.setframerate(sample_rate)
    wf.writeframes(b''.join(frames))
    wf.close()

    return temp_file.name

# Load model
model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")

# Record and transcribe
audio_file = record_audio(duration=5)
transcription = model.transcribe([audio_file])
print(f"Transcription: {transcription[0]}")

Expected Input Format

  • Sample Rate: 16 kHz (model will work with other rates but 16kHz is optimal)
  • Channels: Mono (single channel)
  • Format: WAV, FLAC, MP3, or any format supported by soundfile
  • Bit Depth: 16-bit PCM recommended

Medical Domain Coverage

The model is trained on comprehensive medical documentation including:

  • Patient presentation and admission
  • Medical history and examinations
  • Vital signs and lab results
  • Diagnoses and treatments
  • Medications and dosages
  • Discharge summaries
  • Follow-up recommendations

Limitations

  • Optimized for German medical documentation speech
  • Trained on synthetic speech data
  • May have reduced accuracy on non-medical German content
  • Performance may vary with different audio conditions and accents

Intended Use

This model is designed for:

  • βœ… German medical documentation transcription
  • βœ… Clinical note-taking assistance
  • βœ… Medical dictation systems
  • βœ… Research in medical ASR

Not recommended for:

  • ❌ Critical medical decisions without human review
  • ❌ General-purpose German ASR (use base model instead)
  • ❌ Languages other than German

License

This model is licensed under CC-BY-4.0, the same license as the base model NVIDIA Parakeet-TDT-0.6B-v3.

You are free to:

  • βœ… Use commercially
  • βœ… Modify and create derivatives
  • βœ… Distribute and share

Under the following terms:

  • Attribution β€” You must give appropriate credit to both NVIDIA (original model) and this fine-tuned version, provide a link to the license, and indicate if changes were made.

Citation

Base model:

@misc{parakeet-tdt-0.6b-v3,
  author = {NVIDIA},
  title = {Parakeet-TDT-0.6B},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3}
}

Contact

For questions or issues, please open an issue on the repository.

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for johannhartmann/parakeet_de_med

Finetuned
(3)
this model

Evaluation results