Parakeet-DE-Med: German Medical ASR
Fine-tuned NVIDIA Parakeet-TDT-0.6B for German medical documentation transcription.
This model is a fine-tuned derivative of nvidia/parakeet-tdt-0.6b-v3, which is licensed under CC-BY-4.0.
Model Description
This model is a parameter-efficient fine-tuned (PEFT) version of NVIDIA's Parakeet-TDT-0.6B specialized for German medical documentation (Arztbriefe). It uses the decoder+joint training strategy, training only 2.89% of the model parameters while achieving significant improvements in medical domain accuracy.
- Base Model: nvidia/parakeet-tdt-0.6b-v3
- Language: German (de-DE)
- Domain: Medical documentation
- Training Method: PEFT (decoder+joint strategy)
- Parameters Trained: 18.1M / 627M (2.89%)
Performance
Evaluated on German medical documentation test set (122 samples):
| Model | WER |
|---|---|
| Base Parakeet-TDT-0.6B | 11.73% |
| Parakeet-DE-Med | 3.28% |
Improvement: 72% WER reduction
Training Details
- Training Data: 976 German medical documentation samples
- Training Epochs: 5
- Training Strategy: Freeze encoder, train decoder and joint network only
- Precision: BF16 mixed precision
- Batch Size: 4 (effective batch size 16 with gradient accumulation)
- Learning Rate: 2e-4
Usage
Prerequisites
pip install nemo_toolkit['asr']
# Or for the latest version:
pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[asr]
Basic Transcription
import nemo.collections.asr as nemo_asr
# Load the model from HuggingFace
model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")
# Transcribe a single audio file
transcription = model.transcribe(["path/to/audio.wav"])
print(transcription[0])
Batch Transcription
import nemo.collections.asr as nemo_asr
model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")
# Transcribe multiple files
audio_files = [
"patient_recording_1.wav",
"patient_recording_2.wav",
"patient_recording_3.wav"
]
transcriptions = model.transcribe(audio_files, batch_size=4)
for i, text in enumerate(transcriptions):
print(f"File {i+1}: {text}")
Real-time/Streaming Audio
import nemo.collections.asr as nemo_asr
import soundfile as sf
model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")
# Load audio file
audio, sample_rate = sf.read("medical_dictation.wav")
# Resample to 16kHz if needed
if sample_rate != 16000:
import librosa
audio = librosa.resample(audio, orig_sr=sample_rate, target_sr=16000)
# Transcribe from numpy array
transcription = model.transcribe(audio, batch_size=1)
print(transcription[0])
Advanced Configuration
import nemo.collections.asr as nemo_asr
model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")
# Configure decoding parameters
transcription = model.transcribe(
paths2audio_files=["recording.wav"],
batch_size=1,
return_hypotheses=True, # Get confidence scores
num_workers=4,
channel_selector=0, # For multi-channel audio
augmentor=None
)
# Access detailed results
for hyp in transcription:
print(f"Text: {hyp.text}")
print(f"Confidence: {hyp.score}")
Using with GPU
import nemo.collections.asr as nemo_asr
import torch
# Ensure GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
# Load model (automatically uses GPU if available)
model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")
# Transcribe
transcription = model.transcribe(["audio.wav"])
print(transcription[0])
Transcribing from Microphone
import nemo.collections.asr as nemo_asr
import pyaudio
import wave
import tempfile
def record_audio(duration=5, sample_rate=16000):
"""Record audio from microphone"""
p = pyaudio.PyAudio()
stream = p.open(
format=pyaudio.paInt16,
channels=1,
rate=sample_rate,
input=True,
frames_per_buffer=1024
)
print(f"Recording for {duration} seconds...")
frames = []
for _ in range(0, int(sample_rate / 1024 * duration)):
data = stream.read(1024)
frames.append(data)
print("Recording finished.")
stream.stop_stream()
stream.close()
p.terminate()
# Save to temporary file
temp_file = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
wf = wave.open(temp_file.name, 'wb')
wf.setnchannels(1)
wf.setsampwidth(p.get_sample_size(pyaudio.paInt16))
wf.setframerate(sample_rate)
wf.writeframes(b''.join(frames))
wf.close()
return temp_file.name
# Load model
model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med")
# Record and transcribe
audio_file = record_audio(duration=5)
transcription = model.transcribe([audio_file])
print(f"Transcription: {transcription[0]}")
Expected Input Format
- Sample Rate: 16 kHz (model will work with other rates but 16kHz is optimal)
- Channels: Mono (single channel)
- Format: WAV, FLAC, MP3, or any format supported by soundfile
- Bit Depth: 16-bit PCM recommended
Medical Domain Coverage
The model is trained on comprehensive medical documentation including:
- Patient presentation and admission
- Medical history and examinations
- Vital signs and lab results
- Diagnoses and treatments
- Medications and dosages
- Discharge summaries
- Follow-up recommendations
Limitations
- Optimized for German medical documentation speech
- Trained on synthetic speech data
- May have reduced accuracy on non-medical German content
- Performance may vary with different audio conditions and accents
Intended Use
This model is designed for:
- β German medical documentation transcription
- β Clinical note-taking assistance
- β Medical dictation systems
- β Research in medical ASR
Not recommended for:
- β Critical medical decisions without human review
- β General-purpose German ASR (use base model instead)
- β Languages other than German
License
This model is licensed under CC-BY-4.0, the same license as the base model NVIDIA Parakeet-TDT-0.6B-v3.
You are free to:
- β Use commercially
- β Modify and create derivatives
- β Distribute and share
Under the following terms:
- Attribution β You must give appropriate credit to both NVIDIA (original model) and this fine-tuned version, provide a link to the license, and indicate if changes were made.
Citation
Base model:
@misc{parakeet-tdt-0.6b-v3,
author = {NVIDIA},
title = {Parakeet-TDT-0.6B},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3}
}
Contact
For questions or issues, please open an issue on the repository.
- Downloads last month
- 12
Model tree for johannhartmann/parakeet_de_med
Base model
nvidia/parakeet-tdt-0.6b-v3Evaluation results
- Word Error Rate on German Medical Documentationself-reported3.280