DJ-AI ASR Grammar Corrector (T5-Small)
A lightweight grammar correction model fine-tuned from t5-small
, specifically designed to correct common errors in automatic speech recognition (ASR) outputs β including homophones, verb tense issues, contractions, duplicated words, and more. Optimized for fast inference in (near) real-time ASR pipelines.
Model Details
- Base model:
t5-small
- Fine-tuned on: 90 million synthetic (noisy β clean) sentence pairs
- Training objective: Correct ASR-style transcription errors into clean, grammatical English
- Token count: ~60 million tokens per epoch
- Framework: Hugging Face Transformers + PyTorch
Benchmark Results
Model | Type | Precision | Latency (s/sample) | VRAM (MB) | BLEU | ROUGE-L | Accuracy (%)ΒΉ | Token Accuracy (%)Β² | Size (MB) |
---|---|---|---|---|---|---|---|---|---|
dj-ai-asr-grammar-corrector-t5-base | HF | fp32 | 0.1151 | 24.98 | 78.92 | 90.31 | 44.62 | 90.39 | 5956.76 |
dj-ai-asr-grammar-corrector-t5-small | HF | fp32 | 0.0648 | 6.27 | 76.47 | 89.54 | 39.59 | 88.76 | 1620.15 |
dj-ai-asr-grammar-corrector-t5-small-streaming | HF | fp32 | 0.0634 | 14.77 | 76.25 | 89.61 | 39.9 | 88.54 | 1620.65 |
- Accuracy is a measure of how well the model performs across the full sentence. That is, a prediction is only counted as "correct" if the entire corrected sentence exactly matches the reference sentence. So if the model corrects 1 out of 2 errors, but the final output does not exactly match the expected sentence, it's counted as a fail.
- Token Accuracy is a measure of how well the model performs at the token level.
Intended Use
Use Case | β Supported | π« Not Recommended |
---|---|---|
Post-ASR correction | β Yes | |
Real-time ASR pipelines | β Yes | |
Batch transcript cleanup | β Yes | |
Grammar education tools | β Yes | |
Formal document editing | π« | Model may be too informal |
Multilingual input | π« | English-only fine-tuning |
Corrects Common ASR Errors:
- Homophone mistakes (
their
βthey're
) - Subject-verb disagreement (
he go
βhe goes
) - Verb tense corruption (
i seen
βi saw
) - Missing auxiliaries (
you going
βare you going
) - Contraction normalization (
she is not
βshe isn't
) - Repeated words (
i i want
βi want
) - Misused articles/prepositions/pronouns
Example
DEMO: https://huggingface.co/spaces/dayyanj/dj-ai-asr-grammar-corrector-demo
Input (noisy ASR):
Git Repository: https://github.com/dayyanj/DJ-AI-ASR-GRAMMAR-CORRECTOR
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for dayyanj/dj-ai-asr-grammar-corrector-small
Base model
google-t5/t5-small