Quantumhash
/

Quantum_STT_V2.0

Automatic Speech Recognition

hf-asr-leaderboard

Model card Files Files and versions

sbapan41 commited on May 19

Commit

af14512

·

verified ·

1 Parent(s): 78b8b77

Update README.md

Files changed (1) hide show

README.md +4 -5

README.md CHANGED Viewed

@@ -22,7 +22,7 @@ widget:
 - example_title: Librispeech sample 2
   src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
 model-index:
-- name: parakeet-tdt-0.6b-v2
   results:
   - task:
       name: Automatic Speech Recognition
@@ -155,16 +155,15 @@ img {
 ## <span style="color:#466f00;">Description:</span>
-`parakeet-tdt-0.6b-v2` is a 600-million-parameter automatic speech recognition (ASR) model designed for high-quality English transcription, featuring support for punctuation, capitalization, and accurate timestamp prediction. Try Demo here: https://huggingface.co/spaces/nvidia/parakeet-tdt-0.6b-v2
-This XL variant of the FastConformer [1] architecture integrates the TDT [2] decoder and is trained with full attention, enabling efficient transcription of audio segments up to 24 minutes in a single pass. The model achieves an RTFx of 3380 on the HF-Open-ASR leaderboard with a batch size of 128. Note: *RTFx Performance may vary depending on dataset audio duration and batch size.*
 **Key Features**
 - Accurate word-level timestamp predictions
 - Automatic punctuation and capitalization
 - Robust performance on spoken numbers, and song lyrics transcription
-For more information, refer to the [Model Architecture](#model-architecture) section and the [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer).
 This model is ready for commercial/non-commercial use.
@@ -185,7 +184,7 @@ This model serves developers, researchers, academics, and industries building ap
 ### <span style="color:#466f00;">Release Date:</span>
-05/01/2025
 ### <span style="color:#466f00;">Model Architecture:</span>

 - example_title: Librispeech sample 2
   src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
 model-index:
+- name: Quantum_STT_V2.0
   results:
   - task:
       name: Automatic Speech Recognition
 ## <span style="color:#466f00;">Description:</span>
+`Quantum_STT_V2.0` is a 600-million-parameter automatic speech recognition (ASR) model designed for high-quality English transcription, featuring support for punctuation, capitalization, and accurate timestamp prediction. Try Demo here: https://huggingface.co/spaces/Quantamhash/Quantum_STT_V2.0
+This XL variant of the FastConformer [1] architecture integrates the TDT [2] decoder and is trained with full attention, enabling efficient transcription of audio segments up to 24 minutes in a single pass.
 **Key Features**
 - Accurate word-level timestamp predictions
 - Automatic punctuation and capitalization
 - Robust performance on spoken numbers, and song lyrics transcription
 This model is ready for commercial/non-commercial use.
 ### <span style="color:#466f00;">Release Date:</span>
+14/05/2025
 ### <span style="color:#466f00;">Model Architecture:</span>