sbapan41 commited on
Commit
af14512
·
verified ·
1 Parent(s): 78b8b77

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -22,7 +22,7 @@ widget:
22
  - example_title: Librispeech sample 2
23
  src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
24
  model-index:
25
- - name: parakeet-tdt-0.6b-v2
26
  results:
27
  - task:
28
  name: Automatic Speech Recognition
@@ -155,16 +155,15 @@ img {
155
 
156
  ## <span style="color:#466f00;">Description:</span>
157
 
158
- `parakeet-tdt-0.6b-v2` is a 600-million-parameter automatic speech recognition (ASR) model designed for high-quality English transcription, featuring support for punctuation, capitalization, and accurate timestamp prediction. Try Demo here: https://huggingface.co/spaces/nvidia/parakeet-tdt-0.6b-v2
159
 
160
- This XL variant of the FastConformer [1] architecture integrates the TDT [2] decoder and is trained with full attention, enabling efficient transcription of audio segments up to 24 minutes in a single pass. The model achieves an RTFx of 3380 on the HF-Open-ASR leaderboard with a batch size of 128. Note: *RTFx Performance may vary depending on dataset audio duration and batch size.*
161
 
162
  **Key Features**
163
  - Accurate word-level timestamp predictions
164
  - Automatic punctuation and capitalization
165
  - Robust performance on spoken numbers, and song lyrics transcription
166
 
167
- For more information, refer to the [Model Architecture](#model-architecture) section and the [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer).
168
 
169
  This model is ready for commercial/non-commercial use.
170
 
@@ -185,7 +184,7 @@ This model serves developers, researchers, academics, and industries building ap
185
 
186
  ### <span style="color:#466f00;">Release Date:</span>
187
 
188
- 05/01/2025
189
 
190
  ### <span style="color:#466f00;">Model Architecture:</span>
191
 
 
22
  - example_title: Librispeech sample 2
23
  src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
24
  model-index:
25
+ - name: Quantum_STT_V2.0
26
  results:
27
  - task:
28
  name: Automatic Speech Recognition
 
155
 
156
  ## <span style="color:#466f00;">Description:</span>
157
 
158
+ `Quantum_STT_V2.0` is a 600-million-parameter automatic speech recognition (ASR) model designed for high-quality English transcription, featuring support for punctuation, capitalization, and accurate timestamp prediction. Try Demo here: https://huggingface.co/spaces/Quantamhash/Quantum_STT_V2.0
159
 
160
+ This XL variant of the FastConformer [1] architecture integrates the TDT [2] decoder and is trained with full attention, enabling efficient transcription of audio segments up to 24 minutes in a single pass.
161
 
162
  **Key Features**
163
  - Accurate word-level timestamp predictions
164
  - Automatic punctuation and capitalization
165
  - Robust performance on spoken numbers, and song lyrics transcription
166
 
 
167
 
168
  This model is ready for commercial/non-commercial use.
169
 
 
184
 
185
  ### <span style="color:#466f00;">Release Date:</span>
186
 
187
+ 14/05/2025
188
 
189
  ### <span style="color:#466f00;">Model Architecture:</span>
190