sbapan41 commited on
Commit
25de4f6
·
verified ·
1 Parent(s): af14512

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -57
README.md CHANGED
@@ -223,7 +223,7 @@ The model is available for use in the NeMo toolkit [3], and can be used as a pre
223
 
224
  ```python
225
  import nemo.collections.asr as nemo_asr
226
- asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt-0.6b-v2")
227
  ```
228
 
229
  #### Transcribing using Python
@@ -258,12 +258,6 @@ for stamp in segment_timestamps:
258
  * NeMo 2.2
259
 
260
 
261
- **Supported Hardware Microarchitecture Compatibility:**
262
- * NVIDIA Ampere
263
- * NVIDIA Blackwell
264
- * NVIDIA Hopper
265
- * NVIDIA Volta
266
-
267
  **[Preferred/Supported] Operating System(s):**
268
 
269
  - Linux
@@ -274,7 +268,7 @@ Atleast 2GB RAM for model to load. The bigger the RAM, the larger audio input it
274
 
275
  #### Model Version
276
 
277
- Current version: parakeet-tdt-0.6b-v2. Previous versions can be [accessed](https://huggingface.co/collections/nvidia/parakeet-659711f49d1469e51546e021) here.
278
 
279
  ## <span style="color:#466f00;">Training and Evaluation Datasets:</span>
280
 
@@ -291,55 +285,6 @@ Training was conducted using this [example script](https://github.com/NVIDIA/NeM
291
 
292
  The tokenizer was constructed from the training set transcripts using this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
293
 
294
- ### <span style="color:#466f00;">Training Dataset</span>
295
- The model was trained on the Granary dataset, consisting of approximately 120,000 hours of English speech data:
296
-
297
- - 10,000 hours from human-transcribed NeMo ASR Set 3.0, including:
298
- - LibriSpeech (960 hours)
299
- - Fisher Corpus
300
- - National Speech Corpus Part 1
301
- - VCTK
302
- - VoxPopuli (English)
303
- - Europarl-ASR (English)
304
- - Multilingual LibriSpeech (MLS English) – 2,000-hour subset
305
- - Mozilla Common Voice (v7.0)
306
- - AMI
307
-
308
- - 110,000 hours of pseudo-labeled data from:
309
- - YTC (YouTube-Commons) dataset[4]
310
- - YODAS dataset [5]
311
- - Librilight [7]
312
-
313
- All transcriptions preserve punctuation and capitalization. The Granary dataset will be made publicly available after presentation at Interspeech 2025.
314
-
315
- **Data Collection Method by dataset**
316
-
317
- * Hybrid: Automated, Human
318
-
319
- **Labeling Method by dataset**
320
-
321
- * Hybrid: Synthetic, Human
322
-
323
- **Properties:**
324
-
325
- * Noise robust data from various sources
326
- * Single channel, 16kHz sampled data
327
-
328
- #### Evaluation Dataset
329
-
330
- Huggingface Open ASR Leaderboard datasets are used to evaluate the performance of this model.
331
-
332
- **Data Collection Method by dataset**
333
- * Human
334
-
335
- **Labeling Method by dataset**
336
- * Human
337
-
338
- **Properties:**
339
-
340
- * All are commonly used for benchmarking English ASR systems.
341
- * Audio data is typically processed into a 16kHz mono channel format for ASR evaluation, consistent with benchmarks like the [Open ASR Leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard).
342
-
343
  ## <span style="color:#466f00;">Performance</span>
344
 
345
  #### Huggingface Open-ASR-Leaderboard Performance
 
223
 
224
  ```python
225
  import nemo.collections.asr as nemo_asr
226
+ asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="Quantamhash/Quantum_STT_V2.0")
227
  ```
228
 
229
  #### Transcribing using Python
 
258
  * NeMo 2.2
259
 
260
 
 
 
 
 
 
 
261
  **[Preferred/Supported] Operating System(s):**
262
 
263
  - Linux
 
268
 
269
  #### Model Version
270
 
271
+ Current version: Quantum_STT_V2.0. Previous versions can be [accessed](https://huggingface.co/Quantamhash/Quantum_STT) here.
272
 
273
  ## <span style="color:#466f00;">Training and Evaluation Datasets:</span>
274
 
 
285
 
286
  The tokenizer was constructed from the training set transcripts using this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
287
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
288
  ## <span style="color:#466f00;">Performance</span>
289
 
290
  #### Huggingface Open-ASR-Leaderboard Performance