Quantumhash
/

Quantum_STT_V2.0

@@ -223,7 +223,7 @@ The model is available for use in the NeMo toolkit [3], and can be used as a pre
 ```python
 import nemo.collections.asr as nemo_asr
-asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt-0.6b-v2")
 ```
 #### Transcribing using Python
@@ -258,12 +258,6 @@ for stamp in segment_timestamps:
 * NeMo 2.2
-**Supported Hardware Microarchitecture Compatibility:**
-* NVIDIA Ampere
-* NVIDIA Blackwell
-* NVIDIA Hopper
-* NVIDIA Volta
 **[Preferred/Supported] Operating System(s):**
 - Linux
@@ -274,7 +268,7 @@ Atleast 2GB RAM for model to load. The bigger the RAM, the larger audio input it
 #### Model Version
-Current version: parakeet-tdt-0.6b-v2. Previous versions can be [accessed](https://huggingface.co/collections/nvidia/parakeet-659711f49d1469e51546e021) here.
 ## <span style="color:#466f00;">Training and Evaluation Datasets:</span>
@@ -291,55 +285,6 @@ Training was conducted using this [example script](https://github.com/NVIDIA/NeM
 The tokenizer was constructed from the training set transcripts using this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
-### <span style="color:#466f00;">Training Dataset</span>
-The model was trained on the Granary dataset, consisting of approximately 120,000 hours of English speech data:
-- 10,000 hours from human-transcribed NeMo ASR Set 3.0, including:
-  - LibriSpeech (960 hours)
-  - Fisher Corpus
-  - National Speech Corpus Part 1
-  - VCTK
-  - VoxPopuli (English)
-  - Europarl-ASR (English)
-  - Multilingual LibriSpeech (MLS English) – 2,000-hour subset
-  - Mozilla Common Voice (v7.0)
-  - AMI
-- 110,000 hours of pseudo-labeled data from:
-  - YTC (YouTube-Commons) dataset[4]
-  - YODAS dataset [5]
-  - Librilight [7]
-All transcriptions preserve punctuation and capitalization. The Granary dataset will be made publicly available after presentation at Interspeech 2025.
-**Data Collection Method by dataset**
-* Hybrid: Automated, Human
-**Labeling Method by dataset**
-* Hybrid: Synthetic, Human
-**Properties:**
-* Noise robust data from various sources
-* Single channel, 16kHz sampled data
-#### Evaluation Dataset
-Huggingface Open ASR Leaderboard datasets are used to evaluate the performance of this model.
-**Data Collection Method by dataset**
-* Human
-**Labeling Method by dataset**
-* Human
-**Properties:**
-* All are commonly used for benchmarking English ASR systems.
-* Audio data is typically processed into a 16kHz mono channel format for ASR evaluation, consistent with benchmarks like the [Open ASR Leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard).
 ## <span style="color:#466f00;">Performance</span>
 #### Huggingface Open-ASR-Leaderboard Performance

 ```python
 import nemo.collections.asr as nemo_asr
+asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="Quantamhash/Quantum_STT_V2.0")
 ```
 #### Transcribing using Python
 * NeMo 2.2
 **[Preferred/Supported] Operating System(s):**
 - Linux
 #### Model Version
+Current version: Quantum_STT_V2.0. Previous versions can be [accessed](https://huggingface.co/Quantamhash/Quantum_STT) here.
 ## <span style="color:#466f00;">Training and Evaluation Datasets:</span>
 The tokenizer was constructed from the training set transcripts using this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
 ## <span style="color:#466f00;">Performance</span>
 #### Huggingface Open-ASR-Leaderboard Performance