Update README.md
Browse files
README.md
CHANGED
|
@@ -223,7 +223,7 @@ The model is available for use in the NeMo toolkit [3], and can be used as a pre
|
|
| 223 |
|
| 224 |
```python
|
| 225 |
import nemo.collections.asr as nemo_asr
|
| 226 |
-
asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="
|
| 227 |
```
|
| 228 |
|
| 229 |
#### Transcribing using Python
|
|
@@ -258,12 +258,6 @@ for stamp in segment_timestamps:
|
|
| 258 |
* NeMo 2.2
|
| 259 |
|
| 260 |
|
| 261 |
-
**Supported Hardware Microarchitecture Compatibility:**
|
| 262 |
-
* NVIDIA Ampere
|
| 263 |
-
* NVIDIA Blackwell
|
| 264 |
-
* NVIDIA Hopper
|
| 265 |
-
* NVIDIA Volta
|
| 266 |
-
|
| 267 |
**[Preferred/Supported] Operating System(s):**
|
| 268 |
|
| 269 |
- Linux
|
|
@@ -274,7 +268,7 @@ Atleast 2GB RAM for model to load. The bigger the RAM, the larger audio input it
|
|
| 274 |
|
| 275 |
#### Model Version
|
| 276 |
|
| 277 |
-
Current version:
|
| 278 |
|
| 279 |
## <span style="color:#466f00;">Training and Evaluation Datasets:</span>
|
| 280 |
|
|
@@ -291,55 +285,6 @@ Training was conducted using this [example script](https://github.com/NVIDIA/NeM
|
|
| 291 |
|
| 292 |
The tokenizer was constructed from the training set transcripts using this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
| 293 |
|
| 294 |
-
### <span style="color:#466f00;">Training Dataset</span>
|
| 295 |
-
The model was trained on the Granary dataset, consisting of approximately 120,000 hours of English speech data:
|
| 296 |
-
|
| 297 |
-
- 10,000 hours from human-transcribed NeMo ASR Set 3.0, including:
|
| 298 |
-
- LibriSpeech (960 hours)
|
| 299 |
-
- Fisher Corpus
|
| 300 |
-
- National Speech Corpus Part 1
|
| 301 |
-
- VCTK
|
| 302 |
-
- VoxPopuli (English)
|
| 303 |
-
- Europarl-ASR (English)
|
| 304 |
-
- Multilingual LibriSpeech (MLS English) – 2,000-hour subset
|
| 305 |
-
- Mozilla Common Voice (v7.0)
|
| 306 |
-
- AMI
|
| 307 |
-
|
| 308 |
-
- 110,000 hours of pseudo-labeled data from:
|
| 309 |
-
- YTC (YouTube-Commons) dataset[4]
|
| 310 |
-
- YODAS dataset [5]
|
| 311 |
-
- Librilight [7]
|
| 312 |
-
|
| 313 |
-
All transcriptions preserve punctuation and capitalization. The Granary dataset will be made publicly available after presentation at Interspeech 2025.
|
| 314 |
-
|
| 315 |
-
**Data Collection Method by dataset**
|
| 316 |
-
|
| 317 |
-
* Hybrid: Automated, Human
|
| 318 |
-
|
| 319 |
-
**Labeling Method by dataset**
|
| 320 |
-
|
| 321 |
-
* Hybrid: Synthetic, Human
|
| 322 |
-
|
| 323 |
-
**Properties:**
|
| 324 |
-
|
| 325 |
-
* Noise robust data from various sources
|
| 326 |
-
* Single channel, 16kHz sampled data
|
| 327 |
-
|
| 328 |
-
#### Evaluation Dataset
|
| 329 |
-
|
| 330 |
-
Huggingface Open ASR Leaderboard datasets are used to evaluate the performance of this model.
|
| 331 |
-
|
| 332 |
-
**Data Collection Method by dataset**
|
| 333 |
-
* Human
|
| 334 |
-
|
| 335 |
-
**Labeling Method by dataset**
|
| 336 |
-
* Human
|
| 337 |
-
|
| 338 |
-
**Properties:**
|
| 339 |
-
|
| 340 |
-
* All are commonly used for benchmarking English ASR systems.
|
| 341 |
-
* Audio data is typically processed into a 16kHz mono channel format for ASR evaluation, consistent with benchmarks like the [Open ASR Leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard).
|
| 342 |
-
|
| 343 |
## <span style="color:#466f00;">Performance</span>
|
| 344 |
|
| 345 |
#### Huggingface Open-ASR-Leaderboard Performance
|
|
|
|
| 223 |
|
| 224 |
```python
|
| 225 |
import nemo.collections.asr as nemo_asr
|
| 226 |
+
asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="Quantamhash/Quantum_STT_V2.0")
|
| 227 |
```
|
| 228 |
|
| 229 |
#### Transcribing using Python
|
|
|
|
| 258 |
* NeMo 2.2
|
| 259 |
|
| 260 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 261 |
**[Preferred/Supported] Operating System(s):**
|
| 262 |
|
| 263 |
- Linux
|
|
|
|
| 268 |
|
| 269 |
#### Model Version
|
| 270 |
|
| 271 |
+
Current version: Quantum_STT_V2.0. Previous versions can be [accessed](https://huggingface.co/Quantamhash/Quantum_STT) here.
|
| 272 |
|
| 273 |
## <span style="color:#466f00;">Training and Evaluation Datasets:</span>
|
| 274 |
|
|
|
|
| 285 |
|
| 286 |
The tokenizer was constructed from the training set transcripts using this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
| 287 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 288 |
## <span style="color:#466f00;">Performance</span>
|
| 289 |
|
| 290 |
#### Huggingface Open-ASR-Leaderboard Performance
|