Model Card for Model ID
Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers. Please refer to the paper for details.
Model Details
Model Description
This model is a 2-expert MoE model (total 246M with 140M activate parameters). It achieves better performance compared with similar size dense model and Whisper-small.en model, with half of the activate parameters.
- Developed by: Apple Machine Learning Research
- Model type: ASR
- Language(s): English
- License: apple-amlr
Uses
This model is a speech recognition model.
How to Get Started with the Model
Please refer to the github page for detailed usage.
Training Details
Training Data
The training data is a large-scale conversational audio dataset collected from publicly accessible sources, named SpeechCrawl. Please refer to the paper for details.
Evaluation
Testing Data, Factors & Metrics
Testing Data
This model is evaluated on 10 out-of-distribution (OOD) public evaluation datasets, including AMI-IHM, Chime6, CommonVoice, Fleurs, Callhome, Switchboard, Librispeech, WSJ, Tedlium and Voxpopuli.
Metrics
Word Error Rate (WER).
Results
| Omni-router | Dense | Whisper-small.en | |
|---|---|---|---|
| 246M (140M activate) | 210M | 244M | |
| AMI-IHM | 17.8 | 17.8 | 17.6 |
| Chime6 | 28.2 | 27.9 | 27.7 |
| CommonVoice | 16.4 | 17.6 | 15.2 |
| Fleurs | 8.4 | 8.8 | 7.6 |
| Callhome | 15.0 | 15.2 | 20 |
| Switchboard | 13.5 | 13.8 | 14.9 |
| Librispeech (clean) | 3.3 | 3.7 | 3.1 |
| Librispeech (other) | 7.3 | 8 | 7.4 |
| WSJ (nov92) | 3.7 | 4.1 | 3.5 |
| Tedlium | 4.2 | 4.3 | 4 |
| Voxpopuli | 8.5 | 9 | 8.2 |
| Average | 11.48 | 11.84 | 11.75 |
Citation
If you find this work useful, please cite our paper:
@article{gu2025omnirouter,
title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition},
author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep},
journal={arXiv preprint arXiv:2507.05724},
year={2025}
}
Model Card Contact
Contact zijin@apple.com for any issues.