Model Card for Model ID

Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers. Please refer to the paper for details.

Model Details

Model Description

This model is a 2-expert MoE model (total 246M with 140M activate parameters). It achieves better performance compared with similar size dense model and Whisper-small.en model, with half of the activate parameters.

Developed by: Apple Machine Learning Research
Model type: ASR
Language(s): English
License: apple-amlr

Uses

This model is a speech recognition model.

How to Get Started with the Model

Please refer to the github page for detailed usage.

Training Details

Training Data

The training data is a large-scale conversational audio dataset collected from publicly accessible sources, named SpeechCrawl. Please refer to the paper for details.

Evaluation

Testing Data, Factors & Metrics

Testing Data

This model is evaluated on 10 out-of-distribution (OOD) public evaluation datasets, including AMI-IHM, Chime6, CommonVoice, Fleurs, Callhome, Switchboard, Librispeech, WSJ, Tedlium and Voxpopuli.

Metrics

Word Error Rate (WER).

Results

	Omni-router	Dense	Whisper-small.en
	246M (140M activate)	210M	244M
AMI-IHM	17.8	17.8	17.6
Chime6	28.2	27.9	27.7
CommonVoice	16.4	17.6	15.2
Fleurs	8.4	8.8	7.6
Callhome	15.0	15.2	20
Switchboard	13.5	13.8	14.9
Librispeech (clean)	3.3	3.7	3.1
Librispeech (other)	7.3	8	7.4
WSJ (nov92)	3.7	4.1	3.5
Tedlium	4.2	4.3	4
Voxpopuli	8.5	9	8.2
Average	11.48	11.84	11.75

Citation

If you find this work useful, please cite our paper:

@article{gu2025omnirouter,
  title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition},
  author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep},
  journal={arXiv preprint arXiv:2507.05724},
  year={2025}
}

Model Card Contact

Contact zijin@apple.com for any issues.

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including lelegu/omni-router-speechcrawl-asr-0.25b-v1

SpeechModels

Collection

A collection of speech models. • 5 items • Updated 20 days ago