MegaMath
Collection
MegaMath, the largest open math pre-training dataset curated from diverse, math-focused sources, with over 300B tokens.
•
4 items
•
Updated
A proof-of-concept model train on MegaMath dataset, capable of both Chain-of-Thought and Program-Aided-Language problem solving.
If you find our work useful, please cite
@article{zhou2025megamath,
title = {MegaMath: Pushing the Limits of Open Math Corpora},
author = {Zhou, Fan and Wang, Zengzhi and Ranjan, Nikhil and Cheng, Zhoujun and Tang, Liping and He, Guowei and Liu, Zhengzhong and Xing, Eric P.},
journal = {arXiv preprint arXiv:2504.02807},
year = {2025},
note = {Preprint}
}