π ReVisual-R1 (7B) β Open-Source Multimodal Reasoner
One cold-start, two RL stages, endless reasoning power.
π Highlights
SOTA on 9 tough benchmarks covering visualβmath + text reasoning.
Three-Stage SRO Training
- Text Cold-Start β seed deep reflection
- Multimodal RL β align vision & logic
- Text RL β polish fluency & brevity
PAD (Prioritized Advantage Distillation) keeps gradients alive.
Efficient-Length Reward = concise, self-reflective CoT.
π Resources
π Citation
@article{chen2025advancing,
title={Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning},
author={Chen, Shuang and Guo, Yue and Su, Zhaochen and Li, Yafu and Wu, Yulun and Chen, Jiacheng and Chen, Jiayu and Wang, Weijie and Qu, Xiaoye and Cheng, Yu},
journal={arXiv preprint arXiv:2506.04207},
year={2025}
}
Take ReVisual-R1 for a spin and let us know what you build! π―