Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models Paper • 2509.26628 • Published Sep 30 • 14
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR Paper • 2507.15778 • Published Jul 21 • 20