Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards Paper • 2507.14783 • Published Jul 20 • 3
Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence Paper • 2505.20325 • Published May 23 • 45