Reasoning with Sampling: Your Base Model is Smarter Than You Think Paper • 2510.14901 • Published 29 days ago • 47
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values Paper • 2510.20187 • Published 22 days ago • 18
Accelerating Vision Transformers with Adaptive Patch Sizes Paper • 2510.18091 • Published 25 days ago • 4
Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published 23 days ago • 26
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping Paper • 2510.18927 • Published 24 days ago • 82
The Art of Scaling Reinforcement Learning Compute for LLMs Paper • 2510.13786 • Published 30 days ago • 30
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization Paper • 2510.13554 • Published 30 days ago • 56