Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5 • 128
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs Paper • 2506.07240 • Published Jun 8 • 6
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Paper • 2506.09991 • Published Jun 11 • 56
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Paper • 2506.06941 • Published Jun 7 • 14
s3: You Don't Need That Much Data to Train a Search Agent via RL Paper • 2505.14146 • Published May 20 • 18
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published Jun 13 • 69
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs Paper • 2506.14245 • Published Jun 17 • 40
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published Jun 30 • 46
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published Jul 1 • 72
Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models Paper • 2507.14241 • Published 19 days ago • 16
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization Paper • 2507.15061 • Published 17 days ago • 47
Replacing thinking with tool usage enables reasoning in small language models Paper • 2507.05065 • Published 30 days ago • 14
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Paper • 2507.13158 • Published 20 days ago • 24