Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning Paper • 2509.22601 • Published Sep 26 • 29
From Uniform to Heterogeneous: Tailoring Policy Optimization to Every Token's Nature Paper • 2509.16591 • Published Sep 20 • 2
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2 • 220
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7 • 137
OpenMathReasoning Collection Models and datasets from "AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset" • 7 items • Updated about 19 hours ago • 44