Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published 8 days ago • 40
ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases Paper • 2510.20270 • Published 15 days ago • 6
ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning Paper • 2510.15211 • Published 21 days ago • 2
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping Paper • 2510.18927 • Published 17 days ago • 82
Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs Paper • 2510.18279 • Published 17 days ago • 4
Prompt-MII: Meta-Learning Instruction Induction for LLMs Paper • 2510.16932 • Published 19 days ago • 6
Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published 18 days ago • 64
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search Paper • 2510.12801 • Published 24 days ago • 13
Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks Paper • 2510.12635 • Published 24 days ago • 15
Base Models Know How to Reason, Thinking Models Learn When Paper • 2510.07364 • Published 30 days ago • 1
Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models Paper • 2510.08492 • Published 29 days ago • 8
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs Paper • 2510.07499 • Published 30 days ago • 48
VISTA: A Test-Time Self-Improving Video Generation Agent Paper • 2510.15831 • Published 21 days ago • 20
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning Paper • 2510.15262 • Published 21 days ago • 5