PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold Paper • 2510.15862 • Published 22 days ago • 9
UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action Paper • 2510.17790 • Published 19 days ago • 5
Context Engineering 2.0: The Context of Context Engineering Paper • 2510.26493 • Published 9 days ago • 6
QueST: Incentivizing LLMs to Generate Difficult Problems Paper • 2510.17715 • Published 19 days ago • 32
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published 10 days ago • 40
ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases Paper • 2510.20270 • Published 17 days ago • 6
ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning Paper • 2510.15211 • Published 23 days ago • 2
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping Paper • 2510.18927 • Published 18 days ago • 82
Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs Paper • 2510.18279 • Published 19 days ago • 4
Prompt-MII: Meta-Learning Instruction Induction for LLMs Paper • 2510.16932 • Published 20 days ago • 6
Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published 19 days ago • 64
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search Paper • 2510.12801 • Published 25 days ago • 13
Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks Paper • 2510.12635 • Published 25 days ago • 15
Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models Paper • 2510.08492 • Published about 1 month ago • 8