MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning Paper • 2510.14265 • Published Oct 16 • 19
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning Paper • 2510.15110 • Published Oct 16 • 15
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models Paper • 2504.10368 • Published Apr 14 • 21