Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning Paper • 2510.05251 • Published Oct 6 • 7
Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training Paper • 2509.21500 • Published Sep 25 • 18
Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training Paper • 2509.21500 • Published Sep 25 • 18 • 2
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation Paper • 2506.06962 • Published Jun 8 • 28
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Paper • 2503.17352 • Published Mar 21 • 24
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails Paper • 2502.05163 • Published Feb 7 • 23
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails Paper • 2502.05163 • Published Feb 7 • 23