MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published Feb 14 • 35
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models Paper • 2502.08130 • Published Feb 12 • 9
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 165
ReLearn: Unlearning via Learning for Large Language Models Paper • 2502.11190 • Published Feb 16 • 30
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training Paper • 2502.11196 • Published Feb 16 • 23
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs Paper • 2502.10454 • Published Feb 12 • 7
Dyve: Thinking Fast and Slow for Dynamic Process Verification Paper • 2502.11157 • Published Feb 16 • 7
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking Paper • 2502.09083 • Published Feb 13 • 4
Rethinking Diverse Human Preference Learning through Principal Component Analysis Paper • 2502.13131 • Published Feb 18 • 38
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models Paper • 2502.12464 • Published Feb 18 • 28
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? Paper • 2502.12215 • Published Feb 17 • 16
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading Paper • 2502.12574 • Published Feb 18 • 12
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 Paper • 2502.12659 • Published Feb 18 • 7
Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey Paper • 2502.10708 • Published Feb 15 • 4
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20 • 46
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization Paper • 2502.13922 • Published Feb 19 • 28
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 193
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models Paper • 2502.14802 • Published Feb 20 • 13
RLPR: Extrapolating RLVR to General Domains without Verifiers Paper • 2506.18254 • Published Jun 23 • 32
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models Paper • 2506.18369 • Published Jun 23 • 2
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Paper • 2506.18841 • Published Jun 23 • 56
Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset Paper • 2506.18851 • Published Jun 23 • 29
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Paper • 2506.18896 • Published Jun 23 • 28
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning Paper • 2506.19767 • Published Jun 24 • 13
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25 • 46
ReCode: Updating Code API Knowledge with Reinforcement Learning Paper • 2506.20495 • Published Jun 25 • 8
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge Paper • 2506.21506 • Published Jun 26 • 49
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published 12 days ago • 28
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning Paper • 2507.19457 • Published 12 days ago • 20
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence Paper • 2507.21046 • Published 9 days ago • 73
Goal Alignment in LLM-Based User Simulators for Conversational AI Paper • 2507.20152 • Published 10 days ago • 4
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty Paper • 2507.16806 • Published 15 days ago • 6
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge Paper • 2507.21183 • Published 10 days ago • 12
Persona Vectors: Monitoring and Controlling Character Traits in Language Models Paper • 2507.21509 • Published 8 days ago • 23
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning Paper • 2507.22607 • Published 7 days ago • 40
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE Paper • 2507.21802 • Published 8 days ago • 10