DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 6 days ago • 198
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 Sentence Similarity • 0.1B • Updated Jan 28 • 49.3M • • 1.24k
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning Paper • 2605.02913 • Published Apr 8 • 9
WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training Paper • 2604.14932 • Published Apr 16 • 11
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 630
Memory-Augmented Vision-Language Agents for Persistent and Semantically Consistent Object Captioning Paper • 2603.24257 • Published Mar 30 • 6
FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol Paper • 2603.24943 • Published Mar 26 • 12
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 351