Running on CPU Upgrade 1.92k 1.92k The Smol Training Playbook: The Secrets to Building World-Class LLMs 📝 Explore loss curves for training LLMs
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning Paper • 2510.15444 • Published 25 days ago • 145
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers Paper • 2508.20453 • Published Aug 28 • 63
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 178