view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • 29 days ago • 611
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 196
view article Article Gotchas in Tokenizer Behavior Every Developer Should Know By qgallouedec • Apr 18 • 40
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention By lwtr and 5 others • Aug 21, 2024 • 39
view article Article 🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It? By Kseniase • Mar 17 • 324
view article Article Introducing EuroBERT: A High-Performance Multilingual Encoder Model By EuroBERT and 3 others • Mar 10 • 146
view article Article Process Reinforcement through Implicit Rewards By ganqu and 1 other • Jan 3 • 29
view article Article SmolLM - blazingly fast and remarkably powerful By loubnabnl and 2 others • Jul 16, 2024 • 403
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others • Jan 28 • 876
view article Article The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about... By srinivasbilla • Jan 20 • 69
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 100
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models Paper • 2410.07985 • Published Oct 10, 2024 • 33