KV Cache Steering for Inducing Reasoning in Small Language Models Paper • 2507.08799 • Published Jul 11, 2025 • 40
view article Article I trained a Language Model to schedule events with GRPO! anakin87 • Apr 29, 2025 • 95
view article Article A failed experiment: Infini-Attention, and why we should keep trying? +1 neuralink, lvwerra, thomwolf • Aug 14, 2024 • 76