MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments Paper • 2510.01353 • Published Oct 1, 2025 • 4
Zero-To-CAD Collection Datasets (1M & 100K) and model for synthesizing executable CAD programs from an LLM in a CadQuery environment. No real data used. • 3 items • Updated Apr 25 • 19
1930 Coder Collection Fine-tuning the Talkie 13B 1930 model on agentic trajectories • 4 items • Updated 27 days ago • 4
talkie-13b Collection talkie-1930-13b is a vintage language model trained on pre-1931 English-language text. See https://github.com/talkie-lm/talkie to run talkie. • 3 items • Updated Apr 21 • 54
view article Article CircleGuardBench: New Standard for Evaluating AI Moderation Models whitecircle • May 7, 2025 • 60
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published Mar 16 • 154
view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego • Mar 10 • 159
view article Article Custom Kernels for All from Codex and Claude +2 burtenshaw, sayakpaul, ariG23498, evalstate • Feb 13 • 80
LateOn-Code 💻 Collection State-of-the-art late interaction code retrieval models • 6 items • Updated 6 days ago • 20
view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling lightonai • Feb 12 • 56
Robust Speech Recognition via Large-Scale Weak Supervision Paper • 2212.04356 • Published Dec 6, 2022 • 54
view article Article From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output MaziyarPanahi • Feb 7 • 22
view article Article Universal Assisted Generation: Faster Decoding with Any Assistant Model +6 danielkorat, orenpereg, mber, jmamou, joaogante, lewtun, Nadav-Timor, moshew • Oct 29, 2024 • 61