view article Article We Got Claude to Fine-Tune an Open Source LLM burtenshaw, evalstate • Dec 4, 2025 • 627
view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 396
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face +3 abidlabs, znation, nouamanetazi, sasha, qgallouedec • Jul 29, 2025 • 223
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 777
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles Paper • 2505.19914 • Published May 26, 2025 • 46
Sanskrit Collection collection of all Sanskrit text, currently at 115K samples • 8 items • Updated May 24, 2025 • 11
view article Article Finally, a Replacement for BERT: Introducing ModernBERT +13 bwarner, NohTow, bclavie, orionweller, ohallstrom, staghado, alexisgallagher, rbiswasfc, fladhak, tomaarsen, ncoop57, griffin, jph00, johnowhitaker, iacolippo • Dec 19, 2024 • 741
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26, 2025 • 60
view article Article Gotchas in Tokenizer Behavior Every Developer Should Know qgallouedec • Apr 18, 2025 • 72
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published Mar 14, 2025 • 160
Tessa-T1 REACT REASONING MODEL Collection Tessa-T1 is a model that generates Stateful React with tailwind styling. It has features of other libraries as well. It is based on Qwen2.5-Coder. • 5 items • Updated Mar 24, 2025 • 9
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 manu • Jul 5, 2024 • 318
SLM Judge Models Collection Base model(s) merged with the specific evaluation task adapter. Each model performs excellently for its purpose and remains useful for general tasks. • 6 items • Updated Feb 18, 2025 • 1