Comma v0.1 Artifacts Collection A collection of artifacts related to Comma v0.1—a 7B parameter LLM trained on public domain and openly licensed text • 3 items • Updated 20 days ago • 4
Common Pile v0.1 Filtered Data Collection An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated 20 days ago • 13
Common Pile v0.1 Raw Data Collection 8TB of public domain and openly licensed text • 30 items • Updated 20 days ago • 13
Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated 20 days ago • 25
BitNet Collection 🔥BitNet family of large language models (1-bit LLMs). • 7 items • Updated May 1 • 45
view article Article Yay! Organizations can now publish blog Articles By huggingface and 3 others • Jan 20 • 46
Community Artifacts Collection Datasets, models, and spaces created by the community • 12 items • Updated 15 days ago • 1
NanoBEIR 🍺 Collection A collection of smaller versions of BEIR datasets with 50 queries and up to 10K documents each. • 13 items • Updated Sep 11, 2024 • 17
Positions Datasets Collection Datasets where each row is a chess position • 4 items • Updated Jan 9 • 7
Rated Games Dataset Collection Datasets where each row is a rated chess game • 10 items • Updated 13 days ago • 6
view article Article EU Training Data Transparency: A Proposal for a Sufficiently Detailed Summary 📑📚🖼️🇪🇺 By yjernite • Jul 3, 2024 • 9