Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

datablations

https://github.com/huggingface/datablations
Activity Feed Request to join this org

AI & ML interests

Scaling Data-Constrained Language Models

Recent Activity

craffel  authored a paper 19 days ago
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
thomwolf  authored a paper 23 days ago
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Muennighoff  authored a paper about 2 months ago
Crosslingual Reasoning through Test-Time Scaling
View all activity

Niklas Muennighoff's profile picture Teven Le Scao's profile picture Nouamane Tazi's profile picture Risto Luukkonen's profile picture Aleksandra Piktus's profile picture Sampo Pyysalo's profile picture Colin Raffel's profile picture Thomas Wolf's profile picture Sasha Rush's profile picture

datablations 's datasets 13

datablations/scripts

Viewer • Updated Jun 15, 2023 • 3.48M • 396

datablations/oscar-subsets

Viewer • Updated Jun 14, 2023 • 365k • 429

datablations/c4-subsets

Viewer • Updated Jun 14, 2023 • 729k • 924 • 3

datablations/c4-filter-megatron

Updated May 28, 2023 • 226

datablations/oscar-filter-megatron

Updated May 27, 2023 • 352

datablations/python-megatron

Updated May 22, 2023 • 1.56k • 1

datablations/subsets

Viewer • Updated May 10, 2023 • 365k • 65

datablations/oscar-filter

Viewer • Updated May 10, 2023 • 432M • 1.56k

datablations/oscar-dedup-expanded

Viewer • Updated May 10, 2023 • 432M • 388

datablations/mup

Updated Apr 24, 2023 • 256

datablations/c4-filter

Viewer • Updated Feb 1, 2023 • 365M • 865

datablations/c4-filter-small

Viewer • Updated Jan 17, 2023 • 100k • 152

datablations/oscar-filter-small

Viewer • Updated Nov 24, 2022 • 100k • 9
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs