Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
minpeter 's Collections
[Dataset] K-Corpus
[Dataset] FineWeb2 Edu Korean
[Model] Very, very small things
[Dataset] Pretrain-corpus
[Model] en-ko trans
[Dataset] Candidate datasets to translate
[Dataset] common-pile korean (Filtered-raw)
[Dataset] PR
[Study] NN MNIST
[Model] FLUX.1 Full Finetuned & Merged
[🛠️] Huggingface Utility
[Dataset] unified standard function calling
[tokenizer] AlternateTokenizer
[Dataset] Function Calling

[Dataset] Pretrain-corpus

updated 15 days ago
Upvote
-

  • PleIAs/common_corpus

    Viewer • Updated Jun 10 • 470M • 21.1k • 304

  • EssentialAI/essential-web-v1.0

    Preview • Updated Jun 22 • 68.8k • 194

  • HuggingFaceFW/fineweb

    Viewer • Updated 25 days ago • 52.5B • 566k • 2.28k

  • HuggingFaceFW/fineweb-edu

    Viewer • Updated 25 days ago • 3.5B • 93.8k • 724

  • HuggingFaceFW/fineweb-2

    Viewer • Updated Jun 27 • 5.02B • 466k • 607

  • data-is-better-together/fineweb-c

    Viewer • Updated 29 days ago • 88.7k • 834 • 54

  • allenai/dolmino-mix-1124

    Viewer • Updated Dec 17, 2024 • 165M • 26.1k • 69

  • allenai/dolma

    Updated Apr 17, 2024 • 713 • 924

  • allenai/olmo-mix-1124

    Viewer • Updated 22 days ago • 620M • 41.9k • 69

  • mlfoundations/dclm-baseline-1.0

    Preview • Updated Jul 22, 2024 • 96.3k • 226

  • Zyphra/Zyda-2

    Viewer • Updated Dec 12, 2024 • 1.62B • 34.1k • 83
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs