Christopher Akiki's picture

Christopher Akiki

christopher

·

https://twitter.com/christopher

AI & ML interests

Representation Learning, Natural Language Generation, Dataset Creation and Curation, Information Retrieval

Recent Activity

published a dataset about 9 hours ago

christopher/hf-orgs

liked a dataset 7 days ago

frimelle/test-gated-dataset

new activity 7 days ago

biglam/archives-parlementaires-revolution-francaise:[bot] Conversion to Parquet

View all activity

Organizations

upvoted an article 19 days ago

Article

The Large Language Model Course

By

•

Jan 16

• 194

upvoted 4 collections 24 days ago

Comma v0.1 Artifacts

A collection of artifacts related to Comma v0.1—a 7B parameter LLM trained on public domain and openly licensed text • 3 items • Updated 20 days ago • 4

Common Pile v0.1 Filtered Data

An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated 20 days ago • 13

Common Pile v0.1 Raw Data

8TB of public domain and openly licensed text • 30 items • Updated 20 days ago • 13

Common Pile v0.1

All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated 20 days ago • 25

upvoted a changelog about 1 month ago

Changelog

Filter by MCP compatibility available in HF Spaces

May 21

• 74

upvoted a collection about 1 month ago

LLM evaluation datasets

33 items • Updated Nov 28, 2024 • 10

upvoted an article about 1 month ago

Article

Let's talk about LLM evaluation

By

•

May 23, 2024

• 177

upvoted a collection 2 months ago

BitNet

🔥BitNet family of large language models (1-bit LLMs). • 7 items • Updated May 1 • 45

upvoted an article 5 months ago

Article

Yay! Organizations can now publish blog Articles

By

and 3 others •

Jan 20

• 46

upvoted a collection 6 months ago

Community Artifacts

Datasets, models, and spaces created by the community • 12 items • Updated 15 days ago • 1

upvoted a collection 7 months ago

NanoBEIR 🍺

A collection of smaller versions of BEIR datasets with 50 queries and up to 10K documents each. • 13 items • Updated Sep 11, 2024 • 17

upvoted 2 collections 8 months ago

Positions Datasets

Datasets where each row is a chess position • 4 items • Updated Jan 9 • 7

Rated Games Dataset

Datasets where each row is a rated chess game • 10 items • Updated 13 days ago • 6

upvoted an article 11 months ago

Article

Train a Llama model from scratch

By

•

Jul 29, 2024

• 51

upvoted a collection 12 months ago

♾️ Automerger

1 item • Updated May 27, 2024 • 1

upvoted 2 articles 12 months ago

Article

EU Training Data Transparency: A Proposal for a Sufficiently Detailed Summary 📑📚🖼️🇪🇺

By

•

Jul 3, 2024

• 9

Article

Image-based search engine

By

•

Jul 4, 2024

• 26

upvoted an article about 1 year ago

Article

2024-04-22 - Hub Incident Post Mortem

By

•

May 17, 2024

• 17