Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -9,9 +9,36 @@ pinned: false
|
|
9 |
|
10 |
# Hugging Face Research
|
11 |
|
12 |
-
The science team at Hugging Face is dedicated to advancing machine learning research in ways that maximize value for the whole community.
|
13 |
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
<!---
|
17 |
TIMELINE UPDATE INSTRUCTIONS:
|
@@ -368,31 +395,6 @@ TIMELINE UPDATE INSTRUCTIONS:
|
|
368 |
Go to https://huggingface.co/lvwerra/science-timeline and follow the guide to update the timeline.
|
369 |
-->
|
370 |
|
371 |
-
|
372 |
-
### π οΈ Tooling & Infrastructure
|
373 |
-
|
374 |
-
The foundation of ML research is tooling and infrastructure and we are working on a range of tools such as [datatrove](www.github.com/huggingface/datatrove), [nanotron](www.github.com/huggingface/nanotron), [TRL](www.github.com/huggingface/trl), [LeRobot](www.github.com/huggingface/lerobot), and [lighteval](www.github.com/huggingface/lighteval).
|
375 |
-
|
376 |
-
### π Datasets
|
377 |
-
|
378 |
-
High quality datasets are the powerhouse of LLMs and require special care and skills to build. We focus on building high-quality datasets such as [no-robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots), [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb), [The Stack](https://huggingface.co/datasets/bigcode/the-stack-v2), and [FineVideo](https://huggingface.co/datasets/HuggingFaceFV/finevideo).
|
379 |
-
|
380 |
-
### π€ Open Models
|
381 |
-
|
382 |
-
The datatsets and training recipes of most state-of-the-art models are not released. We build cutting-edge models and release the full training pipeline as well fostering more innovation and reproducibility, such as [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), [StarCoder2](https://huggingface.co/bigcode/starcoder2-15b), or [SmolLM2](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct).
|
383 |
-
|
384 |
-
### πΈ Collaborations
|
385 |
-
|
386 |
-
Research and collaboration go hand in hand. That's why we like to organize and participate in large open collaborations such as [BigScience](https://bigscience.huggingface.co) and [BigCode](https://www.bigcode-project.org), as well as lots of smaller partnerships such as [Leaderboards on the Hub](https://huggingface.co/blog?tag=leaderboard).
|
387 |
-
|
388 |
-
### βοΈ Infrastructre
|
389 |
-
|
390 |
-
The research team is organized in small teams with typically <4 people and the science cluster consists of 96 x 8xH100 nodes as well as an auto-scalable CPU cluster for dataset processing. In this setup, even a small research team can build and push out impactful artifacts.
|
391 |
-
|
392 |
-
### π Educational material
|
393 |
-
|
394 |
-
Besides writing tech reports of research projects we also like to write more educational content to help newcomers get started to the field or practitioners. We built for example the [alignment handbook](https://github.com/huggingface/alignment-handbook), the [evaluation guidebook](https://github.com/huggingface/evaluation-guidebook), the [pretraining tutorial](https://www.youtube.com/watch?v=2-SPH9hIKT8), or the [FineWeb blog](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1).
|
395 |
-
|
396 |
### π€ Join us!
|
397 |
|
398 |
We are actively hiring for both full-time and internships. Check out [hf.co/jobs](https://hf.co/jobs)
|
|
|
9 |
|
10 |
# Hugging Face Research
|
11 |
|
12 |
+
The science team at Hugging Face is dedicated to advancing machine learning research in ways that maximize value for the whole community.
|
13 |
|
14 |
+
### π οΈ Tooling & Infrastructure
|
15 |
+
|
16 |
+
The foundation of ML research is tooling and infrastructure and we are working on a range of tools such as [datatrove](www.github.com/huggingface/datatrove), [nanotron](www.github.com/huggingface/nanotron), [TRL](www.github.com/huggingface/trl), [LeRobot](www.github.com/huggingface/lerobot), and [lighteval](www.github.com/huggingface/lighteval).
|
17 |
+
|
18 |
+
### π Datasets
|
19 |
+
|
20 |
+
High quality datasets are the powerhouse of LLMs and require special care and skills to build. We focus on building high-quality datasets such as [no-robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots), [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb), [The Stack](https://huggingface.co/datasets/bigcode/the-stack-v2), and [FineVideo](https://huggingface.co/datasets/HuggingFaceFV/finevideo).
|
21 |
+
|
22 |
+
### π€ Open Models
|
23 |
+
|
24 |
+
The datatsets and training recipes of most state-of-the-art models are not released. We build cutting-edge models and release the full training pipeline as well fostering more innovation and reproducibility, such as [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), [StarCoder2](https://huggingface.co/bigcode/starcoder2-15b), or [SmolLM2](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct).
|
25 |
+
|
26 |
+
### πΈ Collaborations
|
27 |
+
|
28 |
+
Research and collaboration go hand in hand. That's why we like to organize and participate in large open collaborations such as [BigScience](https://bigscience.huggingface.co) and [BigCode](https://www.bigcode-project.org), as well as lots of smaller partnerships such as [Leaderboards on the Hub](https://huggingface.co/blog?tag=leaderboard).
|
29 |
+
|
30 |
+
### βοΈ Infrastructre
|
31 |
+
|
32 |
+
The research team is organized in small teams with typically <4 people and the science cluster consists of 96 x 8xH100 nodes as well as an auto-scalable CPU cluster for dataset processing. In this setup, even a small research team can build and push out impactful artifacts.
|
33 |
+
|
34 |
+
### π Educational material
|
35 |
+
|
36 |
+
Besides writing tech reports of research projects we also like to write more educational content to help newcomers get started to the field or practitioners. We built for example the [alignment handbook](https://github.com/huggingface/alignment-handbook), the [evaluation guidebook](https://github.com/huggingface/evaluation-guidebook), the [pretraining tutorial](https://www.youtube.com/watch?v=2-SPH9hIKT8), or the [FineWeb blog](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1).
|
37 |
+
|
38 |
+
|
39 |
+
### Release Timeline
|
40 |
+
|
41 |
+
This is the release timeline so far and follow the links by clicking on the elements:
|
42 |
|
43 |
<!---
|
44 |
TIMELINE UPDATE INSTRUCTIONS:
|
|
|
395 |
Go to https://huggingface.co/lvwerra/science-timeline and follow the guide to update the timeline.
|
396 |
-->
|
397 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
398 |
### π€ Join us!
|
399 |
|
400 |
We are actively hiring for both full-time and internships. Check out [hf.co/jobs](https://hf.co/jobs)
|