Update README.md
Browse files
README.md
CHANGED
@@ -3,6 +3,12 @@
|
|
3 |
## Overview
|
4 |
We sampled 100 billion tokens from the CCI4.0 dataset and trained a 1.4B-parameter MoE model with 0.4B active parameters. This model, along with the dataset, is open-sourced as a baseline for future experiments in areas such as dataset construction, algorithmic strategies, and parallel training frameworks. The model arch is same as OpenSeek-Small-v1 model.
|
5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
## Usage Instructions
|
7 |
```python
|
8 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
3 |
## Overview
|
4 |
We sampled 100 billion tokens from the CCI4.0 dataset and trained a 1.4B-parameter MoE model with 0.4B active parameters. This model, along with the dataset, is open-sourced as a baseline for future experiments in areas such as dataset construction, algorithmic strategies, and parallel training frameworks. The model arch is same as OpenSeek-Small-v1 model.
|
5 |
|
6 |
+
## Wandb
|
7 |
+
Our training curves have been recorded in Weights & Biases [Aquila-1_4B-A0_4B-Baseline](https://wandb.ai/aquila3/OpenSeek-3B-v0.1/runs/Aquila-1_4B-A0_4B-Baseline-rank-31).
|
8 |
+
|
9 |
+
## Evalation
|
10 |
+
|
11 |
+
|
12 |
## Usage Instructions
|
13 |
```python
|
14 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|