BAAI
/

OpenSeek-Small-v1-Baseline

Model card Files Files and versions Community

ldwang commited on May 26

Commit

7622e34

·

verified ·

1 Parent(s): fc1b1ca

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -3,6 +3,12 @@
 ## Overview
 We sampled 100 billion tokens from the CCI4.0 dataset and trained a 1.4B-parameter MoE model with 0.4B active parameters. This model, along with the dataset, is open-sourced as a baseline for future experiments in areas such as dataset construction, algorithmic strategies, and parallel training frameworks. The model arch is same as OpenSeek-Small-v1 model.
 ## Usage Instructions
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer

 ## Overview
 We sampled 100 billion tokens from the CCI4.0 dataset and trained a 1.4B-parameter MoE model with 0.4B active parameters. This model, along with the dataset, is open-sourced as a baseline for future experiments in areas such as dataset construction, algorithmic strategies, and parallel training frameworks. The model arch is same as OpenSeek-Small-v1 model.
+## Wandb
+Our training curves have been recorded in Weights & Biases [Aquila-1_4B-A0_4B-Baseline](https://wandb.ai/aquila3/OpenSeek-3B-v0.1/runs/Aquila-1_4B-A0_4B-Baseline-rank-31).
+## Evalation
 ## Usage Instructions
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer