ldwang commited on
Commit
7622e34
·
verified ·
1 Parent(s): fc1b1ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -3,6 +3,12 @@
3
  ## Overview
4
  We sampled 100 billion tokens from the CCI4.0 dataset and trained a 1.4B-parameter MoE model with 0.4B active parameters. This model, along with the dataset, is open-sourced as a baseline for future experiments in areas such as dataset construction, algorithmic strategies, and parallel training frameworks. The model arch is same as OpenSeek-Small-v1 model.
5
 
 
 
 
 
 
 
6
  ## Usage Instructions
7
  ```python
8
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
3
  ## Overview
4
  We sampled 100 billion tokens from the CCI4.0 dataset and trained a 1.4B-parameter MoE model with 0.4B active parameters. This model, along with the dataset, is open-sourced as a baseline for future experiments in areas such as dataset construction, algorithmic strategies, and parallel training frameworks. The model arch is same as OpenSeek-Small-v1 model.
5
 
6
+ ## Wandb
7
+ Our training curves have been recorded in Weights & Biases [Aquila-1_4B-A0_4B-Baseline](https://wandb.ai/aquila3/OpenSeek-3B-v0.1/runs/Aquila-1_4B-A0_4B-Baseline-rank-31).
8
+
9
+ ## Evalation
10
+
11
+
12
  ## Usage Instructions
13
  ```python
14
  from transformers import AutoModelForCausalLM, AutoTokenizer