StepLaw
/

StepLaw-N_268M-D_3.0B-LR3.906e-03-BS32768

@@ -23,17 +23,17 @@ This model is part of the [StepLaw-N_268M-D_3.0B](https://huggingface.co/collect
 - **Feed-forward network size (FFN)**: 9552
 - **Attention heads**: 16
 - **Layers**: 8
-- **Parameter count**: 268MM
 ### Training Parameters
 - **Learning rate (lr)**: 3.906e-03
-- **Batch size (bs)**: 16
 - **Training iterations**: 122070
 - **Training tokens (D)**: 4.0B
 ## Model Description
-StepLaw models are trained with various hyperparameter settings to enable research on scaling laws and hyperparameter optimization. This specific model was trained with learning rate 3.906e-03 and batch size 16 for 122070 iterations, using a total of 4.0B training tokens.
 ## Usage Example
@@ -48,7 +48,4 @@ model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
 inputs = tokenizer("A long time ago in a galaxy far, far away", return_tensors="pt")
 outputs = model.generate(**inputs, max_length=100)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-```## Part of StepLaw Project
-StepLaw is an initiative to provide thousands of models for optimal hyperparameter research.
-Visit [StepLaw Project](https://step-law.github.io/) for more information.

 - **Feed-forward network size (FFN)**: 9552
 - **Attention heads**: 16
 - **Layers**: 8
+- **Parameter count**: 268M
 ### Training Parameters
 - **Learning rate (lr)**: 3.906e-03
+- **Batch size (bs)**: 32768
 - **Training iterations**: 122070
 - **Training tokens (D)**: 4.0B
 ## Model Description
+StepLaw models are trained with various hyperparameter settings to enable research on scaling laws and hyperparameter optimization. This specific model was trained with learning rate 3.906e-03 and batch size 32768 for 122070 iterations, using a total of 4.0B training tokens.
 ## Usage Example
 inputs = tokenizer("A long time ago in a galaxy far, far away", return_tensors="pt")
 outputs = model.generate(**inputs, max_length=100)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```