Best_Model_256_Instruct

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 64
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 100
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
No log	0.2544	29	1.0916
No log	0.5088	58	0.7436
No log	0.7632	87	0.6517
No log	1.0175	116	0.5922
No log	1.2719	145	0.5741
No log	1.5263	174	0.5641
No log	1.7807	203	0.5365
No log	2.0351	232	0.5276
No log	2.2895	261	0.5263
No log	2.5439	290	0.5269
No log	2.7982	319	0.5387
No log	3.0526	348	0.5285