train_codealpacapy_123_1762546585

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the codealpacapy dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4692
  • Num Input Tokens Seen: 24941912

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.5856 1.0 1908 0.4838 1248304
0.4289 2.0 3816 0.4704 2497016
0.4096 3.0 5724 0.4692 3742552
0.3453 4.0 7632 0.4742 4985200
0.406 5.0 9540 0.4919 6233920
0.3266 6.0 11448 0.5208 7478504
0.2651 7.0 13356 0.5637 8722744
0.4997 8.0 15264 0.6324 9977520
0.1853 9.0 17172 0.7039 11225416
0.2099 10.0 19080 0.8089 12472912
0.1702 11.0 20988 0.8917 13721824
0.0829 12.0 22896 1.0327 14970528
0.0673 13.0 24804 1.1541 16220808
0.0278 14.0 26712 1.2787 17464792
0.0165 15.0 28620 1.3870 18706976
0.0066 16.0 30528 1.4610 19956544
0.0083 17.0 32436 1.5147 21204416
0.0047 18.0 34344 1.5930 22451928
0.0037 19.0 36252 1.6185 23696296
0.0408 20.0 38160 1.6300 24941912

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
91
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_codealpacapy_123_1762546585

Adapter
(2038)
this model