T5La-Large-WeightedLoss

This model is a fine-tuned version of on the HuggingFaceFW/fineweb sample-350BT dataset. It achieves the following results on the evaluation set:

  • Perplexity: 59.9708
  • Loss: 4.0939
  • Accuracy: 0.0396
  • Lookahead Perplexity: 744.4827
  • Lookahead Loss: 6.6127

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • training_steps: 524288

Training results

Training Loss Epoch Step Accuracy Lookahead Loss Lookahead Perplexity Validation Loss Perplexity
4.6301 0.0095 5000 0.0312 7.2366 1389.3693 4.5864 98.1387
4.4871 0.0191 10000 0.0318 7.1133 1228.2065 4.4621 86.6700
4.4417 0.0286 15000 0.0340 7.0459 1148.1578 4.3906 80.6868
4.3978 0.0381 20000 0.0352 6.9924 1088.3650 4.3427 76.9114
4.3419 0.0477 25000 0.0349 6.9566 1050.0462 4.3110 74.5165
4.316 0.0572 30000 0.0361 6.9482 1041.2903 4.3202 75.2024
4.3488 0.0668 35000 0.0346 6.9130 1005.3051 4.2879 72.8156
4.3213 0.0763 40000 0.0347 6.8934 985.7009 4.2729 71.7300
4.2945 0.0858 45000 0.0337 6.8687 961.7435 4.2490 70.0335
4.3156 0.0954 50000 0.0363 6.8545 948.1267 4.2447 69.7315
4.3335 0.1049 55000 0.0359 6.8432 937.4398 4.2384 69.2975
4.2959 0.1144 60000 0.0374 6.8479 941.9400 4.2551 70.4671
4.2732 0.1240 65000 0.0372 6.8335 928.3968 4.2418 69.5327
4.2924 0.1335 70000 0.0384 6.8230 918.7655 4.2447 69.7361
4.2923 0.1431 75000 0.0364 6.8392 933.7382 4.2519 70.2380
4.2252 0.1526 80000 0.0371 6.8151 911.5434 4.2382 69.2813
4.3282 0.1621 85000 0.0372 6.8016 899.2575 4.2262 68.4571
4.2749 0.1717 90000 0.0379 6.7829 882.5851 4.2115 67.4584
4.2947 0.1812 95000 0.0375 6.7889 887.9544 4.2235 68.2707
4.2664 0.1907 100000 0.0377 6.7943 892.7447 4.2326 68.8933
4.295 0.2003 105000 0.0370 6.7986 896.5748 4.2416 69.5213
4.2937 0.2098 110000 0.0380 6.7741 874.8497 4.2202 68.0471
4.2807 0.2193 115000 0.0391 6.7792 879.3644 4.2272 68.5222
4.3136 0.2289 120000 0.0384 6.7866 885.9135 4.2357 69.1117
4.3081 0.2384 125000 0.0381 6.7903 889.1959 4.2430 69.6157
4.3328 0.2480 130000 0.0396 6.7835 883.1839 4.2406 69.4502
4.3221 0.2575 135000 0.0400 6.7754 876.0138 4.2366 69.1692
4.3163 0.2670 140000 0.0389 6.7802 880.2415 4.2404 69.4384
4.2954 0.2766 145000 0.0381 6.7683 869.8558 4.2300 68.7163
4.3153 0.2861 150000 0.0398 6.7570 860.0185 4.2200 68.0362
4.2762 0.2956 155000 0.0391 6.7518 855.6387 4.2128 67.5476
4.2877 0.3052 160000 0.0382 6.7626 864.9080 4.2276 68.5495
4.3023 0.3147 165000 0.0395 6.7556 858.8960 4.2232 68.2494
4.337 0.3242 170000 0.0396 6.7766 877.0693 4.2479 69.9564
4.2891 0.3338 175000 0.0383 6.7645 866.5396 4.2347 69.0438
4.3681 0.3433 180000 0.0395 6.7677 869.3163 4.2391 69.3437
4.3192 0.3529 185000 0.0390 6.7627 864.9563 4.2324 68.8804
4.2911 0.3624 190000 0.0382 6.7589 861.7325 4.2307 68.7676
4.332 0.3719 195000 0.0394 6.7499 853.9386 4.2234 68.2637
4.311 0.3815 200000 0.0392 6.7693 870.7021 4.2438 69.6714
4.2935 0.3910 205000 0.0388 6.7643 866.3254 4.2368 69.1861
4.3381 0.4005 210000 0.0398 6.7418 847.1075 4.2172 67.8446
4.2862 0.4101 215000 0.0387 6.7396 845.2019 4.2118 67.4806
4.2843 0.4196 220000 0.0397 6.7369 842.9649 4.2114 67.4486
4.3212 0.4292 225000 0.0399 6.7340 840.4940 4.2066 67.1268
4.2976 0.4387 230000 0.0390 6.7264 834.1246 4.2024 66.8453
4.322 0.4482 235000 0.0370 6.7456 850.2819 4.2172 67.8427
4.2604 0.4578 240000 0.0389 6.7255 833.4224 4.2010 66.7518
4.2827 0.4673 245000 0.0374 6.7311 838.0625 4.2070 67.1551
4.2629 0.4768 250000 0.0398 6.7232 831.4937 4.2016 66.7902
4.2719 0.4864 255000 0.0387 6.7234 831.6196 4.2018 66.8038
4.2518 0.4959 260000 0.0392 6.7199 828.7639 4.1976 66.5268
4.2439 0.5054 265000 0.0381 6.7225 830.8789 4.1985 66.5882
4.2873 0.5150 270000 0.0383 6.7290 836.3176 4.2064 67.1118
4.2885 0.5245 275000 0.0394 6.7120 822.2442 4.1908 66.0741
4.2603 0.5341 280000 0.0392 6.7111 821.5043 4.1885 65.9229
4.2623 0.5436 285000 0.0399 6.7100 820.5507 4.1877 65.8688
4.2669 0.5531 290000 0.0385 6.7113 821.6451 4.1869 65.8204
4.3208 0.5627 295000 0.0400 6.6984 811.1413 4.1780 65.2359
4.249 0.5722 300000 0.0397 6.6998 812.2105 4.1786 65.2716
4.2383 0.5817 305000 0.0390 6.6950 808.3597 4.1740 64.9756
4.2473 0.5913 310000 0.0387 6.7067 817.9059 4.1827 65.5417
4.2612 0.6008 315000 0.0391 6.6928 806.6095 4.1715 64.8147
4.1997 0.6104 320000 0.0392 6.6874 802.2146 4.1658 64.4451
4.2437 0.6199 325000 0.0400 6.6893 803.7320 4.1681 64.5955
4.2243 0.6294 330000 0.0399 6.6818 797.7855 4.1615 64.1649
4.2524 0.6390 335000 0.0399 6.6764 793.4797 4.1580 63.9440
4.2496 0.6485 340000 0.0396 6.6827 798.4811 4.1633 64.2805
4.2444 0.6580 345000 0.0394 6.6818 797.7289 4.1606 64.1105
4.2127 0.6676 350000 0.0402 6.6686 787.2920 4.1505 63.4653
4.2095 0.6771 355000 0.0390 6.6778 794.5674 4.1568 63.8654
4.2228 0.6866 360000 0.0392 6.6670 786.0658 4.1472 63.2587
4.2044 0.6962 365000 0.0397 6.6652 784.5846 4.1455 63.1497
4.2489 0.7057 370000 0.0395 6.6617 781.8825 4.1434 63.0145
4.2101 0.7153 375000 0.0396 6.6568 778.0572 4.1384 62.7031
4.2117 0.7248 380000 0.0395 6.6607 781.1064 4.1406 62.8427
4.2015 0.7343 385000 0.0397 6.6533 775.3367 4.1356 62.5253
4.2242 0.7439 390000 0.0395 6.6529 775.0460 4.1344 62.4528
4.1964 0.7534 395000 0.0397 6.6506 773.2814 4.1314 62.2637
4.1939 0.7629 400000 0.0401 6.6492 772.1992 4.1299 62.1728
4.229 0.7725 405000 0.0397 6.6447 768.7131 4.1276 62.0275
4.1805 0.7820 410000 0.0399 6.6408 765.7219 4.1229 61.7394
4.2036 0.7915 415000 0.0394 6.6451 769.0443 4.1261 61.9348
4.169 0.8011 420000 0.0404 6.6364 762.3091 4.1192 61.5076
4.1815 0.8106 425000 0.0395 6.6399 764.9846 4.1210 61.6187
4.189 0.8202 430000 0.0397 6.6364 762.3756 4.1181 61.4406
4.1774 0.8297 435000 0.0391 6.6365 762.4051 4.1172 61.3861
4.1925 0.8392 440000 0.0398 6.6308 758.1226 4.1129 61.1245
4.1868 0.8488 445000 0.0391 6.6303 757.6731 4.1124 61.0916
4.1189 0.8583 450000 61.2059 4.1142 0.0394 759.8155 6.6331
4.2156 0.8678 455000 61.0326 4.1114 0.0394 757.6308 6.6302
4.1966 0.8774 460000 60.7623 4.1070 0.0398 754.0364 6.6254
4.158 0.8869 465000 60.7495 4.1068 0.0398 754.6634 6.6263
4.1639 0.8965 470000 60.5520 4.1035 0.0392 751.4165 6.6220
4.1936 0.9060 475000 60.4407 4.1017 0.0396 750.2459 6.6204
4.154 0.9155 480000 60.3311 4.0998 0.0393 748.5255 6.6181
4.1473 0.9251 485000 60.3326 4.0999 0.0397 748.9214 6.6186
4.1857 0.9346 490000 60.1977 4.0976 0.0397 746.6764 6.6156
4.1724 0.9441 495000 60.1996 4.0977 0.0397 747.4420 6.6167
4.166 0.9537 500000 60.1117 4.0962 0.0394 746.2841 6.6151
4.1581 1.0095 505000 60.1274 4.0965 0.0395 746.5216 6.6154
4.1443 1.0191 510000 60.1217 4.0964 0.0395 746.7120 6.6157
4.1641 1.0286 515000 60.0107 4.0945 0.0397 744.8850 6.6132
4.1673 1.0381 520000 59.9055 4.0928 0.0397 743.3574 6.6112

Framework versions

  • Transformers 4.57.0.dev0
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
12
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hrezaei/T5La-Large-WeightedLoss

Finetunes
1 model

Dataset used to train hrezaei/T5La-Large-WeightedLoss

Evaluation results