More details on finetuning params
Hello @armand0e
Thanks for your interest! In addition to the hyperparameters already listed in the README, here are the values you're asking about :
Weight Decay: 0.01
Used for light regularization, as we're running a single-epoch fine-tune on a large model.LoRA Alpha: 16
Aligned with a LoRA rank of 8, it offers a good balance between adaptation strength and training stability.LR Scheduler Type: "linear"
Hope that helps — let me know if you'd like to dig deeper into any other part of the configuration or training workflow!
Thanks a bunch, I have tried similar params for my finetunes and experienced lots of tool calling issues, will give it another go with these exact settings and see how it does.
While this was helpful, I don't think they are 1:1 for my use-case. Thanks anyways for your response
Hi, thanks for the work. I can't run the model; I get the error: "AttributeError: GptOssExperts has no attribute 'down_projs'." Maybe you've encountered this.
Hi, thanks for the work. I can't run the model; I get the error: "AttributeError: GptOssExperts has no attribute 'down_projs'." Maybe you've encountered this.
Try double checking you have the right versions of Transformers and TRL, I noticed that some dependencies yield weird errors with GptOss.
- transformers==4.56.2
- trl==0.22.2
Hi, thanks for the work. I can't run the model; I get the error: "AttributeError: GptOssExperts has no attribute 'down_projs'." Maybe you've encountered this.
Try double checking you have the right versions of Transformers and TRL, I noticed that some dependencies yield weird errors with GptOss.
- transformers==4.56.2
- trl==0.22.2
Thanks for stepping in and providing an answer, really appreciate it.
Yes, it turns out the issue was caused by dependency version mismatches.
Happy to help 😀
Noticed the README says this:
Training Configuration
Base Model: unsloth/gpt-oss-20b (20B parameters)
Training Method: LoRA (adapter-only fine-tuning)
LoRA Rank: 8
Learning Rate: 5e-5
Batch Size: 4 per device, gradient_accumulation_steps=4
Epochs: 2
Max Sequence Length: 2048
LR Scheduler: Cosine,
warmup_ratio=0.05
Final Training Loss: 1.22
While your response earlier contradicts this. Just confirming which values.
Weight Decay: 0.01
Used for light regularization, as we're running a single-epoch fine-tune on a large model.LoRA Alpha: 16
Aligned with a LoRA rank of 8, it offers a good balance between adaptation strength and training stability.LR Scheduler Type: "linear"
Noticed the README says this:
Training Configuration
Base Model: unsloth/gpt-oss-20b (20B parameters)
Training Method: LoRA (adapter-only fine-tuning)
LoRA Rank: 8
Learning Rate: 5e-5
Batch Size: 4 per device, gradient_accumulation_steps=4
Epochs: 2
Max Sequence Length: 2048
LR Scheduler: Cosine,
warmup_ratio=0.05
Final Training Loss: 1.22While your response earlier contradicts this. Just confirming which values.
Weight Decay: 0.01
Used for light regularization, as we're running a single-epoch fine-tune on a large model.LoRA Alpha: 16
Aligned with a LoRA rank of 8, it offers a good balance between adaptation strength and training stability.LR Scheduler Type: "linear"
Yes because the model files have been updated, and in the current version you can access the adapter layers directly. This is expected and ensures you won’t encounter any dependency issues. Also, training parameters you’re seeing now are the correct and fully updated ones.