More details on finetuning params

#7
by armand0e - opened

Hello @dousery

First off, great job with this finetune. I was hoping to gain some more insight on your training parameters for training. I saw some included in your readme but was curious on these values as well:

weight decay, lora alpha, and lr_scheduler_type

Thanks.

Owner

Hello @armand0e

Thanks for your interest! In addition to the hyperparameters already listed in the README, here are the values you're asking about :

  • Weight Decay: 0.01
    Used for light regularization, as we're running a single-epoch fine-tune on a large model.

  • LoRA Alpha: 16
    Aligned with a LoRA rank of 8, it offers a good balance between adaptation strength and training stability.

  • LR Scheduler Type: "linear"

Hope that helps — let me know if you'd like to dig deeper into any other part of the configuration or training workflow!

Thanks a bunch, I have tried similar params for my finetunes and experienced lots of tool calling issues, will give it another go with these exact settings and see how it does.

While this was helpful, I don't think they are 1:1 for my use-case. Thanks anyways for your response

armand0e changed discussion status to closed

Hi, thanks for the work. I can't run the model; I get the error: "AttributeError: GptOssExperts has no attribute 'down_projs'." Maybe you've encountered this.

Hi, thanks for the work. I can't run the model; I get the error: "AttributeError: GptOssExperts has no attribute 'down_projs'." Maybe you've encountered this.

Try double checking you have the right versions of Transformers and TRL, I noticed that some dependencies yield weird errors with GptOss.

  • transformers==4.56.2
  • trl==0.22.2
armand0e changed discussion status to open

Hi, thanks for the work. I can't run the model; I get the error: "AttributeError: GptOssExperts has no attribute 'down_projs'." Maybe you've encountered this.

Try double checking you have the right versions of Transformers and TRL, I noticed that some dependencies yield weird errors with GptOss.

  • transformers==4.56.2
  • trl==0.22.2

Thanks for stepping in and providing an answer, really appreciate it.
Yes, it turns out the issue was caused by dependency version mismatches.

Happy to help 😀

armand0e changed discussion status to closed

Noticed the README says this:

Training Configuration
Base Model: unsloth/gpt-oss-20b (20B parameters)
Training Method: LoRA (adapter-only fine-tuning)
LoRA Rank: 8
Learning Rate: 5e-5
Batch Size: 4 per device, gradient_accumulation_steps=4
Epochs: 2
Max Sequence Length: 2048
LR Scheduler: Cosine,
warmup_ratio=0.05
Final Training Loss: 1.22

While your response earlier contradicts this. Just confirming which values.

Weight Decay: 0.01
Used for light regularization, as we're running a single-epoch fine-tune on a large model.

LoRA Alpha: 16
Aligned with a LoRA rank of 8, it offers a good balance between adaptation strength and training stability.

LR Scheduler Type: "linear"

armand0e changed discussion status to open

Noticed the README says this:

Training Configuration
Base Model: unsloth/gpt-oss-20b (20B parameters)
Training Method: LoRA (adapter-only fine-tuning)
LoRA Rank: 8
Learning Rate: 5e-5
Batch Size: 4 per device, gradient_accumulation_steps=4
Epochs: 2
Max Sequence Length: 2048
LR Scheduler: Cosine,
warmup_ratio=0.05
Final Training Loss: 1.22

While your response earlier contradicts this. Just confirming which values.

Weight Decay: 0.01
Used for light regularization, as we're running a single-epoch fine-tune on a large model.

LoRA Alpha: 16
Aligned with a LoRA rank of 8, it offers a good balance between adaptation strength and training stability.

LR Scheduler Type: "linear"

Yes because the model files have been updated, and in the current version you can access the adapter layers directly. This is expected and ensures you won’t encounter any dependency issues. Also, training parameters you’re seeing now are the correct and fully updated ones.

dousery locked this discussion

Sign up or log in to comment