Built with Axolotl

See axolotl config

axolotl version: 0.12.2

base_model: "google/gemma-3-1b-it"

load_in_4bit: true

# gemma3 doesn't seem to play nice with ddp
ddp_find_unused_parameters: true

chat_template: gemma3
eot_tokens:
  - <end_of_turn>

resume_from_checkpoint: outputs/out/checkpoint-3000

datasets:
  - path: dataset_train_val_split/train_parts
    type: alpaca
    data_files:
      - dataset_train_val_split/train_parts/train_part_01.jsonl
      - dataset_train_val_split/train_parts/train_part_02.jsonl
      - dataset_train_val_split/train_parts/train_part_03.jsonl
      - dataset_train_val_split/train_parts/train_part_04.jsonl
      - dataset_train_val_split/train_parts/train_part_05.jsonl
      - dataset_train_val_split/train_parts/train_part_06.jsonl
      - dataset_train_val_split/train_parts/train_part_07.jsonl
      - dataset_train_val_split/train_parts/train_part_08.jsonl
      - dataset_train_val_split/train_parts/train_part_09.jsonl
      - dataset_train_val_split/train_parts/train_part_10.jsonl

test_datasets:
  - path: dataset_train_val_split/validation.jsonl
    type: alpaca
    split: train

dataset_prepared_path: last_run_prepared
output_dir: ./outputs/out

adapter: qlora
lora_model_dir:

sequence_len: 2048
sample_packing: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
  - up_proj
  - down_proj
  - gate_proj
  - q_proj
  - k_proj
  - v_proj
  - o_proj

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 6
micro_batch_size: 1
num_epochs: 3
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0004

# Training on full dataset: 265,590 samples
# Total steps per epoch: 265,590 / (micro_batch_size * gradient_accumulation_steps)
# = 265,590 / (2 * 4) = ~33,199 steps per epoch
# Ensure we use the full dataset
max_steps:  # Leave empty to use all data
eval_strategy: epoch
saves_per_epoch: 1

bf16: true
fp16: false
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
logging_steps: 1
flash_attention: 
eager_attention: true

warmup_ratio: 0.1
weight_decay: 0.0

# save_first_step: true  # uncomment this to validate checkpoint saving works with your config


gradio_max_new_tokens: 512
gradio_temperature: 0.7

outputs/out

This model is a fine-tuned version of google/gemma-3-1b-it on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0417
  • Memory/max Mem Active(gib): 7.72
  • Memory/max Mem Allocated(gib): 7.72
  • Memory/device Mem Reserved(gib): 8.93

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0004
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 6
  • total_train_batch_size: 6
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 1946
  • training_steps: 19461

Training results

Training Loss Epoch Step Validation Loss Mem Reserved(gib) Mem Active(gib) Mem Allocated(gib)
No log 0 0 3.9506 12.01 11.72 11.72
1.2302 0.1035 500 1.2162 14.29 14.2 14.2
0.898 0.2071 1000 0.8612 14.29 14.2 14.2
0.4979 0.3106 1500 0.5640 14.29 14.2 14.2
0.2908 0.4142 2000 0.3491 14.3 14.2 14.2
0.271 0.5177 2500 0.2368 14.3 14.2 14.2
0.2208 0.6213 3000 0.1751 14.3 14.2 14.2
0.2208 0.6213 3000 4.7060 6.46 6.46 6.68
0.0918 0.9999 6486 0.1107 7.72 7.72 8.91
0.0163 2.0 12973 0.0470 7.72 7.72 8.93
0.0332 3.0 19460 0.0417 7.72 7.72 8.93

Framework versions

  • PEFT 0.17.0
  • Transformers 4.55.2
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SindreLinden/gemma-3-1b-ifs-cloud-qlora

Adapter
(145)
this model