See axolotl config

axolotl version: 0.12.2

base_model: "google/gemma-3-1b-it"

load_in_4bit: true

# gemma3 doesn't seem to play nice with ddp
ddp_find_unused_parameters: true

chat_template: gemma3
eot_tokens:
  - <end_of_turn>

resume_from_checkpoint: outputs/out/checkpoint-3000

datasets:
  - path: dataset_train_val_split/train_parts
    type: alpaca
    data_files:
      - dataset_train_val_split/train_parts/train_part_01.jsonl
      - dataset_train_val_split/train_parts/train_part_02.jsonl
      - dataset_train_val_split/train_parts/train_part_03.jsonl
      - dataset_train_val_split/train_parts/train_part_04.jsonl
      - dataset_train_val_split/train_parts/train_part_05.jsonl
      - dataset_train_val_split/train_parts/train_part_06.jsonl
      - dataset_train_val_split/train_parts/train_part_07.jsonl
      - dataset_train_val_split/train_parts/train_part_08.jsonl
      - dataset_train_val_split/train_parts/train_part_09.jsonl
      - dataset_train_val_split/train_parts/train_part_10.jsonl

test_datasets:
  - path: dataset_train_val_split/validation.jsonl
    type: alpaca
    split: train

dataset_prepared_path: last_run_prepared
output_dir: ./outputs/out

adapter: qlora
lora_model_dir:

sequence_len: 2048
sample_packing: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
  - up_proj
  - down_proj
  - gate_proj
  - q_proj
  - k_proj
  - v_proj
  - o_proj

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 6
micro_batch_size: 1
num_epochs: 3
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0004

# Training on full dataset: 265,590 samples
# Total steps per epoch: 265,590 / (micro_batch_size * gradient_accumulation_steps)
# = 265,590 / (2 * 4) = ~33,199 steps per epoch
# Ensure we use the full dataset
max_steps:  # Leave empty to use all data
eval_strategy: epoch
saves_per_epoch: 1

bf16: true
fp16: false
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
logging_steps: 1
flash_attention: 
eager_attention: true

warmup_ratio: 0.1
weight_decay: 0.0

# save_first_step: true  # uncomment this to validate checkpoint saving works with your config


gradio_max_new_tokens: 512
gradio_temperature: 0.7

outputs/out

This model is a fine-tuned version of google/gemma-3-1b-it on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0417
Memory/max Mem Active(gib): 7.72
Memory/max Mem Allocated(gib): 7.72
Memory/device Mem Reserved(gib): 8.93

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0004
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 6
total_train_batch_size: 6
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 1946
training_steps: 19461

Training results

Training Loss	Epoch	Step	Validation Loss	Mem Reserved(gib)	Mem Active(gib)	Mem Allocated(gib)
No log	0	0	3.9506	12.01	11.72	11.72
1.2302	0.1035	500	1.2162	14.29	14.2	14.2
0.898	0.2071	1000	0.8612	14.29	14.2	14.2
0.4979	0.3106	1500	0.5640	14.29	14.2	14.2
0.2908	0.4142	2000	0.3491	14.3	14.2	14.2
0.271	0.5177	2500	0.2368	14.3	14.2	14.2
0.2208	0.6213	3000	0.1751	14.3	14.2	14.2
0.2208	0.6213	3000	4.7060	6.46	6.46	6.68
0.0918	0.9999	6486	0.1107	7.72	7.72	8.91
0.0163	2.0	12973	0.0470	7.72	7.72	8.93
0.0332	3.0	19460	0.0417	7.72	7.72	8.93

Framework versions

PEFT 0.17.0
Transformers 4.55.2
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for SindreLinden/gemma-3-1b-ifs-cloud-qlora

Base model

google/gemma-3-1b-pt

Finetuned

google/gemma-3-1b-it

Adapter

(145)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard