finetune question

#28
by Saicy - opened

I train qwen3-32b and qwen3-14b,using the same training dataset.
But I find qwen3-32b is diffcult to training,always not following instructions.How to solve it

for prompt <|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\n{query}<|im_end|>\n<|im_start|>assistant\n\n\n\n\n

for qwen3-32b training paramters

--stage sft
--model_name_or_path Qwen/Qwen3-14B
--prompt_template qwen
--do_train
--dataset_dir data
--finetuning_type full
--output_dir out/${exp_name}
--max_source_length 32768
--max_target_length 2048
--logging_step 30
--overwrite_cache
--num_train_epochs 3
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 360
--save_strategy "epoch"
--save_steps 1000
--learning_rate 6e-6
--weight_decay 0.1
--warmup_ratio 0
--lr_scheduler_type "cosine"
--plot_loss
--bf16 True
--save_only_model True
--deepspeed deepspeed.json

Saicy changed discussion title from Training question to finetune question

I have a question, how much VRAM needed for qwen3-32b and qwen3-14b?

Sign up or log in to comment