You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I saw this inside the training log about 12 hours ago, but it still doesn't look like CONTINUE TRAINING has started yet.
May I ask what is causing this?
log:
Continuing training from checkpoint, will skip to saved global_step
Continuing training from epoch 0
Continuing training from global step 50000
Will skip the first 0 epochs then the first 100000 batches in the first epoch.
Others
No response
The text was updated successfully, but these errors were encountered:
Reminder
System Info
llamafactory
version: 0.9.2.dev0Reproduction
torchrun --nproc_per_node $GPUS_PER_NODE
--master_addr $MASTER_ADDR
--node_rank $NODE_RANK
--master_port $MASTER_PORT
--nnodes $NNODES
src/train.py
--deepspeed LLaMA-Factory/examples/deepspeed/ds_z3_config.json
--stage sft
--do_train
--model_name_or_path hf_models/Qwen2-VL-7B
--dataset mammoth_vl_si
--buffer_size 128
--preprocessing_batch_size 128
--streaming
--dispatch_batches false
--max_steps 160000
--template qwen2_vl
--finetuning_type full
--output_dir 1208_sft_qwen2vl_mammoth_si
--overwrite_cache
--overwrite_output_dir false
--warmup_steps 100
--weight_decay 0.1
--ddp_timeout 9000
--learning_rate 5e-6
--lr_scheduler_type cosine
--logging_steps 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 4
--gradient_accumulation_steps 2
--cutoff_len 16384
--save_steps 1000
--report_to wandb
--run_name train_qwen2vl_1208_si
--plot_loss
--num_train_epochs 1
--bf16
Expected behavior
I saw this inside the training log about 12 hours ago, but it still doesn't look like CONTINUE TRAINING has started yet.
May I ask what is causing this?
log:
Continuing training from checkpoint, will skip to saved global_step
Continuing training from epoch 0
Continuing training from global step 50000
Will skip the first 0 epochs then the first 100000 batches in the first epoch.
Others
No response
The text was updated successfully, but these errors were encountered: