lora的训练使用的文件是https://github.com/Akegarasu/lora-scripts
lora训练是需要成对的文本图像对的 需要准备相应的训练数据。
1.训练数据准备
使用deepbooru/blip生成训练数据 建筑类建议使用blip来生成。
2.lora在linux上环境
cuda 10.1 p40 python3.7
accelerate 0.15.0 应该只能在虚拟环境中 在train.sh中把accelerate launch --num_cpu_threads_per_process 8换成python 这么改accelerate多卡训练有问题
albumentations 0.2.0
scikit-image 0.14 版本高了会报错
numpy 1.17
这里面有个skimage的版本问题 会报错
[bug解决] cannot import name ‘_validate_lengths‘ from ‘numpy.lib.arraypad‘_arrycrop.py在那_Harry嗷的博客-CSDN博客
safetensors 0.3.0
voluptuous 0.12.1
huggingface-hub 0.12.0
transformers 4.20.0
tokenizers 0.11.6
opencv-python 4.0.0.21
einops 0.3.0
ftfy 6.0
pytorch-lightning 1.2.8
xformers 0.0.9(torch可以支持torch 1.8.1)
diffusers 0.10.0
pyre-extensions 0.3.0
regex 2021.4.4
升级glic
3.sh train.sh训练
openai的clip权重要配置一下
library/train_util/ 1900多行中load_tokenizer函数中的tokenizer CLIPTokenizer.from_pretrained()
使用的是openai的clip-vit-large-patch14参数
4.lora-scripts的核心代码解析
train_network.py- train- train_util.load_tokenizer- BuleprintGenerator- config_util.generate_dreambooth_subsets_config_by_subdirs- blueprint_generator.generate- config_util.generate_dataset_group_by_blueprint 加载数据- train_util.prepare_accelerator- train_util.prepare_dtype- train_util.load_target_model 加载sd模型- train_util.replace_unet_modules- vae.to(accelerator.device)- vae.requires_grad_(False)- vae.eval()- train_dataset_group.cache_lantents- network_module(LoRANetwork)- network.apply_to(text_encoder,unet,train_text_encoder,train_unet)- network.prepare_optimizer_params- train_util.get_optimizer- train_dataloader torch.utils.data.DataLoader(train_dataset_group)- lr_scheduler train_util.get_scheduler_fix- unet,text_encoder,network,optimizer,train_dataloader,lr_scheduler accelerator.prepare(unet,text_encoder,network,optimizer,train_dataloader,lr_scheduler)- unet.requires_grad_(False)- unet.to(accelerator.device)- text_encoder.requires_grad_- text_encoder.to(accelerator-device)- unet.eval()- text_encoder.eval()- network.prepare_grad_etc(text_encoder,unet)- dataset train_dataset_group.dataset[0]- noise_scheduler DDPMScheduler(beta_start 0.00085,beta_end 0.012,beta_schedule scaled_linear ,num_train_timesteps 1000,clip_sample False)- accelerator.init_trackers( netwoek_train )- network.on_epoch_start(text_encoder,unet)- latents batch[ latents ].to(accelerator.device)- latents latents*0.18215- encoder_hidden_states train_util.get_hidden_states- noise torch.randn(latents,device latent.device)- timesteps torch.randint(0,noise_scheduler.config.num_train_timesteps,(b_size,),device latents.device)- noise_latents noise_scheduler.add_noise(lantents,noise,timesteps) [1,4,64,64]- noise_pred unet(noisy_latents,timesteps,encoder_hidden_states).sample [1,4,64,64]- target noise_scheduler.get_velocity(latnets,noise,timesteps) [1,4,64,64]- loss torch.nn.functional.mse_loss(noise_pred.float(),target.float(),reduction none ) [1,4,64,64]- loss loss.mean([1,2,3])- loss_weights batch[ loss_weights ]- loss loss*loss_weights- accelerator.backward(loss)- param_to_clip network.get_trainable_params()- accelerator.clip_grad_norm_()- optimizer.grad()- lr_scheduler.step()- optimizer.zero_grad()- train_util.sample_images(accelerator,args,None,global_step,accelerator.device,vae,tokenizer,text_encoder,unet)
5.入参
args Namespace( bucket_no_upscale False, bucket_reso_steps 64, cache_latents True, caption_dropout_every_n_epochs 0, caption_dropout_rate 0.0, caption_extension .txt , caption_extention None, caption_tag_dropout_rate 0.0, clip_skip 2, color_aug False, dataset_config None, dataset_repeats 1, debug_dataset False, enable_bucket True, face_crop_aug_range None, flip_aug False, full_fp16 False, gradient_accumulation_steps 1, gradient_checkpointing False, in_json None, keep_tokens 0, learning_rate 0.0001, log_prefix None, logging_dir ./logs , lowram False, lr_scheduler cosine_with_restarts , lr_scheduler_num_cycles 1, lr_scheduler_power 1, lr_warmup_steps 0, max_bucket_reso 1024, max_data_loader_n_workers 8, max_grad_norm 1.0, max_token_length 225, max_train_epochs 10, max_train_steps 1600, mem_eff_attn False, min_bucket_reso 256, mixed_precision fp16 , network_alpha 32.0, network_args None, network_dim 32, network_module networks.lora , network_train_text_encoder_only False, network_train_unet_only False, network_weights None, no_metadata False, noise_offset 0.0, optimizer_args None, optimizer_type , output_dir ./output , output_name /home/sniss/local_disk/lora-scripts/output , persistent_data_loader_workers False, pretrained_model_name_or_path /home/sniss/local_disk/stable-diffusion-webui_23-02-17/models/Stable-diffusion/sd-v1.5.ckpt , prior_loss_weight 1.0, random_crop False, reg_data_dir None, resolution (512, 512), resume None, sample_every_n_epochs None, sample_every_n_steps None, sample_prompts None, sample_sampler ddim , save_every_n_epochs 2, save_last_n_epochs None, save_last_n_epochs_state None, save_model_as ckpt , save_n_epoch_ratio None, save_precision fp16 , save_state False, seed 1337, shuffle_caption True, text_encoder_lr 1e-05, tokenizer_cache_dir None, train_batch_size 1, train_data_dir /home/sniss/local_disk/lora-scripts/data , training_comment None, unet_lr 0.0001, use_8bit_adam False, use_lion_optimizer False, v2 False, v_parameterization False, vae None, xformers True)
参数的设置有好几块
1.输入的train.sh
2.train_network.py中的main部分
3.train_util.py中的1536行附近的
add_sd_models_args/add_optimizer_args/add_training_args/add_dataset_args
6.注意事项
a.数据集名称第一个是数字20_arch 这个数字和训练轮数epoch有关
b.train_network中
PyTorch 训练时中遇到的卡住停住等问题_训练时候卡住_yyywxk的博客-CSDN博客- 问题描述使用 PyTorch 框架训练模型 训练第一个 epoch 时 在最后一个 batch 处卡死 卡了一天都没有动弹 而 CPU 和 GPU 都处于正常运行的状态 程序也没有报错 并且之前训练一直都是正常的。最终 只能通过 Ctrl C 强制性暂停。如下图所示。- 可能的原因搜索文章发现 有人出现这种问题是和 cv2.imread 有关 用 OpenCV 的接口进行数据读取 而没有用 PIL 导致出现 OpenCV与Pytorch互锁的问题 关闭OpenCV的多线程即可解决问题1 2。但https://blog.csdn.net/yyywxk/article/details/106323049在args.max_data_loader_n_workers改为0
c.max_token_length 75,150,225,使用225会报错 RuntimeError: The size of tensor a (227) must match the size of tensor b (77) at non-singleton dimension 1,这块是个巨坑 主要是clip初始化的时候 忘了加tokenizer_config.json这个文件。
d.bash: accelerate: command not found 没搞定
多卡训练
python -m torch.distributed.launch --nproc_per_node 4 --nnodes 1 --node_rank 0 --master_addr localhost --master_port 22222 --use_env ./sd-scripts/train_network.py \ import torch.distributed as dist dist.init_process_group(backend gloo , init_method env:// )
accelerate==0.15.0 应该只能在虚拟环境中,在train.sh中把accelerate launch --num_cpu_threads_per_process=8换成python。lora训练是需要成对的文本图像对的,需要准备相应的训练数据。scikit-image==0.14 版本高了会报错。这里面有个skimage的版本问题,会报错。使用deepbooru生成训练数据。2.lora在linux上环境。3.sh train.sh训练。lora的训练使用的文件是。 复制链接
抵扣说明:
1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。
更多文章请关注《万象专栏》
转载请注明出处:https://www.wanxiangsucai.com/read/cv182567