lora的训练使用的文件是https://github.com/Akegarasu/lora-scripts

lora训练是需要成对的文本图像对的 需要准备相应的训练数据。

1.训练数据准备

使用deepbooru/blip生成训练数据 建筑类建议使用blip来生成。

2.lora在linux上环境

cuda 10.1 p40 python3.7

accelerate 0.15.0 应该只能在虚拟环境中 在train.sh中把accelerate launch --num_cpu_threads_per_process 8换成python 这么改accelerate多卡训练有问题

albumentations 0.2.0

scikit-image 0.14 版本高了会报错

numpy 1.17

这里面有个skimage的版本问题 会报错

[bug解决] cannot import name ‘_validate_lengths‘ from ‘numpy.lib.arraypad‘_arrycrop.py在那_Harry嗷的博客-CSDN博客

safetensors 0.3.0

voluptuous 0.12.1

huggingface-hub 0.12.0

transformers 4.20.0

tokenizers 0.11.6

opencv-python 4.0.0.21

einops 0.3.0

ftfy 6.0

pytorch-lightning 1.2.8

xformers 0.0.9(torch可以支持torch 1.8.1)

diffusers 0.10.0

pyre-extensions 0.3.0

regex 2021.4.4

升级glic

3.sh train.sh训练

openai的clip权重要配置一下

library/train_util/ 1900多行中load_tokenizer函数中的tokenizer CLIPTokenizer.from_pretrained()

使用的是openai的clip-vit-large-patch14参数

4.lora-scripts的核心代码解析

train_network.py- train- 
train_util.load_tokenizer- 
BuleprintGenerator- 
config_util.generate_dreambooth_subsets_config_by_subdirs- 
blueprint_generator.generate- 
config_util.generate_dataset_group_by_blueprint 加载数据- 
train_util.prepare_accelerator- 
train_util.prepare_dtype- 
train_util.load_target_model 加载sd模型- 
train_util.replace_unet_modules- 
vae.to(accelerator.device)- 
vae.requires_grad_(False)- 
vae.eval()- 
train_dataset_group.cache_lantents- 
network_module(LoRANetwork)- 
network.apply_to(text_encoder,unet,train_text_encoder,train_unet)- 
network.prepare_optimizer_params- 
train_util.get_optimizer- 
train_dataloader torch.utils.data.DataLoader(train_dataset_group)- 
lr_scheduler train_util.get_scheduler_fix- 
unet,text_encoder,network,optimizer,train_dataloader,lr_scheduler accelerator.prepare(unet,text_encoder,network,optimizer,train_dataloader,lr_scheduler)- 
unet.requires_grad_(False)- 
unet.to(accelerator.device)- 
text_encoder.requires_grad_- 
text_encoder.to(accelerator-device)- 
unet.eval()- 
text_encoder.eval()- 
network.prepare_grad_etc(text_encoder,unet)- 
dataset train_dataset_group.dataset[0]- 
noise_scheduler DDPMScheduler(beta_start 0.00085,beta_end 0.012,beta_schedule scaled_linear ,num_train_timesteps 1000,clip_sample False)- 
accelerator.init_trackers( netwoek_train )- 
network.on_epoch_start(text_encoder,unet)- 
latents batch[ latents ].to(accelerator.device)- 
latents latents*0.18215- 
encoder_hidden_states train_util.get_hidden_states- 
noise torch.randn(latents,device latent.device)- 
timesteps torch.randint(0,noise_scheduler.config.num_train_timesteps,(b_size,),device latents.device)- 
noise_latents noise_scheduler.add_noise(lantents,noise,timesteps) [1,4,64,64]- 
noise_pred unet(noisy_latents,timesteps,encoder_hidden_states).sample [1,4,64,64]- 
target noise_scheduler.get_velocity(latnets,noise,timesteps) [1,4,64,64]- 
loss torch.nn.functional.mse_loss(noise_pred.float(),target.float(),reduction none ) [1,4,64,64]- 
loss loss.mean([1,2,3])- 
loss_weights batch[ loss_weights ]- 
loss loss*loss_weights- 
accelerator.backward(loss)- 
param_to_clip network.get_trainable_params()- 
accelerator.clip_grad_norm_()- 
optimizer.grad()- 
lr_scheduler.step()- 
optimizer.zero_grad()- 
train_util.sample_images(accelerator,args,None,global_step,accelerator.device,vae,tokenizer,text_encoder,unet)

5.入参

args Namespace(
bucket_no_upscale False, 
bucket_reso_steps 64, 
cache_latents True, 
caption_dropout_every_n_epochs 0, 
caption_dropout_rate 0.0,
caption_extension .txt ,
caption_extention None, 
caption_tag_dropout_rate 0.0, 
clip_skip 2, 
color_aug False, 
dataset_config None, 
dataset_repeats 1, 
debug_dataset False, 
enable_bucket True, 
face_crop_aug_range None, 
flip_aug False, 
full_fp16 False, 
gradient_accumulation_steps 1, 
gradient_checkpointing False, 
in_json None, 
keep_tokens 0, 
learning_rate 0.0001, 
log_prefix None, 
logging_dir ./logs , 
lowram False, 
lr_scheduler cosine_with_restarts , 
lr_scheduler_num_cycles 1, 
lr_scheduler_power 1, 
lr_warmup_steps 0, 
max_bucket_reso 1024, 
max_data_loader_n_workers 8, 
max_grad_norm 1.0, 
max_token_length 225, 
max_train_epochs 10, 
max_train_steps 1600, 
mem_eff_attn False, 
min_bucket_reso 256, 
mixed_precision fp16 , 
network_alpha 32.0, 
network_args None, 
network_dim 32, 
network_module networks.lora , 
network_train_text_encoder_only False, 
network_train_unet_only False, 
network_weights None, 
no_metadata False, 
noise_offset 0.0, 
optimizer_args None, 
optimizer_type , 
output_dir ./output , 
output_name /home/sniss/local_disk/lora-scripts/output , 
persistent_data_loader_workers False, 
pretrained_model_name_or_path /home/sniss/local_disk/stable-diffusion-webui_23-02-17/models/Stable-diffusion/sd-v1.5.ckpt , 
prior_loss_weight 1.0, 
random_crop False, 
reg_data_dir None, 
resolution (512, 512), 
resume None, 
sample_every_n_epochs None,
sample_every_n_steps None, 
sample_prompts None, 
sample_sampler ddim , 
save_every_n_epochs 2,
save_last_n_epochs None, 
save_last_n_epochs_state None, 
save_model_as ckpt , 
save_n_epoch_ratio None, 
save_precision fp16 , 
save_state False, 
seed 1337, 
shuffle_caption True, 
text_encoder_lr 1e-05,
tokenizer_cache_dir None, 
train_batch_size 1, 
train_data_dir /home/sniss/local_disk/lora-scripts/data , 
training_comment None, 
unet_lr 0.0001, 
use_8bit_adam False, 
use_lion_optimizer False, 
v2 False, 
v_parameterization False, 
vae None, 
xformers True)

参数的设置有好几块

1.输入的train.sh

2.train_network.py中的main部分

3.train_util.py中的1536行附近的

add_sd_models_args/add_optimizer_args/add_training_args/add_dataset_args

6.注意事项

a.数据集名称第一个是数字20_arch 这个数字和训练轮数epoch有关

b.train_network中

PyTorch 训练时中遇到的卡住停住等问题_训练时候卡住_yyywxk的博客-CSDN博客- 问题描述使用 PyTorch 框架训练模型 训练第一个 epoch 时 在最后一个 batch 处卡死 卡了一天都没有动弹 而 CPU 和 GPU 都处于正常运行的状态 程序也没有报错 并且之前训练一直都是正常的。最终 只能通过 Ctrl C 强制性暂停。如下图所示。- 可能的原因搜索文章发现 有人出现这种问题是和 cv2.imread 有关 用 OpenCV 的接口进行数据读取 而没有用 PIL 导致出现 OpenCV与Pytorch互锁的问题 关闭OpenCV的多线程即可解决问题1 2。但https://blog.csdn.net/yyywxk/article/details/106323049在args.max_data_loader_n_workers改为0

c.max_token_length 75,150,225,使用225会报错 RuntimeError: The size of tensor a (227) must match the size of tensor b (77) at non-singleton dimension 1,这块是个巨坑 主要是clip初始化的时候 忘了加tokenizer_config.json这个文件。

d.bash: accelerate: command not found 没搞定

多卡训练

python -m torch.distributed.launch --nproc_per_node 4 --nnodes 1 --node_rank 0 --master_addr localhost --master_port 22222 --use_env ./sd-scripts/train_network.py \
import torch.distributed as dist 
dist.init_process_group(backend gloo , init_method env:// )


accelerate==0.15.0 应该只能在虚拟环境中,在train.sh中把accelerate launch --num_cpu_threads_per_process=8换成python。lora训练是需要成对的文本图像对的,需要准备相应的训练数据。scikit-image==0.14 版本高了会报错。这里面有个skimage的版本问题,会报错。使用deepbooru生成训练数据。2.lora在linux上环境。3.sh train.sh训练。lora的训练使用的文件是。 复制链接

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

更多文章请关注《万象专栏》