Go read this at https://civitai.com/articles/138 In there it is updated and has images.

Making a Lora is like baking a cake. Lots of preparation and then letting it bake. If you didn't properly make the preparations, it will probably be inedible.
I am making this guide for offline creation, you can also use google collab but I have no experience with that.
First you need some indispensable things:

  1. A nvidia videocard with at least 6GB but realistically 8 GB of VRAM. Solutions for ATI cards exist but are not mature yet.
  2. Enough disk space to hold the installations
  3. A working installation of Automatic1111 https://github.com/AUTOMATIC1111/stable-diffusion-webui or another UI and some Models (for anime it is recommended to use the NAI family: NAI, AnythingV3.0, AnythingV4.0, AnythingV4.5) I normally use AnythingV4.5 https://huggingface.co/andite/anything-v4.0/blob/main/anything-v4.5-pruned.safetensors
  4. A collection of images for your character. More is always better.
  5. Kohya’s scripts, a one click installer for windows can be found at https://github.com/derrian-distro/LoRA_Easy_Training_Scripts, just run the install file in an empty folder. It requires a specific Python version so take care of that.

That’s enough for a start Next begins the tedious part dataset cleanup:

  1. have to scrape a booru either manually or using a script, for rarer things you might end up re-watching that old anime you loved when you were a kid and going frame by frame doing screencaps, mpc-hc(https://github.com/clsid2/mpc-hc/releases/) has a handy feature to save screencaps to png with a right clickfilesave Image.
  2. Get all your images to an useful format that means preferably png but jpg might suffice. For gif i use a crappy spliter(https://github.com/adnanafzal565/GIF-Splitter) open source program. for webp I use dwebp straight from the google libraries(https://storage.googleapis.com/downloads.webmproject.org/releases/webp/index.html) dump dwebp from the downloaded zip into your images folder, open cmd in there and run <for %f in (*.webp) do dwebp.exe "%f" -o "%~nf.png"> that will convert all the webp images into pngs
  3. Unless you are making a huge LORA which accounts for style then remove from your images dataset any that might clash with the others for example chibi or Superdeformed versions of the characters. This can be accounted for by specific tagging but that can lead to a huge inflation of the time required to prepare the LORA.
  4. Exclude any images that have too many elements or are cluttered, for example group photos, gangbang scenes where too many people appear.
  5. Exclude images with watermarks or text in awkward places where it can’t be photoshopped out.
  6. Ok so you have some clean-ish images or not so clean ones that you can’t get yourself to scrap. The next fun part is manual cleaning. Do a scrub image by image trying to crop, paint over or delete any extra elements. The objective is that only the target character remains in your image (If you character is interacting with another for example having sex, it is best to crop the other character mostly out of the image). Try to delete or fix watermarks, speech bubbles and sfx text. Resize small images and pass low res images through img2img to upscale them.

The next fun step is image regularization. Common training images for SD are 512x512 pixels. That means all your images must be resized, or at least that was the common wisdom at the time I started training. I still do so and get good results but most people resort to bucketing which is allowing the LORA training script to autosort the images in size buckets. Also the common consensus is that too many buckets can cause poor quality in the training. My suggestion? Either resize everything to your desired training resolution or chose a couple of bucket sizes and resize everything to their closest appropriate bucket either manually or allowing upscaling in the training script:

  1. One technique is pass it though a script which will simply resize them either though cropping or resizing and filling the empty space with white. IF you have more than enough images probably more than 250. This is the way to go and not an issue. Simply review the images and dump any that didn’t make the cut. This can be accomplished in A1111 in the Train->preprocess images tab
  2. If you on the other hand are in a limited image budget, I would recommend doing this manually. Windows paint3d is adequate if not a good option. Just go to the canvas tab and move the limits of your image. Why do this? Because you can commonly get several subimages from a single image as long as it is high res. For example suppose you have a highrez fullbody image of a character. You can resize and fill the blanks to get 1 image. Do a cut at the waist and do a portrait shot for a 2nd image. Then make a cut at the neck for a 3rd image with a mugshot only.
  3. To simplify things a bit i have created a script which makes images square by padding them. It is useful to preprocess images before feeding them to the upscaler. Just copy the code into a text file and rename it to something.ps1 then right click it and click run with powershell.
  4. I have found a good anime scaler that works better than the one below in point 5. It is a windows ready application https://github.com/lltcggie/waifu2x-caffe/releases Just download the zip file and run waifu2x-caffe.exe Then You can selecte multiple images and upscale them to 512x512. For low res screen caps or old images i recommend the "Photography, Anime" model. You can apply the denoise before or after depending on how crappy your original image is.
  5. For upscaling, my current prefered anime scaler is escale. I have had good results for low res shit to 512. You can get it from the wiki. https://upscale.wiki/wiki/Model_Database. Just drop the model inside the A1111 model\ESRGAN folder and use it from the extras tab. https://drive.google.com/file/d/15VfEwR61Y1Je8EyPMWl40wW01S6bJuPJ/view?usp=share_link

So all your images are clean and in nice 512x512 size, the next step is captioning. Captioning can be as deep as a puddle or as the Marianas trench. For anime characters it is recommended to use deepboru which uses danbooru style tagging. Here’s the way I do it. alternatively you can use diffusion-webui-dataset-tag-editor for A1111 which includes a tag manager and waifu diffusion tagger.

  1. Go to the stable-diffusion-webui-dataset-tag-editor's A1111 tab and select a tagger in the dataset load settings, select use tagger if empty. Then simply load the directory of your images and after everything finishes tagging simply click save. Alternatively Go to A1111 in the Train->preprocess images tab and tick use deepbooru for captions and process them.
  2. Now you must decide which tags you are going to use for your LORA Take for example you wish to make a Lora for Bulma in her DBZ costume. So you choose the tag “Bulma_DBZ”. Wrong! If your character is unknow there is no issue but if you choose a famous character like bulma you will get style contamination from the word “Bulma” and the word “DBZ”. Underscore, dashes and hyphens are equivalent to spaces for the danbooru notation and even if partial you might get some bleed over due to tangentially invoking their concepts. So before you assign a tag, run it trough A1111 and check it returns noise. In the previous example I could concatenate it to BulmaDBZ or what I did which was to use the romanji spelling Buruma. An alternative is the usage of regularization images but I will speak about them later.
  3. The next part is tag pruning either use tag editor or manually go to the folder in which A1111 tagged your images. You must remove any tags in which your character was recognized. I do this using using either the tagger replace function or manually using notepad++ search->find in files option and doing a replace for example “Bulma, ” in exchange for “”. Also you may want to clean up erroneous or superfluous tags, a good way to simply determine which caption file needs cleaning is to go to the detailed view of the folder and note any caption file bigger than 1kb. If you are obsessive or want to do something fancy like tagging specific wardrobe combinations you will want to remove all tags for the individual parts of her costume. For example if some character uses a red dress, red high heels and a yellow chocker. You must delete these individual tags and replace the whole of them for a customized “OutfitName” tag. Or maybe you are a sensible person(Noob) and just want your character to appear and let the other tags do their job by themselves. So after you deleted any problematic tags it is time to insert your character tag. What I do is select the folder with the captions and do ctl+shift+right click and select open a powershell window here. There you can run the following command: foreach ($File in Get-ChildItem *.txt) {"Tohsaka_Rin_Alt, " + (Get-Content $File.fullname) | Set-Content $File.fullname}
    In the command you need to change “Tohsaka_Rin_Alt, ” for your trigger tag or tags. The command will insert the new tags at the beginning of every caption file. I prefer this approach even when using the tag editor as it will undoubtedly insert my trigger as the first tag in all files, it might be superstition (or maybe not) but i like it that way.
  4. This is only if you are training a style. If you are training a style you will need as many varied pictures as possible. For training a style captions are treated differently, what you want to do is either delete all tags from the caption files and let the LORA take over when it is invoked or to add a tag to act as trigger and only eliminate the tags that are associated to the style(for example Retro style, a specific background, perspective and tags like that) and leave all particular tags alone(1girl, blue_hair, etc). It is simpler to simply eliminate all tags.
  5. This is only if you are training a concept. First of all you are shit out of luck. Training a concept is 50% art and 50% luck. First make sure to clean up your images as good as possible to remove most extraneous elements. Try to pick images that are simple and obvious about what is happening. Try to pick images that are as different as possible and only share the concept you want. Tagging is similar to the trigger variant for style. You need to eliminate all tags which touch your concept and leave the others alone.

So you now have all your images neatly sorted and you wonder what the heck are regularization images. Well regularization images are like negative prompts but not really, they pull your model back to a "neutral" state. Either way unless you really need them ignore them.

To me they only have two realistic uses:

  1. Mitigate the bleed over from your trigger. Suppose I want to train a character called Mona_Taco The result will be contaminated with images from the Monalisa and tacos. So you can go to A1111 and generate a bunch of images with the prompt Taco and Mona and dump them into your regularization folder with their appropriate captioning. Now your Lora Will know That Mona_Taco has nothing to do with the Mona Lisa nor Tacos. Alternatively simply use a different tag or concatenate it, MonaTaco will probably work fine by itself without the extra steps. I would still recommend to simply use a meaningless word that returns noise.
  2. Another use is for style neutralization, suppose you trained a character lora with a thousand images all including the tag 1girl. now whenever you run your lora and put 1girl it will always display your character. So to prevent this you should put a thousand different images of different girls all tagged as 1girl to balance out your training and remove your lora influence from the 1girl concept. Of course you have to do this with as many affected tags as you can.
    Now the step you’ve been waiting for: the baking. Honestly the default option will work fine 99% of time. You might wish to lower the learning rate for style. But anyway for a Lora you must open the run_popup.bat from LoRA_Easy_Training_Scripts I recommend to never start training immediately but to save the training json file and review it first.
  3. Do you want to run multiple trainings: No
  4. Do you want to load a Jason file: No
  5. Do you want to save a Json of your config: yes
  6. Select a base model: Put the model you wish to train with I recommend one of the NAI family for Anime
  7. Select your image folder: select the folder above the one with the X_name
  8. Select output folder: Just make a new folder and put your stuff there
  9. Do you want to change checkpoint name: Don’t care
  10. Save File containing captions: YES! And pick by occurrence. That stuff is useful for when you are creating stuff to get an idea of the available tags for your character in that particular lora
  11. SD2 Model: No. TheNai family is based on 1.5
  12. Training on a realistic model: No again this is anime
  13. Use regularization mages: I explained above, the most common answer is no.
  14. Optimizer Adam, adam 8W and Leon work fine but leon is for more modern cards
  15. Dim size is good by default, the larger it is more dat is added to the lora to increase similarity which can cause overfitness which will cause deepfrying in your images at lower weights or high cfg so click cancel
  16. Same with alpha just click cancel
  17. Now you can choose the type of LORA, Locon and Loha require some extra scripts and take longer to train. I have heard you can make do with less epochs but no concrete data yet. So pick lora unless your card is fast and have the scripts needed to use it. I have tried Locon, it picks up more detail and that may be a good thing in intricate objects but keep in mind the quality of your dataset.
  18. Training rate is ok at default lower it for style so anyway click cancel
  19. unetLr just click cancel
  20. same for encoder
  21. Scheduler, technically important as they manipulate the learning rate. In practice? Just select cosine with restart not only is it the first in the list, I have seen some comparison and it is ok. Click cancel to set it to 1 restart
  22. Resolution if you have been following this guide is 512 so click cancel same for height
  23. Amount of images per batch depends on your vram at 8 GB you can do 2 or 1 if you are using image flipping, so just click cancel unless your character is asymmetric(single pigtail, eyepatch, etc)
  24. Now choosing between epoch or steps, it is more common to use epochs.
  25. How many epochs to train: depends on your dataset. I Normally use 8 epochs for 10 repetitions(X_Name becomes 10_Name) for 100 to 400 images
  26. Save epochs as your train: Yes! Set it at every 1 epochs it will be useful if you overtrain just pick an earlier epoch.
  27. Warm up ratio is an initial increase on learning rate just select no.
  28. Shuffle captions: This is to prevent some bias but we want bias and we put our tags at the biggening so no.
  29. Keep tokens at the front: same thing so no.
  30. Unet vs text only: it is best to get it combined so both.
  31. Flip your images: yes! Unless your character is asymmetric.
  32. Comments: Just put the LORA triggers like “Trigger: MonaTaco, Tacodress” so people can see them in the future
  33. Prevent upscaling: no point as all your images should be the correct size but just click no
  34. For max compatibility just pick fp16 twice
  35. cache latents works fine and is stable so choose that
  36. Generate test images: I don’t do it but my video card is crappy and I want it to finish as soon as possible.
  37. Just click any key to finish and go review your tags summary and configuration Json.
  38. If everything is ok then run run_popup.bat again
  39. In the second question select use Json file and select the Json you created in the previous run.

Finally let it cook. It is like a cake if you peek, it will deflate. If you use the computer too much it might mysteriously lower it’s speed and take twice as long. So just step away go touch grass, stare directly at the sun. Scream at the neighbor kids to get of your lawn.

Finally your Lora finished baking. Try it a 1 weight or do an xyz graph with several weights. If it craps out too early go to a previous epoch. Congratulation you either finished or you screwed up.

BTW visit me at https://civitai.com/user/knxo/models and give me a review with some nice sexy images to watch.

Semi-useful crap

the script below convert images into png files and makes them square adding white padding. They can then be fed to an upscaler or other resizer to make them the correct resolution.

#change jpg to png
Get-ChildItem -Recurse -Include *.jpg | Foreach-Object{

$newName=($_.FullName -replace '.jpg',"_from_jpg.png")  
[void][System.Reflection.Assembly]::LoadWithPartialName("System.Drawing")
$bmp = new-object System.Drawing.Bitmap($_.FullName)
$bmp.Save($newName, "png")

}

#change jpeg to png
Get-ChildItem -Recurse -Include *.jpeg | Foreach-Object{

$newName=($_.FullName -replace '.jpeg',"_from_jpeg.png")  
[void][System.Reflection.Assembly]::LoadWithPartialName("System.Drawing")
$bmp = new-object System.Drawing.Bitmap($_.FullName)
$bmp.Save($newName, "png")

}

#change bmp to png
Get-ChildItem -Recurse -Include *.bmp | Foreach-Object{

$newName=($_.FullName -replace '.bmp',"_from_bmp.png")  
[void][System.Reflection.Assembly]::LoadWithPartialName("System.Drawing")
$bmp = new-object System.Drawing.Bitmap($_.FullName)
$bmp.Save($newName, "png")

}

$cnt=0
Get-ChildItem -Recurse -Include *.png | Foreach-Object{

$newName=$PSScriptRoot+"\resized"+$cnt.ToString().PadLeft(6,'0')+".png"
[void][System.Reflection.Assembly]::LoadWithPartialName("System.Drawing")
$bmp = [System.Drawing.Image]::FromFile($_.FullName)



if($bmp.Width -le $bmp.Height)
{
$canvasWidth = $bmp.Height
$canvasHeight = $bmp.Height
$OffsetX= [int] ($canvasWidth/2 - $bmp.Width/2)
$OffsetY=0
}
else
{
$canvasWidth = $bmp.Width
$canvasHeight = $bmp.Width
$OffsetX=0
$OffsetY=[int] ($canvasWidth/2 - $bmp.Height/2)
}



#Encoder parameter for image quality
$myEncoder = [System.Drawing.Imaging.Encoder]::Quality
$encoderParams = New-Object System.Drawing.Imaging.EncoderParameters(1)
$encoderParams.Param[0] = New-Object System.Drawing.Imaging.EncoderParameter($myEncoder, 100)
# get codec
$myImageCodecInfo = [System.Drawing.Imaging.ImageCodecInfo]::GetImageEncoders()|where {$_.MimeType -eq 'image/jpeg'}


#create resized bitmap

$bmpResized = New-Object System.Drawing.Bitmap($canvasWidth, $canvasHeight)
$graph = [System.Drawing.Graphics]::FromImage($bmpResized)

$graph.Clear([System.Drawing.Color]::White)
$graph.DrawImage($bmp,$OffsetX,$OffsetY , $bmp.Width, $bmp.Height)

#save to file
$bmpResized.Save($newName,$myImageCodecInfo, $($encoderParams))
$graph.Dispose()
$bmpResized.Dispose()
$bmp.Dispose()

$cnt++

   }
Edit
Pub: 25 Mar 2023 23:19 UTC
Edit: 06 Jun 2023 16:57 UTC
Views: 1880