Shitty SDXL loras

Finetune-extracted styles (NoobAI V-PRED v1.0)

Random notes written down over months of baking so I can at least vaguely remember why I'm using x settings.

  • Using this sd-scripts fork (had to fix a bunch of shit manually), came up with this template ps1 script / txt for cowards that uses about 22gb VRAM.
  • Size further reduced by slicing some lora blocks off in567 for less interference with composition and built-in character knowledge of the base model. Still confused about the noob blocks, need to mess around more with character/concept loras.
  • Testing with/without edm2 results were massively in favor of edm2, just trust me bro (I cba finding the comparison xyz). Most of the public edm2 configs I found were absolute garbage with my setup, not sure why. I don't want to remember how many different optimizer setups I tested.
  • Immiscible noise @ 4096 is beneficial for style reproduction, 1024 was kind of whatever in comparison. Difference was clear when I did some really overfit bakes: with 4096 lora could generate near 1:1 images compared to training data.
  • Went up from 5x to 10x repeats, feels like a sweet spot. Smaller datasets can have 10x for everything, bigger ones max. 100 images @10x and the rest @2x? Most likely a result of needing more steps for effective baking since I made my warmup pretty big to ensure edm2 stabilized before it's over.
  • Fixing up tags is even more important with 10x repeats, have to manually tag things that captioners like to ignore such as "sound effects".
  • Schizo block_lr guided by LLM hallucinations and visual comparison of roughly 15+ different setups vs having a flat lr.
  • Loss "log_cosh" felt the best after trying a bunch of them, certain prompts that were previously completely cooked even at 1e5 lr come out okay. Loss value doesn't really say anything but for some reason the loss values are 3x smaller, much like the values you would see baking on eps models.
  • Sangoi loss modifier is a rather big "improvement/trade off" (if your dataset is mostly greyscale?). Changed default max_snr=100 to 15 after testing a range of different values like a caveman (5,10,15,25,50,100,150,200). You might lose some 'excessive' details but the balance is generally better.
  • Accuracy autism was beneficial, tested with/without (loss_related_use_float64, edm2_loss_weighting_use_float64, disable_cuda_reduced_precision_operations).
  • Switched finetune extract to ratio (--mode=ratio --linear_ratio=0.4 --conv_ratio=0.3), it was the clear winner after testing multiple values with all the modes. Basically made a 4gb giga extract and went down until I got something that matched it while being as small as possible.

B-lora is recommended, the original is included in case you want to test your own extracts or w/e

Made in abyss manga
Some of the images are i2i from this

Mosha

Finetune-extracted styles (Illustrious v0.1)

Random notes

  • Followed anons rentry to start with, experiments resulted in this boomer .ps1 script which uses about 22gb VRAM.
  • Consistently got underbaked results with (1x dataset + 48 epochs) -> switched to (5x dataset + 16 epochs).
  • "Trigger" tokens seem very beneficial even if not used in prompt, didn't really do a thorough test of this
  • Haven't messed with B-LoRA slicer for these extracts (yet)
  • Timestep fuckery using "shift" is a huge positive for styles, using sd-scripts dev branch with these merged in: timestep shit from sd3 branch + soft snr gamma
  • Biggest change while testing was to move away from --full_bf16, likely a skill issue but I needed more epochs to reach the same results compared to having it off despite tinkering with learning rates. This was with adafactor but I assume lion or some optimi implementation with kahan_sum would fare better? Was too lazy to test more once I got decent results from this setup after weeks of messing with lora bakes.
    - EDIT: full bf16 seems decent enough if you yoink this into your sd-scripts library folder to have automatic stochastic rounding for adafactor

Mosha style lora, 67 image dataset w/ 5x repeats
1 kept token: artmosha
Training .ps1 script

B-LoRA—inspired styles (Pony Diffusion V6 XL)

Styles baked with a custom preset using Lycoris and then stripped down with blora_slicer.py for compatibility. First use takes pretty long at least on my forge setup but it'll be normal afterwards. The goal was to keep a good balance of style and compatibility with character/concept loras and base pony characters etc.

Deadnoodle style lora
Random anons dataset, didn't test this very much
Trained with score_9, source_anime

Made in Abyss manga style lora
Trained with score_9, source_anime

B-LoRA—inspired styles workflow (Pony Diffusion V6 XL)

Just the basic workflow, I have no clue what dataset sizes, learning rates, algos etc. are actually the best.
This set of sliced traits is just what I ended up with, it may not be optimal for everything. Do some experiments!

Training (sd-scripts):

  1. Copy this blora_conv_ffnet_outallbut0_inall_mid.toml file to use as a preset during training, you'll need lycoris.
  2. Example .ps1 file using LoKr algo,Uses roughly 15-18gb VRAM, could be optimized more. You can increase the factor to 3 or 4 if you're desperate, I saw improvements going from 4 to 2 in early testing but I haven't tried again with final settings.
  3. The .ps1 above uses immiscible noise, so you'll have to merge this pull request https://github.com/kohya-ss/sd-scripts/pull/1395:
    git fetch origin pull/1395/head:immis_noise
    git merge immis_noise
    

Slicing:

  1. Clone https://github.com/ThereforeGames/blora_for_kohya to your sd-scripts folder, the repo has more detailed install/usage instructions.
    1.1 Replace the slicer with this modified one blora_slicer.py if you want the lora to retain it's metadata.
  2. Add this to your blora_traits.json
    1
    2
    3
    4
    5
    "out012345_in12356_te1_te2":
        {
            "whitelist": ["lora_te1", "lora_te2", "lora_unet_output_blocks_0_1_", "lora_unet_output_blocks_1_1_", "lora_unet_output_blocks_2_1_", "lora_unet_output_blocks_3_1_", "lora_unet_output_blocks_4_1_", "lora_unet_output_blocks_5_1_", "lora_unet_input_blocks_1_1_", "lora_unet_input_blocks_2_1_", "lora_unet_input_blocks_3_1_", "lora_unet_input_blocks_5_1_", "lora_unet_input_blocks_6_1_"],
            "blacklist": ["proj_in", "proj_out", "alpha"]
        }
    
  3. Run the slicer on your chosen epochs using the above --traits and you're done!

    python blora_slicer.py --loras ./input/{file} --traits out012345_in12356_te1_te2 --debug --output_path=./output/{no_safetensors}_out0to5_in12356_te1te2.safetensors

    Personally I add "input" and "output" folders to my blora folder and use this handy python script folders_blora_slicer.py to run the script on all the loras in the input folder. It's very useful when you're testing multiple presets and multiple epochs.

    1
    2
    3
    4
    5
    6
      in your sd-scripts venv:
    
      cd .\blora_for_kohya
      python .\folders_blora_slicer.py
    
    -> grab the results from the output folder
    

Styles (Pony Diffusion V6 XL)

Trained with score_9, source_anime

Mosha with new oven settings for baking, sketch is a good tag to use. V9 with shittier tagging was somehow superior at 1girl, standing but had some other issues with prompting. Hands often have issues in both.

Trained with score_9, source_anime, eu03

Quick bake with the 5th eu03 dataset from anon, mostly for science. e16 is probably the one you want.

Trained with score_9, source_anime, tsukushi akihito

9th epoch is probably the best for normal use. Last epoch colors/shading go a bit too hard but it's great for spooky stuff. 1.0 weight works fine but going down to 0.8 brings back more details to the backgrounds.

Trained with score_9, source_anime, moshimoshibe

Shitty loras

Styles (LoKr + Pivotal)

Requires webui 1.7.0 / dev branch
Feels more forgiving to bake while retaining prompt responsiveness & less spaghetti fingers with pivotal? Including the token in prompt seems beneficial but might be cope.

Styles (LoKr)

Styles

Styles (hires)

Styles baked at 1200+ resolution, using these will generate mustard dolphin gas

Concepts

Instant loss 2koma mating press, 2e4 adam and 2e6 ada. Which one is better? It's all gacha bullshit that depends on model/prompt/loras.
2e4 feels gentler on style and 2e6 feels a bit more consistent. I uploaded both so you can share the pain of indecision. More notes in metadata, tldr: controlnet good.

Questionable

Kaiji

Edit Report
Pub: 25 Feb 2023 20:26 UTC
Edit: 05 Apr 2025 19:53 UTC
Views: 10363