EASYSCRIPTS EZ STYLE LORA GUIDE
This guide assumes you already have local set up. The included .toml also assumes you're on an RTX 30xx or 40xx GPU. If you're not you'll have to disable bf16 shit.
PREFACE
This isn't really a guide to get something that looks 100% identical to whatever artist you're trying to get styling from. With appropriate in-model artist anchors and/or supporting artist lora's you can probably get pretty close but it's not going to be some burned in "looks completely identical" type bullshit. Also I'm not an expert and I'm not going to go into super specific special options and pros or cons of random settings. I'm just some fag writing out a fairly simple training guide to post on 4chinz with the intention that it will "just werk." If you want an example of what a lora output from these settings will produce, here's a catbox 2x1 grid with a baseline and four other style lora's here's an (updated) catbox 2x1 grid with a baseline and three other style lora's. No anchoring artists were used(you can also just check prompt/negative, the metadata is there).
This guide will be in three and a half parts. The first one and a half covering easyscripts shit, the second will go into minimal detail about dataset and the last part covers tagging. Also a random additional option at the end relevant to people on 12GB of VRAM. And easyscripts doesn't have fp8(yet) so vramlets need not apply.
STARTO
Go here and follow instructions on the install. I'm not holding your hand through this
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
Now go and download this generic .toml
This has now been updated and has gradient checkpointing enabled by default. If you're on a 24GB VRAM card you can disable it if you want for speed gains
To save random autists the effort, this is what optimizer/LR is set to. It's also u-net only and 64dim 32alpha. It's a thicc lora that outputs a 332mb file. But it also works and disk space is fucking cheap.
Also note that there are other viable settings to use. If you want those, go use them. This isn't a guide for settings or pro's/cons of whatever random bullshit. This is a guide that wants to spend as little time as possible detailing that autism and just let retards experience making something that actually functions.
If you do decide to use different settings, the only relevant portion of this guide will be related to dataset and tagging - both of which are pretty simplified.
Double click run.bat and get into the UI. File>load toml, select that .toml you just downloaded.
There's a base model field and a button next to it to bring up file explorer. Point it toward ponyDiffusionV6XL_v6.safetensors (or whatever other XL model you want to train your lora on).
Scroll down to and expand saving args. Set whatever directory you want easyscripts to output the lora into and type whatever name you want the lora to have into the "output name" field with whatever convention you want.
Scroll down to logging args. Change the output directory to wherever you want it to output the training log to. Or just untick the enable button, idk, I'm not your dad.
After this do yourself a favor and go to file and save the toml. You'll appreciate it later when you don't have to reinput all that crap.
Go to the subset args tab from the top.
Change the input image directory to whatever directory you have the dataset in. Also take note of the "keep tokens" field, it'll come up in the tagging section.
THE POINT FIVE
This is the part where it gets a little more complicated but is entirely related to your dataset.
First, it's time for MATH. The goal is you want to hit around 3000-3300~ steps total but you also don't really want to smash it with too many repeats. This is all a bit subjective to how you probably don't really want to go above 4 repeats if you don't have to and you should probably want to stick to around 8-12 epoch. Here's a simple equation.
ire=s
i is total images in your dataset, r is number of repeats, e is epoch count and s your total step count. So if you have 120 images, use 3 repeats and have it train for 9 epochs you end up with 3240 steps.
Again, this stuff is subjective. Guidelines aren't really set in stone. 3k~ steps with the settings in this toml works pretty well for the most part. If your dataset is small enough that you have to make the decision between >4 repeats or >12 epoch, again, kind of subjective but I'd usually opt for higher repeat count over epoch.
So once you figure out the math, change to however many repeats you want and then go back to the main args tab and up near the top you'll find a "Max Training Time" option with dropdowns for epochs and a number. Change that number to however many epochs you decided you want.
DATASET
This honestly isn't complicated. Get grabber or whatever. I'm not going to go into detail on how to use it, find some other shitty guide for that. If you specifically want to scrape twitter go get WFDownloader(it looks sketchy as fuck but it's clean and it works). Just scrape your images and get them all into one directory.
The important part here is pruning the dataset. There's a lot of autism that can go into this but I'm going to keep it extremely simple. You want "high quality" images at good enough resolution that specifically does not have things like excess noise, JPG artifacts, random chromatic aberration while also reflecting the artists style that you want to train. Ideally you can get at least 75 of such images and while you can technically make due with less, more is better. You don't need to bother giving a shit about cropping or resizing anything, buckets will handle that for you. The only thing to worry about are excess runes/words/speech bubbles/etc. If it's something that can be fixed in 30 seconds in an image editor, just go and do it. If it requires more effort/skill than that then just leave it unless you're actually determined. As long as it's properly tagged it won't be a big deal but you want it to be as minor a part of the dataset as it can possibly be.
After you've already made a lora and run it through its paces you can revisit this step and refine it further. You're probably going to have a better time starting with a larger dataset and then cutting it down for further revisioning than starting with a smaller dataset and adding more into it.
TAGGING
This is as easy or as complicated as you want it to be.
Get the tagger extension for a1111 here
https://github.com/picobyte/stable-diffusion-webui-wd14-tagger
Couldn't tell you if it works in forge or not. There should also be a standalone build of it somewhere but I've got no real experience with it there so I'm not covering it.
After that's installed and you restart shit to get it going if you have to, go to the tab in a1111 and select the batch from directory tab. Change the input directory and the output directory to wherever you have your image set. Don't click the interrogate button yet, though. Go down to the interrogator dropdown and select WD14 SwinV2 v1. There are other options if you want but this is a pretty decent default. Next, you have two options. You can use the "score_9" tag or not. It's up to you. If you do, add "score_9" in the additional tags field. And if you do that, you remember how I said to make note of the "keep tokens" option in easyscripts UI? Go and set that to 1. This will keep score_9 as the first tag used in training across all images. You can also try and add the artist to additional tags but unless the artist itself is picked up by the model I wouldn't bother(but if you do increase keep tokens to 2).
Now for the kind of weird part. You have two sliders. Weight threshold and "Min tag fraction in batch and interrogations." Weight threshold is basically a min confidence score for what tags get applied. You can leave it at the default .35 if you want but if you want it to be more stringent you increase it or if you want it more lax you decrease it. Simple. "Min tag fraction in batch and interrogations," however, is just a really dumb fucking way of saying "minimum number of times a tag needs to be repeated within the dataset in order to be tagged." So let's say your dataset has a singular hatsune miku in it. If you have that slider at anything other than 0, even if it recognizes her as Hatsune Miku it won't tag her as that. Again, though, you can leave it at default if you want, but doing so will basically ignore all single use tags in the dataset which isn't always ideal. I personally set it to 0 but it's probably going to depend on preferance and dataset.
After you have all that shit sorted, I'm going to ignore the rest of the page(it's mostly self explanatory, anyway) and tell you to just mash that interrogate button.
If you have autism you can further go and curate/add individual tags by hand but that's on your own time.
Now that your shit is tagged go back to easyscripts, hit that ADD button on the bottom right and you can mash that START TRAINING button. And at the end it "should" work. If it doesn't just google whatever error you're getting and you'll probably get there eventually.
ADDITIONAL INFO
So, for those of you 12GB VRAM, these settings are actually slightly bloated and will kick you into shared VRAM(which is SLOW). However, you do have an option that might trigger some autists.
On the main args tab, there's a little RESOLUTION box. At 1024x1024, a 12GB 3060 will take probably around 7~ hours or so to train 3k steps. However, there's a little secret - you don't actually need to train XL at 1024x1024. Some datasets can even actually perform slightly better at lower set resolutions for random reasons/conjecture that I'm not going to get into, but that isn't the point. Dropping that resolution from 1024x1024 to 832x832 will reduce training time by more than half(2.5-3~ hours for 3k~ steps on a 12GB 3060). It'll still kick you into shared VRAM but not quite as badly so it's a significant speed increase for, as far as I've found, negligible quality loss(PS: the grid at the start were all lora's set at 832x832 or 768x768). This also speeds up training time on bigger VRAM cards, too. Those with bigger VRAM cards can also increase batch size but you'll probably need to adjust settings yourself to figure out what's optimal.
There's also gradient checkpointing/gradient accumulation but I've personally never bothered with it and don't know much of anything about it but it'll lower VRAM consumption and increase generation time as a result, no idea if or how it degrades the actual lora, though. Again, I haven't bothered with it - I tend to just set shit to run overnight and easyscripts has a convenient ability to queue training scripts to run letting a 12GB 3060 spit out two lora's in 6~ or less hours.
Light addendum - Gradient checkpointing by itself seems? To be benine with differences that can be chalked up to non-determinism in training
You can still drop resolution to 832x832 and increase training speed with gradient checkpointing on. A full 1024x1024 train should take around 2-3? hours on a 12GB 3060 while a 832x832 train will take around 70~ minutes or so.
I'll also try to adjust a few things to figure out decent batch sizes for additional speed gains.