Before you begin you should get the monkeypatch extension from here: https://github.com/aria1th/Hypernetwork-MonkeyPatch-Extension. Additionally, add and install the WD1.4 tagger extension found in the webui extension tab. Some swear by the weight decay function, but I have no experience with it.

Once you have those two extensions you can start gathering your dataset, if you haven't already. You'll want at least eighty quality images of the style you want train on. You don't want duplicates, large swathes of text/dialogue/speechbubbles, or extreme differences in style (ie style development over the course of years). With monkeypatch you don't have to crop your images. Though exceedingly large resolution sizes and extreme aspect ratios like panorama or dakimakura images aren't recommended. If you want, edit your images with a decent photo editor like PS or Gimp before batch tagging them with the WD tagger tab. Take a look at the generated text files and remove any incorrect or superflous tags.

Create the hypernetwork using the training tab. Mish or Softsign are both good for anime styles. Between the two, Softsign learns slower but has a lesser chance to fry. A layer structure of 1, 1.5, 1.5, 1 will do for style learning. Leave the other options unchecked.

Now set up for training. Close the webui and cmd/shell, open webui-user.bat and remove any launch arguments like xformers and medvram. You can add them back later when you're not training. Make sure hypernetwork strength is set to 1 and clip skip is either at 1 or 2. Unselect your vae and restart the webui (or if you're paranoid like me, remove it from the vae folder and restart your computer). Enter a preview prompt somewhat related to the style, or what you want it to achieve. Make sure you're generating at 512x512 and high-res fix is off. Sampler and steps don't really matter, just keep it reasonable like Euler or DPM++ 2M Karras at 20 steps.

Train the hypernetwork using the train gamma tab. You can leave the learning rate at defeult, but be certain that you check the both advanced options boxes. In the first added row of options, set weight decay to 0.1. In second row of advanced options that appear, check the Cosine Scheduler box and then set steps for cycle to 250. The other advanced options can remain unchanged.

Leave batch steps at one. With uncropped images, you cannot use them. Gradient Accumulation increase training step efficiency at the cost of details and time. Leave width and height at 512. Set the max steps to 20k. Check the read parameters box so you generate previews with the prompt you entered in the txt2img tab. Check the shuffle tag box, and set the dropout slider anywhere to 0.2. Use the deterministic sampling method. Click Train Hypernetwork.

Let it train while you go do something else, keeping an eye on the preview images. Go read Heathen's discussion on Hypernetwork Style Training on Automatic1111's github: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/2670. By now the reccomended settings from the discussion are out of date, but it's still the best resource we have for simple explanations of what the different hypernetwork options you just entered actually do.

Edit
Pub: 28 Dec 2022 23:01 UTC
Edit: 01 Jan 2023 06:10 UTC
Views: 10972