Stable Diffusion spoonfeed installation guide
Last edit: 17/07/2024
READ THIS:
This guide is for NVIDIA GPU's on windows, AMD needs a different guide.
Nvidia GPUs are the primary supported manufacturer. AMD has support, but is harder to set up. If you are on an AMD GPU check here instead: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs
AMD can use reForge too, but it's outside of the scope of what I know.
Updates:
EDIT: 10/11/2024
Changed ponyXL related things towards NoobAI. Updated stuff everywhere to be less wordy.
What hardware do I need?
Specs
The most important variables are as follows:
1st VRAM Size:: The more VRAM, the bigger your images can be, or the more you can generate at once. Once VRAM is saturated, your model will be offloaded onto RAM, which is far far slower.
For SDXL: 6GB BARE minimum, 8GB 'ok', 12GB is comfortable. Anything above is luxury for most purposes at the moment. I've seen people manage with as low as 3GB, but you will have to use tricks and can expect long generation times. Various optimizations in recent months have helped to gradually push the VRAM requirements down, which is cool.
2nd Card speed: Your actual cards "strength". Somewhat self explanatory as regular gaming benchmarks more or less apply. If VRAM isn't overloaded, this will be your primary decider for generation speed.
3rd RAM size: The speed of your RAM isn't as relevant as available size. IF your model offloads to RAM, you better have enough to load it. To reiterate, once your VRAM is full, even if only small amounts offload to RAM, your generation speed WILL exponentially take hits.
More RAM is still useful in some non essential things you might want to play with, like refiners.
I highly highly recommend the Nvidia 3060 12GB as a budget or midrange card. With 12GB of RAM it consistently beats cards twice it's pricepoint. The closest competitor I would consider is the Nvidia 4070 12GB, which should perform noticably faster. The 5000 series of cards isn't out yet at the time of writing.
Recommendations from the thread:
Additionally you will require a bunch of free hard drive space: I recommend at least 20-40GB Hard Drive space for comfortable genning. This can get very large very quickly.
What Frontend?
Nowadays you get several choices:
WebUi (Automatic1111) | WebUI (reForge) | ComfyUI |
---|---|---|
+Widely used and supported | + A1111 with ComfyUI speeds | +Fast |
- Completely invalidated by reForge | + Currently just a better A1111 Webui | +Tons of individual support, gets cutting edge features first |
-Slowest generation times | +-Official support ended, maintained by the community! | +- Harder to grasp but actually teaches you how things function under the hood. |
-Hardest to set up and more importantly understand |
StabilityMatrix | Invoke | Online Generators |
---|---|---|
+Basically a frontend to install other frontends | +Nicest looking UI | +- Not the focus of this guide. Keep in mind many charge as much in ~3months as the entire pricepoint of a 3060. Most are straight up ripoffs. CivitAI somewhat has merit as the on-site generator let's you use most models on the site, and the lora training features remain popular, even for local genners. |
- Highly discouraged, paywalls access to free software as of late | +- Aimed at professionals. Great UI, lags in features, but has been catching up. | NAI(NovelAI) is one service that's genuinely good. Throughout AI's young history, NAI had its moments where it straight up beat local image generation. As of 10/11/2024, NoobAI offers a local alternative in similar strength. Paid service, but fair pricing. |
TL:DR
At the time of writing (10/11/2024), Forge, specifically reForge remains the best general option due to combining the ease of use of the WebUI, with the speed and performance comparable to that of ComfyUI. Forge supports flux, reForge does not, so having a seperate install for that makes sense. There remains 0 reason to use A1111 over reForge.
"Power users" may consider ComfyUI however as I have little experience with it, it's out of the scope of this guide. Another Anon wrote a quick primer for it here, specifically for people switching to it from the WebUI.
'WebUI'
The term "WebUI" refers to the entire family of A1111 forks, so it includes both it and forge/reForge.
Installing reForge
2 Ways, either through Git, or Raw. I do not recommend a raw install through the Zip.
Git is HIGHLY recommended.
Git
The official install guide can be found here if you get stuck. Specifically we want to be on the dev branch instead of main, as it has important features for the current meta models.
1: Install Git
2: Make sure Python 3.10.6 is installed. The version is important. Make sure to click 'add to path' when installing.
3: Navigate to the folder where you want your install to be. For this example it would be D:\AI
4: Click an empty space in your folder path: , type cmd, press enter. This will open a command line window in your chosen folder.
5: Type in git clone -b dev https://github.com/Panchovix/stable-diffusion-webui-reForge.git
, to paste something into cmd you can right-click on windows10. Press enter. It should pull the folder.
6: To update, do the cmd trick in the reForge folder, then git pull
.
After installing:
Optional: Edit webui-user.bat, change the the respective line to:
set COMMANDLINE_ARGS= --cuda-malloc --pin-shared-memory --cuda-stream
Adds some performance improvements. If your webUI starts crashing consider removing them.
Run webui-user.bat. You may see other executables, but you may ignore these.
This will download several dependencies and will take quite a while. Afterwards it will create the venv folder, which is the virtual environment in which the UI runs. In the end it should open a window in your browser.
How to switch your current A1111/Forge install to reForge
Note: I highly recommend just doing a fresh install of Forge to a new folder instead of switching branches. Switching creates more trouble than it's worth. Do a raw install via git to a new folder. If you just need to switch your main to dev_upstream branch that's fine to do.
In the case of errors
Just google them. Plenty of people run into the exact same issues as you.
Some common issues:
Cannot find python
The webUI is picky with the correct version of python you need, which is still Python 3.10.6. If it cannot find python then you likely forgot to click 'add to path' when installing.
Alternatively go to the start menu, type "app execution aliases", open, and remove all entries mentioning python. Then try running the webui-user.bat again.
Some common ghetto fixes:
Deleting the venv folder:
Let's it regenerate the next time you run the WebUI. Occasionally this can fix a mismatch in dependencies, especially if you are using an older install.
Updating packages:
You can manually update packages inside your venv, if the webUI prompts you for something related to it. You can google how to do this and you usually find a direct how-to if you google the specific error in the log.
Running SDXL (NoobAI) (Recommended!)
1: Download ZoinksNoob.safetensors and place it into models/stable-diffusion.
2: Download sdxl_vae.safetensors and place into models/vae.
The main model is your SDXL checkpoint. A VAE is required for a model to work properly, if you don't have one your images will look garbled.
With a minimal prompt your generation will look something like this, make sure to use 1024x1024 base resolution:
For further NoobAI prompt guidance and some template prompts you can look at my NoobAI guide
For PonyXL based models you can look at my older guide: PDXL spoonfeed guide
Noob is the current meta and new, Pony has broad support.
Additional settings:
1: Go to settings, search for quick
, this brings up the quick settings that will put things to the top of the screen.
sd_model_checkpoint, sd_vae, CLIP_stop_at_last_layers, tac_tagFile
Is what I use. The tagFile is available once you install the Autocomplete extension, which you should, you can find that in the useful extensions section.
2: Go to settings, search for emphasis
, set to 'no norm'. This fixes a bug with emphasis on SDXL.
Running SD1.5 (Easyfluff/Indigofurrymix) (For weaker hardware)
SD1.5 is an older SD version with less GPU requirements.
Download EasyFluff v10-PreRelease or IndigoFurryMix SE02-vpred and place in models/stable diffusion.
IMPORTANT: you also need the corresponding .yaml config file in the same folder, which is models/Stable-Diffusion. For indigofurrymix you can find the config directly below the main download button. For EasyFluff you find it under the same huggingface link.
Download: vae-ft-mse-840000-ema-pruned and place in models/vae.
Once in the webUI, choose your model, choose the VAE, set clip skip to 1.
Next scroll down to the many tabs below your prompt window until you see "RescaleCFG for reForge", enable it. 0.7 is good for Easyfluff, 0.5 is good for Indigo.
EasyFluff prompting:
EasyFluff has a known tendency to crank up yellow colors. (sepia:1.2) in negatives helps, however with a simple prompt like this one, it obviously tints the image pretty hard. If you encounter a tendency towards yellow tones, try adding sepia or warm colors into the negatives.
Easyfluff Embeddings:
Easyfluff users like to chuck in the furtastic negative embeds to help the model with some of it's shortcomings. If you ever see 'ubbp' 'bwu' 'dfc' 'updn' in a prompt then that's these Download. These are dropped into the embeddings folder, in the root of your reForge install. You add them by putting them into the negative prompt, seperated by commas like usual.
Image by anon
Optional things: Read once you have genned a few images properly
Useful Extensions
Extensions can be found in the extensions tab of the WebUI, you can search for them, install them, and restart the WebUI directly from here.
If an extension is not found here, you can still install it via a Github link in the "Install from URL" tab.
Vital:
WebUi Autocomplete Github Link
Let's you enter tags in the prompt box and it will auto complete tags for you based on the boorus.
Once installed, navigate to Settings/Tag Autocomplete and select your tag list.
You can find an updated taglist for NoobAI, here, place in extensions/a1111-sd-webui-tagcomplete/tags.
Protip: Navigate to Settings/User Interface/User Interface and "tac_tagfile" to your set in your "Quicksettings list". This allows you to quickly change on the fly, useful if you also gen with other booru tags frequently.
Infinite Image Browser Github Link
Does not seem to be as known as some other extensions, but this is one of my personal favorites. This offers you a scrollable UI tab for all your generations inside of the WebUI, with functionality to directly send them to txt2img, inpainting, etc. In general it greatly accelerates the speed at which I find stuff and copy/paste prompt portions.
You can also add folders to it for organization. I have a dedicated folder for prompt presets and templates, and one for prompts of other people.
Recommended:
Forge Couple Github Link
Forge's solution to regional prompting. I recommend using the 'advanced' selection option and making a custom separator, in my case I named it ~sep~. Structure your prompt as usual, for a scene with 2 characters you would use 2 separators to divide your prompt into 3 regions, background/main scene, char1, char2.
Afterwards go to the Forge Couple tab and manually drag and resize the boxes for the regions. Regional prompting is finnicky, styling will look different. You may need to fiddle with the weights (the w value) a little.
Contrary to popular belief, you do not need Regional Prompting to do duo scenes. What regional prompting primarily helps with, is avoiding prompt bleedover between two original characters. SD has no way of differentiating which character is which, by excluding zones, you can ensure the prompt gets associated with the right character.
Another common mistake is mistaking how Regional Prompting effectively functions in the current meta models. You do not want to prompt your 2 character regions as if they are now solo regions, eg: Do not prompt (solo, female, anthro, squirrel). The primary composition gets determined by the first region, the character regions should only contain prompts that are specifically not meant for the other character.
The alternative to regional prompting is to simply inpaint the specific characters. This is my preferred solution these days, especially because characters are rarely divided cleanly. If a humans feet fall into the region of the anthro, they tend to get pawpads, stuff like that.
Wildcards Github Link
The wildcard extension allows you to randomly select prompts through a list of tokens. For example it lets you prompt __location__ and selects one that you have specified in a locations.txt file. This is popular for all kinds of things, species, location, artist styles.
Once installed you can open your extension folder in extensions/stable-diffusion-webui-wildcards/wildcards and create one here. Wildcards are called by double underscoring and calling the txt filename. Each line is one selection, you may also use multiple prompts in one line. You may also use the same wildcard twice in a prompt. Below is a simple example of the txt formatting.
locationst.txt
Prompt: __location__
Lora Block Weights Github Link
Let's you schedule loras. More useful than it sounds, if you have a character lora that has a heavy influence on style, you could for example disable it after half steps. Swapping loras on and off adds a bit of generation time, but its worth it in some cases.
The syntax isn't very clear on the site, most of what you would do looks like this.
<lora:LucarioSDXL:1:stop=15>
<lora:PussyUncensor:1:start=15>
As per taste:
Lobe UI Github Link
A subjectively nice UI. I personally don't use it but I can imagine some might like this. Sleek black, kinda similar to SD.Next and similar frontends. A bit too much pointless clicking on tiny buttons and unnecessary bloat for me. Also the generate button isn't orange anymore. Unusable. In all honesty, probably not bad.
I heard this extension is especially useful if you use the WebUI on mobile. Supposedly a much much bigger upgrade there.
Photopea Integration Github Link
Embeds the Photopea image editor into the webUI. It's well integrated and primarily useful for inpaint adjustments. Has functions for sending over masks, generally is faster than using Photoshop or GIMP. Personally I'm just used to GIMP and I find the "UI inside of the UI" a little clunky.
Lora
To keep it short, a Lora is something you can add on top of your given model to teach it new concepts or a specific style. Basically target training. Examples include, a lora on a character, an artist style, a concept like 'pants on head'. You may also encounter terms like 'LyCORIS' or 'DoRA', which are functionally used in the same way. These models are dropped into models/lora, even for the other types mentioned.
Some Lora are trained on a specific checkpoint, for example a loras might be trained specifically on PonyXL, or be more generic and run under SDXL. If something is trained on SDXL it will generally perform decently in Pony as well, but something trained in Pony will often not work well in another model like SeaArt.
Where to download Loras:
Civit: You can filter for your given model in the top right. There's a dedicated filter for Illustrious(NoobAi) or Pony based models these days. Keep in mind that 'most downloaded' does not imply best or well trained.
Trashcollects: The /sgd/ kitchen sink rentry for various character and artist loras. Ctrl-F is your friend.
PonyNotes: Another mostly western and anime focused resource that has a ton of PonyXL loras.
Versions:
SD1.5 Lora will not work with SDXL and vice versa. Your WebUI will usually only list loras in the lora tab based on what type of model you have selected, but they are not always tagged properly so pay attention.
Embeddings
Embeddings or Textual Inversions are a smaller scale solution to target training compared to a lora. Whereas a lora changes the 'layers' of the onion that is your mode, an embedding instead allows your model to navigate the layer better. These are popular to some degree depending on the model, but much less vital in the SDXL era.
Embeddings, just like lora, are designed to work on a specific model. Don't use SD1.5 embeddings with SDXL either.
SD1.5 Embeddings:
FurtasticNegativeEmbeds. These are popular for EasyFluff specifically. Easyfluff struggles a bit with some anatomy and these embeds are a collection of undesirables for your negative prompt, like multiple extra bodyparts.
I generally recommend these.
SDXL Embeddings:
NONE! SDXL embeddings are the ultimate placebo. These are only popular with people who put (epic:1.2) in their positive, and whose negative is a 10 page essay. A lot of them actively make your model dumber as well.
The only crowd for these is usually the average civit.ai user, who needs to use crutches since they have no access to easy upscaling or inpainting.
Upscalers
AI upscaling in Stable Diffusion is a multi step process. By default, Stable Diffusion models are trained on a specific resolution, if you try to generate on much higher resolutions you end up with artifacts and 'hallucinations', such as extra limbs or characters, due to it wanting to fill that extra space.
Here's a convenient workflow for upscaling image.
First of all, download "4x_NMKD-Siax_200k" here
Download the model, then place it in models/ESRGAN.
If ESRGAN does not exist, create the folder. Afterwards restart the WebUI so it shows up.
Next set these values in hires.fix. Do not actually toggle hires.fix on.
The general strategy for genning/upscaling goes something like this:
1: Generate images at regular resolutions and aspect ratios according to the model (SDXL aspect ratios).
2: When you find a good image you would like to upscale, you press the ✨button under your generated image, it will then upscale your image according to the currently selected hires.fix settings. So make sure to do those beforehand (you don't actually toggle hires.fix on for this).
Hires.fix is a 3 step process usually. First it will generate your image, then it will run a simple upscaler, in this case Siax, then it will run img2img using your selected AI model. We don't want hires.fix always on, since it adds a significant amount of generation time. By just upscaling images you like with the ✨button, you can filter through good candidates much faster.
The old method is to instead toggle hires.fix when you find a candidate and recycle ♻ the seed. With the ✨button we skip the re-generation of the image however, which is faster.
Note: Don't accidentally choose the default 'Latent' as your upscaler. It sucks!