A Stable Diffusion Tutorial by Liunkaya
Basically Stable DifFURsion...
Okay, that pun got me a timeout. But hello! π Yeah, this is me. And welcome to my comprehensive tutorial on how to set up Stable Diffusion locally on your computer. I'll refer to it as SD from here on. Rawr!
If you already happen to know what SD is, good. If you don't, this tutorial may not be where you wanted to go and you might be better off to just visit my Twitter instead. *ahhem* So, let's get going!
First things first. What are we trying to do here?
You'll get:
- A Stable Diffusion instance to generate furry art. Woohooo! (or any image for that matter)
- Everything will be running entirely on your computer, independent of any online services
- Everything will be free since we'll be using open source and publicly available projects
If you're familiar with Stable Diffusion and only interested in my experiences, you might want to jump to:
- The model I'm using: Yiffymix
- My frequently used tags and examples
- My negative prompt
- My recommended extensions
π©Disclaimer on AI art
Before we get started, I would like to lose a few words on AI art from my point of view. There is a lot of back and forth discussion out there about whether it's to be considered bad or not.
My hot take is generally this:
The ability to generate AI art empowers people to visualize their ideas and dreams in a very direct and independent way, like sketching out something that just came to your mind. While this allows absolute art-idiots like me to show my random ideas to the world, I will also continue commissioning artists for regular pictures like I ever did.
This doesn't need to become a this or that war.
Computer-generated pictures (you may call it art or not) are just another tool in the set of tools offered. Just another tech in the list of things that we can do now. It will not inherently "steal" clients from artists or kill their existence. If you're an artist and you feel threatened by AI art, which is absolutely fine because first and foremost it's your own feelings, you might want to ask yourself about the why though. What sets you apart? Is it the fact you're a wonderful person to talk to, brainstorm with over characters and poses and backgrounds? In my eyes, as with lots of other things, it is the positive experience that makes clients come back and buy art from a real person.
Bork bork. Now back to the tutorial!
π¨ Setting up Stable Diffusion
There are more tutorials about how to install SD out there under the sun than I can count.
So I will keep it short, while still describing the process for everybody who may not be that familiar with stuff. Feel free to check other tutorials on how to generally set up SD and have it running, if you run into problems.
"But I'll be generating lots of adorable dragon girl pictures after that, right?!"
Yeah yeah! Stick with me and you'll get them in no time. π
I will assume a couple things here, because these are the environment conditions I am familiar with:
- You are running a Windows machine like any normal person
- You got a half-ass recent NVIDIA graphics card not older than 10th gen
- You got some spare disk space > 20 GiB, because BOY, THAT BABY GETS FAT QUICK
While AMD Radeon cards are sort-of supported (afaik), I never tried setting it up and your mileage may vary. It should be possible though!
You can also apparently run SD without a graphics card entirely using the --use-cpu all
flag. I did not try this yet myself though and it will make the generation process take significantly longer.
π» Software you need to install
Running Stable Diffusion requires a few things to be installed on your computer before we can actually run the SD program:
- Python 3.10 - Scroll down to "Files" and pick the "Windows installer (64-bit)"
- Git - This is the tool that allows you to download and update the SD wrapper application
- TortoiseGit (optional) - Download this if you're afraid of black commands prompts. It's not strictly needed though.
When installing Python, don't forget to tick the "Add to Path" option or it might not be picked up by the SD application properly!
Now, go and decide on a place where you want to put all the good stuff. I mean locally on your drive. You got a folder? Okay, then go there and we'll now "clone" the project that is managing everything. It's called "webui" and was made by AUTOMATIC1111. This application has developed into the de-facto standard to run SD. I may sometimes refer to it as the "wrapper" application as well.
- If you did install TortoiseGit, all you need to do is right click anywhere in the folder and select TortoiseGit > Clone... from the menu.
This should open a window. Enter the URLhttps://github.com/AUTOMATIC1111/stable-diffusion-webui.git
and hit OK. - If you did not install TortoiseGit, you will need to open a command prompt at the given folder and "clone" the webui application from here. The easiest way to do that is to Shift+Right click, select Open command window here (or the Terminal if you're using Windows 11).
Now entergit clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
and hit return. Boom, easy as well, wasn't it? π
Either way, cloning the application (called a repository) should be pretty fast, since it only provides the framework to run SD. We haven't yet downloaded anything that SD needs to run. The management application we just downloaded will do all that on its own later.
β¬οΈ Getting updates
Just a quick note on updating Stable Diffusion βοΈπ
The way we just downloaded the webui allows us to "pull" any updates directly from the Github repository. You only need to either use the TortoiseGit > Pull... option (if you installed it) or enter git pull
in a command prompt pointing to within the stable-diffusion-webui folder. This should fetch you any webui application updates and should - obviously - be done without SD currently running.
βοΈ Global settings (before we start)
You might want to change some settings before we start SD for the first time.
Open the "stable-diffusion-webui" folder we just created and find the file "webui-user.bat".
Depending on your Windows settings you might not see file extensions and the file may just be called "webui-user".
Open this file with a regular text editor, for example Notepad++. It should look like this:
Depending on your personal preferences and hardware environment, you might now want to add some settings ("arguments") to the line 6 saying
set COMMANDLINE_ARGS=
I'll give you some short recommendations here, basically all the commands that I use. You can see the full list of available commands here.
Command line argument | Description |
---|---|
--theme "dark" |
Makes the interface use a dark theme instead of white. Use it if you prefer dark modes. I prefer it since it provides more clarity. |
--medvram |
"Enable Stable Diffusion model optimizations for sacrificing a some performance for low VRAM usage." - In other words, it makes SD run reliably on slightly aged graphics cards like my 1080 Ti. |
--autolaunch |
Starting SD will open the web interface in your browser automatically (the actual SD program will happily run in the background like a good kobold). |
--xformers |
Makes SD use "xformers", which is another word for π¦ fancy unicorn magic, speeding up generation or making it use less video memory or performing some other borderline Balrog-summoning sorcery. Note that when using xformers, generated images are not perfectly reproducible with the same settings. |
--no-half-vae |
Makes SD use even more π¦ magic sparkling unicorn glitter. This option made SD 2.0 models work for me, I believe. You might add this one later at some point in case SD doesn't work or only produces black output images. |
--reinstall-xformers |
Use this to force the application to reinstall xformers. This is sometimes needed after pulling an update when the startup procedure throws weird errors. Put it in, start once to force the reinstall, then remove the command line argument again. |
All the settings you want to use need to be placed after the equal sign, for example:
set COMMANDLINE_ARGS=--medvram --xformers --autolaunch --theme "dark" --no-half-vae
πΎ Downloading a model
To generate images, you will need what is refered to as a model. That's basically just a pretty big-ass file that contains all the trained AI data about pictures and internally is made up of needlessly complicated unicorn farts.
Model files commonly come in two types, either a ckpt
file or safetensors
. I won't go into details what is what but when in doubt, just use the safetensors
because it is generally safer to use and should not be able to contain any malicious code. They'll produce the same output for our purposes.
You can get them from lots of different places. The "official" base models are hosted on "huggingface" and you can download them if you want:
Those are generally good all-purpose models for humans, real-life objects (like a banana!), realistic photos, that kind of stuff.
For our purpose thought, those models do not contain the sweet furry juice we're looking for. That means somebody had to use any model as a "base" (for example one of those) and further "train" that model on a set of pictures, for example pictures from our beloved Monosodium glutamate website e621. π
We don't really care how that training works here. Just know that there are a metric buttload of models available out there for all intents and purposes. Models for furry art, models specifically trained on pixel art, for fantasy landscapes, animes, you name it. Character-specific models are often called Dreambooth and are very good at producing exactly that character.
Anyway! Without further ado - my currently recommended model for furry stuff is:
*drumroll*
Yiffymix
Yeah yeah, I know. But I swear, it's the best furry model I've come across so far!
To download it, you will need to create an account on CivitAI, but let me tell you, it's worth it. Not just for this model but for all the good stuff they have over there! As of time of writing, I recomment using the v33 version. I have tried 3 and above and honestly prefered the results of v33. You may download more than one though and just experiment with it. ProTip: Just punch it into the X/Y/Z plot with your prompt of choice!
Download the file and put it into a subfolder at stable-diffusion-webui\models\stable-diffusion\yiffymix_3.safetensors
. You might have to create that last folder by yourself when you set things up for the first time. This is where you want to place all Stable Diffusion models (or checkpoints to be precise).
Now you're all prepared and we can get started making some unicorn images! Or just a dragon. Or whatever, really.
Go make a big tiddy unicorn riding a dragon while throwing water balloons at a werewolf for all I care. I don't judge!
π Starting Stable Diffusion
Okay, let's do this! The first startup will take some time though.
You want to run the webui-user.bat
file I talked about earlier. This opens a console window and the webui should start downloading all the stuff it needs to run. These are mostly Python dependencies like torch and whatnot.
Note that those will be pretty big (a couple gigaboobs), so now it's a good time to grab a coffee and sit back!
Are we there yet? Okay, cool! π
If everything went fine, you should see no error messages or exploding kittens and your computer should still not be on fire.
In case you added the --autolaunch
option earlier, your browser should launch automatically and open up the webui website that you'll use to control SD. If you didn't, just click on the link in the console window (if you're using the new Windows Terminal). There should be a line saying something like Running on local URL: http://127.0.0.1:7860
- or just copy that into your browser.
You should then see something like this. Don't worry about the bit of extra stuff I got at the top and bottom - we'll be covering that later.
π Prompts
You see that big empty box at the top that says "Prompt"? Yeah, that's where you throw in all your dreams and desires. In goes some words (called the prompt) and, woopedy-weepedy, a bit of GPU suffering later you got yourself some magic AI pictures. Boobs (probably) - if you're being honest. π
Okay now, go ahead and enter any character or scene you want to see. You can also just use the example prompt below. I will try to explain how to write good prompts in a bit.
beautiful cute anthro (female red dragoness) bride with short brown hair in a wedding dress
Click the big orange "Generate" button on the right and watch it render a picture! This shouldn't take longer than a couple seconds. Yay! It should look very roughly similar to the example below, depending on the random "seed" SD picked for you. A seed is basically a number that makes random things reproducible when using the same number (like the seed of a Minecraft world).
Note that surrounding parts of the prompt with parentheses, like (female red dragoness)
, will increase the emphasis on those keyword. You can also decrease emphasis using brackets, like [wedding dress]
if the effect of that part is maybe overwhelming your image.
To do that quickly, you can just highlight part of the text in the box and press Ctrl + Up/Down keys to increase or decrease the emphasis. This will additionally display the amount as a decimal, fort example like (brown hair:1.2)
. Parentheses or brackets without a specific decimal will correspond to a 1.1 or 0.9 multiplier. Note that the way SD treats these differs from NovelAI, which uses {}
for increased and []
for decreased emphasis.
"But my image looks uglyyyyyyyyyy and nothing like the stuff I saw online and and and... HELP!" π
I hear you! This is where we all started. So far we only entered a very basic prompt with no other changes. We'll get there, big tiddy dragon promise!
I'm a big supporter of free and available information, accessible to anyone. This is why I wanna share all my dirty little (Stable Diffusion) secrets!
From here on I'll try to summarize my experiences with how to improve your overall imagine quality. It will be categorized into different topics. Feel free to explore them in any order you want.
Before we do this though, you should have a general overview and explantion of the parameters you will be mostly playing with. If you already know how SD works, just skip the next section.
ποΈ Common parameters
You really should learn about what the following settings do to accurately play and optimize your results.
There are plenty of clever tutorials out there, explaining what makes SD tick. For a good overview I recomment you reading this article. It gives some pretty nice example images too!
For convenience I will copy some settings descriptions from the site linked above.
Width & Height
This settings will define the aspect ratio of your images. You can experiment with the width/height as much as you want but remember. Stable Diffusion is optimised for 512Γ512 width & height. If you change this settings the generation time and the memory consumption can highly increase.
Seed
A seed is a specific region in the latent space of the Stable Diffusion Modell. You can think of it as coordinates. Stable Diffusion takes two primary inputs and translates these into a fixed point in its modelβs latent space: A seed integer and a text prompt. The same seed and the same prompt given to the same version of Stable Diffusion will output the same image every time. In other words, the following relationship is fixed:
seed + prompt = image
.
CFG Scale
This term stands for Classifier Free Guidance Scale and is a measure of how close you want the model to stick to your prompt when looking for a related image to show you. A Cfg Scale value of 0 will give you essentially a random image based on the seed, where as a Cfg Scale of 20 (the maximum on SD) will give you the closest match to your prompt that the model can produce. The sweet spot is between 7 and 13.
Steps
Increasing the amount of steps tells Stable Diffusion that it should take more steps to generate your final result which can increase the amount of detail in your image. Higher steps does not always equal a better result. Especially when using a huge amount, like 100-150 or even higher. Just like the CFG scale, this does depend on your prompt, but going too high with the steps can cause artefacts in your final result. It might seem logical to always run at the maximum amount of steps, but this isnβt always a good idea. Often you donβt see that much of a difference when running your steps higher than 70-100, depending on your prompts. And in most cases, for simple images, 50 is plenty ok for most of the samplers.
Samplers
Sampling method are kind of technical, so I wonβt go into what these are actually doing under the hood. Basically sampler is what Stable Diffusion uses to decide how to generate your final result. You can change this setting and try it by yourself, but the differences between samplers are often very small. But again, it depends a bit on your prompt. Please feel free to experiment.
You should definitely check the source article for some good reasons when to use which sampler.
Batch count and Batch size
Batch count will simply generate the number of images you enter and increase the seed value by 1 for every new image. This way you can start generating more than one image at a time and compare the resulting seeds.
Batch size will do pretty much the same, but simultaneously. Note that this will take vastly more VRAM and only speeds up the process if the benefits of not having to ramp up the generation process over and over again is bigger than the drawbacks of the high VRAM usage.
As a rule of thumb, just tweak the Batch count and don't bother with the Batch size.
π Writing good prompts
I found that, to produce high quality pictures, you want to include information about:
- The main subject(s) and their core attributes (like species and color)
- All desired character traits (like hair, clothing, figure, post)
- Art style and lighting
- Background
- Camera perspective
- Any generic tags to help with image quality (like masterpiece, intricate detail, 8k, cinematic lighting, shaded, hi res, ...)
Generally you will end up building your prompt iteratively. You'll add more ideas and details and change things here and there until you're happy with the results. When you can plonk in your prompt, run it with various seeds and regularly get something you like, I like to call that a "solid prompt". It means to me that I didn't just have lots of luck with a good seed, but rather the prompt is detailed enough to accurately specify what I want. This is normally the kind of state you want to get to.
To keep my prompt input from becoming a chaotic mess of mixed up keywords, I ended up grouping these informations into separate lines. I don't know if this is really a good way but it definitely worked for me. I prefixed every group with a small label (having a weight of 0). So far doing this label trick didn't seem to negatively influence my results.
So, most times, my current prompts look like this:
(furry art, uploaded on e621:1.4),
(character:0.0),
[...],(traits:0.0),
[...],(background:0.0),
[...],(perspective:0.0),
[...],(tags:0.0),
[...]
SD does pay more attention to the keywords at the beginning of the prompt. The more a keyword is located to the end of the whole prompt, the less weight it will receive overall.
What I usually end up putting into those groups you can see in the next chapter.
Just a last thought on artist names at this point: You might want to enter any well-known furry artist names to guide the overall style into that general direction. While I do include this tip here, I want to make it very clear that I do not endorse trying to copy a specific art style of an artist. This is dangerous territory and there is a fine line between AI image acceptance and the legitimate fear of real artists that their work may get copied and recycled. When in doubt, stick to the rule of thumb "Don't be an ass."
π€« Dirty little secrets
Okay, time to really get down to business.
There's a good chance you might be here to learn about the NSFW side of things. In this case, read on. I'll teach you the tips and tricks to get snuggly.
Do I still need to add a disclaimer here about how I usually go with the more curvy ladies? Yeah, probably not. π Is just what I find visually attractive!
Yeah, yeah, I know! The keywords make it look like I'm the worlds most perverted dragoness. But here's the thing: Without making it clear, SD will never know what you really want. This part will be embarassing to a degree, so please be a little sensitive about the keywords I use, okay?
Below is a use-it-wisely dump of some of my hand-picked images with just their entire positive prompt. Extract from them all the information you like. You will see a lot of similar keywords. That is just what I settled on as a default for these kind of images. Ands it seems to do its work pretty good.
This site sadly doesn't allow me to show the original prompt in the table precisely because all line breaks get eaten by the formatting.
Note that the majority of these examples were done using DPM++ 2M Karras
on a CFG Scale of 7 and using my default negative prompt. I won't be including the seed to each picture, because I consider all of these prompts to be pretty "solid".
Image | Prompt |
---|---|
(furry art, uploaded on e621:1.4), (style by harnny:2.0), (character:0.0), 1girl, solo_focus, solo female cute sexy (red dragon dragoness:1.6), (red_fur red_body:1.2), (traits:0.0), (long brown hair:1.3), brown_hair, (beautiful long flowing yellow dress:1.5), (big hyper breasts:0.5), straight white dragon horns, white_horn 2_horns dragon_horn, (cleavage:1.2), big huge breasts, huge ass, wide hips, curvy, cute face, red big thick_tail tail, open eyes, happy smile, ear_fins, red_ear_fins, (open mouth, open_mouth open_smile:0.8), blue eyes, blue_eyes, fluffy_tail, long_tail, tail_tuft, (thick_tail:1.5), (background:0.0), (tropic pool background with palms and water, sunlight:1.2), (perspective:0.0), portrait, low-angle view, three-quarter view, looking at viewer, looking at you, (tags:0.0), nsfw, detailed, explicit content, milf, amazing detail, cinematic lighting, detailed background, depth of field, half body shadow, [backlighting], [detailed ambient light], [ambient light on the belly], sharp focus, (questionable content), (shaded), (hi res), ((masterpiece)), highly detailed, intricate detail, rule of thirds |
|
(furry art, uploaded on e621:1.4), (character:0.0), 1girl, solo_focus, solo female cute sexy (blue king cobra:1.5),( blue_fur blue_body:1.2), (character traits and clothing:0.0), (tight wet white long dress dress with lace and ornaments:1.5), (hard red puffy nipples:0.2), (big hyper breasts:1.5), white horns, cleavage, big huge breasts and huge ass and thick thighs, curvy, wide hips, cute face, one single big fluffy thick_tail tail, milf, open eyes, (action pose:1.2), jumping, snake_hood, king_cobra, (background:0.0), (standing outside in a thunderstorn rain clouds:1.5), lightning, very dramatic, (very happy smilie look:1.2), (perspective:0.0), portrait, low-angle view, three-quarter view, looking at viewer, looking at you, (tags:0.0), nsfw, detailed, explicit content, amazing detail, cinematic lighting, detailed background, depth of field, half body shadow, [backlighting], [detailed ambient light], [ambient light on the belly], sharp focus, (questionable content), (shaded), (hi res), ((masterpiece)), highly detailed, intricate detail, rule of thirds, (art style:0.0), (by drakonst, by harnny, by MLeonheartFA, by taranima:1.3) |
|
(furry art, uploaded on e621:1.4), (character:0.0), 1girl, solo_focus, solo female sexy (yellow anthro dragon dragoness:1.5), (character traits and clothing:0.0), (tight red long cheongsam chinese dress:1.5), (long brown hair:1.2), (hard red puffy nipples:0.5), (big breasts:1.2), white dragon horns, cleavage, big huge breasts and huge ass and thick thighs, curvy, wide hips, cute face, one single big fluffy thick_tail tail, milf, smiling smile, open eyes, action pose, jumping, (background:0.0), (standing outside in a beautiful (blue:1.2) bright flower field:1.5), (perspective:0.0), portrait, low-angle view, three-quarter view, looking at viewer, looking at you, (tags:0.0), nsfw, adult, mature, detailed, explicit content, amazing detail, cinematic lighting, detailed background, depth of field, half body shadow, [backlighting], [detailed ambient light], [ambient light on the belly], sharp focus, (questionable content), (shaded), (hi res), ((masterpiece)), highly detailed, intricate detail, rule of thirds, (art style:0.0), (by drakonst, by harnny, by MLeonheartFA, by taranima:1.5) |
|
masterpiece, best quality, furry art, uploaded on e621, 1girl, solo_focus, nsfw, a sexy female anthro {{dragon dragoness}} barkeeper with {red scales}, in tight {{french maid dress}} with {{big hyper breasts}}, in {{cyberpunk ramen noodle bar}} behind counter, offering a cup bowl of steaming ramen, futuristic clothing, scifi cyberpunk gadgets, long brown hair, string tanktop, dragon horns, smiling, happy, breast focus, cleavage, curvy, very big plump breasts, cute face, mature, milf, open eyes, seductive, thick dragon tail, leather collar, high angle shot, close up shot, 85mm, highly detailed, intricate detail, street level view, amazing detail, colorful ambient light, volumetric lighting, BioShock, 8k, backlit, rim lighting, starbucks, hooters, in the style of vader-san, colorful |
|
(furry art, uploaded on e621:1.4), 1girl, solo_focus, nsfw, detailed, a female sexy (hyper:1.7) (anthro yellow scales dragon dragoness furry girl:1.5) in tight beautiful dress laying in grass field at night under (candlelight candles), nipples, cleavage (big huge breasts) and huge ass and thick_thighs, curvy, wide hips, big breasts, cute face, long hair, dragon horns, tail, mature, milf, looking at you, amazing detail, smiling smile, at night, moon light, close up shot, highly detailed, intricate detail, 85mm, open eyes, orange lighting, (yellow fur) |
|
(furry art, uploaded on e621:1.4), by nuzzo, by harnny, (character:0.0), fucking in a tropic outdoor wooden lake surrounded by plants, (wet body:1.3), showering, sex with a sexy (female red fox vixen:1.5) orange_fur orange_body, (cowgirl_position:1.4), vaginal vaginal_penetration, anthro_on_anthro, penetration, knot_fucking, huge big canide penis, cum_in_pussy, cum_inside, female_penetrated, deep_penetration, orgasm, (leaking cum from pussy:1.2), hands tied with rope, (traits:0.0), (long gold blonde hair:1.3), blonde_hair, (cleavage:1.2), (big huge gigantic hyper breasts:1.4), huge ass, wide hips, curvy, thick thighs, cute face, open eyes, happy smile, (ahegao:1.2), blue eyes, blue_eyes, fluffy_tail, fox tail, pussy, hard nipples, (lactating:1.2), pasties, (big puffy nipples:1.2), big balls big_balls, huge_balls, (knot:1.3), vaginal_knotting, (big_knot:1.2), cock_ring, (perspective:0.0), low-angle view, breast focus, sunlight, (tags:0.0), nsfw, detailed, explicit content, milf, amazing detail, detailed background, depth of field, [backlighting], [detailed ambient light], sharp focus, (questionable content), (shaded), (hi res), ((masterpiece)), highly detailed, intricate detail, rule of thirds, highly detailed, small details, smooth, illustration, realism, octane render, 8k, unreal engine 5, artstation, Artgerm |
|
(with minor Photoshop edits) | (character:0.0), a (sexy female anthro red dragon dragoness in tight black bikini:1.4), (hiking outdoors next to a waterfall leaning against a rock:1.4), coming out of the water, in a forest surrounded by trees, (traits:0.0), (red fur red body:1.2), (full heavy plump big giant gigantic huge hyper natural breasts:1.4), (long brown hair:1.3), (very thick big (fluffy:1.2) brown tail:1.1), (reflective shiny glossy oiled up body:1.1), (two polished smooth straight white horns on head:1.1), (open mouth:0.8), detailed scale texture on body, (white chin:1.1), white fin frill ear fins, joy joyful happy smile, (curious open blue big anime eyes:1.2), long pretty eyelashes, (wide hips:0.8), curvy body, (hourglass figure:0.9), (thick thighs:0.8), milf, digitigrade legs, (beautiful face:1.1), curious look, short cute dragon snout, cream colored belly, very detailed and beautiful snout and face, (wet body:1.2), water droplets, wet hair, (nipple outlines visible under clothing:1.2), (perspective:0.0), full body portrait, wide angle shot, sunset, golden hour, godrays, hyperrealistic water and waterfall, looking at viewer, (visual tags:0.0), 1girl, (furry art, uploaded on e621:1.5), ((masterpiece)), (shaded), (hi res), amazing detail, cinematic lighting, solo focus, highly detailed, detailed background, small details, wide dynamic range, hdr, intricate detail, pinup, realistic body proportions, realistic arms, cinematography, lightroom, absurd res, ultra realistic, dynamic composition, artbook, professional lighting, film lighting, diffraction, surface based rendering, Caustics, Ray-tracing, volume-marching, Global Illumination, Subsurface Scattering, Iridescence, skin diffraction, glistening, glistening body, bump mapping, realism, octane render, unreal engine 5, artstation, Artgerm, bokeh, rimlight, depth of field, backlighting, detailed ambient light, sharp focus, rule of thirds, high dynamic range, colorful saturation, (complementary colors:1.5), |
π€¬ Use a negative prompt
By far the easiest way to improve your output quality is to just punch in a number of negative prompts. Those act as a counterweight to your (positive) prompts. Basically put in all the things you _don't want to see.
Often times you will have a list of generic "bad boys" and in case there is something really bothering you in the picture, you can add that to it.
Below is my entire negative prompt that I just leave in "all the time". Try it π
(boring_e621_v4:1.1), (easynegative:1.1), (worst quality:1.4), (low quality:1.4), morbid, poorly drawn hands, long neck, bad anatomy, (fused lips and teeth:1.2), duplicate, missing hand, blurry, ugly, normal quality, ugly, abnormal fingers, worst quality, ((anime)), acnes, poorly drawn hands, disconnected limbs, floating limbs, jpeg artifacts, ugly, face cropped, boring, poorly drawn feet, extra fingers, low quality, worst quality, bad proportions, missing legs, mutated hands, extra arms, cut off, disappearing arms, cut off, jpeg artifacts, ((grayscale)), extra arms, paintings, text, normal quality, dull, out of frame, extra limbs, low quality, missing fingers, artist name, extra legs, missing arms, text, low quality, mutated, abnormal eye proportion, mutated, deformed, lowres, extra legs, tiling, (ugly eyes), extra digit, extra limbs, missing fingers, fused fingers, disappearing calf, disfigured, bad anatomy, (deformed iris), close up, ((cartoon)), ((monochrome)), cross eyed, deformed, extra limbs, mutation, missing legs, (render:1.2), cropped, mutated hands, skin spots, deformed, malformed hands, blurry, bad hands, abnormal feet, close up, amputee, out of frame, deformed breasts, bad anatomy, poor lighting, bad anatomy, bad anatomy, blurry, duplicate, fused fingers, poorly drawn face, lowres, ugly, censored, fused hand, extra ears, blur, malformed limbs, poorly drawn hands, missing arms, poorly, bad hands, fused fingers, watermark, disfigured, fusion, pointy, mutated, misshapen, dehydrated, skin blemishes, crosseyed, deformed, disfigured, missing limb, extra pussy, ((sketches)), watermark, bad art, dehydrated, (sketch:1.2), cloned face, fewer digits, malformed limbs, signature, extra head, huge calf, blurred, mutation, missing limbs, poorly drawn face, Watermark, mutilated, error, (low quality), poorly_drawn_face, blur, grainy, drawn, fewer fingers, cropped, cloned face, long neck, poorly drawn face, draft, censored, normal quality, worst quality, Abnormal hands, too many fingers, close up, disappearing legs, cropped, bad proportions, warped, mutation, signature, (un-detailed skin:1.2), missing fingers, abnormal legs, extra fingers, deformed hands, disappearing thigh, username, ((3d)), (cartoon:1.2), poorly drawn hands, out of view, poorly drawn face, bad hands, face out of frame, (worst quality:2), unnatural colors, morbid, unclear, extra limb, body out of frame, (deformed pupils), ugly, disconnected head, too many fingers, mutilated, gross proportions, obscure, out of frame, asymmetrical body, age spot, glans, imperfections, deformed, gross proportions, Text, mutated hands and fingers, extra fingers, disfigured
Note that the tags boring_e621
and easynegative
are not referring to a normal word, but trigger what's called a "textual inversion". It's basically a list of further "unwanted characteristics" (or wanted ones) that I grabbed from here. Just put the file in stable-diffusion-webui\embeddings\boring_e621.pt
and boom. That's it. Now, whenever you use the trigger keyword in the negative prompts, it'll work its magic. The same goes for easynegative, which you can get from here.
Using a negative prompt can really make a big difference. Let's look at our trusty beach example with and without negative prompts:
π Fine-tuning using X/Y/Z plots
When you came up with what I call a "solid prompt" (one that reliably generates the overall picture idea you are looking for), don't hestitate to fine-tune your generation settings. Often times you might even get a better, "similar" pictures for the same seed when just using a few parameters.
Luckily there's an easy way you can explore those by generating an image grid, each image representing one specific setting you want to control.
Say, for example, you found a nice looking seed and want to see variations of the CFG scale, maybe increments between 5 and 10. Or you want to try how your settings work on a couple of different samplers. Or how your character looks like with a red, green or yellow dress. That's where the "X/Y/Z plot" script comes in.
To do that, go down to the "scripts" section at the bottom and select "X/Y/Z plot". Instead of just generating one picture, this will create a grid of pictures, each one labelled with its corresponding setting. You can put any parameter on any axis and enter the values you want to try for this parameter next to it.
Some final thoughts:
- Parameters with a fixed list of possible options will show a little π button next to the value input, allowing you to quickly pick options.
- Note that the Z axis is just a list of X/Y plots that get repeated next to each another.
- You can use -1 as a seed value and it will use the same seed consistently for all other axes.
- I recommend experimenting with plots varying CFG scale, sampler, steps and CLIP skip.
As an example, for a plot configuration like this you might get an output grid like the one below:
𧩠Use LoRA model for specific features
I'm still working on this part. Come back later!
π Creating larger images
I'm still working on this part. Come back later!
- By AI upscaling (postprocess step)
- By using "Hires fix" (during generation)
βΉοΈ See image parameters with PNG Info
When generating images, Stable Diffusion will add metadata about most generation parameters into the image file.
You can use the tab "PNG Info" to read those parameters from any image you put into it, and (assuming it still has that metadata embedded) the webui will show what the prompt was for that image, what seed was used, etc.
With the buttons on the bottom you can even "Send", i.e. use those as your current SD input parameters. Keep in mind though, if you transfer the parameters of an image somebody else generated, you might not have the corresponding checkpoint model or LoRA files available.
Metadata is often stripped from image files when uploading them to social media to make sure no location or sensitive personal data gets accidentally shared with others.
𧩠Recommended extensions
Quality of life stuff:
Extension | Description |
---|---|
Infinite Image Browsing | A pretty sexy viewer for all your generated images. I like it. |
Tagcomplete | Allows you to use autocompletion for known e621 (and booru) keywords/tags when entering your prompt. |
Stable Diffusion related extensions:
Extension | Description |
---|---|
Regional Prompter | Okay, so this is AMAZING! Allows you to use different prompts for different image regions. Those regions can then be specified by parts of the prompt! Basically allows you to specify a red umbrella. |
Cutoff | VERY recommended. Improves the coherence of subjects and their attributes like "blue hair, red shirt, green eyes". I'm not sure yet whether to prefer this extension or the regional prompted. Both can shine under certain situations. |
ControlNet | Recommended if you want to use an input image to guide your image composition or use a specific character pose. Also allows you to color sketched out black/white images (with a bit of luck). |
π °οΈ Model glossary
I included this because all those terms confused the heck out of me for a very long time. Source is here.
Term | Description |
---|---|
Checkpoint models | These are the real Stable Diffusion models. They contain all you need to generate an image. They are large, typically 2 β 7 GB. |
LoRA models | They are small patch files to checkpoint models for modifying styles. They are typically 10 β 200 MB. |
Textual inversions (embeddings) | They are small files defining new keywords to generate new objects or styles. They are small, typically 10 β 100 KB. |
Hypernetworks | They are additional network modules added to checkpoint models. They are typically 5 β 300 MB. |
π‘ Random tips & tricks
- Use a single keyword prompt to check if SD knows what it "means"
- Add your frequently changed settings to the quick access bar at the top (Settings > User interface > Quicksettings list)
- Try CLIP skip 1 with Yiffymix to have much more species-specific traits, use 2 for a wider interpretation
- Find useful tags from real drawings with "Interrogate DeepBooru" in the img2img tab. (Don't even bother with CLIP, really)
- Use this massive list of examples fo find artist names to guide your image style: https://rentry.org/artists_sd-v1-4
β Things I don't plan to cover
- Using the img2img mode to generate variations
- Inpainting or outpainting images
- Training your own checkpoint or LoRA
- Depth estimation
- Interrogate for prompt keywords
- Deforum
You made it all the way to the end. Have a cookie! πͺ