/vtai/ FAQ

go back to main rentry Main | Cookbook | Proompts | Archive | LoRAs

FAQ - General

This is cool, how do I get started?

Do you have an nVIDIA GPU?

Yes

NAI Leak Speedrun

Note

You likely won't ever use the leaked NovelAI model, but if you plan on merging HLL with other stuff, you'll still want animefull-final-pruned.ckpt anyways

Local Install

Nvidia: https://rentry.org/voldy | https://github.com/AbdBarho/stable-diffusion-webui-docker

No

Are you retarded?
Yes

Colab for Complete Retards

No
Are you OK with using Linux?
Yes

AMD: Native: https://rentry.org/sd-nativeisekaitoo | Docker: https://rentry.org/sdamd

Requires extra setup for Polaris based GPUs

Polaris = RX480, RX580, and derivative cards, so RX460, RX550, Radeon Pro WX2100-5100, etc.

No

Onnx: https://rentry.org/ayymd-stable-diffustion-v1_4-guide

This won't actually let you use "webui"

It's an alternative and requires converting a model

Running webui directly on AMD using DirectML

BSODs await ye

You will get different output, and you'll need to use several launch flags to even get it to work

Do you want to try running it directly on your CPU?

https://rentry.org/cputard | https://rentry.org/webui-cpu

You'll get different output

wHat mODel is bEsT?

Everyone will give you a different answer. Look at catbox links that anons post or ask them what model they're using. Generally speaking, /vtai/ is merging some version of hll with another merged model, at the simplest level, or merging multiple. Nobody's trained an entire checkpoint of their own beyond hll anon training hll itself.

Short, non-conclusive answer, merges with:

  • AOM2
    • Slightly more realistic gens
  • 7th anime
    • Stylized
  • Counterfeit
    • "watercolor" lookin

Look through the cookbook and look at what images generated by the base models used in merges look like. Except now they can draw chubas.

wHiCh upScalEr iS bEst?

They're different, one isn't any better than the other. I wouldn't be surprised if half of the images you see on /vtai/ are using 4x-AnimeSharp or another ESRGAN upscaler, which can be downloaded here. They go in your stable-diffusion-webui/models/ESRGAN folder.

Here is an old comparison I made, with sliders, that allows you to compare the results of different upscalers. It's flawed, since all of the images were made with denoising set to 0.7, which is far from ideal - it's moreso a comparison of what SD will do with the different upscaler results.

Most of the ESRGAN upscalers like 4x-AnimeSharp, lollypop, etc. all work best at around ~0.4-0.5 denoising strength. Go below that and you start to see more of the upscaler and less of what hi-res fix is doing. Above 0.5 you see even more of what hi-res fix did and less of what the upscaler did. At 1.0 - you'll have a different image entirely, and at 0 - you'll have (close to) the upscaler's raw output.

If you set denoising strength to 0 and Hi-Res Fix steps to 1, you can get an idea of what the "raw" output from your selected upscaler looks like!

VAE

SET. YOUR. VAE.
No VAE It does more than just "add color" - it's the component that actually "makes the image" you see at the end.

Why do I still get an image even if I don't set one?

I think every model still has "one" in it, it's just not "complete". Some models have one of the VAEs below "baked in" and will make the same images whether you have your VAE set to Automatic or None - you will however get a different image if you set a different VAE manually. See below for results with NAI SFW, NAI SFW with the NAI VAE "baked in", NAI SFW with the NAI VAE baked in and all pruned to fp16, and a model merge with no VAE baked in.
xyz-grid-0001-1335167568-kapuvania-cropped.png

Stable Diffusion is a text-to-image model that uses a variational autoencoder (VAE) to encode and decode images to and from a smaller latent space. The VAE helps the model to capture the semantic meaning of the image and render fine details such as eyes and faces. In the final steps of image generation, the VAE decoder takes the latent vector from the diffusion process and reconstructs the image in pixel space. The VAE decoder can be fine-tuned with additional data to improve the quality of the image.

GrugTL powered by GPT-4

  • Latent space like secret code. Latent space where small picture live.
  • Secret code have numbers that mean something.
  • Grug not see small picture. Small picture [0.12, -0.34, 0.56, -0.78] look like cloudy letter O.
  • Grug see big picture. VAE make big picture. Decoder change small picture to big picture with pixels. Pixels small squares that make picture for grug.
  • This big picture in pixel space:
  • 1
    2
    3
    4
    5
    6
    7
    [
    [0, 0, 0, 0, 0],
    [0, 1, 1, 1, 0],
    [0, 1, 0, 1, 0],
    [0, 1, 1, 1, 0],
    [0, 0, 0, 0, 0]
    ]
    
  • This picture have five by five pixels. Picture look like letter O.

Set Your VAE

Download one. Put it here.
/stable-diffusion-webui/models/VAE Set it.
SET YOUR VAE

wHaT VaE iS beSt?

There are no new VAEs, and to my knowledge, nobody has trained any other than NovelAI, Stable Diffusion, and Waifu Diffusion . The answer does not change, and hasn't changed. Any model that "comes with a VAE" comes with one of these.
bite my shiny metal ass

Yes, I'm aware I prompt like a moron
Yes, I'm aware webui had an oopsie and didn't store the xyz grid settings for some reason

1
2
3
best quality, ((roboco-san), hololive, orange eyes, gradient eyes, round eyewear), (center opening:1.6), (plunging neckline, downblouse, short hair, gradient hair, bodysuit, humanoid robot), 1girl, solo, mechanical arms, mechanical legs, ((latex, ((fine fabric emphasis)))), (innerboob, breasts apart, unzipped:1.4), (narrow waist, from side, tight:1.2), (underbutt, (ass)), seductive smile, :D, ultra-detailed, illustration, official art, cel shading, emphasis lines, bodypaint, large breasts, full-length zipper, arms up, dutch angle, presenting armpit, leather, ((patterned clothing), textured bodysuit), see-through leotard, cyberpunk, close-up, neon lights, night sky, skyline, wide hips,
Negative prompt: (worst quality, low quality, zipper, zipper pull tab:1.4), (depth of field, cape, blurry, hands, breast strap, holding:1.2), (censored, cropped shirt, crop top, censorship:1.3), (greyscale, monochrome, speech bubble), error, bad anatomy, bad hands, missing digits, paws, cropped, lowres, jpeg artifacts, username, artist name, trademark, letterbox, bad feet, error, missing fingers, extra digit, fewer digits, extra limb, missing limb, mutation, fused digits, claws, cowboy hat, multiple views, bad multiple views, extra arms, extra legs, (fat, fat rolls, flesh, blob), huge breasts, large areolae, cleavage, bikini top only, bra, shirt,
Steps: 24, Sampler: DPM++ 2S a Karras, CFG scale: 32, Seed: 86137747, Size: 512x768, Model hash: a6c54f42f1, Model: Counter_v2_A_Juice_31, Denoising strength: 0.5, Clip skip: 2, ENSD: 31337, Hires upscale: 2, Hires steps: 12, Hires upscaler: 4x-AnimeSharp, Dynamic thresholding enabled: True, Mimic scale: 8, Threshold percentile: 97.9, Mimic mode: Power Up, Mimic scale minimum: 3.5, CFG mode: Half Cosine Up, CFG scale minimum: 3.5, Power scheduler value: 3.5

NovelAI VAE

Used by any number of different models/merges. The full SHA256 checksum of the unpruned NAI VAE "as leaked" is f921fb3f29891d2a77a6571e56b8b5052420d2884129517a333c60b1b4816cdf

  • orangemix.vae.pt
    It's the NAI VAE
  • Anything-V3.0.vae.pt
    It's the NAI VAE
  • lol-top-kek-420-69.vae.pt
    Is it 784 MiB? It's the NAI VAE
Pruned Version

Yes, apparently VAE can be pruned. For some reason, the NAI VAE was ~800 MB. Here's a 335 MB safetensors version. Save yourself some space I guess
Pruned Direct Download, Safetensors

ClearVAE

https://civitai.com/models/22354/clearvae
wiwa

1
2
3
4
5
6
best quality, elira pendora, nijisanji en, large breasts, light blue hair, multicolored hair, white hair, head wings, x hair ornament, hair over one eye, blue eyes, messy hair, (plugsuit, blue bodysuit, test plugsuit), (unzipped, center opening breasts apart:1.5), stomach, navel, groin, groin tendon, lying, on back, seductive smile, arms up, dutch angle, spoken heart, heart-shaped pupils, fine fabric emphasis, 1girl, solo,heterochromia, purple eyes, (wet skin, sagging breasts:0.75), art by akasaai,
Negative prompt: (worst quality, low quality, ribs, underbust, zipper:1.4),
(depth of field, blurry, art by sakimichan), (censored, censorship, sideless outfit:1.2),
(detached sleeves, text, speech bubble, lowres, jpeg artifacts, (extra arms), extra digits, missing limb, fewer digits, (extra digit, extra limb), missing limb, mutation, extra legs, missing finger, bad anatomy, bad hands, bad feet),
error, paws, username, artist name, trademark, letterbox, error, fused digits, claws, multiple views, bad multiple views, fat, fat rolls, blob, nipples, breasts out, areolae, (bad_prompt_version2, EasyNegative:0.8),  
Steps: 30, Sampler: Euler a, CFG scale: 11, Seed: 364117191, Size: 512x768, Model hash: 454c9e8daa, Model: 25d_hll4p1, Clip skip: 2, ENSD: 31337, Script: X/Y/Z plot, X Type: VAE, X Values: "clearvae_main.safetensors, vae-ft-mse-840000-ema-pruned.vae.pt, novelai.vae.safetensors"

From what I can understand, it's a block merge of the NAI VAE and one of the WD 1.4 VAEs, kl-f8-anime2. It's actually kinda nice!
This is the only actual "new" VAE the author is aware of. There are two versions, and in my opinion, the main one is far better. The alternate is almost too saturated, and introduces some weird artifacts that I can't recall seeing with either the SD 1.5 or WD 1.4 VAEs

SD 1.5 VAE

They're more saturated. No, I don't know what the difference is between the two. I use the MSE one because lol 840000 > 560000

vae-ft-mse-840000-ema-pruned.ckpt

https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.ckpt

vae-ft-ema-560000-ema-pruned.ckpt

https://huggingface.co/stabilityai/sd-vae-ft-ema-original/resolve/main/vae-ft-ema-560000-ema-pruned.ckpt

WD 1.4 VAE

They're even more saturated. No, I don't know what the difference is between the two.

kl-f8-anime.ckpt

https://huggingface.co/hakurei/waifu-diffusion-v1-4/resolve/main/vae/kl-f8-anime.ckpt

kl-f8-anime2.ckpt

https://huggingface.co/hakurei/waifu-diffusion-v1-4/resolve/main/vae/kl-f8-anime2.ckpt

  • 4chan removes the prompt information, along with any other metadata, when you post an image.
    • catbox does not

      catbox is down

      Grass is green, sky is blue, water is wet
    • litterbox (temporary, 1 hr - 3 day) does not
    • pixiv does not
    • byte.wtf (temporary, up to 1 month) does not
      • byte.wtf links don't (embed) properly though
    • postimages (temporary or permanent, up to you) does not, provided you choose not to resize your image
    • imgur always removes it

Having the metadata means you can load the image into the PNG Info tab in webui and view the prompt that was used to generate the image.

Can I view metadata for catbox uploads while I'm browsing 4chan, or upload directly to catbox while I'm posting?

Yes. Install this userscript for 4chanx.

If your image was >4MB and webui converted it to a JPG, the userscript won't display the metadata because it can't read EXIF tags out of a JPG, only iTXT chunks from a PNG.

Why do some filenames look like catbox_1a2b3.png?

They were uploaded using said userscript. The rest of us can download the image with the metadata directly when it looks like that, or view it in our browsers.

Is there another way to look at the prompt info?

Yes. You can also use it to do things like "put the metadata back" into an image if you Photoshopped it or something.

WHY CAN'T I MAKE THE SAME EXACT IMAGE AS SO AND SO?

  • Were they using LoRA or TI embeds?
    Get the same ones.
  • Do you have xformers enabled? Did they have xformers enabled?

    They didn't have it enabled

    So disable it.

    They had it enabled (most do)

    Neither of you can ever make the same image again, get over it.

    I don't know

    The world is your oyster and you want to make the same image again?

FAQ - Extensions

How do I 2girls in the same picture? - What is 2Shot? - What is latent couple?

https://github.com/opparco/stable-diffusion-webui-two-shot

This extension is an extension of the built-in Composable Diffusion. This allows you to determine the region of the latent space that reflects your subprompts.

GrugTL powered by GPT-4

  • “Extension make Composable Diffusion better. Extension let you pick part of hidden space for subprompts.”
    • it cut picture into pieces, and let user ask different things for each piece of picture

{{{grugger}}}

  • “Extension make thing better. Extension let you pick part of hidden thing for small asks.”
    • it break picture into small parts, and let user say different things for each part of picture

How do I use this?

You have to prompt for it the way it's "expecting" you to. The bare minimum is three "lines" - yes, you have to hit "Enter", and yes, each line after the first must start with AND

00178-3676936066-best-quality-2girls-chibi-snow-outdoors-snowing-snowflakes-black-hair-e.png

1
2
3
4
5
6
7
best quality, 2girls, ((chibi)), snow, outdoors, snowing, snowflakes, black hair, extremely detailed, extremely intricate, absurdres, incredibly absurdres, illustration, official art, (emphasis lines, cel shading:0.8), 
AND best quality, 2girls, ((chibi)), ookami mio, hololive, orange eyes, (wolf ears), wolf tail, ^ ^, :D, mittens, snow, outdoors, snowing, snowflakes, black hair, extremely detailed, extremely intricate, absurdres, incredibly absurdres, illustration, official art, (emphasis lines, cel shading:0.8), 
AND best quality, 2girls, (((chibi))), kurokami fubuki, hololive, fox ears, fox tail, (((black hair))), red eyes, ^ ^, :D, mittens, snow, outdoors, snowing, snowflakes, black hair, extremely detailed, extremely intricate, absurdres, incredibly absurdres, illustration, official art, (emphasis lines, cel shading:0.8), 
Negative prompt: (worst quality, low quality:1.4), (depth of field, blurry), (censored, censorship), (text, speech bubble, lowres, jpeg artifacts, (extra arms), extra digits, missing limb, fewer digits, (extra digit, extra limb), missing limb, mutation, extra legs, missing finger, bad anatomy, bad hands, bad feet), error, paws, username, artist name, trademark, letterbox, error, fused digits, claws, multiple views, bad multiple views, fat, fat rolls, blob,
AND (worst quality, low quality:1.4), (depth of field, blurry), (censored, censorship), (text, speech bubble, lowres, jpeg artifacts, (extra arms), extra digits, missing limb, fewer digits, (extra digit, extra limb), missing limb, mutation, extra legs, missing finger, bad anatomy, bad hands, bad feet), error, paws, username, artist name, trademark, letterbox, error, fused digits, claws, multiple views, bad multiple views, fat, fat rolls, blob,
AND (worst quality, low quality:1.4), (depth of field, blurry), (censored, censorship), (text, speech bubble, lowres, jpeg artifacts, (extra arms), extra digits, missing limb, fewer digits, (extra digit, extra limb), missing limb, mutation, extra legs, missing finger, bad anatomy, bad hands, bad feet), error, paws, username, artist name, trademark, letterbox, error, fused digits, claws, multiple views, bad multiple views, fat, fat rolls, blob, (((white hair))),
Steps: 40, Sampler: Euler a, CFG scale: 11, Seed: 3676936066, Size: 768x512, Model hash: dcffbff16c, Model: Gekijōban Shinseiki Dai Rantō HLL Mix 1.0 + 3.1 You Can (Not) Proompt II HD Remix &IRYS, Denoising strength: 0.5, Clip skip: 2, ENSD: 31337, Latent Couple: "divisions=1:1,1:2,1:2 positions=0:0,0:0,0:1 weights=0.2,0.8,0.8 end at step=30", Hires upscale: 2, Hires steps: 20, Hires upscaler: 4x-AnimeSharp

I am retarded

I have no idea if AND-ing the negatives works, but I tried it anyways to keep Kurokami from having white hair.

catbox-x5a1h6.png

1
2
3
4
5
(masterpiece, best quality, highres, detailed:1.1), <lora:allVtubersLora_hll31:1>, 2girls, indoors
AND (masterpiece, best quality, highres, detailed:1.1), 2girls, mori calliope, red eyes, pink hair, tiara, (fat:1.3), wide hips, thick thighs, ripped thighhighs, indoors, sweaty
AND (masterpiece, best quality, highres, detailed:1.1), 2girls, (takanashi kiara:1.1), (orange hair:1.1), (green hair:0.9), gradient hair, purple eyes, (fat:1.3), wide hips, thick thighs, ripped thighhighs, orange clothes, indoors, sweaty
Negative prompt: (worst quality, low quality:1.4), blur, blurry, depth of field, motion lines, fat legs, (bad background:1.1), flat color, sketch, 3d, cartoon, toon \(style\), leotard, cartoon
Steps: 28, Sampler: DPM++ 2S a Karras, CFG scale: 7, Seed: 1180464368, Size: 640x512, Model hash: 9c2e7b9e99, Model: anything-v4.5-pruned-fp16-fp16, Denoising strength: 0.49, Latent Couple: "divisions=1:1,1:2,1:2 positions=0:0,0:0,0:1 weights=0.2,0.8,0.8 end at step=24", Hires upscale: 2, Hires upscaler: Latent

00001-3178764140-0-best-quality-2girls-pantyshot-from-below-medium-breasts-upskirt-extend.jpg tmp5r-m6chg.png

1
2
3
4
5
best quality, 2girls, pantyshot, (from below), medium breasts, (((upskirt, extended upskirt))), indoors, light rays, panties from heaven, ((((((fine fabric emphasis)))))), absurdres, incredibly absurdres, ultra-detailed, illustration, official art, (emphasis lines, cel shading:0.8), (ornate clothes, floating clothes), wa maid, maid apron, maid headdress, japanese clothes, (mansion), gradient eyes,  panties,
AND best quality, 2girls, pantyshot, (from below), inui toko, pantyhose, medium breasts, panties, nijisanji, (from below, (upskirt, extended upskirt)), skirt lift, dog ears, heterochromia, yellow eyes, red eyes, black hair, gradient hair, beautiful shampoo commercial hair, hair ornament, wa maid, maid apron, maid headdress, (((grimace, disgust))) japanese clothes, (angry, full-face blush), (((fine fabric emphasis))), wide hips, narrow waist, (mansion), absurdres, incredibly absurdres, ultra-detailed, illustration, official art, 
AND best quality, 2girls, pantyshot, (from below), hoshimachi suisei, smug, panties, small breasts, presenting panties, ((upskirt, extended upskirt)), side ponytail, light blue hair, (((((blue eyes))))), gradient eyes, wa maid, maid apron, maid headdress, japanese clothes, grin, pervert, (((fine fabric emphasis))), action, floating hair, motion lines, you gonna get raped, (mansion), absurdres, incredibly absurdres, ultra-detailed, illustration, official art, 
Negative prompt: (worst quality, low quality:1.4), (depth of field, blurry), (monochrome, censored, censorship, underbust:1.2), (text, speech bubble, lowres, jpeg artifacts, (extra arms), extra digits, missing limb, fewer digits, (extra digit, extra limb), missing limb, mutation, extra legs, missing finger, bad anatomy, bad hands, bad feet), error, paws, username, artist name, trademark, letterbox, error, fused digits, claws, multiple views, bad multiple views, fat, fat rolls, blob, (ribs:1.3), teeth, covered navel, holding, covered nipples, (topless), armpit hair, bottomless, white background, animal ear fluff, ((((loli)))),
Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 30, Seed: 3178764140, Size: 512x768, Model hash: dcffbff16c, Model: Gekijōban Shinseiki Dai Rantō HLL Mix 1.0 + 3.1 You Can (Not) Proompt II HD Remix &IRYS, Denoising strength: 0.5, Clip skip: 2, ENSD: 31337, ControlNet Enabled: True, ControlNet Module: none, ControlNet Model: control_openpose-fp16 [9ca67cc5], ControlNet Weight: 1, ControlNet Guidance Start: 0, ControlNet Guidance End: 1, Latent Couple: "divisions=1:1,1:2,1:2 positions=0:0,0:0,0:1 weights=0.2,0.8,0.8 end at step=20", Hires upscale: 2, Hires steps: 15, Hires upscaler: 4x-AnimeSharp, Dynamic thresholding enabled: True, Mimic scale: 11, Threshold percentile: 100, Mimic mode: Half Cosine Up, Mimic scale minimum: 3, CFG mode: Half Cosine Up, CFG scale minimum: 3

Line 1

Put a 2girls prompt in here. This is the prompt for the image as a whole. Look at the examples above, and you can see that anons kept their "flavor"/quality tags in there, the desired scenery, etc. By default, the other divisions are weighted 4 times as hard, so you don't have to go too hard. The Mori/Kiara example is probably the better example here.

Line 2, onwards

Start with AND and describe each "division" of the image.

iT's NoT woRkiNG!!11

skill issue

In all seriosuness, it can definitely "not work" sometimes. ControlNet was doing a lot of the heavy lifting above in the Toko/Suisei upskirt, and as soon as I turned it off...

00047-3178764143-best-quality-2girls-pantyshot-from-below-medium-breasts-upskirt-extended.png

Plus, some of the best 2shot examples you see are still inpainted afterwards!

4girls

4girls, inpainted https://files.catbox.moe/wscr6u.png

masterpiece, best quality, (4girls), (indoors, scenery, hotel living room, couch, night:1.4),
AND (4girls), masterpiece, best quality, best illustration, (indoors, scenery, hotel living room, couch, night:1.4),
(white shirt, black jeans:1.4),
(:D, closed eyes, open mouth:1.2) (looking away),
(oozora subaru), hololive, black hair, turquoise eyes, short hair,
AND (4girls), masterpiece, best quality, best illustration, (indoors, scenery, hotel living room, couch, night:1.4),
(white coat, black leggings:1.4), 
(grin, closed eyes:1.2), (looking away), holding phone,
(shirakami fubuki:1.2), hololive, green eyes, (white hair), (fox tail), long hair, (fox ears:1.2), ahoge, (small breasts:0.8),
AND (4girls), masterpiece, best quality, best illustration, (indoors, scenery, hotel living room, couch, night:1.4),
(white turtleneck sweater, black leggings:1.4),
smile, (looking away),
(ookami mio:1.2), animal ears, tail, (wolf ears), wolf girl, wolf tail, yellow eyes, (black hair), red hair, streaked hair, (long hair), animal ear fluff, (small breasts:1.4),
AND (4girls), masterpiece, best quality, best illustration, (indoors, scenery, hotel living room, couch, night:1.4),
(red winter coat, black skirt, pantyhose:1.4),
(grin, happy:1.2), (looking away),
(nakiri ayame:1.2), hololive, bangs, black ribbon, gradient hair, hair ribbon, long hair, (ponytail), (white hair), oni, (oni horns:1.2), red eyes, red hair, ribbon, silver hair, small breasts,
<lora:style-yojiShinkawaStyle_offset:0.2> <lora:style-paintedMiniature_v35:0.4> <lora:style-mika_pikazo-v1:0.2>
Negative prompt: (worst quality, low quality:1.4), (depth of field, blurry, bokeh, shallow depth of field:1.5), (blurry background:1.2), (tanlines, white skin, pale skin, multiple bellybuttons, swimming pool), signature, watermark, username, artist name, text, censored, cropped, text, speech bubble, letterboxed, JPEG artifacts, lowres, bad anatomy, (bad hands, error, missing fingers, extra digit, fewer digits:1.4), cropped, worst quality, low quality, normal quality, 3D, black background, (3girls), nude, nsfw, bare legs, (bench), (bare shoulders), (bare arms), (skin-tight clothing:1.3), (cleavage), (looking at viewer:1.4), collared shirt, black gloves,  gloves, apple inc., bottomless, (lips, pink lips), (cityscape, skyline, city, city lights, sun, sun light:1.4), 
Steps: 25, Sampler: DPM++ 2M Karras, CFG scale: 6, Seed: 1851769969, Size: 768x432, Model hash: cd12b7cc22, Denoising strength: 0.6, Clip skip: 2, ENSD: 31337, Mask blur: 4, Latent Couple: "divisions=1:1,1:4,1:4,1:4,1:4 positions=0:0,0:0,0:1,0:2,0:3 weights=0.2,0.8,0.8,0.8,0.8 end at step=30", AddNet Enabled: True, AddNet Module 1: LoRA, AddNet Model 1: thickerLinesAnimeStyle_loraVersion(58c5f51b2b68), AddNet Weight A 1: 0.2, AddNet Weight B 1: 0.2

Line Breaks

It's probably a good time to point out that the line breaks aren't what separate the sections of your latent couple prompt
AND is. You can otherwise use line breaks in your prompts for other reasons.

Before Inpainting

4girls https://files.catbox.moe/npcteh.png ControlNet Source, openpose Pre-Processor/Module
https://files.catbox.moe/r4htcr.jpg

masterpiece, best quality, (4girls), (indoors, scenery, hotel living room, couch, night:1.4),
AND (4girls), masterpiece, best quality, best illustration, (indoors, scenery, hotel living room, couch, night:1.4),
(white shirt, black jeans:1.4),
:D, (looking away),
(oozora subaru), hololive, black hair, turquoise eyes, short hair,
AND (4girls), masterpiece, best quality, best illustration, (indoors, scenery, hotel living room, couch, night:1.4),
(white coat, black leggings:1.4), 
(grin, closed eyes:1.2), (looking away), holding phone,
(shirakami fubuki:1.2), hololive, green eyes, (white hair), (fox tail), long hair, (fox ears:1.2), ahoge, (small breasts:0.8),
AND (4girls), masterpiece, best quality, best illustration, (indoors, scenery, hotel living room, couch, night:1.4),
(white turtleneck sweater, black leggings:1.4),
smile, (looking away),
(ookami mio:1.2), animal ears, tail, (wolf ears), wolf girl, wolf tail, yellow eyes, (black hair), red hair, streaked hair, (long hair), animal ear fluff, (small breasts:1.4),
AND (4girls), masterpiece, best quality, best illustration, (indoors, scenery, hotel living room, couch, night:1.4),
(red winter coat, black skirt, pantyhose:1.4),
(grin, happy:1.2), (looking away),
(nakiri ayame:1.2), hololive, bangs, black ribbon, gradient hair, hair ribbon, long hair, (ponytail), (white hair), oni, (oni horns:1.2), red eyes, red hair, ribbon, silver hair, small breasts,
<lora:style-yojiShinkawaStyle_offset:0.2> <lora:style-paintedMiniature_v35:0.4> <lora:style-mika_pikazo-v1:0.2>
Negative prompt: (worst quality, low quality:1.4), (depth of field, blurry, bokeh, shallow depth of field:1.5), (blurry background:1.2), (tanlines, white skin, pale skin, multiple bellybuttons, swimming pool), signature, watermark, username, artist name, text, censored, cropped, text, speech bubble, letterboxed, JPEG artifacts, lowres, bad anatomy, (bad hands, error, missing fingers, extra digit, fewer digits:1.4), cropped, worst quality, low quality, normal quality, 3D, black background, (3girls), nude, nsfw, bare legs, (bench), (bare shoulders), (bare arms), (skin-tight clothing:1.3), (cleavage), (looking at viewer:1.4), collared shirt, black gloves,  gloves, apple inc., bottomless, (lips, pink lips), (cityscape, skyline, city, city lights:1.4), 
Steps: 25, Sampler: DPM++ 2M Karras, CFG scale: 9, Seed: 447411579, Size: 768x432, Model hash: cd12b7cc22, Denoising strength: 0.45, Clip skip: 2, ENSD: 31337, ControlNet Enabled: True, ControlNet Module: openpose, ControlNet Model: control_sd15_openpose [fef5e48e], ControlNet Weight: 0.85, ControlNet Guidance Start: 0, ControlNet Guidance End: 1, Latent Couple: "divisions=1:1,1:4,1:4,1:4,1:4 positions=0:0,0:0,0:1,0:2,0:3 weights=0.2,0.8,0.8,0.8,0.8 end at step=30", Hires upscale: 2.5, Hires steps: 20, Hires upscaler: 4x-AnimeSharp, AddNet Enabled: True, AddNet Module 1: LoRA, AddNet Model 1: thickerLinesAnimeStyle_loraVersion(58c5f51b2b68), AddNet Weight A 1: 0.2, AddNet Weight B 1: 0.2

Divisions

These are the default settings:

divisions=1:1,1:2,1:2 positions=0:0,0:0,0:1 weights=0.2,0.8,0.8

This splits the image in half, straight down the middle as seen in both examples above.

fuck math

You can use this tool to draw your desired divisions onto a grid and just have it spit the latent couple settings out for you!

I used this very helpful tool made by an anon to play with the division settings and see how the split changes in real-time and to make all the screenshots you see below.

The example in the next section just so happens to split the image differently, so this topic will continue there.

Can you use it with ControlNet?

Yes. Toko/Suisei upskirt above was made using openpose ControlNet and the posex extension.
yes.png

1
2
3
4
5
best quality, (2girls), kitchen, laundry room, washing machine, door, cabinet, vent \(object\), oven hood, suit, black necktie, (blood on clothes), maytag, large breasts, hololive, wallpaper \(object\), best shadow, official art, looking at viewer,
AND best quality, 2girls, kitchen, laundry room, washing machine, door, cabinet, vent \(object\), suit, black necktie, blood on clothes, yukihana lamy, hololive, light blue hair, pointy ears, yellow eyes, smug, drinking, mug, looking down, large breasts, black suit, white shirt, slouching, best shadow, official art, looking at viewer, straight-on, 
AND best quality, 2girls, kitchen, laundry room, washing machine, door, cabinet, vent \(object\), suit, black necktie, (blood on clothes), ((shiranui flare, hololive, pointy ears, red eyes, blonde hair)), afro, curly hair, tan skin, coffee mug, pointing to the side, large breasts, confident, best shadow, official art, looking at viewer, glance,
Negative prompt: fused digits, missing digits, paws, worst quality, low quality, depth of field, blurry, 3D face, photorealistic, cropped, lowres, jpeg artifacts, username, blurry, artist name, trademark, Reference sheet, letterbox, censored, censorship, comic, bad hands, bad feet, bad anatomy, error, missing fingers, extra digit, fewer digits, extra limb, missing limb, poorly drawn hands, mutated limbs, mutated feet, mutated hand, fused digits, claws, cowboy hat, multiple views, bad multiple views, extra arms, extra legs, fat, shelf, 1boy, 2boys, male focus, long fingers, white suit, refrigerator, microwave,
Steps: 35, Sampler: DPM++ 2S a Karras, CFG scale: 12, Seed: 1182590391, Size: 768x328, Model hash: 6f06efaec6, Model: GrapefruitJuice.pruned, Denoising strength: 0.5, Clip skip: 2, ENSD: 31337, ControlNet Enabled: True, ControlNet Module: hed, ControlNet Model: control_hed-fp16 [13fee50b], ControlNet Weight: 1, Latent Couple: "divisions=1:1,1:1.5,1:3 positions=0:0,0:0,0:2 weights=0.3,0.9,0.9 end at step=25", Hires resize: 1280x544, Hires upscaler: lollypop

tmpbb1mv6vz.png mpv-shot0102.png

Hey, this isn't split right down the middle!

I know

I ain't reading all that shit

You can use this tool to draw your desired divisions onto a grid and just have it spit the latent couple settings out for you!
Skip TL;DR

Screenshot-2023-03-30-151831.png In my opinion, the way it handles the divisions and positions makes no sense whatsoever, but here is how I understood it:

1:1 for the first division

(almost) always. This covers the entire picture with your first line 2girls prompt

akshully...

There is nothing stopping you from dividing the image multiple times, and subdividing it multiple times. But I'm not about to try and explain it. You could, for example, use latent couple to split the image up into very specific regions while e.g. upscaling and "make your own" super SD Upscale. In theory at least...

1:1.5 and 1:3 for divisions 2 and 3

Ratio represents y size of this division:x size of this division and both numbers imply they're being compared as ratios to the height and width of the whole image, respectively.

1:1.5 = 1:1 y , 1:1.5 x = 100% of y, 2/3 of x
1:3 = 1:1y, 1:3 x = 100% of y, 1/3 of x

Yes, it's as dumb as it sounds. Had I done e.g. 2:1.5 I'd have only gotten Lamy in the top half.

Screenshot-2023-03-30-152759.png

OK, what about the positions?

It gets dumber

positions=0:0,0:0,0:2

0:0 is easy enough. The origin is the top, and the left. Again, they're y:x, but now they're being compared to the sizes of the divisions themselves and are independent of the other positions. The only thing the position value is compared against is the size of that particular division itself.

0:2 = y origin is still 0, x origin is twice the size of the division itself away from the left
division is 1/3
2x 1/3 is 2/3
so it's on the right, flush

And now Lamy is in the bottom half

Positions: 0:0,1:0,0:2

Because 1: for the second position is telling it place this division 1 entire height of this division away from the top of the image
Screenshot-2023-03-30-153756.png

Here's another stupid, exaggerated example. Yes, I know it overlaps.

Screenshot-2023-03-30-153231.png

Divisions: 1:1,1:1.5,1:4
Positions: 0:0,0:0,0:0.5

The size of the Flare division is now 1/4 of the width of the image because of the :4 - the 1: only affects the height!
Because the position is in here as 0:0.5, that means the image is 1/8th away from the left. 1/8th is half of 1/4th.

Hope you paid attention in math class!

ogey, how 2girls with LoRA?

Keep reading

What is Composable LoRA?

I'm still writing this

  • Composable LoRA allows you to apply LoRA to a specific part of the prompt
  • It also uses AND just like 2shot
  • It also works with 2shot - you can use both at the same time

Here's a really silly example - the first time I tried using it

can I give Watame IRyS's horns and nothing else?

Kinda
kek.jpg You don't need to see any more of my ridiculous prompts, but here's the important part:

big stupid prompt, silly words, center opening my beloved, booba, etc.
AND horns <lora:IRyS-2.51_6epoch:0.5>

The image on the right had that added to it after the long-ass prompt that was also in the image on the left. It applied the IRyS LoRA only to the "horns" in the prompt.

What is ControlNet?

whAt cONtroLneT moDel iS beSt?

use the right tool for the job

There is no "best" ControlNet model. Each one does something different, and is good for something else, and will work better or worse depending on the input image used.

mpv-shot0099-768.png

canny

  • Edge detection. Think of it like tracing
    Canny-tmpmuo73hif.png Canny result
    • It didn't work well here, since the pre-processor couldn't see the outlines clearly. Not enough contrast or something. The output image bears little resemblance to the source.
  • canny works better on 2D art with clear line art.

kajou-ayame-and-blue-snow-megami-magazine-and-1-more-drawn-by-fujii-masahiro-c840322ae113f944a202.jpg Source canny pre-processor output tmp2w97pp1t.png Result OH HO HO HO Result

1
2
3
best quality, houshou marine, hololive, akasaai, (((panties on head))), 1girl, solo, leg lift, kajou ayame, blue snow, (naked cape), action, smug, ofuda, shimoneta to iu gainen ga sonzai shinai taikutsu na sekai, red eyes, ;D, ^ ^, red hair, sidelighting, best shadow, official art, promotional art, megami magazine, slim legs, wide hips, groin tendon, narrow waist, floating hair, dutch angle, underboob, fang, light rays, light particles, soles, barefoot, one eye closed, ((bottomless), ass), outstretched arms, choker, cape, bed sheet, fleeing, explosion, outstretched hand, convenient censoring, (bare hips, bare ass), butt crack, completely nude, see-through silhouette, 
Negative prompt: fused digits, missing digits, paws, (worst quality, low quality), lowres, jpeg artifacts, username, blurry, bokeh, artist name, trademark, Reference sheet, letterbox, censored, censorship, comic, bad hands, bad feet, bad anatomy, error, missing fingers, extra digit, fewer digits, extra limb, (missing limb), poorly drawn hands, mutated limbs, mutated feet, mutated hand, fused digits, claws, cowboy hat, multiple views, bad multiple views, extra arms, extra legs, fat, leather, snow, snowing, winter, winter clothes, ((nipples, pussy)), mosaic censoring, eyewear on head, ribbon, serafuku, sailor collar, ascot, on bed, bedroom, jacket, buttons, epaulettes, jacket on shoulders, paws, claws, broken finger, leggings, bra, bikini, rope, shibari, strap slip, panty straps, thigh strap, garter straps, bridal garter, tanlines, 
Steps: 30, Sampler: DPM++ SDE Karras, CFG scale: 11, Seed: 2806942403, Size: 768x1152, Model hash: d179dccea9, Model: mighty_mix.pruned, Clip skip: 2, ENSD: 31337, ControlNet Enabled: True, ControlNet Module: canny, ControlNet Model: control_canny-fp16 [e3fe7712], ControlNet Weight: 1.05

Yes, ControlNet means you are far more likely to get away with a higher base resolution without getting weird artifacts. I actually got better results with this particular example just generating directly at 768x1152 than I did with e.g. trying to use Hi-Res Fix on a 512x768 base gen.

HED

  • Also edge detection. Less like tracing, more like an edge detection filter you might use in Photoshop.
    HED-tmpkxkdkyfn.png 00017-1657544479-best-quality-kaela-kovalskia-hololive-hololive-indonesia-I-m-iron-man-iron-man.png 00003-1420779218-best-quality-houshou-marine-hololive-I-m-iron-man-iron-man-cosplay-snappin.png
    • This worked really well. The pre-processor could see the outlines we wanted it to. It worked better on Kaela, go figure.

openpose

  • For...you guessed it, copying poses.
  • If you use the openpose pre-processor/module, you should be using it on an image of real people as that is what it was designed for.
  • Otherwise, you can use an extension like posex to manipulate your own poses.
    Open-Pose-tmp9a3l0atp.png 00007-1420779218-best-quality-houshou-marine-hololive-I-m-iron-man-iron-man-cosplay-snappin.png
    • Again, the source material just wasn't right for the openpose pre-processor. It could see his eyes and his ears very clearly, but got thrown way off by his hand - the two lines at the bottom are where it thought his arms were. It's not a "normal" image of people, it's covered in Hollywood SFX. You'd have had better luck manually posing a stick figure using posex.

What is SD Upscale?

What is Dynamic Thresholding?

What is cutoff?

Samplers, Settings, Upscaling, Etc.

TBD

Samplers

  • DPM++ Samplers
    • Less steps
  • Euler, Euler a
    • Anything beyond ~40 steps is unnecessary
    • "Good" results as low as ~28
  • DDIM
    • Go up to 100 steps if you want, it will keep doing things
  • Other
    • 知らない

Settings

Resolution

  • Most models you use will work best at "base" resolutions like 512x768, 512x512
    • Stick with that if you're in doubt
  • Landscape gens = +prompt better
    • Prompt an "impossible pose" at 768x512?
    • hello fleshblobs my old friend
  • Larger base gens, like 768x768 = ++prompt better

Abbreviations, Terms, Etc.

venv

  • [Python] Virtual Environment
  • If you're running webui locally and you need or want to pip install something for webui or for an extension, you really ought to be doing it with your venv active
  • screenshots
  • shift right click context menu can differ
  • activate.bat / activate.ps1

git

  • Go on, git. Go. Leave!
  • Use it to upgrade, downgrade, install webui itself and extensions

    While at a command prompt, in your .../stable-diffusion-webui/ folder

    git pull
    Updates WebUI
    git checkout 1234abc
    "Downgrades" to hypothetical commit 1234abc

    While at a command prompt, in your .../stable-diffusion-webui/extensions/ folder

    git clone https://github.com/someguy/someextension.git
    Installs someguy's someextension for webui

Consider all following explanations overly simplified

TI, embed

  • Textual inversion
  • embedding
  • Small .png or .pt files that can more easily generate something the model you're using was already capable of

I want some

all of /vtai/'s embeds
You can find more on places like civitai or huggingface

LoRA

  • Low-rank Adaptation
  • Think of them like "mini models"
  • Can "teach" whatever model you're using to generate something it couldn't, like:
    • a different character
    • a new concept
    • an artist's style

I want some

Too many places to list...there are some links in the OP

What's an overbaked LoRA?

go to civitai
get LoRA
set strength to 0.3

That's an overbaked LoRA.

Warning

Under Construction lol

anything that is already better answered or explained in the OP or a guide linked there won't be re-treaded here

Edit
Pub: 29 Mar 2023 18:57 UTC
Edit: 04 May 2023 01:58 UTC
Views: 18697