Inpainting
Inpainting is selectively doing Img2Img. No really. I feel like a lot of people don't realize this. All the notes from Img2Img still apply. Inpainting adds the 'padding' to the list of variables. This padding is a margin added to the mask you draw, that determines how large the 'context' of the generated area will be. Say, I draw a mask that's about 100px big around a foot, and add 0 padding. Now the AI only sees said foot, gets a prompt that talks about dork cats, and now attempts to fill this 100px space with the generation capability of a 1024 resolution image. The result is it will likely place the same character from the full generation into this 100px space, albeit highly detailed. I usually call this a pseudolimb, it can be an entire char, it can be a partial limb.
We can adjust the prompt to make the AI realize we are talking about a foot. If we remove all references to anything not present in this space we will probably get a foot, but it will not know which way to go! It still only knows about this 100px space. Does it know where the leg is coming from? Probably not. If we up the padding higher, say 128px, then it will enlargen this context space. This is often more than sufficient to not get garbled generations during inpainting, together with an appropriately high/low denoise
The trade-of to a higher padding is fidelity. Since we are no longer dedicating our 1024px generation power to a 100px large space, we don't get as much detailing. However in a lot of cases this is negligible. In fact depending on the resolutions you are working with, it's very viable to always leave the padding on the maximum of 256. I only work at around 2048x2048 resolutions and it is more than sufficient here.
The reason why I don't often change the prompt during inpainting is largely related to convenience. A high padding is powerful in getting the AI to understand what it's supposed to generate, regardless of what the rest of the prompt says. For areas that require more detailing, I go lower.
To illustrate some of the many applications I will just showcase a series of adjustments to an image.
Feel free to download the image and gen side by side or something, I removed the loras for this one so it's plug and play. You can also run AutismMix for this, doubt it makes a huge difference.
Unless otherwise denoted the prompt is always this.
Prompt:
Model: PonyXL
score_9, score_8_up, score_7_up, score_6_up, source_furry, realistic,
detailed background, atmospheric lighting,
medieval, fantasy, tower, forest, city, standing, raised arms, upper body, celebration,
female, anthro, kemono, [siamese], round glasses, librarian, summer, skirt, love, blush, cleavage cutout, bob cut, white hair, medium breasts, happy, smiling, 4 toes, eyes closed, <3, open mouth, fangs, tail, clothes, [white|green] clothing,
Negative:
blurry, low res, text,
multi anus,
score_3,score_2,score_1,
Dork Cat with no loras... dios mio... la creatura de base ponyXL...
General stuff
Inpaint Bugs:
Drawing masks in Inpainting is oddly buggy, at least for me on Firefox. Occasionally the image will flicker, some of the controls don't work well, sometimes only a small rectangle shows. Usually this can be solved by pressing the X and re-sending the image. Sometimes it's switching tabs. Sometimes it's my browser window size! Who knows! Had the same issue on the regular WebUI too.
First of all, drop the image in PNG Info -> Send to Inpainting.
- Resize Mode: Leave default.
- Mask Blur: Like the feathering option in image editors. This smooths out your mask edges. 4 Is fine. Could go a bit higher on higher res.
- Mask mode: Self Explanatory.
- Masked Content: IMPORTANT! This determines what the mask actually does. "Fill" makes it fill the masked area with colors from the image, this is good if you need to generate something different in the image. Original takes what's already there as it's base, this is good for refining details. I haven't messed with the other 2 much.
- Inpaint Area: IMPORTANT: Whole picture will make it do an actual regular Img2Img pass on the image. If you are inpainting on a Hires.Fix or upscaled image this is bad, since you always want to match your models resolution. Most if not all of Inpaintings strengths lie in the second one, only masked. This makes you only selectively regenerate a region, we ALWAYS use this.
- Padding: IMPORTANT: Often greatly overlooked. What this denotes is how much the AI is taking the surrounding area into account. If we mask a face and leave it at 32, it will dedicate the majority of it's 1024x1024 resolution to genning the face. However often we need to give the AI more "Context", and having this too low is a primary cause of pseudolimbs or badly oriented bodyparts. In the case of the face it might start making the character look the wrong way because the object they are looking at is offscreen. You can easily tell how large this area is when looking at the preview while generating. This is often a value that needs to be changed case by case. For detailing you want it as low as possible, for removing or changing major parts of the image, maxing it out is very good. Even for detailing I keep it fairly high.
- Soft Inpainting: I believe this is a forge feature? It blends inpainted content more closely to the original image which is neat, adds a bunch of generation time. Haven't used it enough to make a strong statement. Keep off for now.
- The rest you should know. ALWAYS keep the resolution at your Models base res. When doing multiple inpaint passes and iterating inpaint passes the resolution resets. For ease of use the dork cat above is already SDXL base res.
- Sampler: IMPORTANT: DDIM is a sampler specifically good for inpainting and you should give it a try. It's context sensitive and tends to blend itself much better towards surrounding areas. If DDIM doesn't work well for you, then use the same sampler you used for your base generation to avoid mismatches.
- Make sure to change the seed to random.
Iterations.
Often you get slightly better candidates to what you started with during inpainting. An example would be you trying to fix a hand with 6 fingers, one inpaint gen gets you 5, but they still look kinda wonky or washed out. In this case you can send your gen straight back into inpaint using a button below the finished image. Importantly this resets some values in the UI such as the resolution. You always want to gen at your models res, for SDXL that's 1024x1024 and for SD1.5 its usually between 512 and 768.
Inpainting to remove/regenerate
Gacha
As with most inpainting tasks, it will rarely work on the first try. Be prepared to gen a couple of times until you see proper results. Remember that most of this is about understanding how padding and denoise work. Some of it is bound to be trial and error.
Easy Fixes: Remove smaller mistakes.
Let's try an easy one first.
Mission: Remove the heart.
Use the mask tool to draw a circle over the heart. Doesn't have to be perfect.
Change to these settings:
It's that easy.
Why no prompt change?
We could remove the "<3" from the prompt, for something more major we probably need to adjust the prompt. Here it's simple enough that you don't have to bother. There is a "proper" way and there's a "convenient" way. This ones the latter.
Why Fill?
We use fill to get some random colors into the masked area to remove the composition, making it more likely for the AI to regenerate it with something more coherent in relation to the rest. If we use original it is more likely to use the heart in its composition.
Why that size Padding?
Could be anything for a simple region like this. Since the sky has basically no detailing we can easily max out. If we generate at too low padding in conjunction with too high denoise we might get Pseudolimbs.
Why that denoise?
Together with fill we need a sufficiently high denoise to actually blend a sky over the fill color. Fill takes random colors of the image, it doesn't ensure it's gonna be blue.
Major fixes: Regenerate larger regions
Now for a harder one:
Mission: Remove the garbled background chars.
I suspect that we need to adjust the prompt this time. Why? The masked padding area would not catch most of the dork, making the AI believe it needs to place the prompt instructions into the masked area. If you try it with the same settings as the last example, that's exactly what you would get too, mostly just "Pseudolimbs", in this case more cats. I chose to draw a generously sized masked over what I presume are marketstands as well, as I already suspect it would have trouble blending it in properly, and I do not care about them existing.
Simply remove the prompt from "standing" to the end. You may also remove "forest", as that area shouldn't have one. Generally speaking this is how you adjust your prompts to the masked area. Issues only tend to arise from this if part of the prompt is somehow related to the masked area, but the dork and the background are fairly well separated here.
My generations didn't deviate enough, so I cranked the denoise a bit higher to 0.75. I did this after the 3rd attempt or so.
This was the 5th gen. It doesn't make a whole lot of sense but for tutorials sake this is good enough.
Why original now? Aren't we still attempting to get something "new" here?
Yes, but I want to keep the general layout intact. We have a higher likelihood of keeping the floor for example.
Why the higher denoise?
Tried a bunch on 0.7, some came out muddy or looking "half genned". That's usually when you need to steer higher. Rarely do you ever go beyond 0.8 or under 0.2 as base rules.
Finnicky stuff: Fixing hands
Let's fix her hand.
If you changed your prompt from the last example then set it back to the full one.
Draw a mask over her left hand, doesn't have to be perfect. Since the hand is surrounded by sky we can freely use that area, often I need to be more strict when focus regions overlap, such as the face.
Now we have two paths:
1: We don't care about detailing too much, we simply want to fix the hand to not look ass.
2: We care enough about the detailing that we turn our padding down and let the AI really focus on the hand.
Luckily for us there's a couple of arguments that make the first 1 much more more enticing. First, the hand occupies a fairly small region of the image, meaning that even with the padding it would receive more than enough attention. Second, it would be finnicky given how far away the hand is from the rest of the image.
This is a classic example of a trial and error run where you need to find the correct padding to denoise ratio. Hands do need a certain degree to look anatomically correct, but if they are disconnected from the rest they lack the context(padding) to really detail them.
With the mask drawn over the hand:
After a few tries I got this.
Still looks like ass, but it's better than before. One issue that's particular about this image is the lack of Lora in Pony XL in addition to the general orientation of the hand, requiring a high padding so the AI knows which way it goes. What I'm getting at is that this is bound to be a bit of a gacha, and needs multiple tries. For this I also attempted a detailing pass, with a low padding and low denoise to add detail, but it didn't significantly improve the hand. It's something worth trying when you have to progressively improve an area through iterations.
Summary:
- Inpainting is largely about finding the correct ratio between padding and denoise. High padding adds more context, making the AI smarter, but sacrifices detail. Realistically speaking this loss of detail is negligible for major regions such as the background and fixing mistakes, due to the resolutions we are actually generating.
- Use "Only Masked"
- Use the base resolution of your model.
- Occasionally you want to do multiple passes, sending the best candidate back into inpainting and going from there, often at a lower denoise to "refine" the area further.
- "Fill" can help if you need to significantly alter an area and basically let the AI create something from scratch. This works well with rough background areas that you don't care about, as well as areas with a highly guided prompt to progressively add something specific. In all other cases "Original" makes more sense.
- A lot of this is "feels" based, unfortunately.
- Adjust your prompt if the area you are generating in does not get enough "Context". Is the context reliant on the surrounding area? Up the padding. Is the area you are generating in very disconnected from the overall prompt? Adjust the prompt. Watch the generation preview to judge: "Is the AI capable of 'getting' what it needs to generate here?".
- For background adjustments that do not need high fidelity you can make your life much much easier by just maxing out the padding. In fact unless your resolution is very high, like 3x-4x the model resolution, then the potential gain from a lower padding by hyperfocusing on one area is likely not even noticable.
Inpainting to refine
Since inpainting effectively regenerates an area and dedicates it's entire thinking space to that masked area, it can also add details. Depending on your upscaling and resolution this can be very very significant, and sometimes very very tedious. Let me showcase one inpaint task that's super consistent, super easy, and very impactful.
Refining faces.
Faces often don't get the care they deserve. Since they occupy a smaller region of the image in relation to most of the body they tend to lack details in key areas. If you might be thinking, hey, isn't this was Adetailer does? Yes. This is exactly the same. I just don't like having it run while genning and like to finetune the results. You can also extend this to everything not just faces. Need to see each individual pore on someones asshole? Inpainting got you covered.
Once again your mask can safely overshoot a little, often it makes sense too, since you are giving the AI a bit more wiggle room.
5 total attempts at various Denoise values. First gen with 0.5 denoise was this, and it felt good enough for me.
Most of the finnicky stuff I usually encounter here is related to finding a good denoise value. Set it too low, no changes, set it too high, too many. Since the direction of the face is very clearcut, neither the padding needs to be high nor the prompt needs to change too much. Once again the teeth are a decent indicator of quality. The left fang is a bit misplaced, but again, for tutorials sake I'm gonna let it pass.
There's a ton of areas where this is incredibly useful. Faces, eyes, mouth, nipples, penis, pussy etc. Cookies
Something you might want to try as an exercise is change her eyes from closed to open.
Small convenience trick for adjusting your prompt.
One thing I've been doing that seemingly works well, is wrap the entire prompt in a (:0.9) weight bracket and then prepend the detailing you want at a high strength into the first block. This ensures the detailing is in the first token block and also ensures that the rest is still somewhat there, albeit at lower weights. Remember, since we want to work at lower paddings we kinda have to nudge the AI to create the correct thing in the area.
Prompt:
(puffy nipples, nipple outline:1.3),
(score_9, score_8_up, score_7_up, score_6_up, source_furry, realistic,
detailed background, atmospheric lighting,
medieval, fantasy, tower, forest, city, standing, raised arms, upper body, celebration,
female, anthro, kemono, [siamese], round glasses, librarian, summer, skirt, love, blush, cleavage cutout, bob cut, white hair, medium breasts, happy, smiling, 4 toes, <3, open mouth, fangs, tail, clothes, [white|green] clothing:0.9)
Little Mask over the breasts. You can double dip by just masking the full chest area, to get more general detail at the same time.
A side by side of the final result. As PNG for once. I want to reiterate this doesn't look crazy good, but I think it showcases some general usecases for inpainting. Detail passes like these are more visible on things that actually go into detail, and this is a fairly flat image.
PDXL without loras lmao
Base | Inpainted |
---|---|
Summary
- When inpainting to add details, a lower padding dedicates more generation space to the area, allowing for more intricate details.
- Since low paddings make it progressively harder for the AI to know what it's generating, you may need to nudge it more by adjusting the prompt.
- For detailing passes, very high denoise values are often unnecessary.
- 0.3-0.5 is a safe denoise value for adding details without corrupting the image with pseudolimbs.
- Iterate! Get a decent gen? Send it back into inpaint, keep the mask, gen again. Maybe lower the denoise slightly in conjunction with a slightly bigger mask to 'smooth out' the area.
General stuff
What I do.
Might expand this.
- Start with a prompt template.
- Add prompt data progressively, hit generate.
- While it's generating I look if I wanna add/subtract something, if yes I hit 'Skip' to cancel the gen.
- Progressively do this until I get what I want,
- Generate 3-5 candidates.
2 Choices, depending on how much I like it.
- If It's good, I hit Hirez, 4x_NMKD-Siax_200k on 0.3 denoise at either 1.5 or 1.7 res, If it's great, I do an UltimateSDUpscale at 2-3x res, 0.2-0.3 denoise instead.
- Inpaint anatomy errors away, noise, shit that makes no sense, basically fix it.
- Give the face an inpaint pass for detailing. I basically always do this.
- Detailing of the focus points of the image. This is often the limbs and tits/pussy/penis.
- Post.
- ???
- Get (you)'s.
Other options:
- Instead of running Hires/USD-Upscale, instead run a regular upscale from the extras tab, then inpaint the details to the most important areas. The downside is you might end up with strong fidelity mismatches where some areas are way more detailed than others. A combination of Hires -> Raw upscale might work here.