OC creation and character consistency with NAID V3 (and other similarly capable models)
Updates
01.01.2024
-Added Tips, Tricks & Insights
-Improved the structure
16.01.2024
-Thanks to Granitehands in my Discord pointing it out I made a tiny adjustment to call the main thing this guide is about what it is, reference insets
21.06.2024:
-Fixed all the broken image links
23.06.24:
-Added a section about zero shot generations
27.06.24:
-Added a section about vibe transfer (and how it's not useless but also not that great)
09.01.25:
-Edited the vibe transfer section slightly, no big new info
Introduction & Theory
The Sky Is The Limit (With Some Caveats)
This rentry is here as a full-fledged guide to making your own OCs in NAID V3 (and an image editor of your choice, primarily for stitching things together, maybe for a little bit of drawing if you're so inclined), and then make more of them. In a "sky is the limit"-kind of way with very, very few caveats.
(I am using NAID V3+Krita for the whole tutorial and my own creations, and if you're using NAI... Opus is insanely valuable here since there is a ton that can be done with free generations. But this should be doable with other similarly capable models as well, and Krita can also be replaced at your discretion.)
One of those caveats is obviously in terms of what the AI has seen, but even this can be worked around to a fair degree actually. Has the AI ever seen bracketed slit pupils that look like (|)? I severely doubt it, and if, then not more than singular times. And it can still be done.
The other caveat... this requires quite a bit of work and understanding. And theory. But not LoRAs or anything else than that. So let's get that theory out of the way first, and I will try to make it reasonably quick and simple with sufficient depth.
When hearing character consistency, you might typically think of two ways to get there. Either one of the mentioned LoRAs trained on that character, or the base model understanding the character well enough to make it. LoRAs need to be trained which can be a tad of a hurdle if you don't have images of the character to train on, and though NAID V3 is pretty damn good at what it knows, you can't use LoRAs with it as it is. Nor have the devs yet implemented any other fancy idea. As you'll see that is surprisingly fine since we're about to implement such a fancy idea here.
And maybe you are already thinking about a certain blog post by Anlatan: https://blog.novelai.net/tutorial-en-creating-consistent-characters-with-novelai-diffusion-anime-female-538b4b678a4e
Now that is a good read, but let's say your character has demon horns, multiple horns
? Such as these ones in particular:
Yeah... just a simple prompt isn't gonna cut it here. No matter how good your prompt is, there's nothing you can write with 225 tokens, heck probably not even with 2k+ tokens in a newer model that'd truly get this consistent just from text. This isn't even a "skill issue" on any model's part, written language itself isn't sufficient here. How else would you put this exact thing into text short of actually using formulas to describe the precise physical shapes? Of course if your prompt may produce something like this you could pray... but why pray if there is a proper path we can take?
Something Better Than Text
But what will help us is inpainting and the AI's own vision and understanding of what it sees. Think of reference sheets and having the same character on the canvas more than once. The AI can do that:
And what if we inpaint over one of those views of Yae? If you think it'll probably generate another fitting view, you would be right. Especially if we have a visual seed, more on that later. Let's try to get a blue bodysuit. Lots of repetition and brackets:
...Yeah nah. Okay we got blue in but the AI clearly still heavily referenced the other stuff in the image.
This isn't really a curse, it will be the weapon we'll wield in this guide, because now you might already have a good hunch where this is going.
When inpainting you can use some of the visual space on your image to get points across that the prompt won't be adequate for.
If you've got an image that is a decent enough reference of your character, and a scene to paint it into, then you can combine those, load the whole image into the AI, leave the reference untouched and inpaint your character where you please. I would generally as such recommend putting the references you want the AI to use on the image, with borders around them to distinguish them from the surroundings if necessary.
Back to the theory. Let's get a slightly better feeling for how and why inpainting can do so much of the heavy lifting without any LoRA or the likes at all. We'll take that image above and inpaint it again. Except this time we'll give inpainting nothing aside from the other two images. Empty prompt and negative prompt, just this:
And we get this:
See how it's taken both of the other views as references and still managed to draw a sane alternative with a similar style and quality?
Visual seeding:
What we did above with that inpainting mask that's attached to the back of her head is visual seeding. It serves as a "seed" for the rest of the person to "grow" on. This will be a very important technique during any generation of new images from a reference too, so take a moment to understand and memorize this. When I refer to a visual seed in this guide I do not mean the number in your settings.
Vibe Transfer (Or Not)
This guide was made before vibe transfer (VT) even became a thing, and you might be wondering if VT can supercharge your OC endeavors.
Unfortunately I have to pop this bubble here, while it isn't useless per se, it's just not spectacularly useful either. I made lots of organized experiments using ClusterVisionF, my own NAID UI (https://github.com/Xovaryu/ClusterVisionF), and what I have to tell you is that VT digests what it sees according to what it knows.
See, what makes OC generation via reference insets and inpainting so powerful is that it draws on the model's understanding of how multiple depictions of the same character on a single image work. When you employ this technique correctly, the AI, rather than improvising, will for the most part really and properly attempt to use the image you provide to redraw that and exactly THAT character. It uses spatial understanding of what it sees. Say when you see an image of a character with a tail, even if you do not see where that tail attaches, you'll likely have an extremely educated guess, right?
Vibe transfer doesn't do that at all.
Let me put it this way. Extremely oversimplified, what VT does is take an image and digest it into a secondary prompt based on its own understanding. Therein lies the problem. See if you put an image of a popular and well-trained character into VT, then even at information extracted 0.01 the model may well understand that it is that exact character, and generate almost like you put that character into the prompt itself. But when it is a completely foreign character, VT basically breaks this character down into constituent parts and instead affects the prompt like that. So for example Tipal, the OC I use in here as my main example? Putting her through VT basically is like a glorified demon girl
in the prompt, the results are all over the place, and are really, really not close. They are so divergent in fact that OC generation with a Tipal in VT is kind of more difficult to get right, and not less. VT breaks a foreign and non-mainstream character down so much that it rather distracts than guides the AI.
This isn't to say VT is completely useless for OC generation with reference insets and inpainting, but temper your expectations and know when to use it. VT is still usable for all the usual stuff it excels at, like styles. If, and only if you have a character that you can get more or less consistently by prompt alone, then likewise VT should help you set that baseline without distracting the AI. Otherwise it's just not going to work miracles for you.
Crystallizing Your Own OC
Alright, you now have an idea of the basic theory but what if you don't have an image of your character to start with? Let's take a step back and first address that. Some of the techniques here will also be relevant when we actually get to making new stuff of your character so I'd advise not to skip.
What you will first want is a bucket list. What features do you need for your character? What can you compromise on or be further inspired by the AI, and what are must-haves?
With that in mind you'll want to make an initial proper, standing, straight-on
reference. Use a flat color background, such as black or white, or maybe some other color depending on what colors your OC will have and may need to be contrasted with. If your character has pale white skin you probably don't want a white background.
Also you will probably want to generate them in tight-fitting clothes (such as a leotard
, bodysuit
or underwear, or even nude), after all this will be an image to be used as a reference, and you likely won't want the AI to have to guess the body shape. Using that same reference body to put different clothes on that character later will be fairly easy. But having wider clothes on your reference shouldn't be devastating either so long as you get the anatomy right, or right for what you are going for.
Now it's time to start experimenting. Use style or artist tags if you already know of a certain direction you want (and also possibly to help NAID V3 to not muck up the eyes too badly), and then just try to make a prompt that approximates your character. What you ideally want is a prompt that could generate all of the relevant features of your character, so with your list in mind just keep improving your prompt (and if you can't make a simple prompt to generate a particular feature you want, using inpainting it's still quite likely possible though you may have to draw it yourself, at least to the point that the AI gets the idea). If you get any results that look even in part useful, save them. Don't get caught up in this step, you can still iterate and refine later and once we're done your reference image will do much more of the heavy lifting than your prompt. If you want my subjective weighting, then depending on what exact step and thing you are trying to do your visual reference may do as little as 60% of the lifting or as much as all 100%.
Now eventually you will want to actually start to crystallize your OC. What you should be aiming for is to cross things off of your bucket list. And there are two ways to do that. Either you generate all the necessary parts separately and stitch them together, or you take an initial okay-enough starting image, and then start to inpaint new parts of it until you've approximated your character well enough. Naturally if you actually have the chops to draw, this will help as well.
Imagine that this following image has something you want and you want to preserve it. Like only the horns. Then you could inpaint like so:
Now when you generate again you'll get a completely new image, except for the horns. Maybe something with green hair instead?
And a lot about the body changed too. The general body style, now there are two tails, wings... you get it. The eyes are terrible in both cases. None of this should deter you however, all of these issues can be addressed in steps. Especially if you have peculiar eyes, don't get caught up on them just yet. Take what you need and keep iterating.
Overlay original image:
Do not be afraid of seams. Always overlay the original image when doing this unless you know exactly what inpainting otherwise does to the rest of the image and you're explicitly fine with that (such as making seam patches mentioned later in this guide).
Seams can be fixed in a single low strength image2image step or even manually drawn over in an image program like Krita. But if you do not overlay the original image and use images that you have inpainted, the corrosion from the inpainting steps will accumulate, and particularly for fine and unique details even a single inpainting step without overlaying can be devastating, and it will ruin your references when you use them as well.
Imagine this is the reference or any other part of the image you're building:
Then this is what a single step might do to that reference without overlaying:
...Yeah it's not so good.
And if you assemble a new OC in many steps then you might go through many iterations. And the last thing you want is to realize midway that you don't overlay so there's none of these jarring seams, but some of the original details are already broken, because then you would have to manually stitch it together, giving you seams again anyway, but now also a ton of extra work.
Maintaining an inpainting mask:
If you gradually inpaint things in steps you will want to maintain your inpainting mask and simply remove the parts of the mask where you made progress. With an image that is a step forward selected and an inpainting mask still active, the two buttons on the right (edit/inpaint) will keep your mask.
If you import a new image externally such as after editing and stitching you will have to make a new mask, I am not aware of a way of circumventing that at this time. Not that it should be terribly much extra work.
Ideally combine both techniques as you see fit or feels best for you. You should become at least a little comfortable with stitching parts of images together anyway.
Avoiding rescaling trouble:
When you make images for processing with the AI, do try to keep the width and height divisible by 64px or you may get all sorts of little issues. Your images would otherwise be rescaled, and you might not even notice that until you download the result and bring it back into your image editor only to see it doesn't fit perfectly anymore.
(You could also start with a face portrait, and then outpaint your way down in steps by resizing the canvas and painting the mask over the new empty space like this:
While this could give you a higher resolution without using Anlas, you run a much higher risk of the AI bungling up the anatomy and you not noticing while zoomed in, so I do not recommend that. Starting at higher resolutions can give you a higher quality starting point quicker, but at a substantial cost. Standard free generation resolutions should be enough and you can always upscale it later.)
Character Consistency
Alright you've got a proper reference for your character ready? Then you're now almost ready to actually use that character and make new art of them! Now before we go in with anything fancy, think about the scene you want to put your character in, make a prompt for that, and make some raw generations. Keep in mind that we will replace whatever character comes with it, so while it would be prudent to leverage your bucket list of features and the prompts you made, really all you want here is a good perspective and place in a desired style, and ideally some visual seed that already comes from your generation itself, no need for extreme attention to detail just yet.
My character, a young adult and average sized demon/dragon girl with a very weird pupil shape, has pale skin and green hair so I take only those features for now. A bit of pale skin should be enough to get started. And I want her in a festive situation, so a simple prompt with the according features is all we're starting with. I got an image that is good enough even though the hair color was unusable. But a part of the image was usable as the visual seed, and since I already had two references of her ready to go, I went into Krita and then patched it all together. We're making an image with reference insets, one reference on each side, and the profile reference on the right isn't even finished and pruned, the hand is bad and the anatomy is still way off. But it'll be enough anyway as you'll see.
(Using reference inset
in the prompt may or may not help, though I am leaning towards it being unnecessary or unhelpful due to how inpainting works and uses the prompt.)
This is what that first step with the references looks like:
(I won't be including the references for the other steps that still kept using them.)
A bit of the pale thighs, the face and some of the clothes are maintained, the rest inpainted.
And this is where that gets us:
The hair is immediately right, so we're getting somewhere. Let's completely ignore the eyes for now, those happen later. Also I failed a little and left the Christmas toggle on so that nuked away the horns in favor of that head. Already in the next step with that fixed I got almost perfect horns. Now what I should have done is to make more liberal use of the visual seeding, so keep that in mind. The wings wouldn't generate, so I have simply cheaply copied a tip over from the left reference, and inpainted accordingly. So our fourth output is:
See that copypasted wing claw besides her head? It still has the black around it, but it's good enough to get the AI to basically go "ah, so there is the tip of that wing, I should add the rest of it".
(Little did I know at this point I probably should've kept the tail from this one... that was a lot of extra work for probably a worse tail. But it's a learning curve.)
By now the principle and work progress should be clear. Adjust your prompt to the area you still need to fix (still don't be afraid of seams), provide visual seeds when needed (you can and should inpaint those after they served their purpose), and claw your way forward. In this case once I had the complicated features I really needed the references for down, I have excised the 1024² image from the center and have resumed refining and inpainting it for a while on free generations.
Once I was at a decent point I wanted to make another experiment (so it's also looking a little crude). What about those bracketed pupils and eyes? Surely getting the AI to draw that by itself is a really tall order. The other two times I got those eyes I only had the central slit generated properly, and then went in with Krita and added the other two slits myself. Could I get the AI to do even that?
Yes I could:
All it took was being... a little on the nose with the references and using an upscaled and zoomed in version. And those eyes aren't perfect, but they make the cut and prove a point. And so they got cut out to be used later when putting everything together.
What's under the mask:
What's under the mask shouldn't be seen by the AI, and shouldn't matter. And generally speaking it doesn't matter and it's best to work that way. However tests with 100% the same settings and a locked seed and changing stuff under the mask suggests otherwise, and in the case of these eyes what was under the mask seems to have affected the output quite a bit. This goes against what the devs said, so take this with a grain of salt... but I don't exactly have any other way of explaining those experiments.
Now after 5 hours or so of work with a couple of brief interruptions and lots of learning experiments:
Sure there are still things I could improve, like how the claws on the wings shouldn't be curved. But I can also take what I learned and make more stuff, and that is a better use of time, isn't it? It's not that different from any other involved way of creating art, there's a point where it's best to be aware of the flaws in your creation but leave them be and keep practicing and creating.
Zero Shots (OC generation without visual seeding)
This guide goes on a lot about visual seeding and crystalizing your image step by step. Not just is there a lot of preparation still needed, but if you provide the visual seeds, then you will already heavily limit the output of the AI. What if you want to have the AI spit out at least fair and inspiring images of your OC nearly the same way you can generate characters otherwise?
It can be done, but with the current state of the technology and NAID V3... I'm warning you right now, while it's possible, it is absolutely anything and everything aside from easy.
First, have some collages of what I was able to get with zero shots:
Sure. This looks "amazing". Considering that the model has never before seen Tipal, and once the setup was done these were all generated by simply hitting that button, those results are nothing to scoff at. No visual seeding, no LoRA or other training, just a canvas with reference insets, the prompt and settings, inpainting the whole center area, a click and that was it.
This makes it entirely obvious that a better model and architecture should well be capable of high fidelity instant zero shots of your own OCs with merely a visual reference, and frankly, I would be disappointed if V4 comes around without being able to do this and more. I mean, here we are, doing it with V3.
But the process is exceedingly complex, and there are big caveats.
The references cost resolution, and NAID only allows you to have so much, so the outputs are just kinda small and not amazingly detailed. The eyes? Especially considering that V3 struggles with eyes anyway, just completely forget about it. Not happening. Surprisingly the horns work out damn near perfectly, but just not everything does. The model likes to ignore some traits a lot more than others. The wings for example. Improvising her from behind? Yeah... not really. Though you will face the same issues to a lesser degree with visual seeding.
Let's briefly acknowledge the cursed complexity of getting zero shots to pop off at all. I've burned a lot of Anlas on experiments and the prompts that actually come through can look pretty cursed at times.
See, normally your current generative models will be pretty lenient with their requirements. You roughly describe a character in the prompt, and you will get some more or less sane output with a very wide range of settings. Low or high scale, lower steps, most samplers, hit the button and the model should at least try. You might have a missing finger or one too much, but usually you won't have deeply corrupted limb salad.
But when trying to use inpainting without even visual seeding, be ready for results like these that tried but... fell more than a little flat:
Yeah. Failure to colorize, limb salad, the classic evil version from a parallel universe and a strange barely visible style no one asked for.
The issue with the alternate color schemes was particularly interesting and confusing, and for a while that happened consistently. Replacing commas ,
with hearts ♥
helped because... well I'd love to be able to tell you. Hearts were useful in V1 and they were useful here. I took my shots in the dark and some landed.
And those were only all the images that did try. Count all these wonderful Tipals here:
A couple totally wasted zero shots with zero Tipals.
What are good settings then? The real problem here seems to be that they heavily depend on your input and character. Very heavily. The relative volume of sane outputs in the latent space when attempting this is dramatically smaller than the volume you will be used to with much more normal use. You'll probably feel a little like you're lost, wandering in the dark and hitting your head on invisible walls. I know I did.
The guidance/scale is overly sensitive with generations like these, and should be your first thing to fiddle with. You'll likely want that setting low-ish, think 4-6. Higher values tend to just not even really try to make the character, though this is no hard rule.
Your prompts might want some targeted repetition and strengthening, particularly when elements of your character get ignored.
I also want to say that ancestral samplers seem to perform somewhat better. Intuitively this makes sense if you know what separates them from the other samplers, but take that with a grain of salt anyway.
So it can be done. If you want to bother and will be able to persevere is a different question.
Tips, Tricks & Insights
Fixing Seams
Using the image overlay function is generally the vastly superior option, but those seams also need to be dealt with. Depending on where you are with your picture the options and best practices may vary wildly.
Prevention
Prevention would be best but there is only so much that can be done, and it isn't much. You may simply get lucky but there's no control over that. So the only real thing you can do is to place seams at places where they are less conspicuous, which are generally places with fairly even colors, mild noise and no high frequency details. Which is rarely feasible.
Image2Image
A really obvious and often very good choice is some simple image2image. A strength of just about 0.3 should do it, and is definitely something I would recommend when working on a character more directly. Especially when crystallizing an OC you may build up a ton of seams, and a single low strength i2i step can wipe them all out. It's also generally speaking pretty simple. Image with seams goes in, image without seams comes out. However this isn't really an option when you can't bring the strength up even that high, or maybe if you are working on something larger where you would then get seams along the edges of that image where it would connect to the rest.
Brush It Up
You simply fix the seams by drawing yourself. Which can be somewhat easy when a simple smudge brush is enough, or you actually have the manual drawing chops to deal with whatever situation you're faced. But you may not have the skill (I know I can deal with some, but not all situations) or maybe it's not really worth the time.
Manual Crossfade
This is a fringe option if you're tiling an image and you do so with some overlap. In that case you can make a crossfade from one tile to another, but I think there are better ways to do it like the next one.
The Nuclear Option - Inpainting Seam Patches
When you're out of reasonable options this should always get the job done, assuming you are leveraging this against single seams that aren't on a character. And for this we actually use inpainting but without overlaying. Do yourself a favor and turn it back on again if you're done.
First you want to get an image of the offending seam, and it can well be just a very small focused area, there is generally no reason to burn Anlas here as the amount of pixels we actually need from the AI is really low to begin with. If your whole image fits, that's good, otherwise take as little as possible/reasonable from your image with the seam, and then load that into NAI. You will also probably not need to adjust the prompt much or even at all for this.
See that seam in the middle? I already weakened it a little but the transition is still evidently unnatural, and it gets more jarring on the whole image. Let's inpaint only straight over that seam as narrowly as possible.
Download the result if it seems fine, and if you do this for only a part of your overall picture, put it exactly back in the place where you got it from. Now you will want to only use a faded patch of the exact area that needs fixing. In Krita that would mean selecting the seam area, feathering your selection a little, inverting the selection, and then just deleting the rest. You should have something like this:
And then you just merge it onto and over that seam.
This way you can target seams directly without just subjecting your whole image to the acid that inpainting sans overlaying brings. It's not a perfect technique and needs a bit of a manual workflow, but if you can do it, it won't take more than a couple minutes for a very passable result.
Style Migration
This isn't a grand point and might have come to mind already, but it's too good to not be aware of.
Even when your character is drawn into a scene from a reference with an unfitting style, image2image should make it easy to migrate to the new style.
Somewhat unfitting initial insertion based on a provided reference:
After using some image2image at around 0.35 strength
Contact/Other Stuff:
I'm running more operations than just this one rentry. I've got a Discord server where you can drop in to look around, or even deliver some criticism/feedback that might make it into this rentry. I've got a fully visual UI on GitHub for NAID that is focused on making cluster collages and image sequences for organized experiments and understanding of the models, so you can use it too. It's not able to use local SD and SDXL, that is not YET. And I've got a Patreon too, should you be able and willing to support me, since creating this stuff... takes a lot of time and dedication. If you don't support me I hope you will at least get some mileage out of this guide and make something great for yourself and maybe others.
All of that can be found here https://linktr.ee/xovaryu