Creating a specific character via Dreambooth
Prerequisites
Motivation
Why use Dreambooth?
We can train on just ~20 images of your character whereas the other methods will overfit easily or not work at all. Dreambooth also allows you to easily transfer your character to different models.
- Hypernetworks & LORA
- Prone to overfitting easily, which means it won't transfer your character's exact design to different models
- For LORA, some people are able to get decent results on weak GPUs. I've trained one "successfully" on LORA but I get varied results on other datasets. For dreambooth, I get it in one try and the setup & documentation is way easier.
- Textual Inversion
- Can't capture specific details of a character, especially if they're unknown
Examples of characters being used in different models with different art styles (sign of not overfitting):
-
NAI + WD v1.3 PVC AnythingV3 -
WD v1.3 PVC AnythingV3
Dataset Preparation
- ~20+ quality images (example). Make sure they're 512x512 (very important!, do this via crop, and not resize)
- ~20+ text files with tags in it that describe each image individually
- https://github.com/toriato/stable-diffusion-webui-wd14-tagger to automatically tag. Make sure the tags are high quality!
My Dreambooth Settings
sd-webui is always updating, so the settings may change.
Create Model
Source Checkpoint: I've had the most success with Waifu Diffusion V1.3. Experimenting with a NAI WDv1.3 80/20 mix which is working as well.
Scheduler: ddim
Make sure to click "Create Model" before proceeding.
Parameters
Training Steps Per Image (Epochs): 10000000
(you'll interrupt training anyway)
Max Training Steps: 10000000
Use Lifetime Steps/Epochs When Saving: No
Save Preview/Ckpt Every Epoch: Yes
Save Checkpoint Frequency: 10
Save Preview(s) Frequency: 10
LR: 0.000001
Resolution: 512
Use 8bit adam: yes
- Mixed precision:
fp16
- Memory attention:
xformers
- Don't Cache Latents:
yes
- Train Text Encoder:
no
(I've never tried this, because I don't have enough VRAM)
Concepts
Instance Token: put the keywords that will identify your character (e.g. star guardian neeko
)
Instance Prompt: put in the instance token here with [filewords]
seperated by a comma (e.g. star guardian neeko, [filewords]
)
Class Prompt: [filewords]
(make sure your instance token doesn't show up in the class prompt at all)
Negative Prompt: use any negative prompt. what i use:
Sample Image Prompt: Include your instance token here and copy the tags from a text file, so you can see the progress as you train.
Sample Image Negative Prompt: copy the same Negative Prompt
Total Number of Class/Reg Images: 10 * the number of images in your dataset
Training the model
When you click train, your model will start generating "regularization images" which helps the network not overfit. You can view these images under the folder classifiers_0
. Make sure they look something similar to what the model should generate for your class prompts. After it's done generating these images, it'll start the training process.
Usually the model gets good around 4000+ training steps with ~20 images, but you'll need to make a trade-off of character accuracy vs. flexibility of transferring the character to other models. In your model's folder, there is a samples folder that the training has generated. When the model picks up fine details is a good indication to train a bit more and stop. You can always continue if you want to.
To know which training step is good, you'll have to come up with an overfitting test. This is when you use a prompt to generate an image that does not look like your training dataset. For example, my datasets are usually only front-pose, so I'll test with side/back poses. At low training steps, it'll easily be able to do it, but at the cost of character accuracy. At very high training steps, your character model will not be able to do it.
Underfitting | Just Right | Overfitting (can't generate side/back poses) |
---|---|---|
I often also choose to also create another model by continuing to train to around 12k+ training steps, but for consistent front poses as that's what I usually include in my dataset. This is "overfitting", but we're overfitting for frontal poses, so this will merge well with any model where you do frontal poses, e.g. figurine model.
Graphing the loss vs time graph
This step is not necessary, but can be useful in deciding which ckpt file to use. It also can indicate whether or not training is working or not.
Run tensorboard --logdir {.../dreambooth/your_model_name/logging/dreambooth}
to see the above graph. Set smoothing on the right to 0.99.
Note: Lower loss is correlated with a more accurate character, but the longer you train, the more likely you're overfitting. Overfitting leads to bad merging with other models. I would still manually test this since ML is not an exact science.
Transferring your character to new models
Go to the checkpoint merger tab in webui.
Primary model: choose the trained model
Secondary mode: choose the new model that you want to use
Tertiary model: choose the source model that was used for training
Set a custom name so that you don't forget.
Multiplier: 1
Interpolation method: Add difference