Kohya Dreambooth Method Mini Guide
A. Activation Word
instance token
andclass token
should not appear in.txt
or.caption
files. They are tokens, not captions. If some script has it, then don't blame me because it's not my fault; it's just a different implementation. You can get the same thing by using the4.2.3. Custom Caption/Tag (Optional)
cell to add a custom tag as anactivation word
and setkeep_tokens
to1
.instance token
andclass token
are not used if you train Dreambooth with captions. But they are still useful to separate your dataset and concept.instance token
andclass token
are used if you setcaption_extension
tonone
.-
My implementation for
instance token
andclass token
is the same askohya_ss
's example since the folder naming scheme configuration- Before: folder naming scheme
<num_repeats>_<instance_token> <class_token>
- After: using dataset config:
- Before: folder naming scheme
B. Multi-Concept Training
Thanks to the latest updates, no more folder naming schemes and set train data path to parent folder anymore.
Now we adopting new function to make life easier. A flexible way to configure Dataset; --dataset_config
.
Using .toml
to do things, now we can train on separate resolution for each [[dataset.subsets]]
, also you don't need to
rename your train data folder like this: <num_repeats>_<instance_token> <class_token>
But how about multi-concept training? Is it still viable? The answer is : It is
However, installing multi-concept training with new --dataset_config
need more work, and this mean more cell, and more line of code.
So I'm adjusting the notebook to only support one-concept
training. From defining train data folder > scraping > preprocessing > training.
BUT, you can still do that by edit dataset_config.toml
and you can do a lot of things more than multi-concept training by editing that file!
Before --dataset_config
we used to configure our dataset like this:
After --dataset_config
you can do more than defining trigger token and no more dataset path set to parent directory, for example:
This is how you do one concept training, in dataset_config.toml
:
And this is how you do multiple concept training, in dataset_config.toml
:
C. Custom Tags/Caption
- if your tag has spaces, replace that spaces (
) with underscore (
_
):
custom_tag :blue_berry
output :blue berry, 1girl, brown hair, school uniform, smile
- if you set
append
toTrue
, your custom tag will be added to end of line instead
custom_tag :blue_berry
output :1girl, brown hair, school uniform, smile, blue berry
- if you want to add or remove multiple tags, add space (
):
custom_tag :blue_berry red_juice
,
output:blue berry, red juice, 1girl, brown hair, school uniform, smile
- if you want to remove a tag, set
remove_tag
toTrue
custom_tag :brown hair
output :1girl, school uniform, smile, blue berry
D. LoRA Module Config
networks.lora
is normal and default kohya-ss/sd-scripts LoRA.lycoris.kohya
is a python package for LoRA module. Previously LoCon. Currently there are 2 LoRA algorithms: LoCon and LoRA with Hadamard Product representation. Putalgo=lora
for LoCon oralgo=loha
for Hadamard Product innetwork_args
. Read: KohakuBlueleaf/LyCORIS.locon.locon_kohya
is LoRA for convolutional network. In short, it's the same LoRA but training almost all layers including normal LoRA layer. Read: KohakuBlueleaf/LoCon.
E. Backward Compatibility
use_old_dataset_config
Specify this option if you want to use old dataset config. A config where folder naming scheme still exist, e.g.<repeats>_<instance_token> <class_token>
. Old but useful for multi-concept training without even bother editingdataset_config.toml
. Just remember to set parent folder path totrain_data_dir
andreg_data_dir
. Doesn't support data preprocessing section, please preprocess your own data locally.
Source:
- Folder naming scheme; an old way to define trigger prompts, dataset repeats, and class token: https://note.com/kohya_ss/n/nee3ed1649fb6 (original article)
- Dataset Config Documentation : https://github.com/kohya-ss/sd-scripts/blob/dev/config_README-ja.md