Kohya Dreambooth Method Mini Guide

A. Activation Word

instance token and class token should not appear in .txt or .caption files. They are tokens, not captions. If some script has it, then don't blame me because it's not my fault; it's just a different implementation. You can get the same thing by using the 4.2.3. Custom Caption/Tag (Optional) cell to add a custom tag as an activation word and set keep_tokens to 1.
instance token and class token are not used if you train Dreambooth with captions. But they are still useful to separate your dataset and concept.
instance token and class token are used if you set caption_extension to none.

My implementation for instance token and class token is the same as kohya_ss's example since the folder naming scheme configuration

Before: folder naming scheme <num_repeats>_<instance_token> <class_token>

After: using dataset config:

⎗
✓
{
    "image_dir": train_data_dir,
    "class_tokens": f"{instance_token} {class_token}",
    "num_repeats": train_repeats,
},
{
    "is_reg": True,
    "image_dir": reg_data_dir,
    "class_tokens": class_token,
    "num_repeats": reg_repeats,
},

B. Multi-Concept Training

Thanks to the latest updates, no more folder naming schemes and set train data path to parent folder anymore.
Now we adopting new function to make life easier. A flexible way to configure Dataset; --dataset_config.
Using .toml to do things, now we can train on separate resolution for each [[dataset.subsets]], also you don't need to
rename your train data folder like this: <num_repeats>_<instance_token> <class_token>

But how about multi-concept training? Is it still viable? The answer is : It is

However, installing multi-concept training with new --dataset_config need more work, and this mean more cell, and more line of code.
So I'm adjusting the notebook to only support one-concept training. From defining train data folder > scraping > preprocessing > training.
BUT, you can still do that by edit dataset_config.toml and you can do a lot of things more than multi-concept training by editing that file!

Before --dataset_config we used to configure our dataset like this:

⎗

1
2
3

<train_data_dir>
    - 10_concept1
    - 20_concept2

After --dataset_config you can do more than defining trigger token and no more dataset path set to parent directory, for example:

This is how you do one concept training, in dataset_config.toml:

⎗
✓
...
        # this is concept1 dataset
        [[datasets.subsets]]
        image_dir = "/content/LoRA/concept1"
        class_tokens = "mksks  style"
        num_repeats = 10

        # this is regularization images folder, optional
        [[datasets.subsets]]
        is_reg = true
        image_dir = "/content/LoRA/reg_data"
        class_tokens = " style"
        num_repeats = 1
...

And this is how you do multiple concept training, in dataset_config.toml:

⎗
✓
...
        # this is concept1 dataset
        [[datasets.subsets]]
        image_dir = "/content/LoRA/concept1"
        class_tokens = "bocchi"
        num_repeats = 10

        # this is concept2 dataset
        [[datasets.subsets]]
        image_dir = "/content/LoRA/concept2"
        class_tokens = "nijika"
        num_repeats = 5

        # this is concept3 dataset
        [[datasets.subsets]]
        image_dir = "/content/LoRA/concept3"
        class_tokens = "ryo"
        num_repeats = 10

        # this is regularization images folder, optional
        [[datasets.subsets]]
        is_reg = true
        image_dir = "/content/LoRA/reg_data"
        class_tokens = " style"
        num_repeats = 1
...

C. Custom Tags/Caption

⎗
✓
Tag: 1girl, brown hair, school uniform, smile

if your tag has spaces, replace that spaces () with underscore (_):
custom_tag : blue_berry
output : blue berry, 1girl, brown hair, school uniform, smile
if you set append to True, your custom tag will be added to end of line instead
custom_tag : blue_berry
output : 1girl, brown hair, school uniform, smile, blue berry
if you want to add or remove multiple tags, add space ():
custom_tag : blue_berry red_juice,
output: blue berry, red juice, 1girl, brown hair, school uniform, smile
if you want to remove a tag, set remove_tag to True
custom_tag : brown hair
output : 1girl, school uniform, smile, blue berry

D. LoRA Module Config

networks.lora is normal and default kohya-ss/sd-scripts LoRA.
lycoris.kohya is a python package for LoRA module. Previously LoCon. Currently there are 2 LoRA algorithms: LoCon and LoRA with Hadamard Product representation. Put algo=lora for LoCon or algo=loha for Hadamard Product in network_args. Read: KohakuBlueleaf/LyCORIS.
locon.locon_kohya is LoRA for convolutional network. In short, it's the same LoRA but training almost all layers including normal LoRA layer. Read: KohakuBlueleaf/LoCon.

E. Backward Compatibility

use_old_dataset_config
Specify this option if you want to use old dataset config. A config where folder naming scheme still exist, e.g. <repeats>_<instance_token> <class_token>. Old but useful for multi-concept training without even bother editing dataset_config.toml. Just remember to set parent folder path to train_data_dir and reg_data_dir. Doesn't support data preprocessing section, please preprocess your own data locally.

Source:

Folder naming scheme; an old way to define trigger prompts, dataset repeats, and class token: https://note.com/kohya_ss/n/nee3ed1649fb6 (original article)
Dataset Config Documentation : https://github.com/kohya-ss/sd-scripts/blob/dev/config_README-ja.md