Kohya Dreambooth Method Mini Guide
A. Activation Word
instance tokenandclass tokenshould not appear in.txtor.captionfiles. They are tokens, not captions. If some script has it, then don't blame me because it's not my fault; it's just a different implementation. You can get the same thing by using the4.2.3. Custom Caption/Tag (Optional)cell to add a custom tag as anactivation wordand setkeep_tokensto1.instance tokenandclass tokenare not used if you train Dreambooth with captions. But they are still useful to separate your dataset and concept.instance tokenandclass tokenare used if you setcaption_extensiontonone.-
My implementation for
instance tokenandclass tokenis the same askohya_ss's example since the folder naming scheme configuration- Before: folder naming scheme
<num_repeats>_<instance_token> <class_token> - After: using dataset config:
- Before: folder naming scheme
B. Multi-Concept Training
Thanks to the latest updates, no more folder naming schemes and set train data path to parent folder anymore.
Now we adopting new function to make life easier. A flexible way to configure Dataset; --dataset_config.
Using .toml to do things, now we can train on separate resolution for each [[dataset.subsets]], also you don't need to
rename your train data folder like this: <num_repeats>_<instance_token> <class_token>
But how about multi-concept training? Is it still viable? The answer is : It is
However, installing multi-concept training with new --dataset_config need more work, and this mean more cell, and more line of code.
So I'm adjusting the notebook to only support one-concept training. From defining train data folder > scraping > preprocessing > training.
BUT, you can still do that by edit dataset_config.toml and you can do a lot of things more than multi-concept training by editing that file!
Before --dataset_config we used to configure our dataset like this:
After --dataset_config you can do more than defining trigger token and no more dataset path set to parent directory, for example:
This is how you do one concept training, in dataset_config.toml:
And this is how you do multiple concept training, in dataset_config.toml:
C. Custom Tags/Caption
- if your tag has spaces, replace that spaces (
) with underscore (_):
custom_tag :blue_berry
output :blue berry, 1girl, brown hair, school uniform, smile - if you set
appendtoTrue, your custom tag will be added to end of line instead
custom_tag :blue_berry
output :1girl, brown hair, school uniform, smile, blue berry - if you want to add or remove multiple tags, add space (
):
custom_tag :blue_berry red_juice,
output:blue berry, red juice, 1girl, brown hair, school uniform, smile - if you want to remove a tag, set
remove_tagtoTrue
custom_tag :brown hair
output :1girl, school uniform, smile, blue berry
D. LoRA Module Config
networks.lorais normal and default kohya-ss/sd-scripts LoRA.lycoris.kohyais a python package for LoRA module. Previously LoCon. Currently there are 2 LoRA algorithms: LoCon and LoRA with Hadamard Product representation. Putalgo=lorafor LoCon oralgo=lohafor Hadamard Product innetwork_args. Read: KohakuBlueleaf/LyCORIS.locon.locon_kohyais LoRA for convolutional network. In short, it's the same LoRA but training almost all layers including normal LoRA layer. Read: KohakuBlueleaf/LoCon.
E. Backward Compatibility
use_old_dataset_config
Specify this option if you want to use old dataset config. A config where folder naming scheme still exist, e.g.<repeats>_<instance_token> <class_token>. Old but useful for multi-concept training without even bother editingdataset_config.toml. Just remember to set parent folder path totrain_data_dirandreg_data_dir. Doesn't support data preprocessing section, please preprocess your own data locally.
Source:
- Folder naming scheme; an old way to define trigger prompts, dataset repeats, and class token: https://note.com/kohya_ss/n/nee3ed1649fb6 (original article)
- Dataset Config Documentation : https://github.com/kohya-ss/sd-scripts/blob/dev/config_README-ja.md