dreambooth training guide
Using dreambooth A111
Download all anime Images for dataset | api.rule, dan, gel - [Script]
download software: https://github.com/Bionus/imgbrd-grabber > Tools > Options > Save > Separatedlog files > separate log file:
Name: %md5%.%ext% - Suffix: .txt - Text file content: %all:,excludenamespace=general,unsafe,separator=^, %
Search: ~tag1 -tag2 (Add~ remove- or just \<space>) [api.rule, dan, gel]
Fix1 - [Script]
Once download use this batch next to dl folder, here "anime" to fix extra extension + move video to a folder, (same can be done with Image)
Fix2 - [Script]
This will remove all .txt from folder anime512x that does not have a corresponding/secondary file (color correction)
Fix3: if add more clip caption, remove duplicated words. - [Script]
This will clean and remove duplicated words separated by commas in all .txt from folder anime512x (limited to 1024 character)
Next?
in webui download extension sd_dreambooth_extension+sd_smartprocess+training-picker then apply and restart.
Change Setting training-picker to video path or move videos, reload if necessary
then in Tab Training Picker extract keyframes of each video and remove any unwanted Images.
Use Tab Smart Preprocess (can zoom on image) or Train>Preprocess image to crop and Generate Captions .txt. Example: set path to directory &
- Crop Images + Generate Captions + Add DeepDanbooru Tags to Caption
or manually crop with Training Picker or birme.net or Edit in Photo>save this frame+crop.
I'll reduced Duplicated with Visual Similarity Duplicate Image Finder File > New Project (sel dir > OK) > StartScan(modify parameter if need it) and wait it's finish > Autocheck & Delete > Autocheck image with: Smaller files size.. (Ajust Checkbox for remove image) > Deleted Checked Files > Perform.
Counter tags to text - [Script]
(Optional) create allTags.txt to count and order all tags from all .txt inside the folderName, usage py CombineAllTags.py folderName | CombineAllTag.py:
(Optional) convert any metadata "parameters" > .txt and vice versa. - [Script]
-
Texture Training
Download all texture and add caption to image, quickly with script
Scrap Texture, keeping dataset | polyhaven - [Script]
If you have install Python just run the script: Python polyhavenScraper.py
:
Add Text CLIP Caption to each Images - [Extension]
Install clip-interrogator-ext Extension AUTOMATIC1111 let use more models for Caption.
Interrogator>Batch>Prompt Mode>Best, Enter correct Images folder>.\path\to\polyhaven_textures
, and Choose a CLIP Model>Go!
This will add text file inside polyhaven_textures to describe each images.
Best CLIP model I found for texture in order
- xlm roberta large ViT-H-14 frozen laion5b s13b b90k Top1
- VIT bigG 14 laion2b s39b b160k
- coca_ViT-L-14/mscoco_finetuned_laion2b_s13b_b90k; convnext_large_d_320/laion2b_s29b_b131k_ft_soup.
Result
Caption end wrong in most cases...
Download all texture and name it | [Manually]
Download Images and create for each one "<ImageName>.txt" in same directory. (log total tags using my script)
Dreambooth config settings
Batch Size
: 1, Set Gradients to None When Zeroing
: ✓, Gradient Checkpointing
: ✓, Lora UNET Learning Rate
: 0.0001, Lora Text Encoder Learning Rate
: 0.00002, Learning Rate Scheduler
: constant, Learning Rate Scheduler
: 1, Constant/Linear Starting Factor
: 1, Scale Position
: 1, Max Resolution
: 512; Optimizer
: 8bit Adamw, Mixed Precision
: fp16, Memory Attention
: xformers, Cache Latents
: ✓, Train UNET
: ✓ Step Ratio of Text Encoder Training
: 0, Freeze CLIP Normalization Layers
: no, Instance Prompt
: diffuse texture, [filewords] Deterministic
: ✓
FixText
Script that generate and Check Letter,
DatasetLetterMaker.py
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 | # rough letter OCR synthetic dataset augmented
import os, string, random, re, math, wordninja, unicodedata, easyocr, webcolors, nltk, hashlib
import matplotlib.font_manager as fm
from PIL import Image, ImageDraw, ImageFont
from nltk.corpus import brown
from nltk.probability import FreqDist
from torchvision.transforms import ColorJitter
# Define the number range
START_NUMBER = 1
END_NUMBER = 26
# Create a directory for the images and text files
os.makedirs('dataset', exist_ok=True)
# Define the data augmentation transform
COLOR_JITTER = ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)
def sanitize_filename(text):
return hashlib.sha256(text.encode()).hexdigest() # Hash the text using SHA256
def calculate_font_size(draw, text, image_size):
font_name = random.choice(fm.findSystemFonts()) # Use matplotlib to find a random font file
font_size = 1
font = ImageFont.truetype(font_name, font_size)
text_size = font.getbbox(text)[2:4]
while text_size[0] < image_size[0] and text_size[1] < image_size[1]:
font_size += 1
font = ImageFont.truetype(font_name, font_size)
text_size = font.getbbox(text)[2:4]
return font_size - 1
def place_text_randomly(img, draw, text, bg_color, image_size):
while True:
font_name = random.choice(fm.findSystemFonts()) # Define a random font_name here
font_type = os.path.basename(font_name).rsplit('.', 1)[0] # Get the font type from the font name
font = ImageFont.truetype(font_name, 10) # Create a test font object
if all(font.getbbox(c) != (0, 0, 0, 0) for c in text) # Check if the font supports all characters in the text
break
text_color = tuple([255 - c for c in bg_color]) # Define text_color here
angle = random.uniform(-30, 30) # Choose a random angle for the text
# Randomly choose a size for the text that fits within the image when rotated
min_font_size = 20
max_font_size = calculate_font_size(draw, text, image_size)
if max_font_size < min_font_size:
return False, min_font_size # Return False and the minimum font size
font_size = random.randint(min_font_size, max_font_size)
font = ImageFont.truetype(font_name, font_size) # Define a random font
text_width, text_height = draw.textsize(text, font=font)
# Choose a random position for the text that fits within the image when rotated
max_x_position = image_size[0] - max(text_width * abs(math.cos(angle)), text_height * abs(math.sin(angle)))
max_y_position = image_size[1] - max(text_height * abs(math.cos(angle)), text_width * abs(math.sin(angle)))
if max_x_position < 0 or max_y_position < 0:
return False, min_font_size, font_type # Return False and the minimum font size and font_type
position = (random.randint(0, int(max_x_position)),
random.randint(0, int(max_y_position)))
# Create a new image for the text and rotate it
text_img = Image.new('RGBA', (text_width, text_height), color=(0,0,0,0)) # Change 'RGB' to 'RGBA' and bg_color to (0,0,0,0)
text_draw = ImageDraw.Draw(text_img)
text_draw.text((0, 0), text, fill=text_color + (255,), font=font) # Add alpha channel to fill color
rotated_text_img = text_img.rotate(angle, expand=1)
# Paste the rotated text onto the original image at the specified position.
img.alpha_composite(rotated_text_img.rotate(angle), position) # Use alpha_composite instead of paste
return True, font_size, font_type # Modify this line to return font_type
def is_text_readable(image_path, text):
reader = easyocr.Reader(['en']) # specify the language(s)
result = reader.readtext(image_path)
# Check if the result is empty or below threshold confidence level of OCR.
if not result or any(res[2] < 0.5 for res in result):
return False
# Check if the recognized text matches the input text
recognized_text = ' '.join([res[1] for res in result])
if recognized_text != text:
return False
return True
def has_two_colors(image_path):
img = Image.open(image_path)
colors = img.getcolors()
# Check if there are at least two colors in the image.
if len(colors) < 2:
return False
return True
def closest_color(requested_color):
min_colors = {}
for key, name in webcolors.CSS3_NAMES_TO_HEX.items():
r_c, g_c, b_c = webcolors.hex_to_rgb(name)
rd = (r_c - requested_color[0]) ** 2
gd = (g_c - requested_color[1]) ** 2
bd = (b_c - requested_color[2]) ** 2
min_colors[(rd + gd + bd)] = key
closest_color_name = min_colors[min(min_colors.keys())]
closest_color_name = ' '.join(wordninja.split(closest_color_name))
return closest_color_name
def rgb_to_name(rgb):
try:
color_name = webcolors.rgb_to_name(rgb, spec='css3')
except ValueError:
color_name = closest_color(rgb)
color_name = ' '.join(wordninja.split(color_name))
return color_name
def describe_font_size(font_size):
if font_size < 30:
return "very small"
elif font_size < 40:
return "small"
elif font_size < 50:
return "medium"
elif font_size < 60:
return "large"
else:
return "very large"
def generate_image_and_text(text):
sanitized_text = sanitize_filename(text)
img_path = f'dataset/{sanitized_text}.png'
for _ in range(3): # Limit the number of attempts to 3.
# Create a new image with random background color.
bg_color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
img = Image.new('RGBA', (1024, 1024), color=bg_color + (255,)) # Change 'RGB' to 'RGBA' and add alpha channel to bg_color
d = ImageDraw.Draw(img)
# Place the text at a random position with a random font.
success, font_size, font_type = place_text_randomly(img, d, text, bg_color, img.size) # Modify this line to return font_type
if len(text) == 1 and text.isalnum(): # If the text is a single letter or number, do not skip it
success = True
if not success:
continue
COLOR_JITTER(img).convert('RGB').save(img_path, 'PNG') # Convert back to 'RGB' before saving
# If the image has at least two colors and the text is readable, break the loop and write the text to a file.
if has_two_colors(img_path) and is_text_readable(img_path, text): # Add text as an argument here
text_color_name = rgb_to_name(tuple([255 - c for c in bg_color]))
bg_color_name = rgb_to_name(bg_color)
size_description = describe_font_size(font_size)
with open(f'dataset/{sanitized_text}.txt', 'w') as f:
f.write(f'The word "{text}" is written in {text_color_name} letters on a {bg_color_name} background. The {font_type} font size is {size_description}.') # Modify this line to include font_type
break
else:
# If after 3 attempts the image is still not recognized and it exists, delete the image.
if os.path.exists(img_path):
os.remove(img_path)
def main():
# Download the necessary resources from nltk
nltk.download('brown')
# Get words from the Brown corpus and create a frequency distribution of words
fdist = FreqDist(word.lower() for word in brown.words())
# Get the most common words and limit to the most commonly used 200+ words
words = [word for word, frequency in fdist.most_common(200)]
# Add UTF-8 characters with numbers
utf8_chars = [chr(i) for i in range(32, 127)] + [chr(i) for i in range(161, 172)] + [chr(i) for i in range(174, 256)]
utf8_chars = [c for c in utf8_chars if unicodedata.category(c)[0] != 'C']
# Add numbers and single letters to your texts
numbers = [str(i) for i in range(START_NUMBER, END_NUMBER+1)]
letters = list(string.ascii_lowercase) + list(string.ascii_uppercase)
texts = words + utf8_chars + numbers + letters
# Generate images and text files for common English words.
for word in words:
generate_image_and_text(word)
# Generate images and text files for UTF-8 characters.
for char in utf8_chars:
generate_image_and_text(char)
if __name__ == "__main__":
main()
|
requirements.txt
:
Misc
Miscellaneous
Tips
Adding Full A model to B
(Optional) Combine Model: A pix2pix/inpaint/ownModel, B Protogen, C v1-5-Pruned, M 1