Training Chroma with using diffusion-pipe on linux

Training Chroma with using diffusion-pipe on linux

Notes

Eval datasets are the same format as training. Copy 1-4 image/text pairs from your training into eval.
I put all my configs in the examples folder
Losses around 0.2-0.4 are normal

Installation

⎗
✓
(install python-uv from your distro)
git clone --recurse-submodules https://github.com/tdrussell/diffusion-pipe
cd diffusion-pipe
uv venv
source .venv/bin/activate
uv pip install torch torchvision torchaudio
uv pip install -r requirements.txt

Config

examples/chroma.toml

⎗
✓
output_dir = '/path/diffusion-pipe/output'
dataset = 'examples/chromadataset.toml'
eval_datasets = [
    {name = 'eval', config = 'examples/eval.toml'},
]
epochs = 1000 # Forever
micro_batch_size_per_gpu = 1
pipeline_stages = 1
gradient_accumulation_steps = 1
gradient_clipping = 1.0
warmup_steps = 100

eval_every_n_epochs = 1
eval_before_first_step = true
eval_micro_batch_size_per_gpu = 1
eval_gradient_accumulation_steps = 1
#disable_block_swap_for_eval = true

save_every_n_epochs = 2
checkpoint_every_n_minutes = 120
activation_checkpointing = true
partition_method = 'parameters'
save_dtype = 'bfloat16'
caching_batch_size = 1
steps_per_print = 1
video_clip_mode = 'single_beginning'

[model]
type = 'chroma'
diffusers_path = '/path/models/FLUX.1-schnell'
transformer_path = '/path/models/chroma-unlocked-v30.safetensors'
dtype = 'bfloat16'
transformer_dtype = 'float8'
flux_shift = true

[adapter]
type = 'lora'
rank = 16 # up to 128 if 16GB+
dtype = 'bfloat16'
#init_from_existing = '/data/diffusion_pipe_training_runs/something/epoch50'

[optimizer]
type = 'adamw_optimi'
lr = 1e-4
betas = [0.9, 0.99]
weight_decay = 0.01
eps = 1e-8

#[optimizer]
#type = 'Prodigy'
#lr = 1
#betas = [0.9, 0.99]
#weight_decay = 0.01

examples/chromadataset.toml which is the same as examples/evaldataset.toml except the path

⎗
✓
resolutions = [512] # Increase resolution to 768 and/or 1024 for more detail if 24GB VRAM

enable_ar_bucket = true

min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 9

frame_buckets = [1]

[[directory]]
path = '/path/datasets/lora_dataset'

num_repeats = 1

Running

NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" deepspeed --num_gpus=1 train.py --deepspeed --config examples/chroma.toml