Training Chroma using diffusion-pipe on linux (Install WSL2 on Winblows for Linux support)

Notes

  • Eval datasets are the same format as training. Copy 1-4 image/text pairs from your training into eval. Normally you don't poison eval with training data, but we're intentionally biasing the model here.
  • I put all my configs in the examples folder (out of laziness)
  • Losses around 0.2-0.4 are normal

Installation

1
2
3
4
5
6
7
(install python-uv from your distro)
git clone --recurse-submodules https://github.com/tdrussell/diffusion-pipe
cd diffusion-pipe
uv venv
source .venv/bin/activate
uv pip install torch torchvision torchaudio
uv pip install -r requirements.txt

Dataset

  • Put as many good images as you want in a folder (more isn't always better).
  • Captions go in the same file name with a .txt extension (For Winblows, google show file extensions). Example: image1.jpg and image1.txt

Config

examples/chroma.toml

output_dir = '/path/diffusion-pipe/output'
dataset = 'examples/chromadataset.toml'
eval_datasets = [
    {name = 'eval', config = 'examples/eval.toml'},
]
epochs = 100
micro_batch_size_per_gpu = 1
pipeline_stages = 1
gradient_accumulation_steps = 1
gradient_clipping = 1.0
warmup_steps = 100

eval_every_n_epochs = 1
eval_before_first_step = true
eval_micro_batch_size_per_gpu = 1
eval_gradient_accumulation_steps = 1
#disable_block_swap_for_eval = true

save_every_n_epochs = 1
checkpoint_every_n_minutes = 120
activation_checkpointing = true
partition_method = 'parameters'
save_dtype = 'bfloat16'
caching_batch_size = 1
steps_per_print = 1
video_clip_mode = 'single_beginning'

[model]
type = 'chroma'
diffusers_path = '/path/models/Chroma1-HD/'
transformer_path = '/path/models/Chroma1-HD/Chroma1-HD.safetensors'
dtype = 'bfloat16'
transformer_dtype = 'float8' # Minimal quality loss
flux_shift = true

[adapter]
type = 'lora'
rank = 16 # up to 128 if 16GB+
dtype = 'bfloat16'
#init_from_existing = '/data/diffusion_pipe_training_runs/something/epoch50'

[optimizer]
type = 'adamw_optimi'
lr = 1e-4
betas = [0.9, 0.99]
weight_decay = 0.01
eps = 1e-8

#[optimizer]
#type = 'Prodigy'
#lr = 1
#betas = [0.9, 0.99]
#weight_decay = 0.01

examples/chromadataset.toml which is the same as examples/evaldataset.toml except the path

resolutions = [512] # Increase resolution to 640, 768, etc for more detail if 24GB+ VRAM

enable_ar_bucket = true

min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 9

frame_buckets = [1]

[[directory]]
path = '/path/datasets/lora_dataset'

num_repeats = 1

Running

NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" deepspeed --num_gpus=1 train.py --deepspeed --config examples/chroma.toml

Edit

Pub: 15 May 2025 16:45 UTC

Edit: 23 Aug 2025 12:34 UTC

Views: 3792