AMD LoRA Training Guide

Written by Anon, mainly a copy paste from https://rentry.org/lora_train, but adjusted to work on AMD+Linux

AMD LoRA Training Guide

Limitations

This guide should work on

Radeon VII (Vega 20)
6700XT up to 6950XT

and might work on:

6700

with some VRAM tweaking

Prerequisites

You will need:

A Linux installation (I'm using Ubuntu 22.04)
Python 3.10
git

Installation

Open a terminal and navigate where you want to install it.

Download the repo:

⎗

1	git clone https://github.com/kohya-ss/sd-scripts

Installation instructions (different from standard/NVIDIA):

⎗

cd sd-scripts
python3 -m venv venv
source venv/bin/activate

pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.2
pip3 install --upgrade -r requirements.txt
pip3 uninstall tensorflow
pip3 install tensorflow-rocm

In exactly this order

Grab one of the two python training scripts in https://github.com/derrian-distro/LoRA_Easy_Training_Scripts, If you have no idea what you're doing you should grab the lora_train_popup.py version as it basically walks you through everything.
Edit the lora_train_popup.py with the following:
⎗

✓
1
self.xformers: bool = False
sadly we cannot use xformers because facebookresearch targets Windows+NVIDIA
Either you set self.use_8bit_adam: bool = False or follow the next optional step "building bitsandbytes for AMD"
Then prepare your dataset.

Building bitsandbytes for AMD (optional/read above!)

Uninstall the current bitsandbytes
⎗

✓
1
pip uninstall bitsandbytes

Get this unofficial AMD fork of bitsandbytes

⎗

1 2	git clone https://github.com/broncotc/bitsandbytes-rocm cd bitsandbytes-rocm

Build it with

⎗

make hip

if it fails you might need to install:

⎗

1
2
3

make
libstdc++-12-dev
amdgpu-install --usecase=rocm,hiplibsdk

On success install bitsandbytes with:
⎗

✓
1
python3 setup.py install
Now you can exit bitsandbytes-rocm with cd .. and prepare your dataset

Dataset preperation (shameless copy paste)

Create a directory layout as shown below:

Example directory layout: https://mega.nz/folder/p5d3haJR#SmDSpaldBGcYzvZOx8sqbg
You could have one concept subfolder or 10 but you must have at least one.
Concept folders follow this format: <number>_<name>
- the <number> determines the amount of repeats your training script will do on that folder.
- the <name> is purely cosmetic as long as you have matching txt caption files.
- Caption files are mandatory or else LoRAs will train using the concept name as a caption.
Learn how to do captions here

Start training

Open a terminal and navigate to your sd-scripts folder OR just reuse the terminal from the installation
Make sure your venv is active or just:
⎗

✓
1
source venv/bin/activate

Launch it with

⎗

1	venv/bin/accelerate-launch --num_cpu_threads_per_process 12 lora_train_popup.py

Follow the popups and make sure to select the upper folder, which holds the <number>_<name> subfolders when asked.

Tips and tweaks

512x512 Training with batch size 1 requires 10GB VRAM on my system + 1GB to other apps.
Batch size 2 or 768x768 do not fit in 16GB VRAM.
If you have an IGPU you can actually run all other apps on it by following this: Linux + IGPU + ANY DGPU
Ignore the MIOpen(HIP): Warning [SQLiteBase] Missing system database file: gfx1030_40.kdb Performance may degrademessage. There is no solution and the performance loss is only on the startup.
View VRAM usage with radeontop in another terminal
This works sometimes: AMD GPU Linux Overclocking