LoRA Training for Big Lust on Runpod

Introduction
Prepare training data and config file
Deploy a Runpod instance
Upload training and config data
Do the training
Download the results
Terminate the instance

Introduction

Since running a runpod instance costs money, try to do as much prep work as you can locally before starting an instance. And since even a stopped instance costs a little bit of money, you may want to terminate (ie. delete with data) instances between training sessions.

This means there's a little more setup involved at the start of each session, but it's minimal. And if desired, you can train multiple LoRAs in one session.

For simplicity, at various points in this guide you will find 'celebrity name' in an instruction. Anywhere you see this, substitute the name of the celebrity you are training, following the format in the instruction.

Prepare training data and config file

Before you deploy a runpod instance, you'll need to prepare your training images and captions, as well as your kohya configuration file. Curating and captioning a good dataset is outside the scope of this guide. See this excellent guide for more info on that.

Gather your training images into one folder and caption them using Taggui or whichever method you prefer. You should end up with txt files named the same as your images - one txt file for each image.
Make a new file, paste in the config json from this link, and save it as celebrity name.json. This is your training config file.
In your config file, modify the following parameters as indicated:

Parameter	Value
`metadata_author`	Your name or alias
`metadata_tags`	`celebrity name` or whatever trigger words you added to the caption files
`metadata_title`	`celebrity name`
`output_name`	`celebrity-name-biglust16.safetensors` or whatever you like
`sample_prompts`	`iphone selfie of celebrity name woman, 1girl, solo, yellow bra, headshot, cleavage, smirk, auburn hair --w 1024 --h 1024 --n ugly, old, weird` This is the the prompt that will be used to generate sample images during training. Change it to whatever you like, or just enter your trigger word.
`train_data_dir`	`/workspace/kohya_ss/dataset/images/CelebrityName`
`epoch`	See info box
`save_every_n_epochs`	See info box

Steps and Epochs

You want to aim for around 3500 steps. Remember that steps = images x epochs. So if for example you have 70 training images, 3500 steps / 70 images = 50 epochs. If you have more or less images, giving an awkward number of epochs (usually the case), round up to the nearest multiple of 3, 4 or 5. It is worthwhile to have around 10 to 12 loras to pick between, so in this example you would set save_every_n_epochs to 5 to get 10 output files.

Before moving on go back and double check what you've done so far. Ideally you don't want to be troubleshooting errors while your instance is running, costing you money. That said, chances are you will hit stumbling blocks your first try, so don't sweat it too much.

Deploy a Runpod instance

Now that your training and config data is prepared, it's time to deploy a runpod instance. If you deployed an instance previously which you did not terminate, you can skip this stage - just start the instance and connect to the services.

In your runpod dashboard, click 'Deploy a Pod'.
Select the 'Kohya_ss GUI' template.
Under GPU select 'RTX 4090'.
Leave all other options as they are, enter a name for the pod, and click 'Deploy On-Demand'.
The pod takes a few minutes to start up. You can watch the logs as it does so. This is a good time to have a final check over your training config and data.
Once it has started, click 'Connect', and from the list that appears, select the following services which will open in new tabs:

Port	Service	Description
3000	Kohya_ss	The UI for running the LoRA training
8000	Runpod Application Manager	Two buttons to allow you to stop and start Kohya_ss. Shouldn't normally be needed, but useful if you mess something up and need to restart.
8888	JupyterLab	A graphical frontend for managing files in the pod. You will use this to upload the training config and data.

Upload training and config data

Now you can prepare the runpod instance for training by downloading the Big Lust model into it, and uploading your training config and data.

Open the JupyterLab tab.
Open a terminal and run the following:
wget -O /workspace/kohya_ss/biglust16.kst6.safetensors https://civitai.com/api/download/models/1081768?type=Model&format=SafeTensor&size=full&fp=fp16
This will download the Big Lust v1.6 model to the correct location. nb. If this is a previously deployed instance that you didn't terminate, skip this step.
Navigate to /workspace/kohya_ss and upload your training config file, which should be named celebrity name.json
Navigate to /workspace/kohya_ss/dataset/images. Here, create a new folder named CelebrityName, and a folder within that named 1_CelebrityName. So your full path is: /workspace/kohya_ss/dataset/images/CelebrityName/1_CelebrityName
In the folder you've just created, upload your training images and caption txt files.

Take special care over the images folder. You should have a folder named /workspace/kohya_ss/dataset/images/CelebrityName. This should match the train_data_dir option in the config file. Within this you should have a folder named 1_CelebrityName - this is where your images and caption files go.

Make sure everything has finished uploading before proceeding.

Do the training

Ironically, this is the easiest part. Switch to the Kohya_ss tab, and continue.

Switch to the LoRA tab!!! So easy to forget because the tabs look identical - but if you do, your training will go nowhere and you won't know why. Even worse, if you change an option in your config and save it on the wrong tab, you'll mess it up and need to re-upload it. Don't forget!
Under 'Configuration File' select your configuration file (duh), then click 'Load' (the last button to the right).
Scroll down, check through the options to make sure they look right, then click 'Start Training'. Don't worry, there's no visual indication that it's doing anything, but it will be.

Download the results

Now, if everything is set up right, it's time to monitor the training and download the LoRAs as they appear. Check and monitor the following:

The training log will appear at /workspace/logs/kohya_ss.log. Check this for progress, or if nothing is appearing in the output folder. To keep a running view of it, open a terminal and run the following comand:
tail -f /proc/$(pgrep -f kohya_gui)/fd/1
Sample images will start to appear within a minute or two at /workspace/kohya_ss/outputs/samples. Don't worry that they don't look much like the celebrity, but it's worth it to check that they're appearing.
Various output files, including the LoRAs will appear at /workspace/kohya_ss/outputs. Some data files will appear straight away, the LoRAs will start to appear after a few minutes.

It's a good idea to download the latter 50% or so of the LoRAs, so given the example of 50 epochs, saving every 5 epochs, you would download files 25 through to the last (not numbered - but it's file 50). Then start genning with them and compare the results.

Terminate the instance

Finally, to ensure you won't continue to be charged, once you've downloaded your LoRAs return to the runpod dashboard, stop the instance, and terminate it. When it's terminated, all the data within it will be lost, so make sure you've got the LoRAs downloaded!

If you don't mind being charged a little bit in return for a quicker start-up next time, just stop the instance. All the data in it will be retained. However, be warned that depending on demand, sometimes when you go to start up a stopped instance, there will be no GPUs available for use, and so you won't be able to use it for training. This, in addition to the cost factor, is why this guide recommends terminating instances and starting from scratch each time.

LoRA Training for Big Lust on Runpod

Introduction

Prepare training data and config file

Deploy a Runpod instance

Upload training and config data

Do the training

Download the results

Terminate the instance

Warning