LoRA training for LLaMA

A simple guide that hopefully doesn't miss anything.

This was tested on a runpod instance, which runs debian. I don't think it'll work on windows but it works on WSL Ubuntu as well. You'll need python 3.1 and anaconda/miniconda as well.

I don't think I missed anything.

Setup

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
git clone https://github.com/GamerUntouch/minimal-llama
cd minimal-llama
pip install -r requirements.txt

This will set up the proper modules to run the minimal-llama system, which I've edited slightly to generate the proper LoRA files.

In order to train these files, you need the HF converted int8 weights and the tokenizer.model files beforehand.

Tokenization

The LoRA generator needs a tokenized .arrow file. This script will generate that from a .txt file provided.

python tokenize_dataset.py --tokenizer_path tokenizer/tokenizer.model --text_path text/textfile.txt --save_path output/folder --max_seq_length 256

The json loader doesn't like invalid escape codes. Text like "!" should be changed to "\!" before running this. The max_seq_length determines the batch size and will matter in the next script.

LoRA Training

python finetune_peft.py --model_path folder/with/int8/converted/weights --dataset_path output/folder --peft_mode lora --lora_rank 8 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --max_steps 300 --learning_rate 2e-4 --fp16 --logging_steps 10 --output_dir lora/output/folder

Based on my testing, I've found that a 3060 is able to handle a batch size of 196 and a per_device_train_batch_size of 1. You can fiddle with the parameters for training.
The script is fairly slow.

Merging

python lora_merge.py --model_path /path/to/llama-7b --peft_path /path/to/peft --output_path /path/to/output --tokenizer_path /path/to/tozenizer

Seems to be somewhat faulty, but will merge and allow you to run finetuned models. Seems to inflate the filesize, maybe making it 16-bit instead of 8-bit, no idea.

Running LoRAs

The latest oobabooga text-generation-webui version allows LoRAs to be used at int8. I haven't tested it, but hypothetically merging LoRA with int8 weights then converting them to 4bit SHOULD work. I haven't tested this so I don't really know, though.