Llama 4Bit Lora Support

Only Inference support

(all from https://github.com/oobabooga/text-generation-webui/issues/332)
Install custom 4bit Inference peft fork : pip install git+https://github.com/Curlypla/peft-4bit-fix

Lora 4bit Training support

(all from https://github.com/qwopqwop200/GPTQ-for-LLaMa and https://github.com/johnsmith0031/alpaca_lora_4bit)
Install custom GPTQ fork :
git clone https://github.com/Curlypla/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
python setup_cuda.py install

Install custom peft fork : pip install git+https://github.com/Curlypla/peft-GPTQ

With the new changes, LLaMA models need to be re-quantized to work with newset code (see https://github.com/oobabooga/text-generation-webui/issues/445)

Edit
Pub: 20 Mar 2023 10:36 UTC
Views: 1024