Llama 4Bit Lora Support
Quicklinks:
- alpaca-lora-7b : https://huggingface.co/tloen/alpaca-lora-7b
- alpaca-lora-13b : https://huggingface.co/chansung/alpaca-lora-13b
- alpaca-lora-30b : https://huggingface.co/chansung/alpaca-lora-30b
- alpaca-lora-60b : Not out ?
Only Inference support
(all from https://github.com/oobabooga/text-generation-webui/issues/332)
Install custom 4bit Inference peft fork : pip install git+https://github.com/Curlypla/peft-4bit-fix
Lora 4bit Training support
(all from https://github.com/qwopqwop200/GPTQ-for-LLaMa and https://github.com/johnsmith0031/alpaca_lora_4bit)
Install custom GPTQ fork :
git clone https://github.com/Curlypla/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
python setup_cuda.py install
Install custom peft fork : pip install git+https://github.com/Curlypla/peft-GPTQ
With the new changes, LLaMA models need to be re-quantized to work with newset code (see https://github.com/oobabooga/text-generation-webui/issues/445)