How to run OpenAssistant locally on an NVIDIA GPU on Windows
- Download and install
git
https://git-scm.com/downloads - Download and install
python
https://www.python.org/downloads/ - Download
CUDA drivers
for your GPU
https://developer.nvidia.com/cuda-11-7-0-download-archive - Download the
text-generation-webui
one click installer (Windows)
https://github.com/oobabooga/one-click-installers/archive/refs/heads/oobabooga-windows.zip - Run
install.bat
and follow the instructions - Run
download-model.bat
and typeK
for "none of the above" -
Type in
OpenAssistant/oasst-sft-1-pythia-12b
, press enter and wait until everything downloads. The whole model has 24+ GB so it will take a while.Alternatively, if the above method doesn't work for any reason, you can open up the command line (cmd) from windows search and run
git clone https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b
and after it's downloaded, move the wholeoasst-sft-1-pythia-12b
folder toone-click-installers-oobabooga-windows\text-generation-webui\models
- Launch
start-webui.bat
- It will automatically load the model. In case you have more models downloaded, it will ask you which model to load, select
oasst-sft-1-pythia-12b
- Once it's loaded, you will see a message like "Running on local URL... "
- Enter that URL in your browser and you will be in the text generation UI, ready to start chatting
Some tips
The model will likely run out of VRAM after several responses. Just press stop and clear history and you can continue chatting.
You can try playing with the model parameters in the "parameters" tab to figure out the best ones. There you can also set how long the OA's response will be by tweaking the "max_new_tokens" parameter. I'm currently using these and they seem to work alright: temp 0.7, repetition_penalty 1.17, top_k 40, top_p 0.1, max_new_tokens 500