How to run OpenAssistant locally on an NVIDIA GPU on Windows

Download and install git
https://git-scm.com/downloads
Download and install python
https://www.python.org/downloads/
Download CUDA drivers for your GPU
https://developer.nvidia.com/cuda-11-7-0-download-archive
Download the text-generation-webui one click installer (Windows)
https://github.com/oobabooga/one-click-installers/archive/refs/heads/oobabooga-windows.zip
Run install.bat and follow the instructions
Run download-model.bat and type K for "none of the above"
Type in OpenAssistant/oasst-sft-1-pythia-12b, press enter and wait until everything downloads. The whole model has 24+ GB so it will take a while.

Alternatively, if the above method doesn't work for any reason, you can open up the command line (cmd) from windows search and run git clone https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b and after it's downloaded, move the whole oasst-sft-1-pythia-12b folder to one-click-installers-oobabooga-windows\text-generation-webui\models
Launch start-webui.bat
It will automatically load the model. In case you have more models downloaded, it will ask you which model to load, select oasst-sft-1-pythia-12b
Once it's loaded, you will see a message like "Running on local URL... "
Enter that URL in your browser and you will be in the text generation UI, ready to start chatting

Some tips

The model will likely run out of VRAM after several responses. Just press stop and clear history and you can continue chatting.

You can try playing with the model parameters in the "parameters" tab to figure out the best ones. There you can also set how long the OA's response will be by tweaking the "max_new_tokens" parameter. I'm currently using these and they seem to work alright: temp 0.7, repetition_penalty 1.17, top_k 40, top_p 0.1, max_new_tokens 500

How to run OpenAssistant locally on an NVIDIA GPU on Windows

Some tips

Warning