How to run OpenAssistant locally on an NVIDIA GPU on Windows

  1. Download and install git
    https://git-scm.com/downloads
  2. Download and install python
    https://www.python.org/downloads/
  3. Download CUDA drivers for your GPU
    https://developer.nvidia.com/cuda-11-7-0-download-archive
  4. Download the text-generation-webui one click installer (Windows)
    https://github.com/oobabooga/one-click-installers/archive/refs/heads/oobabooga-windows.zip
  5. Run install.bat and follow the instructions
  6. Run download-model.bat and type K for "none of the above"
  7. Type in OpenAssistant/oasst-sft-1-pythia-12b, press enter and wait until everything downloads. The whole model has 24+ GB so it will take a while.

    Alternatively, if the above method doesn't work for any reason, you can open up the command line (cmd) from windows search and run git clone https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b and after it's downloaded, move the whole oasst-sft-1-pythia-12b folder to one-click-installers-oobabooga-windows\text-generation-webui\models

  8. Launch start-webui.bat
  9. It will automatically load the model. In case you have more models downloaded, it will ask you which model to load, select oasst-sft-1-pythia-12b
  10. Once it's loaded, you will see a message like "Running on local URL... "
  11. Enter that URL in your browser and you will be in the text generation UI, ready to start chatting

Some tips

The model will likely run out of VRAM after several responses. Just press stop and clear history and you can continue chatting.

You can try playing with the model parameters in the "parameters" tab to figure out the best ones. There you can also set how long the OA's response will be by tweaking the "max_new_tokens" parameter. I'm currently using these and they seem to work alright: temp 0.7, repetition_penalty 1.17, top_k 40, top_p 0.1, max_new_tokens 500

Edit Report
Pub: 20 Mar 2023 23:44 UTC
Edit: 21 Mar 2023 15:41 UTC
Views: 3042