LongGameAnon's Homepage
LongGameAnon's Retard Guide to using Oobabooga and llama2 with SillyTavern
Disclaimer
| This guide was made with using Windows+Nvidia in mind. This guide will assume you have a GPU with a minimum of 8GB of VRAM.
| This guide is for the quickest, easiest, simplest way to get your llamas working in SillyTavern with your bots. If you want to know more and have more options read the links below.
Helpful links
Models
Other setup guide
Llama guide
Community and ways to ask for help
Table of contents
Step 1: Download Oobabooga Text-gen Webui
1.) Get latest One-Click Installer (select the correct one for your OS)
2.) Extract the installer
Step 2: Run Ooba
1.) Double click start_windows.bat (or the respective start file for your OS). This will install everything you need and start Ooba (this might take a minute).
2.) During the install, you will be prompted to select your GPU type, select the one that applies to you.
3.) Once complete, you should be able to browse to the Ooba interface at http://127.0.0.1:7860.
4.) Congrats you now have a llama running on your computer!
Step 3: Download a Model
In Ooba (Easy)
1.) Go to the models page and paste TheBloke/Llama-2-7B-GPTQ
(or your model of choice) into the "Download custom model or LoRA" field, then select Download.
Manually (Harder)
1.) Install Git
2.) Download this model here. Llama-2-7B-GPTQ
To download, select this menu
Then select "Clone Repository"
Then copy and run the following commands
3.) Move the entire model folder that you downloaded in the previous steps into the "Models" folder of your Ooba install.
.../oobabooga_windows/text-generation-webui/models
Step 4: Loading a Model
1.) Select the "Model" tab in the web ui and select your model from the dropdown and ExLlama as the model loader.
2.) You should see the words "Successfully Loaded ..."
Step 5: Getting Ooba into Silly Tavern.
1.) In Ooba, go to the "Session" tab and check the API boxes for both extensions and command line flags
Click Apply and Restart when finsihed
2.) Open SillyTavern and click here on your api/plug:
3.) Select Text Gen WebUI (Ooba) and paste the localhost endpoint.
You should see a green light and the name of the loaded model if you did this correctly.https://files.catbox.moe/p0z28e.png
4.) For your presets select one of the NovelAI presets as they are usually decent (tweak as needed).
And with that you are finished!
Other considerations
What models can I run?
Note: Exllama has improved VRAM usage, but these are still a good rough guide for what you will need.
I want moar context
Advances in local model development have introduced new methods of squeezing additional context space out of existing models.
See this guide for how to set up NTK to expand your context.