RAGs and co.

splitclover@proton.me mail me some time

About this rentry

I opened this rentry to share some of my experiences on using RAG in sillytavern, I hope that some will find this helpful. To sweeten the deal I even have a cheeky hack to add proxy support to use the openai model to outsource/speed up the process it takes to set things up.
Knowing that most would want to skip to the good stuff I'll have the guide just below this paragraph, but if you're interested you could read up on my understanding on how it works and some details and quirks I encountered.

Guide

I think this is only available on the staging branch for now, everything I've done here was using version 1.12.0

Let me start by giving a short overview on how everything works.

Data bank

First there's the "Data Bank" which contains all the files you want to use to augment your reply. This is where you upload your files or add websites to scrape.
Open it in this menu here:

Open Data Bank

This should open a menu of all your attachments. This should be self-explanatory but there are three categories: global, character and chat attachments.
Note that you can transfer your attachments from one section to another, but this removes the calculated data (vectorization step) so I'd recommend against this.

Let's try to add some files of our own. I'll use the built-in scraper to add the contents of a wiki page, but the other options should be intuitive as well. The notepad allows you to simply copy and paste content directly.

Add attachment

Once that's done you should see your file(s) added. If you are scraping or are adding files other than .txt I'd recommend looking into the attachment to edit out any unwanted sections that might have been added. Sites including wikipedia or fandom like to add redundant text at the bottom like references, I usually delete them since they don't add anything.

Added attachment

Here you can also:

  • disable/hide an entry without deleting it
  • move the entry
  • download the plaintext
  • or delete the entry

Once that's done you can close the tab and head to the next step to actually use the files in your prompts.

Vectorization

To keep things short, SillyTavern needs to calculate which sections of your files are relevant and to include in your prompt. This step is called vectorization, basically the text gets cut into smaller sections (usually 2500 characters - not tokens), that get translated into lists of numbers that summarize the contents.
Open your extensions tab, there you should see "Vector Storage".

Extension tab

Opening it might seem daunting at first, but let's review this step by step.
First there's the vectorization source:

Vectorization source

Unlike llms, local models for this usecase have caught up significantly and have the same quality as models like oai.
One thing I encountered while using local is significant lag when processing large files, after all, the model used is 137M parameters. I occasionally had to restart the instance because of this, which is why I switched to other sources (OAI). I've hacked together some sloppy reverse proxy support since ST doesn't have that, see oai reverse proxy support.
Once you've chosen a source (local is fine), you should check this box to include files:

Checkbox

This opens another huge menu, but I'll spare you the details here - I might explain the values later.
What matters is the button labeled "Vectorize All":

Vectorize all

This button starts the translation step which might take a long time depending on the size of your file(s).
Press it, and while you're waiting you could read about the injection settings in the following section.

Injection settings

Because RAG only includes a few relevant excerpts of all your files, you might want to tell the AI that it might possibly not have the full picture.
The default injection template works fine, but I like to wrap it in xml and add some instruction somewhere else to reference that.
One thing I really dislike is the lack of available inject positions, I set it to main prompt because in-chat doesn't fit my needs. There's also a setting to set the role.

Finishing steps

At some point a completion message should pop up, saying that it vectorized all files. In case you missed it, you could press the vectorize button again - it won't run twice if it finished correctly.
If you were to send a message now, things should work and the excerpts should get sent correctly. You should check this inside your console or here to confirm.

I'd recommend reading a little about the available parameters next, but the default ones should be fine.

About parameters

I'll talk about some parameters here, ordered by importance, but keep in mind I might be talking nonsense here.

Parameter Explanation
Query messages This is the amount of chat messages to use in scanning, these messages decide what is deemed relevant in your files. I prefer to set it to 1 so only the recent user message gets used.
Retrieve chunks (under Data Bank files) Because the files get split into chunks, this value decides how many of those chunks to include in your prompt. More chunks means more token usage.
Chunk size (chars) (under Data Bank files) This is how big the chunks are. Keep in mind that characters are used instead of tokens. You might need to purge and vectorize again if you change this. Smaller chunks result in a "higher resolution", the sent excerpts would be more fine-grained, but it takes a lot more time to vectorize. I keep it at 2500 for large files.
Translate files Most embedding models are all trained on English, if your files aren't, this could be helpful.

About oai proxy support

I wrote some rudimentary reverse proxy support for the oai model "text-embedding-ada-002" to outsource the computation, use this at your own risk.
This relies on a few dirty workarounds, but because I doubt that the devs will add something like this - might as well share it here. I'm not a coder by any means, but I tried to make things as simple to install.

Setup

First you should check if your proxy even has the valid model available, otherwise this won't work. Select the proxy with the oai endpoint (should end in "/openai"). When you press connect look at the console for this model:

text-embedding-ada-002

So things work properly SillyTavern must think that we have a valid oai key, enter some random letters in this box:

Api key here

Make sure to select Oai and press connect, use the oai proxy endpoint just in case. You can switch to claude after this, don't worry.
Reload the page, the box should look like this:

Key saved

At this point you can switch back to claude.
Now you need to replace a file that is responsible for vectorization, download it here: [openai-vectors.js - V1]
Once that's done, open it inside a text editor, you need to edit values like ratelimit, proxy url and proxy key:

  • Replace the number with your desired ratelimit, if the proxy page says the ratelimit is 4, set it to 3 otherwise things will break because it might hit the limit.
    Ratelimit
  • In line 37 replace the -URL here- with your proxy url, make sure to delete the surrounding "-"
    proxy url
  • Just below this there's also the proxy key, replace this too. It should look like this: `Bearer 8ab7e64c-8556-4b44-9253-6f885be5fbc1` or `Bearer password`
    pass

Fingers crossed everything you entered is right, if not, then it's probably the url.
Next open your sillytavern installation in a file manager, navigate to /src/vectors/
Rename your downloaded and modified file to "openai-vectors.js" and replace the existing file with it.
Restart your sillytavern and the oai vectorization source should work now. Keep in mind that the rate limit causes long wait times when vectorizing lots/large amounts of data.

In-depth stuff

lazy... tired... maybe later
migu

Edit
Pub: 08 May 2024 20:43 UTC
Edit: 09 May 2024 08:50 UTC
Views: 734