Vector Storage Quick Rundown

What's an embedding model

An ML model that embeds texts into a vector space (that is, it codifies a text into a sequence of numbers).

The idea is then calculating the geometric distance between texts for like, text searching. Or activating lorebook entries.

What's vector storage in ST

Instead of wrangling keys, ST can use an embedding model to determine which lorebook entries to activate, by comparing the last few messages in context and the contents of the LB entries.

Note: if a vectorized entry has keys, and those keys show up in the chat, it still gets activated like a normal entry. The vectorization just gives you a second way of enabling the entry.

How do I enable vector storage on LB entries

When presented with the "Strategy" options Blue, Green and Chain, select Chain.

chain

How do I configure vector storage

Go on the extensions menu (the three boxes), select Vector Storage, and play around. Pay attention to the World Info options!!! Vectorization ignores World Info entries by default!!!

How do I run embedding models locally

You can download embedding models on huggingface, like https://huggingface.co/Qwen/Qwen3-Embedding-0.6B-GGUF (~600mb at Q8) or https://huggingface.co/Qwen/Qwen3-Embedding-8B-GGUF (~8gbs at Q8).

Download llamacpp (if you know which download you want, great, otherwise the normal CPU version is fine) and then run the llama-server binary with ./llama-server --embedding -m /path/to/your/embedding.gguf --pooling cls on powershell/console/etc. Add -ngl 99999 to the end of the command to move all layers to GPU if you downloaded a binary with GPU support (adjust the number to reduce the number of layers).

You CAN run embedding models on CPU though. It's not as annoying as running LLMs.

Then you go on the extensions menu and configure ST to use llamacpp instead of "Local (Transformers)" (which runs a ~130mb 2023 embedding model on your browser).

llamacpp config example

It should also be possible to run embedding models on koboldcpp but I couldn't figure that out lol.

ST caches your vectorizations WITH COMPLETE DISREGARD FOR WHAT MODEL IT IS THAT LLAMACPP IS ACTUALLY RUNNING. Flush the cache whenever you switch embedding models by going to data/default-user and nuking the vectors folder.

What embedding model does ST use by default

https://huggingface.co/Cohee/jina-embeddings-v2-base-en

You can double check by going on https://github.com/SillyTavern/SillyTavern/blob/release/default/config.yaml and checking the value for extensions.model.embedding (just CTRL+F embedding and it should show up)

Do bigger models actually perform better

Yeah IMO. Very nice!

Also because of how agressive the caching is you don't really have to worry that much about the compute.