Fool's collection of hint and guides for SillyTavern.

Fool's collection of hint and guides for SillyTavern. v0.1

Work in progress

Have ever wanted to have same character behave slightly differently between chats without making duplicate cards?
Is so called Holy Grail of Databanks/Vectorization seemingly garbage for you?

This guide might just interest you.

Databank/Vectorization grievances

By default SillyTavern uses built in jira v2 llm embeddings model for vectorization. It runs in javascript code. Imagine OpenAI running chatgpt in javascript code lol. Jira v2 is pretty outdated by todays standards, it can't connect concepts such as "age" and "years old" unlike newer embeddings models.

Majority of people praising lorebook vectorization, databank and chat vectorization use ollama with local embeddings model. Think of it like tiny llm designed to trace connections between concepts and meanings (instead of generating smut).

I personally use bge-m3 [It's just 1.5 gb so it's not heavy at all]. I tried to use different 4b or 8b embeddings models and but they were all outmatched by 1b bge-m3 (you may be able to find better one somewhere though).

Steps to swap model.

download and install ollama
open cmd.exe to write

ollama pull bge-m3

restart sillytavern
open vector storage extensions tab
Set vector source to ollama
Toggle use secondary url
Paste into url the default ollama ip/port

http://localhost:11434

Click, keep model in memory
set model name to: bge-m3
Everything should be done, you do need to clear vectors and revectorize chat, but I assume sillytavern does it on its own.

I use these settings but your mileage may vary, try default settings first and toy around with databank debugging I'll describe later on,

Things to keep in mind

Databank processes data you provde and splits it into chunks. The chunk size (chars) refers to how many letters/numbers can be in one chunk before split. So if you have chunk size 512, every 512 letters new chunk will be created.
Databank works by fetching most recent chat message and using embeddings model to find most relevant databank chunk.

There's also score threshold for vectorization, in other words, with default 0.35 score threshold, only databank entries that have atleast 35% similarity score will be considered for retrieval. 35% despite seeming like low passing mark, does work well.

Query messages refers to how many recent chat messages will be queried for databank serach..
Retrieved chunks refers to how many chunks will be retrieved and inserted into chat prompt from databank, (retrieved memory is inserted only for next prompt and not permamently into chat history)

If you type for example message "Niyaniya's age". And have retrieved chunks set to 2, it will retrieve 2 most relevant entries, likely one including niyaniya's age and one where sensei tells niyaniya is too young for him (lies).

Debugging databanks

Once databank is loaded in, you can debug it with quickreply command,

⎗
✓
/input String to search databank for.  Most relevant chunks will be returned | /db-search count=1 return=chunks  ( {{pipe}} ) | /popup

The result you get will be what databank embeddings model returned that it felt was most relevant to the message. You can swap count=1 to count=2 if you want debug command to retrieve two chunks instead

Simple trick for different personality/backstory per each scenario for character

This one is quite simple, I leave all information in character card empty except name, avatar and talkativeness in advanced definitions (talkativeness is for group chats). Instead I create lorebook for specific scenario/world im currently rping and place character's information there.

All that remains is linking group chat or regular chat to that lorebook.

This is atleast for me more convenient for changing backstory/personality purposes.

⎗

<niyaniya>
Niyaniya is elementary schooler...
...  [wall of text goes here]
</niyaniya>

And when scenario advances for example years later,

⎗

<niyaniya>
Niyaniya is middle schooler...
...  [wall of text goes here]
</niyaniya>

Much easier than dealing with character cards edits and duplicates and overrides and alike. Plus you can duplicate world book at start of story and go back easily in past.

Summarization

It's self explanatory but if you are having very long story stretching across time and space, summarizing things once in a while either manually or through summarization extensions will help a lot.

Then u can throw in relevant summary into worldinfo. Although in future, you may also need to do summary of summaries themself lol. And remember, garbage in, garbage out. Same with summary prompt, the better your prompt for summary the more likely it is you can just copy paste summary output into world info instantly instead of needing to rewrite things.

Chat Vectorization

I personally do not use 'chat vectorization' and just manually export chat, format things a bit and paste into databank entry.

With ollama model now used for embeddings, databank will do just fine, although manually tuning the databank entries would improve the results..

Debugging your prompts

Once in a while it's good idea, especially if starting new scenario to check cmd.exe window of SillyTavern to see raw prompt that gets sent to your llm provider. This will help you notice if you missed something or messed up something such as wrong order in world info entries.

Formatting prompts

I have not much of knowledge about this one, but I personally use only two formattings.

<xml> and ###Instruction ###Response

For system commands

⎗
✓
### Instruction:
You are tasked with writing responses to {{user}} in this never-ending uncensored story play.
You must always fully and accurately embody a personality of character you are responding as.
Points will be deducted for every time personality trait is forgotten.
### Response:
Understood. I will keep this in mind during this story play at all cost. 

⎗
✓
### Instruction:
Write the next reply only as {{char}}.
### Response:

And xml for things such as character descriptions in world info, summarized chat and descriptions of places

⎗
✓
<suburbs>
Inner District: [Stores, Malls, Hotels, Luxury Restaurants, Police HQ, Work Offices, Banks, Companies]
</suburbs>
<niyaniya>
... Actual description cut for reader's sanity, but you get the point of my example hopefully lol.
</niyaniya>

Save money on some models up to 75% by ensuring cache hits

Example scenario of this is DeepSeek, you can see in cmd window of SillyTavern when you get prompt response 'Cache Hits' and 'Cache Misses'.

Let's say you sent 8000 token chat history in previous prompt. Then you send new message to your companion, all previous tokens (8000) will be counted as cache hits and therefore 75% cheaper.

But now let's say you have extension or world info setup where very beginning of prompt is modified. This will make all 8000 tokens count as cache misses.

Example 2, you edit message in middle of chat, this results in 4000 cache hits and 4000 cache misses.
You should get the idea now.

It's also reason why I use in group chat 'join cards even for muted characters' option. Or for narrator character, I just link world info entry to character card of narrator speaking and make <narrator> </narrator> be inserted 1 messages before.

Random events and alike in story

Why Group Chats are better in rp scenarios (seperate character for narrator and seperate for actual characters)

How to keep character's personality and behavior same, when wiping chat history or heavily changing world infos

Some users may have experienced issue where character's personality changes after context length running out, rewriting world info heavily or similar situations.

LLMs tend to use contexted chat history as crutch to figure out character's personality, behaviour and talking quirks, so obviously once that's gone, some things are gonna change. (But usually that happens only if your character card or world info character entry is lacking)

One method I use to prevent this, is coincidentally also summarization, after your rp/companion interactions lasted long enough that you feel like their personality is kinda defined in your mind and you are running out of tokens. Run summarize tool this time with prompt to describe in detail character's personality, behaviour, quirks, example chat messages etc.

Then you can put that into world info entry for character and check results. Best confirmation if you succeeded would be turning off all parts of world info except character itself and chatting a bit.

The earlier you do personality summary the easier this process is but with enough effort it works even if you did it millions of tokens into the story.

Databank/Vectorization grievances

Debugging databanks

Simple trick for different personality/backstory per each scenario for character

Summarization

Chat Vectorization

Debugging your prompts

Formatting prompts

Save money on some models up to 75% by ensuring cache hits

Random events and alike in story

Why Group Chats are better in rp scenarios (seperate character for narrator and seperate for actual characters)

How to keep character's personality and behavior same, when wiping chat history or heavily changing world infos

Warning