Last 3 changes

16/12/2024 - Added debug section
15/12/2024 - Made rentry, added general tutorial

TOTAL PROXY DEATH

This is a general tutorial on how to optimize prompt caching on SillyTavern (ST). Since most Claude proxies are basically dead, many people have resorted to paying for access to Nonnet and Opus. These models (especially Opus) can get very expensive since they are not priced for RPs, i.e Anthropic's target userbase does not swipe ten times at 32k context. Fortunately, anthropic has a price reducing feature called prompt caching (see Anthropic docs).

Prompt caching, in very simple terms, saves the previous prompt a user has sent for 5 minutes and then reuses this on the next generation. However, the caching implementation on both ST and OpenRouter (OR) is absolute dogshit so we need to do some extra wrangling to make this work.

I will try to keep this as simple as possible and just focus on how to use this properly on ST rather than focusing on how prompt caching works. For a more in depth explanation of prompt caching and how it works on ST check out the OG guide.

Note: A lot of the settings shown in this tutorial may not work for you. You will need to adapt them based on your card and jb. I will try to clearly explain how to adapt this based on your settings.

How DO I cache (stolen from OG guide)
General Settings
Debugging
Proof of Concept
Future updates

How DO I cache (stolen from OG guide)

Make sure this commit is merged into your ST: https://github.com/SillyTavern/SillyTavern/commit/54db4983f4663d77db79ec1246888a5791bdb619 (that is, make sure you've git pulled staging after the date on it). You can also just check if cachingAtDepth shows up in your default/config.yaml.

Edit your config.yaml to ensure the Claude section reads somewhat like

⎗
✓
claude:
  enableSystemPromptCache: false
  cachingAtDepth: 0

where cachingAtDepth is SOME non-negative number that MAY OR MAY NOT be 0. Depends on where you like to make your injections at depth and your PHIs and etc.

General Settings

This sections is about how to get prompt caching working for most users. As long as you don't have some extreme bloat in jbs or some crazy lorebook with random injections, it should work for you. The goal here is simple, try to get rid of any randomness between prompts.

Set cachingAtDepth to 8.
Get rid of {{random}} or dice rolls.
Set ALL lorebook entries are to constant or disabled (change the green dot to blue). If a new entry is added to the prompt, it will not trigger the cache.
If you are using groupchats then join the character cards, don't swap.
Make sure the Author's note has insertion frequency of 1 (read more about author's note in the debug section).
Important The context size in ST has to be set to a GREATER value than the chat context. This is because if an old message leaves the context, the cache will not trigger.

If you follow all these steps, the caching should work for swipes and new messages.

For all my Adventure RP chads who have 1000+ message roleplays: I recommend setting the ST context size to 20-25k. Summarize the chat and, then start a new chat. I don't know much about summarization so try and follow this summarization guide.

Debugging

If your prompt cache is not WRITING, then try these solutions. If the prompt caching is disabled, your generation details on OR should look like the linked image: https://files.catbox.moe/ggvitb.png. The caching discount should be --. If it is a negative or positive number, then caching is enabled.

I have received emails and read 4chan posts from many Anons about possible debugging techniques, so if your caching is still disabled, try these.

Change caching depth in the main config.yaml file. the ST folder has two of these files, one should be in the main folder where the start.bat file is and the other in the default folder. Some anons have changed both to 8. My settings are 8 in the main folder (the one with start.bat), and -1 in the default folder, I haven't touched this.
On OR, change provider to Anthropic. In the API connections tab in ST there should be a drop down below where you select your OR models called 'Model Providers', only add Anthropic to it.
Really Stupid!!! If it is still not working, set your cachingAtDepth to 8 and make sure your author's note is injected at depth 4 and non empty (just add any character to it, mine is just :)). I have no real explanation for why, but for my chats with empty author's notes or author's notes added After main prompt, OR just doesn't write the cache. So just try this. You can try at different depths also, but I just spent $2 just debugging this and don't want to waste any more money. Let me know if you know why this happens or if you found more stupid bugs like this.

Proof of Concept

Here I have a proof of concept for this caching method. On 15th December 2024 I called Nonnet 76 times at minimum 12k context. Under normal Nonnet prices, this should cost me $2.75. However, all these prompts only cost me $1.11 which is a 60% decrease. Remember that 12k was just the minimum context, the average context was 14,073 which would cost $3.21 for 76 prompts. $3.21 to $1.11 is a 65% decrease.

Watch this video for proof: https://files.catbox.moe/vqvh09.webm

Future updates

I intend to add a more in depth explanation for how caching works exactly and how to customize it for your own cards, lorebooks, and jbs. For now this should do. If you guys are having issues contact me and I will try to help.

TOTAL PROXY DEATH

How DO I cache (stolen from OG guide)

General Settings

Debugging

Proof of Concept

Future updates

Warning