BoT (Balaur of thought): The friendly manual

If you're looking for download links and/or installing instructions, they can be found HERE.

Wait what?

BoT is a Silly Tavern Script and was initially made available on the r/SillyTavernAI subreddit under the name VoT at version 3.2. Since then I learnt some proper romanian spelling and had to rename it to BoT, which stands for Balaur of Thought.

BoT is meant to enhance the RP experience, be it E or not by forcing short-term memory, basic logic and common-sense on (modern) LLMs. Just keep in mind this is experimental work, and your mileage may vary for a specific combination of LLM, samplers configuration, persona, character card, writing-style, and expected results.

The idea behind it comes from the observation that LLMs like llama-3 or mistral are fairly decent when directly asked basic logic questions, but fail catastrophically at it during RP, when left on their own. The obvious thing was then to directly inject specific information into the context. Then crafting prompts that force the LLM to reflect upon certain things. One thing led to the next and BoT became a thing.

What does it do exactly?

BoT is basicallly a collection of prompts within a qr-set (bunch of scripts) that ask questions to the LLM and inject the replies into the context prior to each reply. This should aid the LLM to produce more situationally-aware responses.

In a nutshell

I recommend you use BoT on a fresh new chat.
On chat load, BoT will decide whether the character card belongs to a single character or to multiple characters and format prompts specifically for each case. The LLM is not used for this, instead, if the {{char}} macro cobtaibs " and ", "And ", or "&" the card is considered multi-character.
Upon your first message, a number of things will happen:

  1. A compact comment titled Scene analysis will appear in the chat.
  2. A compact comment titled Spatial analysis will appwar in the chat.
  3. A compact comment titled **Dialog analysis will appear in the chat.
  4. A compact comment titled **Branching analysis will appear in the chat.
  5. The actual reply from the LLM-driven character will appear. Ideally, following all prior guidelines, thus improving the chat.

From the second user mesaage on, something slightly different will happen. First, all prior analyses will dissappear from the chat. Then, steps 2 to 5 will occur. This is because the scene analysis is generated only once.

In detail

When a chat is opened, BoT attempts to determine whether it involves a single-character (somdthing a character card named John Doe) card or a multiple chars in a single card (like a single card named Jane and John Doe). Either way prompts are adjusted.

In the following tje terms user's last message and chsr's last message are used. These refer to last part of the roleplay text messages. Image generations, comments and nites and the like are not taken into account.

When user sends a message, a number of things happen:

  1. If the scene analysis is enabled and no scene analysis exists, a /gen command is triggered, asking about place and time, as well as char's abilities and impairments and accents. Once the analysis is generated, it is stored and if mindread is enabled for scene, it is inserted in the chat log as compact, invisible to the LLM comment.

. If spatial analysis is enabled, it is similarly performed with one key difference: If no prior analysis exists, a "first spatial analysis" prompt is used; this prompt encourages LLM to hallucinate answers or parts of answers it does not have sufficient information to reply.

This can be useful when for instance a persona definition has no mention to user's clothes.If left on it's own a LLM could infer that since there are no references to it, user is then naked (in the center of Tokyo in a cold winter night, right). The mild hallucination method forces the LLM to consider a larger amount of text when infering clothing, which is necesary for a plausible answer.
If a previous spatial analysis exists, it is ephemerally injected immediately before char's last message (depth is dynamically calculated). Also a different prompt (with the same questions but different wording) is used. Both measures ensure (or at least facilitate) that the LLM keeps analyses consistent across successive messages.
Another key difference with scene is that spatial analysis is generated for every user message. After a spatial analysis is generated if mindread for it is enabled the result will be displayed as a compact LLM-invisible comment.

  1. If the dialog analysis is enabled, a /gen command is used to prompt

Them fancy buttonz!

Additionally, a toolbar with a numbet of icons are attached to the top of the input bar:

  • [👀] - Viewing menu
  • [🧠] - Analyses menu
  • [🤔] - Rethink menu
  • [📝] - Edition menu
  • [🛠️] - Additional tools
  • [🗃️] - Databank/RAG
  • [❓] - About

The choice of emoji icons is a way to make BoT mobile-friendly, since I am a mobile user myself.

[👀] Viewing menu

Opening it will give access to 2 submenus:

View prompts: A list of the differebt prompts will be displayed. Upon selecting one it will be displayed as a compact comment on the chat, with color-coded text.

View analyses: An analydis type has to be selected and a message number input (The message number corresponds to a user message). The analysis is then displayed as a popup.

View injections: Slightly different from the others, this will not display what's actually being injected on any spcific message ID, rather show you the syntax of the injection. Injections are classified in: Scene, prior spatial, current spatial, dialog and branches. Like in sibling submenus, the selected item is appears in the chat log (hidden from the LLM) in a color-coded format.

[🧠] Analyses menu

This menu allows user to customize the behavior of analyses, but in order to understand it's use, there's something you need to know:

  1. Every analysis generated incorporates the results of all previous analysis of the same batch in their prompts.
  2. To generate the character reply, the fact that the analyses are on the chat or not is irrelevant. They are put in the chat just so you have somethibg to read While the reply generates. Instead, analyses are injected in the context in-chat at depth 0, that is, after the last message.

Knowing those two things, it vecomes obvious that there's more than a way to do the same prompting-injecting drill. The analyses menu is meant to configure exactly how:

  • Analyze: Allows to enable or diaable different types of analyses. If disabled, no new analysis of that type will be generated, shown in the chat, nor injected into the context.
  • Mindread: Allows to show new analyses in the chat or not. When disabled, a new analysis of that type may be generated if enabled but not shown in the chat.
  • Injection: Allows user to customize which type of analysis is injected into the context.In order to be generated, an analysis type needs to be enabled in the analyze submenu.
  • Delay: Again, an explanation is due: Users have reported issues with some backends complaining of too many requests. Why services nerf user experience is anyone's guess, but BoT attempts to work around it by (naively maybe) introducing a delay between generations. This submenu allows user to toggle delay ON and OFF and to specify the delay time in milliseconds. This is OFF by default.

[🤔] Rethink

Similar to the regenerate function of ST. It allows finer control on what is being rethought ans how. Rethink all has the same functionality as in prior versions, that is deleting the last batch of analysed (spatial, dialog and branches) and regenerating them from scratch and then regeneraring a char reply.

Individual analyses can be rethought, if so, the user will be prompted for an additional instruction to be added to the prompt in order to avoid repeating the error. If thr additional instruction is empty or cancelled the analysis will be generated anyway. Either way the new analysis replaces the old one.

Finally just the last char's message can be rethought (the last message of each character in a group chat) which, on message ID>0 (after character's greeting message) performs the same ephemeral injection into the context that made the original message. It requests an optional additional instruction (cancelling at this point makes the LLM generate the alternative message without any additional instruction) and displays the result. The new one (only if accepted) is added as a new swipe when possible, or replaces the previous one shen swipes are not available; swiping is only available on the last message in the cbat log, in a group chat it is possible to rethink the last message of a character that was not the last to speak.
If on a chat with a single card and at char's greeting this does a radically different thing, which is to create a new greeting message for the haracter, taking uswr's persona into account (and hopefully not speaking for the user)

[📝] Edit menu

This one gives a lot of fine control on the orompts and analyses. It contains 3 submenus, bamely edit prompt, reset prompt, and edit analysis. But first some context regarding prompts.

For each analysis, there's a prompt that generates it. A prompt is typically made of three types of text:

  • Common strings such as the "do not roleplay..." at the beginning of each one.
  • A prefix such as the "... keep spatial awareness in mind..." for spatial analysis.
  • Questions such as "What items of clothing is {{char}} wearing?"

The prompt edit and reset allow user to control every individual bit if text that makes up each prompt.

  • Edit prompt: The different analyses types are displayed and in selecting one the individual bits can be modified here. If the modified bit is a question the user might enter a label for it afterwards, this label is only for the user to navigate through prompt bits with ease and the LLM has no access to labels.
  • Reset a prompt: It will display a submenu like the one for prompt editing, and upon selecting an analysos type a list of the edited prompts will appear (or a warning if there is none). If the specific prompt bit to be reset has a custom label, the label will be reset to the default value as well.
  • Edit analysis: A list with the analyses types is displayed. Only the last batch of analyses can be edited. The modified version replaces the old one in both injection and chat (if mindread is on). Keep in mind that for an edited prompt to have any effect, char's last message should be rethought with the mew analysis version.
  • Edit inject: Similar to edit prompt, The result of every analysis that's set to be injected in the config menu is wrapped in a series of strings, namely ptefix, header and suffix. This option allows the user to efit said strings.
  • Reset injection: Again the same as the prompt equivalent, it allows user to revert injection strings to their default value.

[🛠️] Additional tools

These are simple but convenient RP tools.

  • Interrogation: This allows the user to either ask the LLM why char's last reoly was the way it was; or lets user input a question manually.
  • Rephrase: Useful when char uses a different person or tense from what it's been using. You can also specify an instruction manually. You can choose which in the submenu. Once generated, the rephrased text will appear on a new input, so you can proofread it and modify it. If accepted, it will replace last character's message. Cancelling will discard the rephrased version and keep the old one instead.
  • Translate: NOT meant as an alternative to ST's built-in translation. This exists instead to evaluate the LLMs translation capabilities.
  • Sync: Similar to the like-named slash command. It replaces all instances of old {{user}} macro with the new value of the same macro in the prompts and unjections. If you want to sybc old messages, you still need to use the /sync slash command.

[🗃️] Databank

This builds upon vector storage and databank extensions, so make sure they are installed (they're bundled with ST by default) and enabled (Vector storage is disabled by default).
There are some pretty good explainers out there on what RAG is and how it works. In simple terms you can take a bunch of files, vectorize them and then automatically retrieve them when they're relevant and put them into the context. It's kinda like Wi/Lorebooks that doesn't rely on keywords but on similarity instead. This is what vector storage + databank do in tandem.
What BoT does in this front is to help organize and manage databank entries, avoid contradictory information and basically make user's lige eadier.

In BoT databank entries represent memories, skills and whatnot that a character(/s) have/know. This is due to BoT being oriented to RP.
Every entry has the text itself and a topic. The topic has a number of functions:

  1. To be used when listing entries.
  2. Databank name, used forvretrieval on RAG. Databank allows duplicated wntry bames, which is good for many applications, but not for RP.
  3. BoT does not allow two entries on the same topic, this is to avoid contradicting statements. If user attempts to create an entry on an already existing topic, BoT will ask to merge both entries, replace the old with the new, or discard the new and keep the old.

While a copy of each entry is kept in an array for edit and viewing the actual databank entries contain some extra strings, namely prefix, header and suffix. The structure is: [prefix] [header] [entry][suffix]. The user can edit these strings, giving it enough flexibility to imolement what works better for a given LLM.

Databank entries can be stored for the character or for the conversation. Due to the databank plugin limitations, the character databank entries cannot be accessed by BoT (throughvslash-commands, that is) in group chats. If there's a workaround I haven't found it.

The exact syntax is:
(Entry prefix) (Header prefix) Topic (Header suffix) Entry (Followup header) Additional entry (Entry suffix)

Default values for the different strings are:

  • Entry prefix: <memory
  • Header prefix: topic='
  • Header suffix: '>
  • Followup header: <update />
  • Entry suffix: </memory>
  • Manage entries: View, edit and delete entries. First, user will have to choose the source of the entries (char or chat). Then a list of entries will be presented. Upon selecting one a set of options will be presented:
    • View entry: Displays the raw entry in a popup.
    • View file name: Displys the name of the file where the entry is physocakly stored.
    • Edit entry: Opens an input box that can be used to edit the entry.
    • Delete entry: Deletes the databank entry.
  • Add entry: User may add entries manually or prompt the LLM to generate it. BoT will detect duplicated (same topic) entries and ask user how to handle it.
  • Edit parameter: It allows user to change the value of the different pre/suffix strings. Does not update preexistent entries, yet.
  • Reset parameters: Returns individual strings to their default value.

[❓] About

You probably don't need help to figure this one out.

Thanks! I hate it!

If you need to clean up the around 200 global variables and/or the around 100 local ones, just run /run BOTKILL before disabling BoT.

Wtf is a balaur and why would someone want a LLM to think like one?

A balaur is a 3-headed dragon from romanian mythology. I named my script Balaur of thought in-line with train, chain and tree of thought. Initially I was just trying to write a script to implement tree of thought, but realized all branches led to the land of hallucination and stupidity; I added a corpus of analyses to the branch-generating prompt and got way better results.

So I ended up with a prompt made of a body of analyses and options on where to head next. The name Balaur sounded appropriate. So in typical me style I proceeded to misspell it lol. By version 3.3 I corrected the name and by 3.4 (hopefully) all variable mames, comments and databank entries will be corrected.

Edit
Pub: 17 Aug 2024 11:14 UTC
Edit: 27 Sep 2024 12:59 UTC
Views: 6794