Balaur of thought 5.21: The friendly manual

If you're looking for download links and/or installing instructions, they can be found HERE.

Theory behind BoT

Balaur of thought (BoT) is an attempt to mittigate LLMs in-principle flaws, observed across multiple models and scales; namely, the tendency to hallucinate.

Observation

Empiric observation have demonstrated that modern LLMs are (somewhat) decent at basic logic (as long as it doesn't involve spatial reasoning) when directly prompted; just as it's very bad at the same thing when left on their own. Likewise, common-sense can be predicted by a LLM when directly asked but will be ignored when not.
Sure, one can ask a bunch of questions manually, add a bunch of guidelines, and then ask for a character reply, which is good-ish on some settings, but very bad for RP and inmersion. BoT is an attempt to streamline the process of mittigating hallucinations by putting situational awareness into the context

The BoT technique

The actual technique involves two parts:

The body: A series of prompts force the LLM to shift attention to the kind of things that typically don't get it, by directly asking qbout tvem. LLM replies are injected into the context, putting those conclusions last to maximize power.
The heads: The many heads of the mithological beast represent the branching of paths that happens afterwards. With the body of analyses in the context, the LLM is asked to generate a number of actions it's character/s could take, limmiting the possible plot to only a few feassible options.

This implementation

The full BoT technique as described above was fully implemented in version 4 of BoT, but since it was buggy and restrictive, the whole codebase ended in the trashbin. Current 5.20 lets the user to write his/her own analyses and group them into batteries in the order and length they see fit; furthermore, batteried can be crudely scripted.
Guidelines and RAG can be used to aid in thevBoT technique and on RP in general. Also the array of tools provided (which will grow withvtime) add uses, related to RP and creative writing.

BoT overview

Nearly all BoT features can be classified in four broad categories: Injection-telated, Data bank, QoL&transparent features, and tools

Injection

This set of features are arguably the most important and certainly receive the most development focus. Mainly AGS (Analyses, guidelines, and sequences) take care of inducing a TTC (test-time compute) process, even on non-reasoning models, in one or more CoTs (Chains-of-thought).
Each part of the AGS complex has it'svown role:

Analyses: Prompt the LLM with a questionary or other instruction. The result is then injected
Guidelines: Instructions thatbget straight up injected into the context
Sequences: A series of analysesvand guidelines with configurable behaviors; this is by far the most sophisticated BoT injection-telated feature and the one that can make the most of small models.

As stated, thevputpose of the descrived components is to produce strings to inject into the context. Where exactly, they are injected, depends on how an AGS is run:

Manual: Each AGS has it's own menu, and they can be manually run from there. If so, injects produced by AGS are put into the context at depth 1
Automated: From the automation menu you can set analyses or sequences (but not guidelines) to be performed at certain inyervals. The are performed after a user message arrives abd injected at depth 0.

In addition to AGS, the rethink option allows to reuse injects, regenerate a message/batch from manual injects performed afterwards, or by using the prior version as anti-example.

Data bank

While RAG can be used for a variety of things, in the context of RP, BoT's primary use-case, RAG can be used as a long-term memory for characters. This usage in turn comes with it's own rules and constraints. Past events need to be described once, which might be updated during conbersation, same for people. Avoiding contradiction is a good way to keep hallucination as low as possible.
In order to avoid documents, memories for BoT, being rrtrieved based on format, or at least to keep format from playing a role in that decision, memlrues need to be uniformly formatted. That format is changeable without having to manually go through individual documents in order to try different formatting.
Each file representing what the character remembers about a oerson, event, or object, can contain an arbitrary number of entries, thus making it updateable as conversation advances, or as different conversations create different memories for a character.
Entries can be manually created and modified by just writing them out in the New DB entry dialog, or to be created by the LLM upon user request. There is a number of auto (mirr luke semi-auto really) entry creation:

New: When the LLM is asked to create an entry for a novel event it will treat it as such.
Replace: The LLM can be asked to ignore all current entries and generate a new entry from scratch. This effectively makes the character "forget" what it "knew".
Update: With the prior entries in the context, the LLM is asked to add a new entry based on the current conbersation. The new entry is then formatted and appended to the memory file.
Merge: The LLM is asked to summarize the contents of a memory file. The created summary then replaces all of the previous contents of the memory file.

QoL and user-transparent tools

Thrre is a number of festures that are either simple or not immediately obvious. The way AGS lists are handled is one suchvtool. The cascade of events leading the deletion of an analysis or guideline is one such QoL. Upon deleting one, the following happens without user intervention:

The analysis/guideline is deleted from the list by creating a new array that does not include the deleted entry.
Sequences are checked for definitions matching both source and index number of the deleted item.
In every sewuence, items with matching sources and index number higher than the one deleted are substracted one, so that they point at the correct analysis/guideline.
If the deleted item was an analysis, the same process repeats for automation.
If a sequence is left empty by the deletion of an item, it is removed from the list.
If the deletion of an analysis/guideline cwuses the remotion of a sequence, the automation list checked and updated.

The translation feature also integrates BoT seamslessly with ST for non-english speakers. Even whem no internarional UI for BoT exists as of 5.20, it is still usable and can be navigated with very little knowledge of english.
There's an unslop feature too. If you're using KoboldCPP or other backend that supports banned tokens, hou're probably better using that. While for everyone else unslop is better than regex because it allows a more comfortable control over qhat is being replaced and with what.
The mindread and mindwrite features let you see and conttrol the reasoning process, including every step of BoT's chains of thought:

Mindread: Creates an HTML-formatted copy of a reasoning step and outs it in the chat log. This message is invisible to the LLM. Old mindreads are deleted when a new character message (or batch for group chats) arrives.
Mindwrite: Displays reasoning steps in an inputbox, letting you check and correct errors/hallucinations before moving to the next step.

Additional to this, analyses and sequence steps with the behavior Send or Both are MD-formatted and added to the reasoning block of character messages prior to the model's reqsoning if it exists.

Tools

There is a small suite of tools on BoT that are either experimental or do simple, yet useful in some cases things. These are explained in the Tools section of this document.

Analysis

Analyses are the basic building blocks of the BoT technique. An analysis is a prompt containing questions about the context (typically on a RP scenario). The LLM answers it and the result is injected onto the context to aid on further responses.

The analysis menu can be accessed from the brain icon.

Create: User needs to input a prompt and a name. Both can include any number of macros, such as {{char}} or {{group}}, as they look here, no need for escaping.
View: Shows the analysis prompt, macros are replaced in the shown version to reflect their current state.
Perform: It generates a reply to the analysis prompt. All analyses results are ephemeral injected after the next user message arrives.
Edit: The prompt is shown in an input box, macros are not replaced. The user can then edit the prompt. If not cancelled, the new version is saved and a new input box is shown, giving the option to change the analysis name, cancelling this will leave the name unchanged.
<lI>Remove: Removes the analysis from the list. It is also removed from any sequence that may contain it. If a sequence is left empty after this operation, the sequenve itself is removed as well.

Practical knowledge

Since version 5.20 BoT check for, and impeeds duplicate names.
As analyses are performed, the results are wrapped in a prefix and a suffix, this behavior an be changed in configs, under the wrappers item. Once wrapped, they are put in a chatwide array. As mentioned above they are ephemerally injected (single use) after a user message arrives in a single card chat, or, after the next soeaker is drafted in a group chat.
By default BoT comes with a number of analyses, all oriented to implement BoT technique on a limited universe of RP settings. However, the same questions that explain a romantic date does not explain a battlefield. So uders are strogly encouraged to add their own prompts for their specific needs. Default analyses can be deleted/modifued too, which can also help to keep the list tidy.
When an analysis is removed, BoT automatically removes it from any sequence that contained it. If a sequence is left empty by this action, it is also removed. If the deleleted analysis is set for automation, it is removed from there too.
All analyses performed from the analysis menu are injected in the order they are performed, prior to the next user message in the context.

Guidelines

A guideline is a simple instruction you can type and get injected into the context.

Guidelines can be accessed from the signpost icon.

Quick guideline: Lets you type a guideline, then generate a character reply. You can decide between adding the reply as a new message or as a swipe on the last one.
New: An input box allows you to type a new guideline, macros such as {{user}} or {{char}} can be used as shown, no need to escape them. Once created. a name needs to be entered, the guideline is then saved for later use.
View: The actual text of the guideline will be shown, macros are replaced in order to reflect the current value.
Inject: Sets the selected guideline to be injected after the next user message has arrived. Only one guideline can be active at a time.
Edit: Displays the selected guideline in an input box. After editting the guideline can also be renamed in a separate input box.
Remove: Eliminates the selected guideline from the list.

Practical knowledge

If a guideline is injected and then removed, that will not remove the injection.
Regardless of when a guideline is set for injection, it is always injected after all analyses/batteries results and after the next user message in the context, basically at depth 0.

Sequence

Sequences (batteries of analyses on v5.10) are two steps above analyses in terms of sophistication, as they allow you to group and order individual analyses as well as defining what to do with each one's result.

The batteries menu can be accessed from the literal battery icon.

New: A list of all entries for the selected source is displayed, they can be added one by one, the text above will display the sources and names of the ones added so far. Every item must be associated to a behavior, which is basically what to do with the analysis' result or guideline; These are: Pass to have the next analysis see the current result, Send to have it injected, and Both which does both things. This menu has no save button, you might've noticed, in order to save the new sequence, you need to cancel, which will open another menu where you can save or discard the current sequence, also you can remove the last added entry.
View: A basic list with the sources, names, and behaviors of the entries contained in the sequence is displayed.
Perform: Each individual analysis is performed and every guideline injected in order. Depending on their behavior, one or more results will be injected at depth 1 after the next user message arrives.
Rename: Lets you change the name of the selected sequence. A sequence name is only used for you to identify it in the list and has no effect on what's being actually generated/injected.
Edit: Displays a list with the names and behavior of each individual analyses contained in the sequence. You can then switch them, reorder, remove and change their behavior. New ones can also be added.
Export: Creates an import string in the chatbox thst contains the sequence itself as well as every analysis and guideline it contains. This big friendly string can then be stored as plain-text, sent to others or posted online. In order to import a sequence jiwt paste the import string into the chatbox and send it like it was a message.
Remove: Eliminates the sequence. If the deleted sequence is set for automation, it will also be removed from there.

Practical knowledge

Seqjence names must be unique, however, as of 5.20 it is your duty to keep it that way.
Each sequence can run an arbitrary number of chains of thought, outputting the result of each one to the context. In order to tailor a sequence that actually does what you want, it is important to unferstand that there are two arrays at play:

Chain of thought: This one gets filled with the results of the individual analyses. It is injected into the context before performing an analysis. This array can be emptied by an analysis with the Send behavior.
Sequence result: Contains the results of the analyses with the behaviors Send or Both. Once the sequence has run the results in this array are the ones to be injected at the appropriate moment.

The precise action of the different behaviors is:

Pass: Adds the result of the current analysis to the chain of thought, this chain is injected into the context before performing the next one. Pass also excludes the current result from the result array of the sequence.
Send: Clears the chain of thought and does not add the current result to it. The current analysis result is added to the sequencea result array.
Both: Adds the current analysis result to the chain of thought. Sends the current analysis result to the sequence result array.

All sequences performed from this menu will be injected in the same order they are performed prior to the next user message in the context.

Theory&tips

A well-planned sequence acts like a two-level chain of thought. With a series of chains of thought, each one sending a result to the sequence result. This array of results, if each one is a component of a larger reasoning process, can act as a higher-level chain of thought.
Setting more than one battery for automation with different frequencies, can help remind the LLM to keep certain elements in mind.

Automation

Automation gives a flexible way to have analyses and sequences performed at a regular interval.

Automation menu can be opened with the car icon.

New: Use this to add a new analysis or sequence to be automatically performed. Once one is selected you need to input every how many user messages you want it to run.
View: Displays a list of the included analyses and the behaviors if s sequence is selected, or the actual text whe a single analysis.
Change: It lets you change the current analysis/sequence for a different one, while retaining the original frequency and countdown.
Frequency: Lets you set every how many user messages to perform the current analysis/sequence and manipulate the state of the counter.
Remove: The selected analysis/sequence will not run automatically anymore.

Practical knowledge

All analyses/sequences performed automatically are injected in the same order they are performed, after user messsge, but before guideline (if any).

Rethink

While having BoT does not keep you from regeneratkng/swiping, the rethink option adds a few different ways to do it.

The rethink menu is an arrow pointing in a circle. It has three basic options, the third being far more nuanced than the other two.

Same injects: It will reinject the same analyses/guidelines as when the original message was generated and then adds a new swipe to the last message.
Current injects: It will inject the results of all analyses generayed after the last character message arrived.
Rethink generator: First thing, you need to select how bad the last message was, then type why it was bad and finally select whether to simply geberate a new message or use a two steps approach. The simple mode will take the prior input and injext it as instruction and generate a new character reply. In the two steps mode a midle-step is generated, ib which the LLM is asked to reason about how the message should be, before actually generating it.

Practical knowledge

Given the non-deterministic nature of LLMs, reusing the same injections that produced a bad reply is not as dumb as it sounds, however, this largely depends on the tempatature parameter you are using.

Auto unslopper

LLMs have the annoying habit of repeating certain words/expressions far too many times, which is detrimental to text quality. Unslop aims to solve this by replacing known slop expressions with alternative constructs. This can be toggled on and off from the config menu.
KoboldCPP and other backends have their own solution to slop, which is called banned tokens this is better than BoT's unslop but requires bare-metal access, which is not available for all backends. If you backend supports banned tokens, stick to that.

What counts as slop and what a slop can be replaced with can be viewed/modified from the trash-can icon.

New: Allows you to type an expression to add it to the list of known slop. Then you'll be asked to enter a list of altrrnatives to replace that slop with.
View: Shows a list of unslops for the selected slop.
Add single: Allows you to add a single unslop string for the selected slop. Keeps duplicate slops from being added.
Edit: Lets you edit the full unslop list for the selected slop. Esch unslop string is presented as a separate line.
Edit raw: Lets you edit the full unslop list as a JSON string.
Remove: Deletes the currently selected slop string from the list of slops as wrll as all thevalternatives for unslopping.
Export: Lets you select which slop strings to export. It then creates a big friendly string with all thevrequired code to import them in the chatbox. This string can be stored, sent, or posted. In order to import simply paste the import string into the chatbox and send like it was a message.

Practical knowledge

Import strings tend to be quite long, it is usually a good idea to export slop list in manageable portions.
When a character message is unslopped, the new version is added as a new swipe, regardless of whether the original was actually modified. This is meant to keep the unslop process fast enough on slower systems. The original version is maintained as second-to-last swipe.

Data bank & RAG

The databank feature relies on ST's vector storage extension, and Data Bank extension, so you might as well want to have that up and running and correctly configjred if you plan to use it. This feature sill still be able to manage DB files even if verctor storage is disabled, however, in that case files will not have an effect on the conversation.
The data bank feature aids you in RP and/or creative writting by keeping entries organized by topic, with a consistent format, and even to be generated by the LLM itself on your demand..

The vault icon gives access to the sata bank menu, here you can create and manage files from ST's data bank.

New: New entries must beling to either the currebt character ir the currebt chat but can be transferred between them once created. Type a topic and whether you want to write it manually or let the LLM create it for you.
View: All entries for the current topic are displayed in order.
Rename: This allows you to change the name of the currebt topic, for DB files, the topic is implrtant because it is part of the actual file.
Edit: Lets you efit each entry in the currebt topic individually.
Transfer: Here you can take a file from the current character's DB and move/copy it to the chat DB ir the other way around. This option is not available for group chats due to a ST limitation.
Remove: Deletes the current topic, and file from the data bank.
ST Databank: Opens Silly Tavern's vanilla interface for the data bank.

Practical knowledge

The actual format of DB files is applied autonatically when creating/modifying topics and/or entries. The format itself can be modified from the config menu.
BoT allows only a single file per topic, each file/topic can have multiple entries

Tools

This menu is accessible from the tiols icon and offers an array of smaller features that can be useful in some circstances.

Rephrase tool

Sometimes LLMs spit out good content in the wrong tense or person, rephrasin aims to fix that.

Weite now: Lets you manually input a criteria for the LLM to rephrase the last message
Select now: Opens a series of menus where you can select the desired tense, person and POV.

Rephrased versions are generated using the genraw command, this means that the LLM only has access to the rephrasing promot and the message, not the shole conversation. In order to generate a different message use ST's swipe, regenerate, or BoT's rethink.

Reverse-prompt tool

Unlike analysis, shere the LLM answers user-generated questions, this asks the LLM to write the questions which you need to then answer.

Edit prompt generator: The prompt that asks the LLM to write auestions can be modified from here.
Perform reverse-prompting: The LLM will generate a bunch of questions to better understand what's going on.

The actual questions are displayed on the same input box you use yo write the answers, whether you leave them or delete them, only leaving your answers is up to you.

Clean chatlog

This is a simple-iwh tool that clears out all mindteads from the chat log, in case something has gone wrong.

Impersonate

Takes an instruction from either the chatbox or an opening inoutbox and uses it to generate a user dialog. The resulting text grts put in the chatbox.

Configuration

Global BoT settings can be modified here, as well as running some maintainance procedures.

Mindread: When enableed AGS get HTML-formatted and inserted in the chat log as compact comments These are meant for you to be able to inspect each step, as LLMs do not have access to this type of messages. Old mindreads are removed from the chat log automatically.q
Mindwrite: When enabled BoT-generated text, such as analysis' results is displayed, giving the user the chance to edit or cancel it. If disabled, generation results are always assumed accepted.
Delay: Some APIs reject too many continued requests. Delay is a workaround that can be enabled or disabled and the actual delay time set. The way this works is by opening an input box, cancelling disables delay, while setting a valid time (in milliseconds) will enable delay and set the time in a single step
Translate: Enable to have analyses prompts and results, guidelines, and ebery other part of the chat to your language. In irder for this to work, however, the Translate extension meeds to be configured with the correct language.
Unslop: Enabling this will cause character messages to have known slop expressions automatically replaced, the unslopped versipn is added to the same message as a new swipe. Exsctly what counts as slop and what it's replaced with can be modified in the unslop menu, trash-can icon.
Wrappers: Analyses prompts, results, and guidelines are by default added a prefix and a suffix when generated/injected. Individual wrappers can be editted and/or disabled.
Databank: The prompts used to auto generate entries can be edited here. Also the strings used to format DB files can be toggled on of off and edited; just keep in mind that unless you want topic and entries to a be a single, continuous line you need to at least put a few newlines; Finally the update option takes all BoT-managed DB files for the current character/chat and rewrites them with the current format, useful to keep a uniform style across all of them.
Reset: Deletes all custom prompts, batteries, guidelines and settings, restoring original values. Also eliminates strings set for injection. Reverts all wrappers and DB file format strings to their default values but does not modify nor release the actual files.

Delete last

Two QoL buttons: Delete last, the eraser icon, and Restore deleted is the paper sheets icon:

Delete last: Deletes the last message on the chat-log but stores it in a message buffer. Deleting more than one will permanently delete the older one.
Restore deleted: Adds the stored message (if any) last on the chat log. This action removes it from the message buffer

Greetings

Upstream greetings

ST dev team and contributors, AI Horde devs and volunteers.

Reddit greetings

Mamelukturbo, h666777, Grobuk, Sad-Flatworm-3240, Weird_Internet_2642, JoeySalmons, HowWasRoyadinTaken, Individual-Web-5391, Mimotive11, Wasted_Election_8361, sorosa, SnussyFoo, disposable66, Samueras, BlueEye1814, RedX07, og_mrrubberducky, Gr3y_Matter, HornyMonke1, jmsfindorff, ShiftShido, Cool-Hornet4434, Agreeable_apraline_15, This_Speaker_6767, Comprehensive-Joke13, amanpgmh, guchdog IZA_does_the_art, Geechan, Targren, SensitiveFlamingo12, Jarwen87, pixelnull, DoJo_Mast3r, BeneficialScheme6010, Alternative-Fox1982, CommonPurpose1969, National_cod9546, Immediate-One-7862, Responsible_Fee_6164, Vegetable-Eye5946, SukinoCreates

Balaur of thought 5.21: The friendly manual

Table of contents

Theory behind BoT

Observation

The BoT technique

This implementation

BoT overview

Injection

Data bank

QoL and user-transparent tools

Tools

Analysis

Menu explainer

Practical knowledge

Guidelines

Menu explainer

Practical knowledge

Sequence

Menu explainer

Practical knowledge

Theory&tips

Automation

Menu explainer

Practical knowledge

Rethink

Menu explainer

Practical knowledge

Auto unslopper

Menu explainer

Practical knowledge

Data bank & RAG

Menu explainer

Practical knowledge

Tools

Rephrase tool

Menu explainer

Reverse-prompt tool

Menu explainer

Clean chatlog

Impersonate

Configuration

Menu explainer

Delete last

Greetings

Upstream greetings

Reddit greetings

Warning