Sukino ✦ Guides and Tips
I am a poor guy from the global south, so no fancy models like ChatGPT and Claude for me. I just try to make the most of smaller local models.
These are the things I learned about how to make them suck less. Feel free to jump around — this isn't some structured guide, just me sharing what I've figured out along the way. Want to learn more? Check out my findings page, where I share things that even smarter people have discovered too.
- Make the Most of Your Turn; Low Effort Goes In, Slop Goes Out!
- The AI Wrote Something You Don't Like? Get Rid of It. NOW!
- Put Words in the AI’s Mouth to Make It Write What You Want
- Unslop Your Roleplay with Banned Tokens
- Use Randomness in Your Sessions for Variety and Unpredictability
- You May Be Able to Use a Better Model Than You Think
- Use Connection Profiles to Make Swapping AI Models Easier
- Use a Clipboard Manager for Easy Backup and Versioning of Your Bots and Interactions
Latest Updates:
2025-02-21 — Added sectionUse a Clipboard Manager for Easy Backup and Versioning of Your Bots and Interactions
.
2025-02-20 — Small additions toThe AI Wrote Something You Don't Like? Get Rid of It. NOW!
and reworkedMake the Most of Your Turn; Low Effort Goes In, Slop Goes Out!
.
2025-02-17 — Added sectionUse Connection Profiles to Make Swapping AI Models Easier
.
2025-02-14 — Added sectionYou May Be Able to Use a Better Model Than You Think
.
2025-02-11 — Added sectionUse Randomness in Your Sessions for Variety and Unpredictability
and minor fixes to previous ones.
Make the Most of Your Turn; Low Effort Goes In, Slop Goes Out!
Let's talk about something that often gets overlooked: how to actually use your turn when roleplaying with a small AI. Many people write minimal, low-effort responses and expect magic in return. While big AI models like ChatGPT and Claude can sometimes make up for a bad roleplayer, with smaller models, your input matters way more. This means you need to step up your game.
The key is giving your partner (yes, even an AI one) something substantial to work with, so put in the effort! Don't waste your turn writing meaningless actions or just reacting to what the AI did. Don't just react, act! While reactions are important to show the AI it should react back to you, your turn should also push the narrative forward a bit too.
See this guy? Don't be this guy! Unless you can pay for the big models.
When you fall into these passive habits, the results are predictable: generic responses, the AI taking over your character's actions, endless monologues about inner feelings without any real action, and repeated content from previous turns. This happens because the AI is trying to match its previous response length but has nothing new to work with.
Why Is That? Because The AI Is Artificial, but Not Intelligent!
Here's a crucial point that some people miss: The AI is a lie! It doesn't think, it isn't actually intelligent or creative. What is an AI model really? It's basically a super fancy text autocomplete. It predicts the most probable next word, over and over, replicating the data it has been trained on and that you have provided, so it sticks to patterns by design. Moreover, most AI models are trained to follow instructions, which requires them not to make things up; what is creativity to us, is hallucination to them, and they're trained to avoid hallucinating — that's why relying on them to drive the story forward independently usually leads to stale sessions or nonsensical responses.
Think of the roleplay session more like a conversation, the more you give, the more you get back. The more details you provide, the more chances the AI has to match it with some of it's own data. You don't need to be Shakespeare; you can even mirror the AI's writing style if you're stuck. With practice, you'll discover how subtle your narrative nudges can be. Often, just a vague hint at where you want the story to go is enough for the AI to figure out what to write and flesh out the details.
Getting good at this is part of the fun, and trust me, seeing how the AI responds to your well-crafted turns is super rewarding.
So Here Is an Example Session
Let me show you what I mean in practice. Check out this log of a quick session I just spun with the lovely Lucille from our friend momoura using Mistral Small Instruct 2501, which isn't even finetuned for RP: Link 1 (Catbox) / Link 2 (Neocities)
Notice how I don't write much, but instead focus on making each line count. Let me break down how I approached my first turn: I don't do much action-wise, but I hint that I don't like the staff watching us, that I act differently when we are alone, that I'm patient and kind with Lucille, that I respect her disability without treating her as fragile, and I've left an action hook for the AI to work with at the end.
The AI picks up on these things and makes Lucille respond like the long-time friends we are supposed to be. But the real magic happens in each following turn when I help it just a bit — Lucille's responses grow longer and more detailed, giving us more material to work with, making my turns easier and easier. I then make the AI's job easier by creating small sequences of actions after reacting to the previous one. It's like a snowball effect.
Pay attention to how the AI even introduces and builds upon its own details. For instance, when it wrote the numerous books stacked neatly on her shelves.
and then used it in its next turn with "I think it might be under that pile of books by the window. I was trying to read, but then the dress caught my eye, and then[...]"
. Or when I went to pick up the dress, it had Lucille Her fingers tap anxiously at the bench where she sits
and later recontextualized this detail as "Today, I thought I might try to play the piano in the music room. I've been practicing a new piece," she admits, her fingers tapping out a soft rhythm on her lap
. See how giving the AI room to generate small details pays off in just a few messages later? AI models love patterns, so they naturally build upon established details. All you have to do is help it not to get stuck on the same detail forever by giving it a chance to come up with new ones.
Look, I am not claiming to be an AI RP wizard, or that this was an impressive session. My writing here was nothing special, but that's what I am trying to show you, it doesn't have to be, I basically just listed actions and dialogues with intent. The key is just to put that little bit of effort and thought into each turn. That's all it takes to get good, engaging responses back.
The AI Wrote Something You Don't Like? Get Rid of It. NOW!
Here's something crucial you need to know: the chat history is a bible for the AI. Everything it writes that you let slide? That's basically you giving it a thumbs up to do more of the same. The AI pays special attention to two things: the start (your system prompt and character card) and the end (the recent messages) of the context. These have the biggest impact on what comes next.
So when the AI falls into those annoying habits — maybe it's writing clichés, being redundant, acting out of character, taking over your character's actions, or going off on long tangents — letting it slide is like saying "Hey, more of that please!" The solution? Edit those messages. Cut out the bad parts. There's barely anything worth keeping? Swipe it and let the AI try again. You want every message in that history to be completely acceptable.
Now, I can hear you asking: "Wait, so I have to keep correcting it all the time? That's way too much work! Why can't it just play right?"
You are partly right. Yes, sometimes you'll need to put in some effort at the start of a new session because the AI doesn't remember your previous chats. But remember what I said about chat history being the AI's bible? That works in your favor too. Once you have a couple of good messages, it catches on pretty quick. This is why a well-thought-out First Message
is so important, and why Example Messages
are so powerful; because as far as the AI knows, it wrote them itself and will influence heavily the start of your session — and this is why you will always see people warning against putting user actions or dialogs in these sections.
Combine this with putting effort into your own responses (like we talked about above), and you'll find yourself editing less and less as the chat goes on and in future sessions.
There's a shortcut you can use: You can save particularly good responses of your character and add them to the card as examples messages. The AI will use these as a guide, since it will be formatted as if they were its own previous responses. But honestly? I'm not a huge fan of this trick. Sometimes the AI gets confused and thinks these example messages actually happened before in your current session, which can make things weird. Handmade examples, with things that actually happened in your character's past, or are generic enough to have happened or not, work a little better.
But here's a real time-saver: If you're using SillyTavern, there's this extension called Rewrite that's super helpful. It lets you select any part of a message and gives you some options, including a simple Delete button. So instead of editing messages, you can just highlight the parts you don't like and get rid of them. Trust me, install it, it makes cleaning up responses a breeze.
Put Words in the AI’s Mouth to Make It Write What You Want
Here's a neat trick you might not have realized yet — you can practically gaslight the AI into thinking it started writing something. It'll pick up right where you leave off. Here's how:
- Hit the edit button on the AI's last response
- Delete the parts you don't want to keep
- Add your own half-finished sentence pointing in the direction you want
- Hit Continue
- Watch as the AI naturally continues writing as if it had written that part itself
You are essentially giving the AI a new jumping-off point. This is super useful in a bunch of situations.
- Let's say you're using a multi-character card, but the AI seems to have forgotten about one of them. No problem! Just start a new paragraph with that character's name and hit Continue — the AI will have to bring them back into the scene.
- If the AI responses start to get too short, and hitting Continue isn't generating anything new, try adding a relevant word or phrase at the end of the response and hit Continue again, and the model will pick up from there.
Using The Last Assistant Prefix
You can actually use this knowledge to reinforce rules to the AI while making it invisible for you. Is it constantly writing for you, and you don't know why? Is it not following the character card? The best thing to do is find the root of the problem, sure. But you can go to your instruct template and add things to the Last Assistant Prefix
that the AI will say at the start of every response without printing it out in the chat. Something like this:
Remember to check how your instruct actually formats the
Assistant Message Prefix
. Does it start with something like<|model|>
? You have to do it here too. And these---
have a purpose too; Mistral and Gemma does this when they start to write fiction after confirming that it's complying to a instruction.
This is much more powerful than using OOC commands or Author's Notes, because you are not asking it to do anything, you are making your additions part of its own thought process — you are basically tricking it into thinking that it has agreed to generate what you want. That's how most AI jailbreaks actually work, you can see how I used this exact trick with Gemma 2 on my settings page. Pretty clever, right?
Unslop Your Roleplay with Banned Tokens
Are you tired of ministrations sending shivers down your spine? Do you swallow hard every time their eyes sparkle with mischief and they murmur to you barely above a whisper? Maybe, just maybe, I have the solution for you!
AI models tend to get stuck in patterns, spitting out the same phrases over and over. While you could try prompting them to stop, that's not always reliable — and it sucks when these repetitive phrases slip in and break your immersion. But if you're on KoboldCPP, there's a better way: you can straight-up ban these phrases from the AI's vocabulary using the Banned Tokens
field in the Text Completion presets
window. Here's how to format it:
- It's good practice to add spaces before and after the phrase you want to ban. This prevents you from accidentally banning parts of other legitimate phrases. For example, banning
led with
also bansfilled with
andled without
. - However, there's one catch: if the AI response starts or ends with your banned phrase, maybe there won't be a space before or after it. In these cases, you'll need to ban the phrase without the spaces. For example, I had to ban
"as you turn to leave"
without the space because it kept showing up at the start of responses, and with a comma after. - Think about words you can partially ban to remove multiple slops at the same time. For instance, banning
steeling h
will prevent the AI from using phrases like "steeling herself", "steeling himself", "steeling hard", and "steeling harder" — all with a single ban. This helps you save those precious ban slots while cleaning up more slop. - But doing this could cause another problem, let's say you're doing a roleplay where you're fighting a faction called "Steeling Hearts"... and you already see the problem, right? Now the AI can't write the name of their own faction. So, think carefully about what you are banning.
- You can check my full ban list on my settings page.
Now, whenever the AI outputs slop, or something that breaks your immersion, just add it to the banned list so it literally can't write it again. When the AI tries to output these exact phrases, it'll backtrack and come up with something different instead. It works great!
Use Randomness in Your Sessions for Variety and Unpredictability
The first macros we learn when roleplaying are {{user}}
and {{char}}
, but there's another really useful one you should know about, the {{random}}
macro. It allows you to list different text options that will be randomly selected each time it's used or when the AI starts a new turn. There's also {{roll}}
for dice rolls. You can use these anywhere you'd use the basic macros — in the text box, character descriptions, lorebook, author's notes, you name it.
Sounds cool, right? But how is it useful? First, here's how they work:
{{random::option1::option2}}
or{{random:option1,option2}}
- add as many options as you want with more::
s or,
s. Control the chances by repeating options or leaving blanks. I suggest using the::
syntax, since it supports commas in its options.{{roll:XdY+Z}}
- D&D style dice. X is number of dice, Y is sides per die, Z is what to add. X and Z are optional. So,2d6+3
means roll two six-sided dice and add 3.
Here's a simple example:
This gives us a random day, weather (1/6 chance of rain, 2/6 clear, 3/6 nothing), and time (50/50 day/night).
Easy, right? The tricky part is working with probabilities since you can't nest randoms. But you can get creative with repeats and blanks to control the chances. So, how can we use it in roleplay? I have a few examples for you:
Action Buttons With Random Outcomes
Let's say you're doing a simple medieval adventure and want a basic food foraging system to make it feel more videogame-like. We could use something like this: {{random::I forage for food and find something that looks edible. I can't really tell what it is, so I examine it more closely.::I forage for food, but it seems like all I can find is dirt. It's frustrating.::I forage for food, but I end up getting bitten by something. Shit, what is it? Does it look poisonous?::I forage for food and find what looks like {{roll:d4+1}} coins. I take a closer look at it, trying to figure out if it's worth anything.}}
But copying and pasting this every time you want to forage would be a pain. That's where Quick Reply buttons come in handy. Here's how to set it up:
- Open your extensions menu
- Go to the Quick Reply section
- Create a new bar
- Add a button with a command like this:
/send
will send the message as you, and {{input}}
includes whatever is in your text box when you run the Quick Reply. So hitting the button will send your message plus the foraging prompt. If your text box is empty, it'll just do the foraging.
Randomness To Create Depth in Characters and Scenarios
Want to see a character who uses randomness pretty well? Let's check out Eudora, a character from knickknack, where I first learned this trick. While she has detailed descriptions and lorebook entries, her character notes are by far the most interesting part. These notes get inserted into context every turn, and look at how they use randomness:
If Eudora is unsure about an action, Eudora will always pick the one that she perceives as 'safe', which is {{random: trusting only herself., trusting in the words of the King James Bible with a relevant quote., trusting the voices in her head.,trusting what she believes the corvids are telling her.}}]
[{{random: The voices are telling Eudora that she should not trust {{user}}; write voices in italics., The voices are telling Eudora that {{user}} is dangerous; write voices in italics., The voices are telling Eudora that her aunt was mad and that the Lord does not exist; write voices in italics., The voices are telling Eudora that she must do something drastic related to the current situation; write voices in italics.,The voices are telling Eudora that she is damned; write voices in italics., The voices are telling Eudora that what she is doing is correct for the Lord's Prophetess; write voices in italics.,The voices are telling Eudora that Eudora is insane; write voices in italics., The voices are silent., The voices are silent.}}]
Really cool concept, this setup randomly changes her behavior and the voices in her head every turn. Notice even the clever use of "The voices are silent" instead of blanks.
You can get even more control over these outputs using lorebook entries. For example, you could set up entries that only trigger every few turns, or wait a bit after certain keywords appear. Like having those voices only show up a few messages after someone mentions religion, or making them more violent under certain conditions.
The only limit is your creativity! You could create a sick character who randomly vomits blood or faints, a companion with multiple personalities who switches between them, or a yandere girlfriend with wild emotional swings.
And you don't have to limit yourself to characters; there's nothing stopping you from using randomness to create dynamic environments or random events. Imagine an excavation site that becomes increasingly dangerous over time. You could set up a lorebook entry that triggers a message to the AI every few turns with something like: In your next response, describe how {{random::the walls start to creak.::a support beam groans.::small rocks tumble down nearby.::the ground beneath {{user}}'s feet feels unstable.}}
and this triggers a second delayed entry with more serious events. You could do similar things with the weather, haunted houses, or any environment you want to feel alive and unpredictable.
You May Be Able to Use a Better Model Than You Think
There's a common rule of thumb that you should use the biggest model you can fit in your VRAM with at least Q4 quantization. While this made sense when models were smaller and quantization methods weren't as efficient, things aren't so clear cut anymore — especially for larger models above 12B. For roleplay. even a slightly dumbed-down 20B model still outsmarts a well-balanced 12B one, you just need to accept that you're not getting the model's full capacity either.
A game-changer has been the new IQ quantizations, particularly IQ3 for ~20B models and IQ2 for the bigger ones. These methods minimize losses better than traditional quantization, and every bit you can save makes a huge difference when you start to go under Q4.
Personally, I have a 12GB GPU, which is in an awkward spot right now. There isn't a perfect model size for it. 12B-14B models feel like they leave performance on the table, while 22B-24B models are too big to fit with a decent context size. For roleplay, I find 16K context size ideal, with 12K being the minimum acceptable.
Luckily for us, KoboldCPP offers three main tricks to fit more context in these situations:
- Enable Low VRAM Mode. This moves context data to RAM — it's not exactly what it does, but it's close enough. This will slow down generation speed somewhat, so you'll want to use the biggest quantization that still fully loads into VRAM. On some systems, generation also suffers from additional slowdowns as the context fills up.
- Load Model Layers into the CPU/RAM. Instead of context, you can offload some model layers while keeping the context in VRAM. This also affects speed, but in a different way than Low VRAM mode, so you may need to test both to see which works better for your system.
- Quantize the Context (KV Cache). You can quantize the context itself using KV Cache at 8-bit or 4-bit. This reduces context memory usage, but you lose the ability to quickly swap context sections, requiring reprocessing for any context changes. More importantly, in roleplay, quantizing the context affects perceived intelligence more than quantizing the model itself. It can be pretty annoying when the AI suddenly misses important details or misinterprets character information.
If you have an NVIDIA GPU, remember to set
CUDA - Sysmem Fallback Policy
toPrefer No Sysmem Fallback
ONLY for KoboldCPP, or your backend of choice, in the NVIDIA Control Panel, under Manage 3D settings. This is important because, by default, if your VRAM is near full (not full), the driver will fall back to system RAM, slowing things down even more.
IQ quantized models are notorious for taking an even bigger speed hit when layers are loaded into the CPU/RAM in some setups. Test both IQ and Q_K quants if you go that route.
If you want to use the Low VRAM mode, make sure that the model is fully loaded into the GPU, otherwise you will reduce your generation speed too much. You can do this by setting the
GPU Layers
value to something too high, like999
. If you get a memory error, you are trying to load a model bigger than the VRAM you have available, try something smaller or to free more VRAM.
If you want to load layers into the RAM, it will take some trial and error to find the right number. Set the context size to the desired amount and use your system's task manager to check how much VRAM KoboldCPP is using, while you increase the value of the
GPU Layers
setting. On Windows 11, you can do this by going to the details pane of the task manager and enabling theDedicated GPU Memory
column.
In my setup, by combining a model at IQ3_M with Low VRAM Mode, I can use 22B-24B models like Mistral Small and Cydonia with a full, unquantized 16K context. The speed is pretty acceptable, around 8-10 tokens/second, and I still have about 1.5GB of VRAM available for other applications like a browser or music. This setup made for a much more enjoyable experience than being limited to smaller models just to adhere strictly to the old VRAM rule.
If you have an iGPU, you can plug your monitor into it and have your browser and desktop use it, leaving as much of your discrete GPU as possible for the AI model.
My point is, explore options beyond the typical advice, especially with newer models and quantization methods. It really depends on your hardware, how fast your GPU, CPU, and RAM are, and how tolerant you are of possible speed hits, but you might find you can get more out of your setup than you initially thought.
Use Connection Profiles to Make Swapping AI Models Easier
Connection Profiles
are an underutilized feature in SillyTavern that can really simplify your life when experimenting with different models. This feature allows you to save multiple configurations — including jailbreak, system prompts, instruct and context templates, tokenizer settings, and sampler preset — making it easy to switch between various APIs, models, and formatting templates.
Managing profiles is straightforward – you'll find the save and management buttons at the top of the API Connections
window. This feature comes from a built-in extension, so make sure it's enabled in the Extensions
window if you don't see it.
For an even smoother experience, combine Connection Profiles with the Chat Top Info Bar extension. This combination lets you switch between AI connections or KoboldCPP model templates in less than a second while chatting, without messing with settings menus.
My recommendation? Create a profile for each setting you frequently adjust. In my setup, I maintain separate generic profiles for each instruct template commonly used by local models (ChatML, Mistral, Gemma 2, and Metharme/Pygmalion). I also keep profiles for all the free APIs I have access to with different jailbreaks, Mistral Large and the various Gemini models.
The time saved is invaluable, and having the ability to instantly revert to previous configurations when you've modified settings too much is a real lifesaver.
Use a Clipboard Manager for Easy Backup and Versioning of Your Bots and Interactions
Every computer user knows they should make backups and keep multiple versions of their work. But man, do I suck at it! I just can't remember to do it and keep telling myself that everything will be fine. Over time, I have actually found a solution that works for me. It requires minimal effort and has saved me a few times: relying on a robust clipboard manager.
The built-in Windows clipboard manager is too unreliable for this use case — it deletes itself every time you log off, and it constantly overwrites your contents. Get a third-party solution like Ditto for Windows, or CopyQ that is mutiplataform, they offer significant advantages:
- Keep copied items for extended periods (up to a year or more).
- Every version of anything you ever copied will always be there.
- Easy search through past clipboard, you just need to remember a keyword you used on it.
- Create organized groups for different types of content.
- Custom keyboard shortcuts for different clipboard groups
Just set your clipboard manager of choice to keep your clips for an absurdly long time (my Ditto is configured to keep the last 500,000 clips I made), and get into the habit of doing a Ctrl+A Ctrl+C
before making any changes to your cards or prompts. Are you already using your system's clipboard manager? Go to your settings and disable it to free up the shortcut, and set it for your new one to make using it as natural as possible. Now you have an effortless backup and versioning system for all your roleplaying needs.
Keep in mind that anyone with access to your user account, or an administrator account, will be able to see anything you copy, even outside of your roleplaying sessions. So I wouldn't recommend doing this if you share your user account with others, or if you can't trust people around you not to snoop on your computer, unless you encrypt the database file with your clips and keep it in a safe place.
Want to change your bot's description? Copy it first. Want to try some changes to your system prompt? Copy it first. Do you roleplay the same character multiple times and have your favorite startup and interactions? Copy it so you can use it again in another session. Got a good interaction and want to use it later as an example message? Copy it. You do not need to think about organizing things, where to put them, how to find them later, so just copy everything you can see yourself using again later.
Trust me, it's no fun to have the bot you've been working on disappear because you pasted the wrong thing in the wrong place. This will save your ass sooner or later.