Sukino's Findings: A Practical Index to AI Roleplay

Finding learning resources for AI roleplaying can be tricky, as most of them are hidden away in Reddit threads, Neocities pages, Discord chats, and Rentry notes. It has a lovely Web 1.0, pre-social media vibe to it, with nothing really indexed or centralized and always something cool buried somewhere you haven't discovered yet.

To make things a little easier, I've compiled a list of interesting, up-to-date information about it. Think of it as a crash course to help you get a modern AI roleplaying setup, understand how everything works and where to find things. Want to know more? Check out my Guides page, where I share little quality of life things I have discovered.

If you have any feedback, want to talk, make a request, or share something, reach me at: sukinocreates@proton.me or @sukinocreates on Discord.


Latest Updates:
2025-03-22 — I made conversions of Deepseek presets for Text Completion connections, check them out. Added warning that Gemini got worse at roleplaying lately.
2025-03-20 — Added debased-AI and theatreJB presets. added WyvernChat to character card providers.
2025-03-18 — The Local LLM/Models section has been reworked with a new tool, but the information itself is the same.
2025-03-15 — Changed 24B recommendations to Dan's Personality Engine and Cydonia 2.1. Added more 12B recommendations for variety, Rocinante 1.1 and NemoMix Unleashed.
2025-03-13 — Updated list of Presets. Expanded the Local LLM/Models section. Added Guided Generations.


Getting Started

Picking an Interface

First, you will need a frontend, the interface where the roleplaying takes place and where your characters live.

I will only recommend solutions that are open source, private, secure, well maintained, and don't lock you into a closed ecosystem. So if you've heard of a service that's not listed here, it's probably because it doesn't meet these criteria.

  • Install SillyTavern: https://docs.sillytavern.app/installation/ — SillyTavern stands as the de facto frontend for AI roleplaying. While alternatives exist, it remains the most mature and feature-rich platform, consistently receiving updates first and offering extensive customization options with broad system compatibility and robust community support. Follow the official guide for step-by-step instructions on Windows, Linux, Mac, Android, or Docker. For iOS users, see the solution below.
    • Access SillyTavern Remotely Via Tailscale: https://sillytavernai.com/tailscale-config / https://docs.sillytavern.app/administration/tunneling — Tailscale is a private and secure tunnel that connects all your devices like a LAN connection, but over the Internet. This means that you can host SillyTavern on one device, and access the same instance on all your other devices anywhere you have an Internet connection, and maybe even share it with your friends. This is the best way to use it on Android, and the only way to really get it on iOS, if you have a dedicated device to host it — like a computer, unused phone, Raspberry Pi, or homeserver. You can even rent a small, inexpensive remote server or VPS to host it on, if you are tech savvy enough, it's pretty lightweight.
  • Or Use Agnastic: https://agnai.chat/ — If you really can't install SillyTavern, or you just want a simple online solution, Agnastic is becoming a good alternative. It's free, can be used without an account and runs completely in your browser. It even has some free models for you to get your feet wet — there are better free models in the next section, so don't pick it just because of this.
  • or RisuAI: https://risuai.xyz/ — Another online alternative. Has a different set of features than Agnastic, and some users find the UI more friendly, so it might be more to your liking.

There's nothing stopping you from starting with these online frontends and later migrating to SillyTavern if you feel the need for a more complete solution. Just keep in mind that you'll miss out on most of the modern and advanced features, and that most of the content and setups you find online won't apply to you.

Throughout this guide, I'll assume you're using SillyTavern, but the instructions should be easily applicable to the alternatives—you'll just need to look for the equivalent options.

Setting Up an AI Model

If You Want to Run an AI Locally

It's uncensored, free, and private. Requires a computer or server with a dedicated GPU or a Mac with an M-series chip. If you don't know if you have dedicated GPU, Google or ask ChatGPT for instructions on checking for your system.

You'll need a backend, the program that will run your AI models and connect to your frontend via a local API. There are two main model formats to pick from, GGUF and EXL2. If you don't have a preference yet, go with GGUFs, they are easier to find, easier to use, and have more sizes to fit all memory sizes.

Choose a backend and go pick a model and a suitable preset.

  • KoboldCPP: https://github.com/LostRuins/koboldcpp — Runs GGUF models. Don't know what to pick? Go with this one. Designed with roleplaying in mind, so it has some exclusive features for us roleplayers that will come up later in the guide. Comes with its own roleplaying frontend that you can use if you want to, but you don't have to interact with it. Read the notes on the release page to know which version you need to download.
    • I have a guide that will help you set it up and optimize it to your system, check it out.
  • TabbyAPI: https://github.com/theroyallab/tabbyAPI — Runs EXL2 models. Probably will be the most performant if you have enough VRAM to run everything smoothly.
  • LM Studio: https://lmstudio.ai/ — Runs GGUF models. Pretty barebones, but has it's fans for how easy it is to use, and for being able to download and manage the models within it's UI.
  • TextGen WebUI/Oobabooga: https://github.com/oobabooga/text-generation-webui — Runs GGUF and EXL2 models. The most versatile and it's strength is having the best integrated UI to chat with the AI model.
If You Want to Use an Online AI

This is where censorship and privacy become an issue, as you will be sending everything to these services, and they can log your activity, block your requests, or ban you at will. Stay safe, use burner accounts if you feel like it would be bad to have your sessions tied to your name, and be careful not to accidentally send sensitive information, as most of the time your data will be used to train new AI models.

You'll need a service that provides the AI model of your choice and an API key to connect to it with your frontend.

Choose a service and go pick a suitable preset.

  • The Free Ones: These change all the time, but I will try to keep this updated with the options I know of.
    • Google Gemini: https://aistudio.google.com/apikey — Google currently offers free API keys through AI Studio. Gemini used to be the best roleplaying models available for free, but Google has gradually made them worse. Has many security checks, so a good preset is essential, and you may still encounter refusals. Use the model version suggested by your chosen preset, as the models are updated frequently and the ideal one for roleplaying changes. Requires a Google account, and your data will be used for training (if used outside the UK/CH/EEA/EU), but since it's Google, you can't expect much else.
    • MistralAI: https://console.mistral.ai/api-keys — Mistral currently offers trial API keys through Le Plateforme. Mistral Large 2411 is their best model. Requires opting into data training and may ask for phone number verification.
    • Cohere: https://dashboard.cohere.com/api-keys — Cohere currently offers evaluation API keys. Command R+ 104B (not 08-2024) is their best model. Requires registration and is rate limited, read their documentation to know more.
    • Model Providers With Free Key Rotations: https://github.com/cheahjs/free-llm-api-resources — There are providers/revendors that host the same AI models that people run locally, and offer some of them for free in a rotation. However, you cannot verify the real quality of the models; they may be serving a very low-quality version to free users.
    • KoboldAI Colab: https://koboldai.org/colabcpp — You can borrow a GPU for a few hours to run KoboldCPP at Google Colab. It's easier than it sounds, just fill in the fields with the desired GGUF model link and context size, and run. They are usually good enough to handle small models, from 8B to 12B, and sometimes even 24B if you're lucky. Check the section on where to find local models to get an idea of what are the good models.
    • AI Horde: https://stablehorde.net/ — A crowdsourced solution that allows users to host models on their systems for anyone to use. The selection of models depends on what people are hosting at the time. It's free, but there are queues, and people hosting models get priority. By default, the host can't see your prompts, but the client is open source, so they could theoretically modify it to see and store them, though no identifying information (like your ID or IP) would be available to tie them back to you. Read more on their FAQ to be aware of any real risks.
  • The Paid Ones: Most of these options operate on a pay-per-request model, so the more you play, the more expensive it gets.
    • There are providers/revendors that host the same AI models that people run locally, at all price points. The most famous is OpenRouter, but you can find alternatives if you look around, including cheaper and subscription based ones. Shop around and check the section on where to find local models to get an idea of what are the good models.
    • You can also pay for the big, corporate models like GPT, Claude and Deepseek, they are pretty smart and will give you the best experience you can get. But only go this route if you have disposable income because they are quite expensive and it turns into a money sink pretty fast. And remember, you need the API key, so don't buy a ChatGPT subscription or anything like that.
  • /aicg/ meta: https://rentry.org/aicg_meta — Comparison of how the different services/models perform in roleplay. Don't take this as gospel, they vary depending on the preset and bots you use, but it can help you set your expectations for what you can pay for.

Your model's provider/proxy isn't available via Chat Completion in your frontend?
You'll need to find out if they offer an OpenAI-compatible endpoint. Basically, it mimics the way OpenAI's ChatGPT connects, adding compatibility with almost any program that supports GPT itself. Check their documentation looking for an endpoint address, it should look something like this https://api.provider.ai/v1. If they have one, select Custom (OpenAI-compatible) as your chat completion provider, and manually enter that address and your API key. If the model list loads, you are golden, just select the right model there.


Where to Find Stuff

Chatbots/Character Cards

Chatbots, or simply bots, come in image files, or rarely in json files, called character cards. The chatbot's definitions are embedded in the image's metadata, so never convert it to another format or resize it, or it will become a simple image. You simply import the character card into your roleplaying frontend and the bot will be configured automatically.

  • Chub AI: https://chub.ai/ — This is the primary hub for chatbot sharing, but it's overwhelmed with frustratingly low-quality bots. It's hard to find the good stuff without knowing who the good creators are. So, for a better experience, create an account and follow creators whose bots you enjoy.
  • Chatbots Webring: https://chatbots.neocities.org/ — A webring in 2025? Cool! Automated index of bots from multiple creators directly from their personal pages. Could be a great way to find interesting characters without drowning in pages of low-effort sexbots on Chub. I mean, if the creator went to the trouble of setting up a website to host their bots, they must be into something, right?
  • Anchored Bots: https://partyintheanchorhold.neocities.org/ — Consistently updated list of bots shared on 4chan without having to access 4chan at all, what a blessing.
  • WyvernChat: https://app.wyvern.chat/ — A strictly moderated bot repository that is gaining popularity.
  • Character Tavern: https://character-tavern.com/ — Community-driven platform dedicated to creating and sharing AI Roleplay Character Cards.
  • AI Character Cards: https://aicharactercards.com/ — Promises higher-quality cards though stricter moderation.
  • RisuRealm Standalone: https://realm.risuai.net/ — Bots shared through the RisuRealm from RisuAI.
  • JannyAI: https://jannyai.com/ — Archive of bots ripped from JanitorAI. If you are a migrating user, this may be of interest to you.
  • PygmalionAI: https://pygmalion.chat/explore — Pygmalion isn't as big on the scene anymore, but they still host bots.
  • Character Archive: https://char-archive.evulid.cc/ — Archived and mirrored cards from many various sources. Can't find a bot you had or that was deleted? Look here.
  • Chatlog Scraper: https://chatlogs.neocities.org/ — Want to read random people's funny/cool interactions with their bots? This site tries to scrape and catalog them.

Local LLM/Models

Figuring Out Which Models You Can Run

Want to run a model locally, but are confused by all those names and numbers? No worries! Here's a quick crash course, plus two tools that will help you find the perfect model. First, you just need to understand these four key concepts:

  • Total VRAM is the memory you have available in GPU, your graphics card. This is different than your RAM memory. If you don't know how much memory you have, or if you have dedicated GPU, Google or ask ChatGPT for instructions on checking for your system.
  • In roleplay, the Context Length is how many past messages the AI can hold in memory, measured in tokens, between a syllable and a word. 8192 tokens is pretty good; users generally prefer 16384 for long roleplaying sessions, but you may need to choose a worse model to be able to fit everything in your GPU. An oversized context is useless if your model can't use all the information, so don't go beyond 16K for now, as most models compatible with common domestic hardware can't use it effectively.
  • Models have sizes, calculated in billions of parameters, represented by a number followed by B. Larger model sizes are generally smarter, but not necessarily better at roleplaying, and require more memory to run. So, as a rule of thumb, a model with 12B parameters is smarter than one with 8B parameters.
  • Models are shared in various quantizations, or quants. The lower the number, the dumber the model gets, but less memory you need to run it. The best balance between compatibility and intelligence for AI roleplaying purposes is a GGUF IQ4_XS (or Q4_K_S if there isn't one available), or an EXL2 between 4.0~4.5 bpw.

Simple, right? Total VRAM, context length, model sizes, and quants. Now we will use this information with one of these two calculators:

  • https://sillytavernai.com/llm-model-vram-calculator/ — This tool is the easiest to use. Just enter your Total VRAM and desired Context Size, then click Load Models to see a list of compatible options. Once it loads, sort by Total VRAM and find the highest number followed by B—this indicates the largest model your hardware can run smoothly at IQ4_XS or Q4_K_S. For example, if your system can handle an 8B model, you can run basically any model in that size range or smaller. But I suggest that you choose a Default Recommendation bellow instead of the ones suggested by the calculator, their algorithm favors older models not fine-tuned for roleplaying, as they are more widely used and have had more time to gather more reviews and downloads.
  • https://smcleod.net/vram-estimator/ — If you are a bit more tech-savvy, this calculator is pretty self-explanatory and will let you find the perfect model size and quant for your system. Just adjust the values until the FP16 K/V Cache bar fits into the available VRAM of your GPU.
Default Recommendations

These are the most commonly recommended models by 2025-03. They're not necessarily the freshest or my favorites, and there's no one best model for everyone, but they're tried and true. It's a good idea to test and keep a few models around for variety, as small local models can get repetitive over time and different models tend to have different flavors. Choose a model and go pick a suitable preset.

Finding More Models
  • HuggingFace: https://huggingface.co/models — This is where you actually download models from, but browsing through it is not very helpful if you don't know what to look for.
    • Bartowski/mradermacher: https://huggingface.co/bartowski / https://huggingface.co/mradermacher — I don't know how they do it, but these two keep releasing GGUF quants of every slightly noteworthy model that comes out really quickly. Even if you don't use GGUF models, it's worth checking their profile to see what new models are released.
  • SillyTavernAI Subreddit: https://www.reddit.com/r/SillyTavernAI/ — Want to find what models people are using lately? Do not start a new thread asking for them. Check the weekly Best Models/API Discussion, including the last few weeks, to see what people are testing and recommending. If you want to ask for a suggestion in the thread, say how much VRAM and RAM you have available, or the provider you want to use, and what your expectations are.
  • HobbyAnon: https://venus.chub.ai/users/hobbyanon — This page offers a curated list of models of multiple sizes and instruct templates, along with an easy-to-follow tutorial for getting started with KoboldCPP.

Presets, Prompts and Jailbreaks

Always use a good preset that is appropriate for your model of choice. They are also called prompts or jailbreaks, although this name can be a bit misleading as they are not just for making these AI models write smut and violence — the NSFW part is usually optional.

LLM models are first and foremost corporate-made assistants, so giving them well-structured instructions on how to roleplay and what the user generally expects from a roleplaying session is really beneficial to your experience. Each preset will play a little differently, based on the creator's preferences and the quirks they found with the models, so try different ones to see which one is more to your liking.

Presets are listed by the model or instruct template with which they are compatible. If you're using a finetune and the instruct template isn't obvious from the model name alone, you can usually find that information on the model's original creator page.

Presets for Text Completion Models

To import these presets on SillyTavern, click on the AI Response Formatting button, the third one with an A in the top bar, and press the Master Import button on the top-right of the window. Make sure the ones you downloaded are selected in the drop-down menus. Always read their descriptions to make sure you don't need to tweak any other setting.

Presets for Chat Completion Models

To import these presets on SillyTavern, click on the AI Response Configuration button, the first one with the sliders in the top bar, and a windows titled Chat Completion Presets should pop up — If it has another name, you aren't connected via Chat Completion, fix it first. Now, just press the Import preset icon on the top-right of the window, and make sure the one you downloaded is selected in the drop-down menu. Always read their descriptions to make sure you don't need to tweak any other setting.

Your model's provider/proxy isn't available via Chat Completion in your frontend?
Go back to the If You Want to Use an Online AI section to learn how to add it.

Your model wastes time explaining itself before playing its turn?
It means that you are using a reasoning model. This new type of model will always "think" before writing its responses.
This reasoning step shouldn't be visible to you unless you open the Thinking... window above the model's turn.
If it is geting mixed with your bot's actual responses, make sure your frontend is updated to a version that actually supports reasoning models, and that support for them isn't disabled.
In SillyTavern, to find this option, click on the AI Response Formatting button, the third one with an A in the top bar, and expand the Reasoning section to enable the Auto-Parse option.

You will see these pages talking about Latte from time to time, it is just a nickname for GPT Latest.

SillyTavern Resources

Extensions
Themes
Quick Replies
Novel Roleplaying Setups

Learning How To Roleplay

Build Your Basic Knowledge

Handy Resources for Botmaking

Botmaking is pretty free-form, almost anything you write will work, and everyone does it a little differently, so don't think you need to follow templates or formats to make good bots, plain text is more than fine...


Other Indexes

More people sharing collections of stuff. Just pay attention to when these guides and resources were created and last updated, they may be outdated or contain outdated practices. A lot of these guides come from a time when AI roleplaying was pretty new and we didn't have advanced models with big context windows, everyone was learning and experimenting with what worked best.

Edit Report
Pub: 08 Feb 2025 03:42 UTC
Edit: 23 Mar 2025 18:18 UTC
Views: 22720