Idiot-Proof Mixtral Guide (April 2024 Edition)

1. Prerequisites

Have at least 30-40GB of total system RAM. This includes the VRAM in your graphics card, or GPU.
If you're not sure, check your task manager and click on the performance tab to see your GPU and how much RAM you have.
A good setup to have, for example, would be 32GB of RAM and an RTX 3060 with 12GB of VRAM, totaling 44GB.
This guide will assume you have a Nvidia GPU, as that is the easiest one to use for LLMs.

2. Download Kobold.cpp

The following page is the Releases tab for Kobold.cpp. It's a GUI program with llama.cpp that makes using LLMs easy and accessible.
https://github.com/LostRuins/koboldcpp/releases
Download the one on the top, as that is the most recent. You will want the one that says koboldcpp.exe. This program has all the features you need, and opening it is as simple as downloading and double-clicking it. Ignore the popup from Windows, if you get one. That is just a false positive.

3. Download a Mixtral Model

Go to huggingface.co to find a Mixtral merge or finetune of your choice. There are more than a few out there, and the best one is up to personal preference. However, I will link a few of my recommendations below.

Mixtral Instruct:
https://huggingface.co/Artefact2/Mixtral-8x7B-Instruct-v0.1-GGUF

BagelMisteryTour (BMT):
https://huggingface.co/ycros/BagelMIsteryTour-v2-8x7B-GGUF

Nous-Hermes Mixtruct:
https://huggingface.co/InferenceIllusionist/Nous-Hermes-2-Mixtruct-v0.1-8x7B-DPO-DARE_TIES-iMat-GGUF

I recommend using the Q4_K_M quants, as they are the best balance between quality and speed. If you have the extra RAM, you can get the Q5_K_M quant, if you're willing to sacrifice some speed for extra quality.
I also recommend using the quants with iMat, if they are available, as they boost the performance of quants.

4. Download SillyTavern

This isn't entirely necessary, as Kobold.cpp has a built-in frontend, but SillyTavern is better, especially for roleplay and chat purposes.

https://github.com/SillyTavern/SillyTavern
Follow the installation guide on the link above to install it. The staging branch, though unstable, is recommended as of writing this due to features that have not yet made it to the release branch.

To open SillyTavern, find the SillyTavern folder that you just cloned. If you haven't altered anything with your Git installation, your SillyTavern folder should be in your Users folder, inside the file on your Windows username. Click the start.bat file after downloading and it will install its requirements. If you want to get any updates, click the UpdateAndStart.bat file. It will automatically pull anything that's been merged into the branch you chose.

5. Setup

Upon opening Kobold.cpp, you will be greeted with this window.

Blank Kobold.cpp startup screen

You will need to adjust the settings for the best experience. First, load your GGUF model by clicking the Browse button. It will take you to your files, where you can find the location of your downloaded model and load it.

Next, determine how many GPU layers you will need to use. If your setup is similar to mine, then you can use 12. Otherwise, add or subtract layers as needed.

After that, you will need to allocate context for the model. This is basically how many words your model can memorize. Mixtral models have a maximum context of 32k, but you may need to use less to fit your model.

Finally, there are the check boxes. I advise that you disable MMAP and enable mmq. That gives me the best experience. Disabling the browser UI isn't necessary, but neither is letting it launch.

Your setup should look similar to this if you followed the instructions.

My Kobold.cpp settings

You can click the Save button on the bottom to save your settings as a preset file. You will then be able to click Load to load it immediately any time you open kobold.cpp.

Once you have both kobold.cpp and SillyTavern open, you will have to connect the two so that you can use SillyTavern as a frontend while kobold.cpp actually operates the model.

Kobold.cpp API drop down in SillyTavern

When you open SillyTavern, there will be a red plug button on the top. Click it, and it will reveal a drop down menu for API Connections.
Under API, select Text Completion, and under API Type, select Koboldcpp. It will ask for an API URL to connect to Koboldcpp. Check the black terminal window for Kobold and look at the bottom. You will see your URL there, so copy and paste it into the API URL text box in SillyTavern, and click Connect.

With that, you will be connected to Koboldcpp and ready to chat with your Mixtral model!

6. Cards

SillyTavern will have character cards that you can find by clicking the ID button on the top right of SillyTavern. Click on them and you can chat with them. If the ones provided aren't enough, you can look for cards to download online. The most notable website is chub.ai. Be warned that a lot of the cards on there are very NSFW and can potentially be disturbing.
Alternatively, you can make your own! Local models used to have formats for writing cards, but that isn't necessary with MIxtral. It is smart and has plenty of context, so plain text will suffice. In the character card drop down menu, click the button the the top left that says "Create New Character". Write the prompt for your character in the Description field, and create a first message in the Greeting box in the bottom. There are more settings to fiddle around with, but the first two are more than enough for a good card.

SillyTavern Character Creator

7. Samplers

On the top left of SillyTavern, there is a button that looks like three sliders. These are the Sampler Settings.
There are no predefined rules for what you should do with samplers. The common consensus right now is to click the black Neutralize Samplers button and just use Temperature, Min P, and Smooth Sampling. Temp should be high(around 4), Min P should be very low (less then 0.1), Smoothing Factor should be around 0.2 and Curve should also be around 4 or so. These can be adjusted, but results may vary. Experimentation will be needed to get the best results for what you want.

For reference, these are my settings.
My Samplers for BagelMisteryTour

Addendum: Asking for Help

It is never a bad thing to ask for help, especially when Google isn't giving the answer you need. It should be noted, however, that 4chan, the site this rentry was first posted, isn't very friendly at times. That also applies to /lmg/, the general thread for Local LLMs. As of the time of writing, /lmg/ has a massive stick up its ass and is hostile towards people asking for help. This is not a new thing. /lmg/ will sometimes go into a temper tantrum over seemingly insignificant things, such as the release of a new proprietary model such as Claude 3, a dry spell (which, in /lmg/ terms, is a few days or weeks without any groundbreaking news or discoveries in one of the newest and most quickly growing technologies right now), or transgender women, the latter of which causing the general to split into two threads multiple times.

You might be able to get good advice during periods of emotional instability like these. However, this comes at the cost of your sanity, as you have to read and post in extremely cancerous threads. Much like a woman, the best course of action is to leave /lmg/ alone for a couple days so they can sort out their feelings, and come back later when they are in a better mood.