Idiot-Proof Mixtral Guide (April 2024 Edition)

1. Prerequisites

Have at least 30-40GB of total system RAM. This includes the VRAM in your graphics card, or GPU.
If you're not sure, check your task manager and click on the performance tab to see your GPU and how much RAM you have.
A good setup to have, for example, would be 32GB of RAM and an RTX 3060 with 12GB of VRAM, totaling 44GB.
This guide will assume you have a Nvidia GPU, as that is the easiest one to use for LLMs.

2. Download Kobold.cpp

The following page is the Releases tab for Kobold.cpp. It's a GUI program with llama.cpp that makes using LLMs easy and accessible.
https://github.com/LostRuins/koboldcpp/releases
Download the one on the top, as that is the most recent. You will want the one that says koboldcpp.exe. This program has all the features you need, and opening it is as simple as downloading and double-clicking it. Ignore the popup from Windows, if you get one. That is just a false positive.

3. Download a Mixtral Model

Go to huggingface.co to find a Mixtral merge or finetune of your choice. There are more than a few out there, and the best one is up to personal preference. However, I will link a few of my recommendations below.

Mixtral Instruct:
https://huggingface.co/InferenceIllusionist/Mixtral-Instruct-ITR-8x7B-GGUF?not-for-all-audiences=true

BagelMisteryTour (BMT):
https://huggingface.co/ycros/BagelMIsteryTour-v2-8x7B-GGUF

Nous-Hermes Mixtruct:
https://huggingface.co/InferenceIllusionist/Nous-Hermes-2-Mixtruct-v0.1-8x7B-DPO-DARE_TIES-iMat-GGUF

I recommend using the Q4_K_M quants, as they are the best balance between quality and speed. If you have the extra RAM, you can get the Q5_K_M quant, if you're willing to sacrifice some speed for extra quality.
I also recommend using the quants with iMat, if they are available, as they boost the performance of quants.

4. Download SillyTavern

This isn't entirely necessary, as Kobold.cpp has a built-in frontend, but SillyTavern is the best experience, especially for roleplay and chat purposes.

https://github.com/SillyTavern/SillyTavern
Follow the installation guide on the link above to install it. The staging branch, though unstable, is recommended as of writing this due to features that have not yet made it to the release branch.

5. Setup

Upon opening Kobold.cpp, you will be greeted with this window.

Alt Tag

You will need to adjust the settings for the best experience. First, load your GGUF model by clicking the Browse button. It will take you to your files, where you can find the location of your downloaded and model and load it.

Next, determine how many GPU layers you will need to use. If your setup is similar to mine, then you can use 12. Otherwise, add or subtract layers as needed.

After that, you will need to allocate context for the model. This is basically how many words your model can memorize. Mixtral models have a maximum context of 32k, but you may need to use less to fit your model.

Finally, there are the check boxes. I advise that you disable MMAP and enable mmq. That gives me the best experience. Disabling the browser UI isn't necessary, but neither is letting it launch.

Your setup should look similar to this if you followed the instructions.

Alt Tag

You can click the Save button on the bottom to save your settings as a preset file. You will then be able to click Load to load it immediately any time you open kobold.cpp.

Once you have both kobold.cpp and SillyTavern open (to open SillyTavern, click the start.bat file in the SillyTavern folder), you will have to connect the two so that you can use SillyTavern as a frontend while kobold.cpp actually operates the model.

Alt Tag

When you open SillyTavern, there will be a red plug button on the top. Click it, and it will reveal a drop down menu for API Connections.
Under API, select Text Completion, and under API Type, select Koboldcpp. It will ask for an API URL to connect to Koboldcpp. Check the black terminal window for Kobold and look at the bottom. You will see your URL there, so copy and paste it into the API URL text box in SillyTavern, and click Connect.

With that, you will be connected to Koboldcpp and ready to chat with your Mixtral model!

6. Cards

SillyTavern will have Character cards that you can find by clicking the I.D button on the top right of SillyTavern. Click on them and you can chat with them. If the ones provided aren't enough, you can look for cards to download online. The most notable website is chub.ai. Be warned that a lot of the cards on there are very NSFW and can potentially be disturbing.
Alternatively, you can make your own! Local models used to have formats for cards, but that isn't really necessarily with MIxtral. It is smart and has plenty of context, so plain text will suffice. In the character card drop down menu, click the button the the top left that says "Create New Character". Write the prompt for your character in the Description field of the character creator in SillyTavern, and create a first message in the Greeting box in the bottom. There are more settings to fiddle around with, but the first two are more than enough for a good card.

7. Samplers

This is something that experimentation will be needed for the best results. On the top left of SillyTavern, there is a button that looks like three sliders. These are the Sampler Settings.
There are no predefined rules for what you should do with Samplers. The common consensus right now is to neutralize all the samplers and just use Temperature, Min P, and Smooth Sampling. Temp should be high, Min P should be very low (less then 0.1), Smoothing Factor should be low, and Curve should be around 4 or so. These can be adjusted, but results may vary.

Addendum. Asking for Help

It is never a bad thing to ask for help, especially when Google isn't giving the answer you need. It should be noted, however, that 4chan, the site this rentry was first posted, isn't very friendly at times, even /lmg/, the general thread for Local LLMs. As of writing, /lmg/ has a massive stick up its ass and is, for unknown reasons, hostile towards people asking for help. This is not a new thing. /lmg/ will sometimes go into a temper tantrum over seemingly insignificant things, such as the release of a new proprietary model such as Claude 3, a dry spell (which, in /lmg/ terms, is a few days or weeks without any groundbreaking news or discoveries in one of the newest and most quickly growing technologies right now), or transgender women, the latter of which causing the general to split into two threads multiple times.

You might be able to get good advice during periods of emotion instability like these. However, this comes at the cost of your sanity, as you have to navigate and post in extremely cancerous threads. Much like a woman, the best course of action is to leave /lmg/ alone for a couple days so they can sort out their feelings, and come back later when they are in a better mood.

Edit Report
Pub: 02 Apr 2024 14:35 UTC
Edit: 02 Apr 2024 14:45 UTC
Views: 349