Fast Mixtral on vast.ai easily

Using this method, initial prompt processing will take a while (around 30~60 seconds), but once a conversation gets started it should be near-instant. For generation itself I get ~25 tokens/second on a new chat (7 seconds for a 150-token response).

Please let me know in the thread if you have corrections.

Getting your Kobold URL
SillyTavern settings
Method without SSH (not recommended)

Getting your Kobold URL

Go to https://cloud.vast.ai/templates/ and select Cuda:12.0.1-Devel-Ubuntu20.04

You should land on https://cloud.vast.ai/create - set your filters as outlined below:

Click on "RENT" for any of the offers. Visit https://cloud.vast.ai/instances/. When the light blue "CREATING..." button turns into ">_ CONNECT", click there to get the ssh command. Vast.ai will give you instructions to provide it with a public SSH key if you haven't already, just follow those. Then open a terminal on your desktop and run the SSH command. If you're stuck on this step somehow then consult "Method without SSH (not recommended)"

Once you are in your server's command line, copypaste this entire sequence of commands into the prompt, or for short curl https://files.catbox.moe/8914or.sh | bash
- This installs Kobold and downloads Mixtral, which can take between 15 and 30 minutes depending on your instance's network speed.)

⎗
✓
git clone https://github.com/LostRuins/koboldcpp
cd koboldcpp
wget -c -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
chmod +x cloudflared-linux-amd64
make LLAMA_CUBLAS=1
wget https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/resolve/main/mixtral-8x7b-instruct-v0.1.Q5_0.gguf
(sleep 60 && ./cloudflared-linux-amd64 tunnel --url http://localhost:5001) &
./koboldcpp.py ./mixtral-8x7b-instruct-v0.1.Q5_0.gguf 5001 --ropeconfig 1.0 1000000 --contextsize 32768 --usecublas --gpulayers 33 --blasbatchsize -1

When you see this in your terminal, you're ready. Append /api to your Cloudflare URL and that's your Kobold API URL.

When you're done using Mixtral for the day I recommend destroying your instance with the trashcan icon to save money.

SillyTavern settings

API Connections

Advanced Formatting (simply select the "Alpaca" preset)

AI Response Configuration: Here are some recommended settings from the MixtralForRetards rentry. The most important parts are:
- Make sure Mirostat Mode: 0 (other settings reported to work poorly on Mixtral)
- Ban EOS Token: unticked (otherwise Mixtral will always continue generating until your response max tokens, in schizo ways).
- Repetition Penalty 1, Repetition Penalty Slope 0 (higher settings reported to cause schizo replies)

Method without SSH (not recommended)

Go to https://cloud.vast.ai/templates/ and EDIT Cuda:12.0.1-Devel-Ubuntu20.04
In "On-start Script", paste:

⎗
✓
git clone https://github.com/LostRuins/koboldcpp
cd koboldcpp
wget -c -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
chmod +x cloudflared-linux-amd64
make LLAMA_CUBLAS=1
wget https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/resolve/main/mixtral-8x7b-instruct-v0.1.Q5_0.gguf
(sleep 60 && ./cloudflared-linux-amd64 tunnel --url http://localhost:5001) &
./koboldcpp.py ./mixtral-8x7b-instruct-v0.1.Q5_0.gguf 5001 --ropeconfig 1.0 1000000 --contextsize 32768 --usecublas --gpulayers 33 --blasbatchsize -1

Click "SELECT AND SAVE"
Rent an instance
Periodically check the instance's logs with the below outlined button. After 15 to 30 minutes, you will see your Cloudflare URL, just append /api to it, that's your Kobold API URL.

Fast Mixtral on vast.ai easily

Getting your Kobold URL

SillyTavern settings

Method without SSH (not recommended)

Warning