Fast Mixtral on vast.ai easily
Using this method, initial prompt processing will take a while (around 30~60 seconds), but once a conversation gets started it should be near-instant. For generation itself I get ~25 tokens/second on a new chat (7 seconds for a 150-token response).
Please let me know in the thread if you have corrections.
Getting your Kobold URL
- Go to https://cloud.vast.ai/templates/ and select Cuda:12.0.1-Devel-Ubuntu20.04
- You should land on https://cloud.vast.ai/create - set your filters as outlined below:
- Click on "RENT" for any of the offers. Visit https://cloud.vast.ai/instances/. When the light blue "CREATING..." button turns into ">_ CONNECT", click there to get the ssh command. Vast.ai will give you instructions to provide it with a public SSH key if you haven't already, just follow those. Then open a terminal on your desktop and run the SSH command. If you're stuck on this step somehow then consult "Method without SSH (not recommended)"
- Once you are in your server's command line, copypaste this entire sequence of commands into the prompt, or for short
curl https://files.catbox.moe/8914or.sh | bash
- This installs Kobold and downloads Mixtral, which can take between 15 and 30 minutes depending on your instance's network speed.)
- When you see this in your terminal, you're ready. Append
/api
to your Cloudflare URL and that's your Kobold API URL.
- When you're done using Mixtral for the day I recommend destroying your instance with the trashcan icon to save money.
SillyTavern settings
- API Connections
- Advanced Formatting (simply select the "Alpaca" preset)
- AI Response Configuration: Here are some recommended settings from the MixtralForRetards rentry. The most important parts are:
- Make sure Mirostat Mode: 0 (other settings reported to work poorly on Mixtral)
- Ban EOS Token: unticked (otherwise Mixtral will always continue generating until your response max tokens, in schizo ways).
- Repetition Penalty 1, Repetition Penalty Slope 0 (higher settings reported to cause schizo replies)
Method without SSH (not recommended)
- Go to https://cloud.vast.ai/templates/ and EDIT Cuda:12.0.1-Devel-Ubuntu20.04
- In "On-start Script", paste:
- Click "SELECT AND SAVE"
- Rent an instance
- Periodically check the instance's logs with the below outlined button. After 15 to 30 minutes, you will see your Cloudflare URL, just append
/api
to it, that's your Kobold API URL.