HUFFLEPUFF!
Currently serving:
Model: Mixtral-Instruct-3.75bit
Hardware: RTX 3090
API URL: https://parking-coupled-regarded-pi.trycloudflare.com
Mixtral is very sensitive to prompt format. Follow to the T or get suboptimal output!
Story string (respect the spaces):
Chat start:
Example separator: Empty
System prompt:
Sequences for this model:
Input Sequence: [INST]
Output Sequence: [/INST]
Last Output Sequence: [/INST]
Stop Sequence: </s>
Separator: </s>
Recommended parameters for this model:
https://rentry.org/llm-settings
How to use:
1. Proxy menu:
2. Advanced menu:
Are my prompts logged?
Nope. I have no interest in your ahh ahh mistress loli smut. I store nothing, no statistics, not even prompt count.
Known issues
- Small models suffer from limited spatial reasoning, but are still excellent at conversations. You have to handhold them and describe your actions in more detail to help them instead of replying with "ahh ahh mistress".
- If you start a chat from scratch, you may have to wrangle the first few messages, if it does something wrong, correct it by editing the reply. The model will learn and fall into pattern. Alternatively, use cards with good and diverse example dialogues.
- Asterisks in replies are fucked? Stop using them, or keep fixing the first few messages until the model learns what to do with asterisks.
- Model keeps narrating your actions? Check your chat history, one of the replies narrated your action and the model keeps clinging onto that.
- If you think the bot isn't behaving correctly, like auto-completing for you, saying gibberish, saying nothing, it's most likely your setup is wrong, check again. When in doubt, check SillyTavern's console output, the prompt should always end with Output Sequence, followed by
{{char}}:
How to host your own proxy
On your own machine/rented VM: https://rentry.org/hostfreellamas
On Google Colab for free:
https://colab.research.google.com/github/LostRuins/koboldcpp/blob/concedo/colab.ipynb
SOTA model: https://huggingface.co/InferenceIllusionist/mini-magnum-12b-v1.1-iMat-GGUF/resolve/main/mini-magnum-12b-v1.1-iMat-IQ4_NL.gguf
- 32k context, Mistral prompt format, remember to enable BOS token.