HUFFLEPUFF!
Currently serving:
Model: Mixtral-Instruct-3.75bit
Hardware: RTX 3090
API URL: https://parking-coupled-regarded-pi.trycloudflare.com
Mixtral is very sensitive to prompt format. Follow to the T or get suboptimal output!
Story string (respect the spaces):
Chat start:
Example separator: Empty
System prompt:
Sequences for this model:
Input Sequence: [INST]
Output Sequence: [/INST]
Last Output Sequence: [/INST]
Stop Sequence: </s>
Separator: </s>
Recommended parameters for this model:
https://rentry.org/llm-settings
How to use:
1. Proxy menu:
2. Advanced menu:
Are my prompts logged?
Nope. I have no interest in your ahh ahh mistress loli smut. I store nothing, no statistics, not even prompt count.
Known issues
- Why replies short? => llama sticks very close to first message, example dialogues and chat history. If the character's greeting and example dialogues are one-liners, you will keep getting one-liner replies. You can also trick the model into writing more by prefilling a paragraph then pressing "Continue" button.
- Here's an ideal card that will generate medium responses: https://files.catbox.moe/1ytt9w.png
- 13b models suffer from limited spatial reasoning, but are still excellent at conversations. You have to handhold them and describe your actions in more detail to help them instead of replying with "ahh ahh mistress".
- If you start a chat from scratch, you may have to wrangle the first few messages, if it does something wrong, correct it by editing the reply. The model will learn and fall into pattern. Alternatively, use cards with good and diverse example dialogues.
- Asterisks in replies are fucked? Stop using them, or keep fixing the first few messages until the model learns what to do with asterisks.
- Model keeps narrating your actions? Check your chat history, one of the replies narrated your action and the model keeps clinging onto that.
- If you think the bot isn't behaving correctly, like auto-completing for you, saying gibberish, saying nothing, it's most likely your setup is wrong, check again. When in doubt, check SillyTavern's console output, the prompt should always end with Output Sequence, followed by
{{char}}:
How to host your own proxy
Because I won't keep hosting forever. This is experimental and will stop any day.
On your own machine/rented VM: https://rentry.org/hostfreellamas
On Google Colab for free - Import this into Google Colab: https://files.catbox.moe/77vood.ipynb - Frostwind-10.7B - 11k ctx, alpaca prompt format - temp 0.75, min_p 0.04, rep_pen 1.03