How to mixtral for retards:
Grab latest Kobold:
https://github.com/LostRuins/koboldcpp/releases/
Grab mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf:
https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF/tree/main
Mixtral common pratfalls:
Using below 5bit (at least atm, apparently quants work differently for moe models)
Using bad quants with wrong rope settings:
Use --ropeconfig 1.0 1000000 to make sure the quant your using has the right rope settings built in.
Bad formatting if following its official format:
https://desuarchive.org/g/thread/97876734/#97880774
Using mirostat, seen at least 3 times that it causes it to repeat / makes mixtral retarded.
Prompt processing is not optimized for moe yet so:
For now, you might have better luck using --noblas or setting --blasbatchsize -1 when using Mixtral
Apparently SillyTavern has multiple formatting issues but the main one is that card's sample messages need to use the correct formatting:
https://desuarchive.org/g/thread/97876734/#97880774