HUFFLEPUFF!


Currently serving:

Model: Behemoth-v1.1 123B
Hardware: HGX H100s

API URL (Aphrodite): https://rncsz-203-127-135-238.a.free.pinggy.link

Can't use cloudflare, and other providers keep hitting limits or randomly dying. ST calls the endpoint 10 times to tokenize each prompt. Fix your shit STdevs.


Master Import template for this model:
https://files.catbox.moe/f6cpwd.json


Recommended parameters for this model:

1
2
3
4
5
Context: [50k]
New Tokens: [1000] - use "Continue" button if the reply gets cut off
Temperature: [1.2] - slide this up as your chat gets longer for more creativity, low context + high temp = gibberish
Min_p: [0.1] - minimum chance for a token to be selected compared to the best token, higher = more accurate but more slop
Do_sample: [On]

https://rentry.org/llm-settings


How to use with SillyTavern:

1. Proxy menu:

proxy-menu

2. Advanced menu:

classic


Known issues
  • Small models suffer from limited spatial reasoning, but are still excellent at conversations. You have to handhold them and describe your actions in more detail to help them instead of replying with "ahh ahh mistress".
  • Asterisks in replies are fucked? Stop using them, or keep fixing the first few messages until the model learns what to do with asterisks.
  • Model keeps narrating your actions? Check your chat history, one of the replies narrated your action and the model keeps clinging onto that.
  • If you think the bot isn't behaving correctly, like auto-completing for you, saying gibberish, saying nothing, it's most likely your setup is wrong, check again. When in doubt, check SillyTavern's console output, the prompt should always end with Output Sequence, followed by {{char}}:
    Example ST Prompt
How to host your own proxy

On your own machine/rented VM: https://rentry.org/hostfreellamas

On Google Colab for free:
https://colab.research.google.com/github/LostRuins/koboldcpp/blob/concedo/colab.ipynb
SOTA model: https://huggingface.co/mradermacher/Rocinante-12B-v1.1-i1-GGUF/resolve/main/Rocinante-12B-v1.1.i1-Q4_K_S.gguf - 32k context, Mistral format

SillyTavern settings for this model:
Hyperparameters (temperature, etc.): https://files.catbox.moe/fnvgql.json
Template for Master Import: https://files.catbox.moe/f6cpwd.json

Contact

sandwich4093@proton.me

Edit
Pub: 27 Jul 2023 19:44 UTC
Edit: 27 Oct 2024 19:54 UTC
Views: 46685