splitcloverqr + bloatmaxx

by meatrocket meatyrocket@proton.me discord: meat.rocket

| archive | presets | prompts |


cut down more than 2x the gen time of CoT with this QR preset


latest updates

(27/6/24): New Preset 1 & Preset 2 update (CoTmaxx v3.0 & Bloatmaxx 2.0)

So I've been experimenting with this dual setup for a couple of days now, and I think I nailed the format for this dual setup, in order to achieve the best balance between overall quality, length, pacing, descriptive emphasis, and dialogue :

First preset only contains a CoT prompt to have gpt-4o create a <thinking> box with free reign, only letting it describe how {{char}} would respond & nothing else.

  • Thankfully gpt-4o can be used for this, which to their credit, if you give it LESS instructions and giving it full reign with only a few restrictions, it does a way better job at the task than bloating the CoT with additional instructions & parameters)

Second preset contains everything else, as the CoT & tasks are mostly isolated from chat history and primarily applied for the immediate response.
This means this should work with both Claude & GPT simultaneously. Hooray for GPT.


setup
  1. download the QR preset by splitclover (setup at the bottom of this rentry)
  2. download Preset 1 (CoT v3.0) (27/06/24)
  3. download Preset 2 (bloatmaxx 2.0) (27/06/24)
  4. set Preset 1 to Claude 3.5 Sonnet
  5. set Preset 2 to Claude 3.0 Opus
older version
  1. Preset 1 (CoT v2.0) (24/06/24) | Preset 2 (simple v1.2) (23/06/24)

wtf is this?

a QR preset with bloatmaxxed CoT which uses Claude 3.5 Sonnet (or gpt-4o if you are a degenerate) to generate the CoT & prefill, and finishes it with either 3.0 Opus for the rest of the response

  • Preset 1 contains CoT, if you want to change the prose you can enable/add toggles here
  • Preset 2 contains just the jailbreak & simple preset and if you want to bloatmaxx HTML & use other gimmick prompts

the benefits?

primarily cuts down on time generating a bulky CoT-like prompt & sends it as part of the prompt for completion with a better RP model

may also have additional benefits CoT has such as reduced repetition, improved instruction handling, spatial awareness and handling complex character definitions.

which model(s) do I use?

for the first preset, if you value speed, choose the faster model you have for the CoT task:

  • Claude 3.5 Sonnet is ~30-40 tk/s, a great all rounder and can very effectively handle all kinds of scenarios.
  • mistral-large & 8x22b are both decently capable and are fast as well, if you have access to these models they are quite good for this task.
  • gpt-4o is ~80-95 tk/s, albeit monstrously fast and quite capable, does falter a bit in complex scenarios or involving nsfw. It isn't very capable of handling RP.

for the second preset, just choose the smartest model you have available (opus/sonnet).

how much time does each model take?

assuming the CoT = ~600 tokens, and the 2nd preset Prompt = ~1000 tokens (max, with HTML bloat) for a total of about 1600 tokens per full reply:

  • mistral-8x22b > 3.5 Sonnet (~8 sec. CoT + ~25 sec. = 33 seconds total gen)
  • mistral-8x22b > 3.0 Opus (~8 sec. CoT + ~50 sec. = 58 seconds total gen)
  • mistral-large > 3.5 Sonnet (~11 sec. CoT + ~25 sec. = 36 seconds total gen)
  • mistral-large > 3.0 Opus (~11 sec. CoT + ~50 sec. = 61 seconds total gen)
  • 3.5 Sonnet > 3.0 Opus (~13 sec. CoT + ~50 sec. = 63 seconds total gen)

for mainly SFW scenarios or for speed, i can also recommend the following::

  • mixtral 8x7b > 3.5 Sonnet (~2 sec. CoT + ~25 sec. = 27 seconds total gen) (via groq)
  • 3.0 Haiku > 3.5 Sonnet (~8 sec. CoT + ~25 sec. = 31 seconds total gen)
  • gpt-4o > 3.5 Sonnet (~7 sec. CoT + ~25 sec. = 32 seconds total gen)
  • gpt-4o > 3.0 Opus (~7 sec. CoT + ~50 sec. = 57 seconds total gen)

without dual preset jb, this is how long claude would take:

  • 3.5 Sonnet (~13 sec. CoT + ~25 sec. = 38 sec. total gen) (baseline CoT & prompt on same model)
  • 3.0 Opus (~30 sec. CoT + ~50 sec. = 80 sec. total gen) (baseline CoT & prompt on same model)

how to set up the QR-preset
  1. download the QR-preset, import it into the Quick Reply extensions and enable it as global chat

  1. download both preset json files and import them into your prompt presets
  2. make sure preset 1 is connected to your GPT-4o proxy and saved on that model & preset
  3. do the same with preset 2 but with your Claude 3.5 proxy of choice
  4. edit the presets to your output desire
  5. now click the :gear: icon in the QR menu, and set the api, model, and proxy
  6. also change the names of the presets in the QR menu to preset 1 and preset 2 respectively
  7. to generate a reply click the [djb] Send button, if all goes well, you will see a CoT pop up first, before it disappears and Sonnet takes over with the response

special thanks to:

  • splitclover (for the QR preset)
  • /aicg/ anons & jbmakies for suggestions + ideas
  • sturdycord for their support & proxies
Edit
Pub: 22 Jun 2024 17:03 UTC
Edit: 27 Jun 2024 04:06 UTC
Views: 9182