Test Objective
- I tested a theory I had; we can loosely measure how good the finetune is at following RP instructions if we generate without any context and try to let it create a full prompt.
- The uber popular 13b 'MythoMax' model leans towards 'SillyTavern' prompt styles in a way that is stronger than the alternatives that came before it, and notably some of the ones that are being tested/popularized right now.
Key Findings:
- MythoMax as a model has a unique trait: it effectively adapts to making valid SillyTavern-style prompts better than the others, which are more open ended / inconsistent formatting wise, and Mytho expects ST prompting styles pretty consistently (e.g scenario, personality, same system and human prompts). But, certain parts of prompting, like "Never respond as X" that we have in the SillyTavern prompts, don't make a lot of sense. In a future finetune, you might want to replace those with "Always respond as the character you are portraying" or something with a stronger positive bias, and change ST itself to reflect this. Also, this statement is an anecdote, but I feel like LLMs do a better job at following "DO" instructions than they do "DON'T" ones, as a general trend.
- Some models labeled or 'marketed' as being for RP don't naturally produce RP responses. Explicit prompting can help, but the results might not always have the desired format, with 'MLewdBoros' standing out in this aspect (it much prefers to write C code). That doesn't mean these tunes are bad inherently, but I think the fact they don't expect to follow directions in the style we've provided should be considered. Otherwise, we are merging a bunch of models with distinct instruction styles and hoping that the model learned to follow the prompts correctly based on formatting that's not even trained in any of the models (wtf is
#### {{char}}:
lmao) - For better results, finetuning for models probably needs to figure out a better balance between 'consistent formatting' and 'creative dialogues', or at least put some more emphasis on consistent instuctions.
User Recommendations:
- For better interaction with MythoMax models, use tokens
<<SYSTEM>>
and<<HUMAN>>
to separate between 'bot response' and 'human input'. The model doesn't output 'Alpaca' formatting naturally, despite what MythoMax's page says on HF. - The token
<<AIBOT>>
sometimes denotes narration, likely influenced by parts of the merge like 'Kimiko' or 'LimaRP'
Finetuner Recommendations:
- Embrace the 'SillyTavern' prompt style, and make slight adjustments for better results.
- Avoid being too rigid with pre-existing presets. Consider creating a new standard based off the ones we have or refining current ones.
- Normalizing or adapting the finetune data before training might help significantly.
- Maybe a 'SillyTavern formatting' Lora would be more adaptable? Or you could 'calibrate' a finetune on top of your merge so that it follows ST instruction style?
Miscellaneous:
- Tested using llama.cpp version llama-b1226.
MythoMax 4_K_M
Using main.exe -m mythomax-l2-13b.Q4_K_M.gguf --color -c 4096 --keep -1 -n -1 -t 1 -b 512 -i --gpu_layers 43
Sometimes, attempts at Temp 1.4 often resulted in the starter tokens not being <<SYSTEM>>
, leading to unhinged nonsense outputs half of the time, (usually code), but I let this happen in the case a recognizable system prompt that follows <<SYSTEM>>
could emerge. This didn't happen through my limited testing.
Example #1 [TEMP 0.2]:
Example #2 [TEMP 0.2]:
Example #3 [TEMP 0.8]:
Example #4 [TEMP 0.8]:
Example #5 [Temp 1.4]:
Example #6 [Temp 1.4]:
Mythalion 4_K_M
Using main.exe -m mythalion-13b.Q4_K_M.gguf --color -c 4096 --keep -1 -n -1 -t 1 -b 512 -i --gpu_layers 43
Example #1 [Temp 0.2]:
Example #2 [Temp 0.2]:
Example #3 [Temp 0.8]:
Example #4 [Temp 0.8]:
Example #5 [Temp 1.4]:
Example #6 [Temp 1.4]:
MLewdBoros 4_K_S
I could not get it to write a roleplay by itself without externally provided context. It always makes some variation of a program, usually C / C#. No seriously. So then I tried giving it some decent starter tokens:
This didn't help much. The writing was still quite dry, and it even mentioned 'her programming' later on (GPT RLHF seeping through...)
Certainly doesnt adhere to SillyTavern prompting styles.