Local LLM Testing (for AMD GPUs)
Tested on AMD RX6600 with Koboldcpp and its ROCm supported fork. Mostly just my observations. I tested using Sukino's character card - Sarah Ashworth. Using presets from Sphiratrioth and Sukino. Context size is 4096 unless specified otherwise.
✅Impish_Mind_8B.i1-Q4_K_M
- ROCm required?
- Llama-3-Instruct
- 8k context size benchmark with full offload went ok.
- Follows character's style well with 8k context size, gets slower as the context fills up.
✅Forgotten-Abomination-8B-v4.0.i1-Q4_K_M
- ROCm recommended.
- With 8k context, it's only marginally faster than 12b model
✅Forgotten-Abomination-12B-v4.0.i1-IQ3_XXS
- ROCm required.
- Pretty good and fast.
- It just goes with the flow, doesn't really care about the character card.
- 4k context size benchmark on rocm. 8k it's slow.
☑️patricide-12B-Unslop-Mell.i1-Q4_K_S
- Pretty good
- Slow on Vulkan
- Faster on ROCm
dans-personalityengine-v1.1.0-12b-i1-IQ3_XXS
- ROCm required.
- Fast.
- Creative.
PocketDoc_Dans-SakuraKaze-V1.0.0-12b-IQ3_XXS
- ROCm required.
- Fast.
- Creative.
- It keeps going into the same pattern of splitting sentences again and again. The Conversation preset works better I think.
- Overall not bad.
L3-8B-Lunar-Stheno.i1-Q4_K_M
- Vulkan recommended.
- fastest out of the variants.
Not creative. Faster on Vulkan. Further testing, very creative. I like this one. "Excellent. Confusion suits you." - Ok. I really like this one. Inconsistent.
gemma-2-9b-it-abliterated-IQ4_XS
- ROCm required.
- Fast on ROCm. Slow on Vulkan. It gets slow as the context fills up. Maybe not the most creative. I think this one keeps the style of the character even in longer conversations though. Doesn't like to say mean stuff.
Gemma-2-Ataraxy-9B-IQ4_XS
- ROCm required.
- Dumb on Vulkan. Smart on ROCm with MMQ and FA. Short responses. Not very creative. Actually it started to get creative after the second message. Also fast. Slow on Vulkan. It gets confused at the start. Further testing - it's pretty boring honestly.
MN-12B-Mag-Mell-R1-Q4_K_S
- ROCm required.
- ChatML
- This one might be as good as IQ4XS. Maybe a little too chatty? Focuses more on conversation than world details. More like it fits the details into the conversation. Not too fast on ROCm, not too slow either.
MN-12B-Mag-Mell-R1.IQ4_XS
- ROCm recommended.
- ChatML
- Very slow on both Vulkan and ROCm, but very creative It's a little faster with MMQ and Flash Attention enabled on ROCm.
L3-8B-Stheno-v3.2-Q4_K_M-imat
- Works on Vulkan.
- Coherent, I think I prefer this over Lunaris. Can talk but does not build a world like mag-mell or sakurakaze.
etherealaurora-12b-v2-i1-IQ3_XXS
- ROCm required.
I like this one.Slow but smart and creative.
gemma-2-Ifable-9B-Uncensored-DeLMAT.i1-Q4_K_S
- ROCm. Test more with Vulkan.
- Slow.
- Creative.
- It's talking on my behalf. Doesn't seem smart.
- Much better with Sphiratrioth's GM and RP1.3 settings, with Sukino's Context and Instruct.
- It's not that slow in actual usage.
Not Recommended
❌Silicon-Maid-7B.i1-Q4_K_S
- Very fast. Might work with a lot of tuning, but it's pretty dry overall. Trying different settings, it's still pretty dry.
❌gemma-2-9b-it-IQ4_XS
- abliterated version is better
❌gemma-3-12b-it-abliterated.i1-IQ3_XXS
- Slow and not creative. Tested on ROCm with MMQ and FA
❌NemoMix-Unleashed-12B-Q4_K_S
- First message and it already forgot the context. And slow. Also slow with Vulkan
❌Phi-lthy4-IQ3_XS
- very stupid, maybe vulkan? stupid on rocm too.
❌MN-12B-Mag-Mell-R1-IQ3_XXS-imat
- Much faster on ROCm but I feel it's also dumber and less creative. Yeah definitely dumber. IQ4XS is much more creative.
❌NemoMix-Unleashed-12B-IQ4_XS
- Unusably slow on Vulkan. Barely usable on ROCm. Creative.
❌Kunoichi-7B.i1-Q5_K_M
- Can't follow instructions. Talks on user's behalf. Fast.
❌Kunoichi-DPO-v2-7B.i1-Q4_K_M
- Can't follow instructions. Talks on user's behalf. Fast.
❌L3-8B-Lunaris-v1.i1-Q4_K_M
- Coherent. Responses are shorter than I'd like. It can be creative at times. Most of the time it isn't. Not bad though. Similar to Stheno. Works on Vulkan.