Local LLM Testing (for AMD GPUs)

Tested on AMD RX6600 with Koboldcpp and its ROCm supported fork. Mostly just my observations. I tested using Sukino's character card - Sarah Ashworth. Using presets from Sphiratrioth and Sukino. Context size is 4096 unless specified otherwise.

Last updated on 04-May-2025: More Forgotten Abomination testing with Vulkan and 8k tokens.

Update 29-June-2025: added Darkest-muse-v1.i1-Q4_K_S

✅Darkest-muse-v1.i1-Q4_K_S

Tested on Vulkan.

✅Impish_Mind_8B.i1-Q4_K_M

ROCm required?
Llama-3-Instruct
8k context size benchmark with full offload went ok.
Follows character's style well with 8k context size, gets slower as the context fills up.

✅Forgotten-Abomination-8B-v4.0.i1-Q4_K_M

ROCm recommended.
With 8k context, it's only marginally faster than 12b model

✅Forgotten-Abomination-12B-v4.0.i1-IQ3_XXS

~~ROCm required.~~ Vulkan works too.
Pretty good and fast.
It just goes with the flow, doesn't really care about the character card.
Further testing, it follows the card. Used with sleepdeprived3/Mistral-V3-Tekken-T4.
4k context size benchmark on rocm. 8k it's slow.
In actual usage with 8k, some layers offloaded to system memory, it works fine.

☑️patricide-12B-Unslop-Mell.i1-Q4_K_S

Pretty good
Slow on Vulkan
Faster on ROCm

dans-personalityengine-v1.1.0-12b-i1-IQ3_XXS

ROCm required.
Fast.
Creative.

PocketDoc_Dans-SakuraKaze-V1.0.0-12b-IQ3_XXS

ROCm required.
Fast.
Creative.
It keeps going into the same pattern of splitting sentences again and again. The Conversation preset works better I think.
Overall not bad.

L3-8B-Lunar-Stheno.i1-Q4_K_M

Vulkan recommended.
fastest out of the variants. ~~Not creative~~. Faster on Vulkan. Further testing, very creative. I like this one. "Excellent. Confusion suits you." - Ok. I really like this one. Inconsistent.

gemma-2-9b-it-abliterated-IQ4_XS

ROCm required.
Fast on ROCm. Slow on Vulkan. It gets slow as the context fills up. Maybe not the most creative. I think this one keeps the style of the character even in longer conversations though. Doesn't like to say mean stuff.

Gemma-2-Ataraxy-9B-IQ4_XS

ROCm required.
Dumb on Vulkan. Smart on ROCm with MMQ and FA. Short responses. Not very creative. Actually it started to get creative after the second message. Also fast. Slow on Vulkan. It gets confused at the start. Further testing - it's pretty boring honestly.

MN-12B-Mag-Mell-R1-Q4_K_S

ROCm required.
ChatML
This one might be as good as IQ4XS. Maybe a little too chatty? Focuses more on conversation than world details. More like it fits the details into the conversation. Not too fast on ROCm, not too slow either.

MN-12B-Mag-Mell-R1.IQ4_XS

ROCm recommended.
ChatML
Very slow on both Vulkan and ROCm, but very creative It's a little faster with MMQ and Flash Attention enabled on ROCm.

L3-8B-Stheno-v3.2-Q4_K_M-imat

Works on Vulkan.
Coherent, I think I prefer this over Lunaris. Can talk but does not build a world like mag-mell or sakurakaze.

etherealaurora-12b-v2-i1-IQ3_XXS

ROCm required.
~~I like this one.~~
~~Slow but smart and creative.~~

gemma-2-Ifable-9B-Uncensored-DeLMAT.i1-Q4_K_S

ROCm. Test more with Vulkan.
Slow.
Creative.
It's talking on my behalf. Doesn't seem smart.
Much better with Sphiratrioth's GM and RP1.3 settings, with Sukino's Context and Instruct.
It's not that slow in actual usage.

Not Recommended

❌Silicon-Maid-7B.i1-Q4_K_S - Very fast. Might work with a lot of tuning, but it's pretty dry overall. Trying different settings, it's still pretty dry.

❌gemma-2-9b-it-IQ4_XS - abliterated version is better

❌gemma-3-12b-it-abliterated.i1-IQ3_XXS - Slow and not creative. Tested on ROCm with MMQ and FA

❌NemoMix-Unleashed-12B-Q4_K_S - First message and it already forgot the context. And slow. Also slow with Vulkan

❌Phi-lthy4-IQ3_XS - very stupid, maybe vulkan? stupid on rocm too.

❌MN-12B-Mag-Mell-R1-IQ3_XXS-imat - Much faster on ROCm but I feel it's also dumber and less creative. Yeah definitely dumber. IQ4XS is much more creative.

❌NemoMix-Unleashed-12B-IQ4_XS - Unusably slow on Vulkan. Barely usable on ROCm. Creative.

❌Kunoichi-7B.i1-Q5_K_M - Can't follow instructions. Talks on user's behalf. Fast.

❌Kunoichi-DPO-v2-7B.i1-Q4_K_M - Can't follow instructions. Talks on user's behalf. Fast.

❌L3-8B-Lunaris-v1.i1-Q4_K_M - Coherent. Responses are shorter than I'd like. It can be creative at times. Most of the time it isn't. Not bad though. Similar to Stheno. Works on Vulkan.

Local LLM Testing (for AMD GPUs)

Not Recommended

Warning