The choice is between running a dense model or a mixture of experts (moe) model. Dense models use all their parameters to generate each token, moe models only use a fraction. A moe model is typically faster to run than a dense model of comparable intelligence, but also larger. Currently moe models are better suited for home PCs because they can be ran reasonably fast even when loaded to ram or shared between ram and vram. The amount of memory used by the model is the same as it's file size plus 2-4 gb depending on your context window. Below is a list of model links, for each model there are several quantization options, for example, Q4_K_M. Lower number means smaller model size in exchange for lower quality.
Picking the right model for rp/writing:
From best to worst (excluding deepseek, kimi and large glm):
1- https://huggingface.co/unsloth/Qwen3-235B-A22B-Instruct-2507-GGUF #moe (must fit into ram+vram). use at least q2_k.
2- https://huggingface.co/unsloth/GLM-4.5-Air-GGUF #moe (must fit into ram+vram). use at least q2_k.
Each of the following 3 has their own strengths:
3- better prose but is safetyslopped https://huggingface.co/lmstudio-community/gemma-3-27B-it-qat-GGUF #dense model (must fit into vram)
4- more explicit, mature https://huggingface.co/bartowski/Rocinante-12B-v1.1-GGUF #dense model (must fit into vram). use at least q4_k_s
5- smarter https://huggingface.co/unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF #moe (must fit into ram+vram). at least IQ4_XS.
6- https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507-GGUF #dense model(must fit into vram). epic cope
Run with kobold.cpp in sillytavern for rp and mikupad for writing.