/lmg/ recommended models

The number of gigabytes in parentheses is the minimum amount of memory (VRAM + RAM) required to run the model at reasonable quant. For smaller models this is widely considered to be at least Q4_K_M. Large models can be quanted to 1 bit while remaining coherent but the exact impact on their performance, especially on long-context tasks, is still unexplored. With more memory you'll be able to fit more context. Using a smaller quant will make the model dumber.

Ideally the entire model fits in your VRAM. You can cope by loading parts of the model into RAM instead but it will be much slower.

MoE (mixture of experts) models don't use all of their parameters for each token so they are much faster than a dense model of the same size. Because of this loading a part of the model into RAM instead of VRAM is viable. All of the large models listed here are MoE. If you're using llama-server from llama.cpp it will automatically load the model in the most optimal way for your hardware. You should only set the preferred context size because otherwise it defaults to 4096 if you're short on VRAM.

ERP

  • Nemo (12GB) - The model every vramlet started with, now showing its age. Uncensored with a system prompt.
  • Gemma 4 31B (24GB) - A proper successor to Nemo with a different writing style. Worth trying even if you can run bigger models. Supports vision so it can comment on your dick pics. Uncensored with a system prompt. Anons often say that for this use case it's as good as much larger models. You can also try the MoE and smaller versions listed below.
  • GLM-4.5 Air (80GB) - The middle point between Nemo and DeepSeek. Like Nemo and Gemma its pretraining doesn't seem to have been filtered at all. Needs a prefill to get around refusals. MoE model.
  • GLM-4.6 (200GB) / 4.7 - Same as the above but even more parameters and thus smarter. 4.7 has better benchmark scores but some Anons think that it's more safetyslopped.
  • DeepSeek V3 (200GB) / R1 0528 / V3.1 Terminus - R1 is a thinking model and Terminus is a hybrid thinking model. V3 has repetition issues in long chats. R1 is more resistant but still requires sampler trickery. Terminus has almost none but it has less variety. Even the smallest quants of DeepSeek like the UD-IQ1_S are very good.
  • Kimi K2.6 (400GB) - DeepSeek architecture but bigger. Similar unfiltered dataset. Some Anons prefer it over DeepSeek. Supports vision.

Programming & General

Like most benchmarks, public programming benchmarks have found their way into the training dataset of most models. In my experience, if you work on anything other than webshit, bigger model = better despite the benchmark scores. Test them yourself on your own codebases.

For general assistant and "claw" type shit that searches the web and makes tool calls small models are good enough.

Edit

Pub: 20 Jul 2025 09:15 UTC

Edit: 26 Apr 2026 09:25 UTC

Views: 35239