Current Local Model Meta
This isn't an exhaustive list of personal preferences, but rather an attempt to objectively list what could be considered the current local SOTA meta, with some background information on the interfaces and models.
Image Generation Interfaces
forge/reForge
forge and reForge are continuations of Automatic1111's WebUI, which was the first widely accepted UI for image generation after Stable Diffusion was released. They're based on the same codebase, have an almost identical Gradio interface, and most extensions are compatible with both. If you've used Automatic's old UI, you'll be very familiar with the layout.
forge/reForge are considered newbie-friendly given their ease of use and simple interfaces. In general, they have all the tools you'll likely need for image generation, ie txt2img, img2img, inpainting, controlnet support. Upscaling is achieved through a few third party extensions, most notably and widely used of which is Ultimate SD Upscale.
The most important difference between forge and reForge is that forge works with SDXL models and flux, while reForge is only for SDXL-based models. reForge itself has a few optimizations that aren't present in forge and slightly better extension compatibility, making it the best option for SDXL models between each UI.
A common criticism of these Gradio UI's is that they lack the extended functionality and customization of ComfyUI, and that they're slow to update, especially forge, which still doesn't have in-built controlnet support for flux. reForge is also not being actively developed for the time being, though the UI itself is totally feature complete and fully functional.
- forge: https://github.com/lllyasviel/stable-diffusion-webui-forge
- reForge: https://github.com/Panchovix/stable-diffusion-webui-reForge
ComfyUI
This is a node-based interface and backend that aims to be a general-use interface for all kinds of generative AI. There's a huge range of extensions and custom nodes which allow you to run image, video, audio and text generation AI. It's the most frequently updated interface, with new models often getting support and nodes within days of their release.
The downside is that it's relatively complex to use with a steep initial learning curve, especially for those used to UI's like forge. ComfyUI operates with node-based workflows which don't come stock with the software. You have to either build them yourself or download them from other sources. The developer tends to post simple workflows on his site and Civitai is another source, so workflow are generally easy to find. While initially complex, it becomes quite intuitive once you've built a few workflows of your own.
Going in blind, without understanding how the system works, isn't advised. If you're totally new to Comfy, I recommended watching a few tutorials on the node-based system. This is a good beginner's series, and goes through all the steps required to build txt2img, img2img, inpainting and controlnet workflows.
ComfyUI is sometimes criticized for being too complex and unwieldy, with more work required to get a model up and running after its first installed. For example, in order to do what forge/reForge does by default and with no extra extensions or steps needed by the user, you would need one massive custom made workflow or individual workflows for txt2img, img2img, inpainting and controlnets, all of which require multiple nodes and connections.
Image Generation Models - Anime
IllustriousXL
The most popular anime-centric SDXL finetune, using massive booru datasets. The currently released version is 2.0, though 1.0 is still quite popular, with some considering it superior to 2.0. Illustrious has the widest range of LoRA support and aesthetic finetunes, giving the model a huge amount of flexibility and creativity.
- v1.0: https://civitai.com/models/1232765?modelVersionId=1389133
- v2.0: https://civitai.com/models/1369089/illustrious-xl-20
NoobAI
A massive finetune of IllustriousXL, using a 13M image dataset consisting of the entire catalogue of Danbooru (up to 2024-10-23) and e621. It enhances the model's capabilities with a wider range of artist styles and creativity. The VPRED version has better color depth than the EPS version, though VPRED requires specific settings to work correctly, so make sure you read the model page and use the correct sampler settings.
Noob is often considered a "pro" prompting model, given it's much more difficult to produce "quality" outputs with basic prompts. It reacts poorly to simple tagging, especially without artist tags, but excels when you give it very detailed prompts for the character(s), pose, setting, lighting, etc, using as much detail as possible. Knowledge of booru tags is essential. It's also recommended to scan Danbooru and e621 for artist styles you like, and keep them in a file for ease of reference.
- Various versions: https://civitai.com/models/833294/noobai-xl-nai-xl
Both Illustrious and Noob have LoRA's for characters and styles not present in the original datasets, though Illustrious has far more. There's also a huge range of aesthetic finetunes available which focus the models on specific looks ranging from western cartoon styles, 2.5D, 3D to semi-realistic. The most up to date resource to find them is Civitai.
Prompt Engineering Tools for Ill and Noob
You should get tagcomplete for forge/reForge UI's or Autocomplete-Plus for ComfyUI. They're a set of handy extensions to see possible Booru tags as you type a prompt out. If you use the ComfyUI Autocomplete extension, get this updated tag list.
Another good tool is the TIPO extension for both ComfyUI and forge/reForge, which is a small LLM plugin that's designed to enhance your prompts using the DanTagGen model (HF demo here.) You can give it a small set of booru tags and it will extrapolate out from them and add various related details/tags.
Illustrious aesthetic finetunes
These are models that were further finetuned on smaller datasets to "aesthetically" tune illustrious or noob. These are generally more newbie friendly, requiring less prompting effort to produce gens that "look" good. The trade-off is less overall creativity, given this kind of aesthetic training "focuses" the model on the specific concepts, styles and poses of the smaller datasets they're trained on. Many still prefer these for their ease of use or for newcomers to image gen.
Wai NSFW Illustrious
A very popular aesthetic finetune of Illustrious 1.0. It was (and still is to a point) considered a bit of a "slop" producer, given the generic default style, however v14 is much more creative than previous iterations and has been well received.
HassakuXL
Another popular finetune. The latest version (2.2) is seemingly inferior to 2.1fix, so 2.1fix is currently recommended.
Image Generation Models - Anime Photorealism/2.5D/3D
For SD, "Photorealism" is an umbrella term for models that blur the lines between digital art and reality, ie high fidelity 3D animation or styles that mix both real photography with elements of traditional media or 3D art. Illustrious and Noob can't really achieve this on their own as they're trained mostly on traditional media. There's a huge list of LoRA's that replicate various photorealistic or 3D styles, and there's also specialized finetunes and merges. These are some of the most popular and/or well regarded of the latter.
Uncanny Valley
iLustMix
Pony Realism
Illustrious Realism
Image Generation Models - Realism NSFW
Models that produce realistic NSFW generations. These are still mostly based on SDXL given the difficulty in training flux to reproduce NSFW material and the fact that new local model releases are few and far between. See the Chroma section for a new realistic NSFW flux-based model that's currently in training.
Lustify
Currently the most popular NSFW SDXL finetune, with very good anatomy.
BigLust
Another popular NSFW model. A wide range of LoRA's have been trained specifically on this model, which you can find floating around certain threads on /b/.
Image Generation Models - Realism SFW
Models that excel at producing realistic or artistic generations, though they lack the ability to produce NSFW material.
Flux
Still widely considered the best local model for SFW realism and art, especially fantasy art. The model has issues with realistic generation, like "plastic" looking skin and the notorious "flux chin" (women with cleft chins and sharp cheekbones). However, where the model excels is with the many thousands of LoRA's available from places like Civit which mitigate issues like this.
Flux can't do NSFW on its own. Flux models are also "distilled", meaning they're much harder to teach new concepts it has little knowledge of already, including NSFW. This is why SDXL NSFW finetunes are superior to flux NSFW LoRA's and finetuning attempts.
The 8-bit/Q8 version has relatively steep VRAM requirements, however various quantizations (including SVDQuant) allow it to run on lower end GPU's.
There are two basic versions of flux : dev and schnell. Dev is the standard version, schnell is the "turbo" version which requires less steps to gen, but gives you lower quality gens as a result.
- flux dev (ggufs): https://civitai.com/models/647237?modelVersionId=724149
- flux schnell (ggufs): https://civitai.com/models/648580?modelVersionId=725877
Flux Finetunes and model merges
PixelWave
Popular and well regarded aesthetic finetune of flux that enhances its artistic capabilities, though it remains a SFW-only model.
- Multiple versions for both dev and schnell: https://civitai.com/models/141592?modelVersionId=992642
Chroma
A full finetune of flux schnell that aims to replace flux as a base model by "de-distilling" and radically altering the architecture of the model, using a 5M image dataset which contains a mix of anime, furry, artistic and photos. The goal is to give flux enhanced capabilities, including built-in NSFW, increased creativity and better prompt adherence, and to allow it to take inputs that are both natural language and booru-style tags. It's still in training, with new epochs being released every four days or so.
The model shows a lot of promise, demonstrating prompt adherence and creativity that is superior to the original flux model, and the dataset has resolved classic flux issues such as flux chin, flux skin and the model favoring out of focus backgrounds.
However, given the model is still a work in progress, it has a few notable issues, such as limb coherence/body horror (especially hands) and fine-grain details having artifacts and noise residuals. The model creator has stated that these issues should resolve once training is complete.
The other downside is the relatively long generation time, even compared to flux, given Chroma is not distilled and uses cfg. On a 3090, a 30 step generation takes about 70 seconds at 1024x1024.
The "detail-calibrated" epochs are recommended, as they are divergent versions of the model from when it began to train at a higher base resolution (1024x1024 vs 512x512).
16bit epochs (requires a 24GB GPU): https://huggingface.co/lodestones/Chroma/tree/main
GGUF quantizations: https://huggingface.co/silveroxides/Chroma-GGUF