Guides |
|
Quick Start Guide |
Anon's tutorial for getting models running locally |
SillyTavern Guide |
Instructions for roleplaying via koboldcpp. Additional GNBF grammar usage |
LM Tuning Guide |
Training, fine-tuning, and LoRA/QLoRA information |
LM Settings Guide |
Explanation of various settings and samplers with suggestions for specific models |
LM GPU Guide |
Current as of the 40 series. Alternatively some Anons made a few different build guides |
|
|
Models |
|
HuggingFace |
Best source for current quants (filter by GGUF or EXL2) |
LLM VRAM Calc |
Tool to estimate VRAM usage for GGUF/EXL2/GPTQ quants |
OpenModelDB |
Specifically models for upscaling images and videos |
Voice Models |
Easily searchable list for use mainly with RVC 1/2 |
Models Info Table |
Googlesheet of models, AI labs, datasets, and various other ML info by Alan Thompson |
Chat Leaderboard |
Closed and local models ELO rated with additional MMLU/MT-bench scores |
|
|
Papers |
|
Local Models Papers |
Papers and articles that I've found to be interesting with a way to search via abstracts |
Arxiv ML |
Primary source of machine learning papers |
PapersWithCode |
Indexer that allows sorting by GitHub stars |
Semantic Scholar |
Scientific literature semantic search tool |
Scholar Inbox |
ML focused paper recommendations based off personal preferences |
|
|
News |
|
AI Explained |
General AI news with well sourced links (Youtube) |
AI News Blog |
Lesswrong cultist so "AI Bad" takes but does a good weekly AI news roundup (Blog) |
ML Resources |
Broader sporadically updated list (not fully local) |
Previous Threads |
Always good to search for previous questions before asking |
|
|
Learn |
|
LLM Course |
Collection of articles, videos, courses, and colabs for learning applied ML |
Andre Karpathy YT |
In-depth videos of LLM construction from one of OpenAI's founding members |
TF From Scratch |
Blogpost with Juypter notebook that goes step by step for coding and training a small GPT |
LLM-Sampling |
Token Probability visualizer with support for current popular samplers |
LLM Visualization |
Drag and pull 3D model of various LLMs with explanation for components |
Intro to DNN |
Book format of a Neural Networks course that serves as in introduction to ML |
Principles of DL |
Textbook that introduces the math behind Deep Learning |
|
|
LLM Inferencing |
|
Text Gen WebUI |
Frontend to most GPU/CPU model backends |
WebUI Extensions |
Most notable XTTSv2 and Stable Diffusion |
|
|
llama.cpp |
Main CPU inferencing development with GPU acceleration (GGUF models) |
kobold.cpp |
llama.cpp fork with Kobold UI and additional features (with support for older GGML models) |
|
|
exllama2 |
Inference library for local LLM with new quant style (70B llama2 on 24GB VRAM) |
TabbyAPI |
FASTAPI application for exllama2 backend for use with SillyTavern |
|
|
SillyTavern |
Frontend that is a heavily modified TavernAI fork |
vllm |
Inference library with fast inferencing and PagedAttention for KV management |
|
|
LLM Tools |
|
Axolotl |
Fine-tuning tool for various architectures with integrated support for flash attention and rope scaling |
Mergekit |
Toolkit for merging LLMs including piecewise assembly of layers |
promptfoo |
Tool for testing and evaluating LLM output quality also with side-by-side feature |
Floneum |
Graph/node editor for AI workflows with a focus on community made plugins |
OpenRLHF |
Framework for RLHF generation optimized for performance and distributed models |
FedModule |
Framework for federated learning with over 20 implemented algorithms |
|
|
LLM Guiding |
|
Langchain |
Set of resources to maximize LLMs Chains/tool integrations/agents/etc |
llama_index |
Central interface to connect LLM's with external data |
TextGrad |
Framework with API to backpropagate textual gradients with user defined loss functions |
SGLang |
Structured generation language designed for LLM/VLMs |
DSPy |
Composable and declarative modules for instructing LMs in a familiar Pythonic syntax |
Continue |
Open source code assistant that works with local models |
|
|
Datasets |
|
Huggingface |
Best source for datasets |
Wiki Embeddings |
Predone embeddings for various language of Wikipedia |
ERP Scrapes (1)(2) |
Raw RP/ERP/ELIT content |
VN JP/EN Scrape |
60 million tokens of dialogue and actions/narration |
WN JP/EN Scrape |
100k chapters of webnovels paired with fan-translations |
janitorai-cards |
190k character cards converted to v2 format and viewable as local webpage |
chub.ai |
Archive of various character cards from chub as well as from some other sources |
|
|
Dataset Tools |
|
augmentoolkit |
Generates multi-turn instruct-tuning data from input documents |
dswav |
Audio dataset preparation tool using whisper and ffmpeg to transcribe and split inputs |
lilac |
Dataset curation tool for RAG or tuning with annotating/clustering/labeling support |
Data-Juicer |
Dataset preparation tool with support for multimodal data |
InfoGrowth |
Online dataset curation framework for data cleaning and selection |
|
|
Non-LLM Models |
|
Vision/Image |
|
ComfyUI |
Node based stable diffusion GUI. User submitted workflows |
LDSR ComyUI |
Image super resolution upscaler with less artifacts than others but slower |
MambaIRv2 |
Image restoration model that uses attentive state-space modeling for improved results |
ControlNeXt |
90% less parameters than ControlNet and works with other LoRA techniques |
Molmo |
Multimodal LLMs with image/video understanding and has the VLM component fully open sourced |
ColPali |
VLM that indexes documents from their visual features (PDF focused) |
Surya |
OCR, layout analysis, reading order, line detection in 90+ languages |
ShareCaptioner |
Image captioning model with lower hallucinations than LLaVa |
BSQ-ViT |
Image/Video tokenizer with Binary Spherical quant that has best image/video restoration performance |
Spandrel |
Library for loading various upscaling models for use with chaiNNer or SD WebUI |
DiffEditor |
Tuning-free method for fine-grained image editing using score-based diffusion |
MASA |
Match anything via SAM for use in finding similar objects across different domains |
Depth-Anything-V2 |
Robust monocular depth estimation that works well with semantic segmentation |
ProLab |
Semantic segmentation via property-level label space rather than just categories |
SUPIR |
Image restoration and upscale method with semantic adjustment editing ability |
DDColor |
Vivid and natural colorization for black and white photos (and possibly video) |
lama-cleaner |
Local inpainting tool (remove or erase and replace) |
TRELLIS |
Image-to-3D-assets generative model that uses a unified structured latent representation |
|
|
Video |
|
HunyuanVideo |
Video foundation model with SOTA results with NSFW output ability ComfyUI wrapper |
Efficient Track Anything |
Allows for real time video segmentation on mobile devices or 2x FPS improvements over SAM2 on GPU |
Upscale Hub |
Set of resources and models for image and video upscaling (anime focused) |
Ground-A-Video |
Video Editing via Text-To-Image diffusion models with groundings/motion/depth data |
EasyAnimate |
Text-to-Video model that maxes out at 6s usable with various framerates and resolutions |
LivePortrait |
Real time face swap with extended controllability (eyes, lips, stitching) |
MegActor |
Animate images from audio/image with consistent motion via diffusion |
|
|
Audio/Speech |
|
Amphion |
Audio/Music/Speech toolset of various models with visualization capability |
Fish Speech 1.4 |
Text-to-Speech model with good CHN/JPN and decent ENG audio |
DiffEditor |
Speech Editing model with improvements to OOD text output |
Qwen2-Audio |
Audio-Language model that can voice chat and do audio analysis without specific prompting |
FluxMusic |
Text-to-Music model with large improvements over AudioLDM2 |
UniMuMo |
Text/Music/Motion foundational model capable of mixing all modalities for output generation |
GPT-SoVITS |
Few-shot voice cloning and Text-to-Speech WebUI (ENG/JPN/CHN) rentry guide |
ControlSpeech |
Text-to-Speech with voice clone capability that takes in voice/style/content prompts |
whisper.cpp |
Speech-to-Text inference library with CPU/GPU support for various whisper based models |
Whisper Diarization |
STT via Transformers.js with word-level timestamps and speaker segmentation |
STAR-Adapt |
ASR unlabeled finetune method that reduces WER for specific accents/noise |
RVC |
Retrieval based Voice Conversation model |
Urhythmic |
Unsupervised rhythm modeling for voice conversion |
Descrpyt |
High-Fidelity audio compression with improved RVQGAN (can drop-in replace EnCodec) |
DeepFilterNet |
Real time noise suppression using deep filtering |
UVR |
Audio source separation GUI for various models with full Demucs and MDX23C support |
AudioSR |
Audio super resolution (any -> 48kHz) |
EAT |
Audio and speech classification |
|
|
Other |
|
Genesis |
Generative physics simulation framework for a wide array of modalities |
AnythingLLM |
RAG and agent focused frontend with support for local and cloud models |
MemoRAG |
RAG framework that leverages its memory model by recalling with query-specific clues |
T-Ragx |
Translation fine-tune method that works with RAG (glossaries) and preceding text |
GenTranslate |
Fine-tune of SeemlessM4T from N-best hypotheses dataset for MT and Speech-to-Text |
Dragon+ |
Dual-encoder based dense retriever for use with the RA-DIT FT approach with paired LLM |
Magica |
File content type detector model |
AutoACT |
Automatic agent learning framework using a division-of-labor strategy |
LOCUST |
State-space model for long document abstractive summarization |
NV Embed v1 |
Decoder-only LLM embedding model that outperforms T5/BERT/similar models |
ESPN |
GPUDirect Storage implementation for multi-vector embedding retrieval and bindings |
FastFit |
Text few-shot classification fine-tuning method with high accuracy and fast training time |
Prithvi-WxC |
Weather forecasting foundation model trained on 160 types of atmospheric data |
Time-MoE |
Time series MoE foundation models with largest having 1.1B active parameters |