LLM Reference Glossary
(v 0.2, Last Updated 2023-06-15)
- LLM Reference Glossary
- Terms
- General
- Card, Character Card
- Context Size, Max Context Size
- Fine-tuning
- GGML
- GLoRA
- GPTQ
- Infinite Context, Infinite Memory
- k-quantization, k-quants
- Landmark Attention
- LLM
- LoRA
- Model Size, Parameter Count, Parameters
- Perplexity
- Prompt
- QLoRA
- Quantization
- RPTQ
- Soft Prompt
- SpQR
- SqueezeLLM, Dense-and-Sparse Quantization
- SuperCOT
- TART
- Token, Tokenization
- xPos
- Models
- 4chan-specific
- Sites
- People
- General
- See Also
- Terms
Terms
General
Card, Character Card
A pseudo-standardized way of sharing LLM prompt material for the purposes of roleplay and chat. Can come in the form of a raw .json
file, or PNG/WebP images with the JSON data embedded. Supported by some generator front-ends and websites like chub.ai.
More Info: https://github.com/malfoyslastname/character-card-spec-v2
Context Size, Max Context Size
The amount of tokens an generator can perform work on. Common values are 1024 and 2048, though some models can go much larger. If input is over the maximum context size, the input is truncated to fit.
It can help to think of the input to a model as a sheet of paper. The prompt is all of the words that you've written, and the output is the model's attempt to fill out the rest of the sheet. In that analogy, the Context Size is the size of the sheet of paper.
Fine-tuning
Taking an existing LLM and specializing the model's capabilities, optimizing its performance on a narrower, task-specific dataset. Most models in actual use are fine-tunes of other foundational models (or other fine-tuned models).
More Info: https://rentry.org/llm-training#the-basics
GGML
Georgi Gerganov (ggerganov) Machine Learning
A C Library and binary storage format for LLMs, supporting quantization. Used for CPU-based generators like llama.cpp, koboldcpp, etc.
More Info: https://github.com/rustformers/llm/blob/main/crates/ggml/README.md
GLoRA
Generalized Low-Rank Adaptation
More Info: https://github.com/Arnav0400/ViT-Slim/tree/master/GLoRA
GPTQ
Generative Pre-trained Transformer (GPT) Quantization
A method of model storage and quantization for GPU generators.
Infinite Context, Infinite Memory
A shorthand term for the strategy of utilizing a vector database like ChromaDB to query for snippets from past conversations with an LLM, to inject context-relavant data into a current prompt. This gives conversations a sense of long-term memory whle still using a limited context size.
More Info:
k-quantization, k-quants
A method of quantization using mixtures of different weight types, going from 8-bit Q8_K all the way down to 2-bit Q2_K.
More Info:
Landmark Attention
A QLoRA which compresses LLM context into "landmarks" to, quote: "[make] the process of selecting relevant tokens for answers more efficient, and allowing 2-16x longer context use without memory constraints." Claims to achieve context lengths of up to 32k with some models.
More Info: https://github.com/eugenepentland/landmark-attention-qlora
LLM
Large Language Model
LoRA
Low-Rank Adaptation
A method to expedite the training process of LLMs, reducing the number of trainable parameters by 10,000 times and the GPU memory requirements by over 3 times.
More Info:
- https://github.com/microsoft/LoRA
- https://rentry.org/llm-training#low-rank-adaptation-lora_1
- https://rentry.org/lora_train
Model Size, Parameter Count, Parameters
The number of weights a specific model contains. usually measured in billions (i.e., 7B means "7 billion parameters").
Perplexity
A common metric for evaluating a model, a measurement of how well a model is able to predict the contents of a dataset. Higher perplexity is bad, but lower perplexity isn't always good.
More Info: https://huggingface.co/docs/transformers/perplexity
Prompt
The input fed to a model. Contains the instructions for the model, or the personality, scenario, and history when used in a roleplay or chat context. The prompt takes up a portion of the max context size.
QLoRA
Quantized Low-Rank Adaptation
A method of LLM training using quantization for the weights, allowing for on-the-fly and near-lossless quantization of language models and massive reductions in memory requirements.
More Info:
- https://github.com/artidoro/qlora
- https://rentry.org/llm-training#qlora
- https://arxiv.org/abs/2305.14314
Quantization
A way of "compressing" models by changing the way that the weights are stored. Can take a f16 or f32 full-precision model and bring each weight down to using as little as 2-bits, allowing for less storage space and faster inference. There are several different methods of quantization, all with their own upsides and downsides.
More Info:
- https://huggingface.co/blog/4bit-transformers-bitsandbytes
- https://www.youtube.com/watch?v=mii-xFaPCrA
RPTQ
Reorder-Based Post-Training Quantization
An experimental method of model quantization using 3-bit and 4-bit weights using a reorder based approach, originally laid out in this paper. Not currently used.
More Info: https://github.com/AlpinDale/RPTQ-for-LLaMA
Soft Prompt
A technique used to subtly guide a language model's response by including additional context in the input text.
More Info: https://github.com/KoboldAI/KoboldAI-Client/wiki/Soft-Prompts
SpQR
Sparse-Quantized Representation
A near lossless quantization scheme for 3-bit/4-bit weights.
More Info:
SqueezeLLM, Dense-and-Sparse Quantization
A method of model quantization using 3-bit and 4-bit weights. It splits weight matrices into two components, "dense" which can be heavily quantized without affecting performance, and "sparse" which preserves sensitive and outlier parts of the weight matrices.
More Info: https://github.com/SqueezeAILab/SqueezeLLM
SuperCOT
A LoRA indended for use with LangChain, trained on Chain-of-Though (COT) datasets.
More Info: https://huggingface.co/kaiokendev/SuperCOT-LoRA
TART
Task-Agnostic Reasoning Transformer
A task, model, and domain agnostic method for improving in-context learning peformance for classification tasks. Can sit on top of any model, and claims to be able to outperform 65B models.
More Info: https://github.com/HazyResearch/TART
Token, Tokenization
Words and sentences are fed into an LLM by first being tokenized. A token may be a single character, a part of a word, or a full word. Different models can tokenize the same sentence in different ways, so tokenization is model-dependant. Models usually also have special tokens for things like marking the end of a generation, signifying the start of a response, etc.
More info: https://vaclavkosar.com/ml/Tokenization-in-Machine-Learning-Explained
xPos
Extrapolatable Position Embedding
More Info:
Models
This is not meant to be a comprehensive list. For more in-depth info, see: https://rentry.org/lmg-resources#general-resources-guides-for-large-language-models-llm-datasets
Alpaca
Type | Sizes | Context | License |
---|---|---|---|
Fine-Tune (LLaMA) | 7B | 2048 Tokens | Code: Apache License 2.0, dataset and weight diffs: CC BY NC 4.0 |
Released by Stanford, now the base of many other fine-tunes.
More Info:
- https://crfm.stanford.edu/2023/03/13/alpaca.html
- https://github.com/tatsu-lab/stanford_alpaca
- https://huggingface.co/tatsu-lab/alpaca-7b-wdiff
ChatGLM
Type | Sizes | License |
---|---|---|
Foundational | 6B | Open (Apache License 2.0) |
Chinese-focused model. 动态网自由门 天安門 天安门 法輪功 李洪志 Free Tibet 六四天安門事件 The Tiananmen Square protests of 1989 天安門大屠殺 The Tiananmen Square Massacre 反右派鬥爭 The Anti-Rightist Struggle 大躍進政策 The Great Leap Forward 文化大革命 The Great Proletarian Cultural Revolution 人權 Human Rights 民運 Democratization 自由 Freedom 獨立 Independence 多黨制 Multi-party system 台灣 臺灣 Taiwan Formosa 中華民國 Republic of China 西藏 土伯特 唐古特 Tibet 達賴喇嘛 Dalai Lama 法輪功 Falun Dafa 新疆維吾爾自治區 The Xinjiang Uyghur Autonomous Region 諾貝爾和平獎 Nobel Peace Prize 劉暁波 Liu Xiaobo 民主 言論 思想 反共 反革命 抗議 運動 騷亂 暴亂 騷擾 擾亂 抗暴 平反 維權 示威游行 李洪志 法輪大法 大法弟子 強制斷種 強制堕胎 民族淨化 人體實驗 肅清 胡耀邦 趙紫陽 魏京生 王丹 還政於民 和平演變 激流中國 北京之春 大紀元時報 九評論共産黨 獨裁 專制 壓制 統一 監視 鎮壓 迫害 侵略 掠奪 破壞 拷問 屠殺 活摘器官 誘拐 買賣人口 遊進 走私 毒品 賣淫 春畫 賭博 六合彩 天安門 天安门 法輪功 李洪志 Winnie the Pooh 劉曉波动态网自由门
More Info: https://huggingface.co/THUDM/chatglm-6b
LLaMA
Large Language Model Meta AI
Type | Sizes | Context | License |
---|---|---|---|
Foundational | 7B, 13B, 33B, 65B | 2048 Tokens | Restrictive |
Meta's foundational model, which many others are fine-tuned off of. Comes in 4 sizes: 7B, 13B, 33B, and 65B.
More Info: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
OpenLLaMA
Type | Sizes | Context | License |
---|---|---|---|
Foundational | 3B, 7B, 13B (600BT Preview) | 2048 Tokens | Open (Apache License 2.0) |
An open-source reproduction of Meta's more restrictively licensed LLaMA.
More Info: https://github.com/openlm-research/open_llama
StarCoderBase, StarCoder
Type | Sizes | Context | License |
---|---|---|---|
Foundational | 15B | 8192 Tokens | Code: Open (Apache License 2.0), Model: Semi-Open (bigcode-openrail-m) |
A model trained mostly on GitHub code and some natural language text, geared towards being a coding assistant. Technically StarCoderBase is foundational and StarCoder is a fine-tune focused on Python. The model is public, but you have to have a Hugging Face account and accept their terms to access.
More Info:
4chan-specific
/lmg/
Local Models General
A frequent thread on 4chan's /g/ board, for the discussion of local large language models.
More Info: https://4chan.org/g/lmg
/aicg/
AI Chatbot General
A frequent thread on 4chan's /g/ board, for the discussion of character cards.
More Info: https://4chan.org/g/aicg
Sites
Hugging Face, HF
The primary place where LLM models and datasets are uploaded and distributed.
More Info: https://huggingface.co/
Character Hub, chub.ai
A site for uploading and sharing Character Cards, as well as lore books. Uses Gitlab on the backend and supports forking existing characters.
More Info: https://chub.ai/
People
Georgi Gerganov (ggerganov)
Author of open-source LLM-related projects like GGML, llama.cpp, and whisper.cpp.
More Info: https://github.com/ggerganov
LostRuins
Author of koboldcpp.
More Info: https://github.com/LostRuins