/lmg/ Model Links and Torrents
This list is no longer being maintained. Until a replacement is added, you can use the following links :
For a general list of LLaMA models and finetunes, see https://rentry.org/lmg-resources#llama-models-guides-resources
If you're looking for an updated list of the best "creative" models to RP/ERP with, see https://rentry.org/ayumi_erp_rating
- Changelog (MDY)
- 4-bit GPU Model Requirements
- 4-bit CPU/llama.cpp RAM Requirements
- Original Weights
- Models/Finetunes/LoRA's
- Wizard Vicuna 30B Uncensored (05/31/2023)
- Chronos 13B (05/29/2023)
- BluemoonRP 30B 4K (05/26/2023)
- SuperHOT 30B Prototype (05/24/2023)
- Manticore 13B (05/20/2023)
- WizardLM 30B Uncensored (05/23/2023)
- Manticore 13B (05/20/2023)
- Pygmalion/Metharme 13B (05/19/2023)
- VicUnLocked 30B (05/18/2023)
- Wizard Mega 13B (05/16/2023)
- WizardLM 13B Uncensored (05/10/2023)
- BluemoonRP 13B (05/07/2023)
- Vicuna 13B Cocktail (05/07/2023)
- GPT4-x-AlpacaDente2-30B (05/05/2023)
- Vicuna 13B Free v1.1 (05/01/2023)
- Pygmalion/Metharme 7B (04/30/2023)
- GPT4-X-Alpasta 30B (04/29/2023)
- OpenAssistant LLaMa 30B SFT 7 (04/23/2023)
- SuperCOT (04/22/2023)
- Dataset Formats
- Filtering/Bias Rundown
- Previous Model List
Changelog (MDY)
[05-31-2023] - Added Wizard Vicuna 30B Uncensored
[05-29-2023] - Added Chronos 13B
[05-26-2023] - Added BluemoonRP 30B
[05-24-2023] - Added SuperHOT Prototype
4-bit GPU Model Requirements
VRAM Required takes full context (2048) into account. You may be able to load the model on GPU's with slightly lower VRAM, but you will not be able to run at full context. If you do not have enough RAM to load model, it will load into swap. Groupsize models will increase VRAM usage, as will running a LoRA alongside the model.
Model Parameters | VRAM Required | GPU Examples | RAM to Load |
---|---|---|---|
7B | 8GB | RTX 1660, 2060, AMD 5700xt, RTX 3050, RTX 3060, RTX 3070 | 6 GB |
13B | 12GB | AMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080 12GB, A2000 | 12GB |
30B | 24GB | RTX 3090, RTX 4090, A4500, A5000, 6000, Tesla V100 | 32GB |
65B | 42GB | A100 80GB, NVIDIA Quadro RTX 8000, Quadro RTX A6000 | 64GB |
4-bit CPU/llama.cpp RAM Requirements
5bit to 8bit Quantized models are becoming more common, and will obviously require more RAM. Will update these with the numbers when I have them.
Model | 4-bit | 5-bit | 8-bit |
---|---|---|---|
7B | 3.9 GB | ||
13B | 7.8 GB | ||
30B | 19.5 GB | ||
65B | 38.5 GB |
Original Weights
LLaMA 16-bit Weights
The original LLaMA weights converted to Transformers @ 16bit. A torrent is available as well, but it uses outdated configuration files that will need to be updated. Note that these aren't for general use, as the VRAM requirements are beyond consumer scope.
Type: Foundational
Filtering: None
Model | Type | Download |
---|---|---|
7B 16bit | HF Format | HuggingFace |
13B 16bit | HF Format | HuggingFace |
30B 16bit | HF Format | HuggingFace |
65B 16bit | HF Format | HuggingFace |
All the above | HF Format | Torrent Magnet |
LLaMA 4-bit Weights
The original LLaMA weights quantized to 4-bit. The GPU CUDA versions have outdated tokenizer and configuration files. It is recommended to either update them with this or use the universal LLaMA tokenizer.
The CPU old format version is before the recent quantization format change via pull #1405, and will not work with versions of llama.cpp beyond that pull. The CPU new format links have been converted to work with #1405 and beyond.
Type: Foundational
Filtering: None
Model | Type | Download |
---|---|---|
7B, 13B, 30B, 65B | CPU (old format) | Torrent Magnet |
7B, 13B, 30B, 65B | CPU (new format) | 7B, 13B, 30B |
7B, 13B, 30B, 65B | GPU CUDA (no groupsize) | Torrent Magnet |
7B, 13B, 30B, 65B | GPU CUDA (128gs) | Torrent Magnet |
7B, 13B, 30B, 65B | GPU Triton | Neko Institute of Science HF page |
Models/Finetunes/LoRA's
Wizard Vicuna 30B Uncensored (05/31/2023)
This is wizard-vicuna-30B trained with a subset of the dataset - responses that contained alignment / moralizing were removed. The intent is to train a WizardLM that doesn't have alignment built-in, so that alignment (of any sort) can be added separately with for example with a RLHF LoRA.
Type: Instruct
Filtering: Light
Model | Type | Download |
---|---|---|
30B GGML | CPU | Q4_0, Q4_1, Q5_0, Q5_1, Q8 |
30B | GPU | Q4 |
Chronos 13B (05/29/2023)
This model is primarily focused on chat, roleplay, and storywriting, but can accomplish other tasks such as simple reasoning and coding. Chronos generates very long outputs with coherent text, largely due to the human inputs it was trained on.
Type: Roleplay Instruct
Filtering: Light
Model | Type | Download |
---|---|---|
13B GGML | CPU | Q4_0, Q4_1, Q5_0, Q5_1, Q8 |
13B | GPU | Q4 CUDA 128g, Q4 CUDA, Q4 Triton |
BluemoonRP 30B 4K (05/26/2023)
An RP/ERP focused finetune of LLaMA 30B, trained on BluemoonRP logs. It is designed to simulate a 2-person RP session. This version has 4K context token size, achieved with AliBi.
It uses a non-standard format (LEAD/ASSOCIATE), so ensure that you read the model card and use the correct syntax.
Type: Roleplay
Filtering: None
Model | Type | Download |
---|---|---|
30B GGML | CPU | Q5_0 |
30B | GPU | Q4 CUDA 128g |
SuperHOT 30B Prototype (05/24/2023)
A LoRA trained on a variety of combined roleplaying datasets. Made by the creator of SuperCOT. This is a prototype at around 0.5 epochs. It uses a special format, so read the LoRA card for instructions.
It's not generally recommended for use yet, given its an early prototype. This entry will be removed when the final version is posted.
Type: Roleplay Instruct
Filtering: ???
Model | Type | Download |
---|---|---|
30B LoRA | LoRA | HF Link |
30B GGML | CPU | None yet. |
30B | GPU | Q4 CUDA 128g |
Manticore 13B (05/20/2023)
Manticore 13B is a 3 epoch LLaMa 13B model fine-tuned on a number of merged datasets.
Type: Instruct
Filtering: Light
Model | Type | Download |
---|---|---|
13B 16-bit | Unquantized | HF Link |
13B GGML | CPU | Q4_0, Q4_1, Q5_0, Q5_1, Q8 |
13B | GPU | Q4 CUDA 128g |
https://huggingface.co/kaiokendev/SuperHOT-LoRA-prototype
WizardLM 30B Uncensored (05/23/2023)
WizardLM 30B trained with a subset of the dataset - responses that contained alignment / moralizing were removed. The intent is to train a WizardLM that doesn't have alignment built-in, so that alignment (of any sort) can be added separately with for example with a RLHF LoRA.
Type: Instruct
Filtering: Light
Model | Type | Download |
---|---|---|
30B 16-bit | Unquantized | HF Link |
30B GGML | CPU | Q4_0, Q4_1, Q5_0, Q5_1, Q8 |
30B | GPU | Q4 Triton |
Manticore 13B (05/20/2023)
Manticore 13B is a 3 epoch LLaMa 13B model fine-tuned on a number of merged datasets.
Type: Instruct
Filtering: Light
Model | Type | Download |
---|---|---|
13B 16-bit | Unquantized | HF Link |
13B GGML | CPU | Q4_0, Q4_1, Q5_0, Q5_1, Q8 |
13B | GPU | Q4 CUDA 128g |
Pygmalion/Metharme 13B (05/19/2023)
Pygmalion 13B is a dialogue model that uses LLaMA-13B as a base. The dataset includes RP/ERP content. Metharme 13B is an experimental instruct-tuned variation, which can be guided using natural language like other instruct models.
Type: Roleplay (Pyg), Roleplay Instruct (Meth)
Filtering: None
Model | Type | Download |
---|---|---|
13B Pygmalion/Metharme XOR | XOR | https://huggingface.co/PygmalionAI/ |
13B Pygmalion GGML | CPU | Q4_0, Q4_1, Q5_0, Q5_1, Q8 |
13B Metharme GGML | CPU | Q4_1, Q5_1, Q8 |
13B Pygmalion | GPU | Q4 CUDA 128g |
13B Metharme | GPU | Q4 CUDA 128g |
VicUnLocked 30B (05/18/2023)
A full context LoRA fine-tuned to 1 epoch on the ShareGPT Vicuna Unfiltered dataset, with filtering mostly removed. There's also a half-context 3 epoch version that you can get here.
Type: Instruct
Filtering: Light
Model | Type | Download |
---|---|---|
LoRA | LoRA | HF Link |
30B GGML | CPU | Q4_0, Q4_1, Q5_0, Q5_1, Q8 |
30B | GPU | Q4 Triton, Q4 CUDA |
Wizard Mega 13B (05/16/2023)
Wizard Mega is a Llama 13B model fine-tuned on the ShareGPT, WizardLM, and Wizard-Vicuna datasets. These particular datasets have all been filtered to remove responses where the model responds with "As an AI language model...", etc or when the model refuses to respond.
Type: Instruct
Filtering: Light
Model | Type | Download |
---|---|---|
13B GGML | CPU | Q4_0, Q5_0, Q5_1, Q8 |
13B | GPU | Q4 CUDA 128g |
WizardLM 13B Uncensored (05/10/2023)
This is WizardLM trained with a subset of the dataset - responses that contained alignment / moralizing were removed. The intent is to train a WizardLM that doesn't have alignment built-in, so that alignment (of any sort) can be added separately with for example with a RLHF LoRA.
Note that despite being an "uncensored" model, several tests have demonstrated that the model will still refuse to comply with certain requests.
Type: Instruct
Filtering: Light
Model | Type | Download |
---|---|---|
13B GGML | CPU | Q4, Q5, Q8 |
13B | GPU | Q4 CUDA 128g |
BluemoonRP 13B (05/07/2023)
An RP/ERP focused finetune of LLaMA 13B finetuned on BluemoonRP logs. It is designed to simulate a 2-person RP session. Two versions are provided; a standard 13B with 2K context and an experimental 13B with 4K context. It has a non-standard format (LEAD/ASSOCIATE), so ensure that you read the model card and use the correct syntax.
Type: Roleplay
Filtering: None
Model | Type | Download |
---|---|---|
13B GGML | CPU | Q5 |
13B | GPU | Q4 CUDA 128g |
Vicuna 13B Cocktail (05/07/2023)
Vicuna 1.1 13B finetune incorporating various datasets in addition to the unfiltered ShareGPT. This is an experiment attempting to enhance the creativity of the Vicuna 1.1, while also reducing censorship as much as possible. All datasets have been cleaned. Additionally, only the "instruct" portion of GPTeacher has been used. It has a non-standard format (USER/ASSOCIATE), so ensure that you read the model card and use the correct syntax.
Type: Instruct
Filtering : Light
Model | Type | Download |
---|---|---|
13B GGML | CPU | Q5, Q8 |
13B | GPU | Q4 CUDA 128s, Q4 Triton |
GPT4-x-AlpacaDente2-30B (05/05/2023)
ChanSung's Alpaca-LoRA-30B-elina merged with Open Assistant's second Finetune.
Type: Instruct
Filtering: Medium
Model | Type | Download |
---|---|---|
30B GGML | CPU | Awaiting re-quantization |
30B | GPU | Q4 CUDA |
https://huggingface.co/askmyteapot/GPT4-x-AlpacaDente2-30b-4bit
Vicuna 13B Free v1.1 (05/01/2023)
A work-in-progress, community driven attempt to make an unfiltered version of Vicuna. It currently has an early stopping bug, and a partial workaround has been posted on the repo's model card.
Type: Instruct
Filtering: Light
Model | Type | Download |
---|---|---|
13B GGML | CPU | Q5, f16 |
13B | GPU | Q4 CUDA 128g |
Pygmalion/Metharme 7B (04/30/2023)
Pygmalion 7B is a dialogue model that uses LLaMA-7B as a base. The dataset includes RP/ERP content. Metharme 7B is an experimental instruct-tuned variation, which can be guided using natural language like other instruct models.
Type: Roleplay (Pyg), Roleplay Instruct (Meth)
Filtering: None
Model | Type | Download |
---|---|---|
7B Pygmalion/Metharme XOR | XOR | https://huggingface.co/PygmalionAI/ |
7B Pygmalion GGML | CPU | Q4_3, Q5_0, Q5_1, Q8 |
7B Metharme GGML | CPU | Q4_1, Q5_1, Q8, f32 |
7B Pygmalion | GPU | Q4 Triton, Q4 CUDA 128g |
7B Metharme | GPU | Q4 Triton, Q4 CUDA |
GPT4-X-Alpasta 30B (04/29/2023)
An attempt at improving Open Assistant's performance as an instruct while retaining its excellent prose. The merge consists of Chansung's GPT4-Alpaca Lora and Open Assistant's native fine-tune.
It is an extremely coherent model for logic based instruct outputs. And while the prose is generally very good, it does suffer from the "Assistant" personality bleedthrough that plagues the OpenAssistant dataset, which can give you dry dialogue for creative writing/chatbot purposes. However, several accounts claim it's nowhere near as bad as OA's finetunes, and that the prose and coherence gains makes up for it.
Type: Instruct
Filtering: Medium
Model | Type | Download |
---|---|---|
30B GGML | CPU | Q4_0 |
30B | GPU CUDA | Q4 CUDA, Q4 CUDA 128g |
OpenAssistant LLaMa 30B SFT 7 (04/23/2023)
An open-source alternative to OpenAI’s ChatGPT/GPT 3.5 Turbo. However, it seems to suffer from overfitting and is heavily filtered. Not recommended for creative writing or chat bots, given the "assistant" personality constantly bleeds through, giving you dry dialogue.
Type: Instruct
Filtering : Heavy
Model | Type | Download |
---|---|---|
30B XOR | XOR | https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor |
30B GGML | CPU | Q4_0, Q5_0, Q5_1, Q8 |
30B | GPU | Q4 CUDA, Q4 CUDA 128g |
SuperCOT (04/22/2023)
SuperCOT is a LoRA trained with the aim of making LLaMa follow prompts for Langchain better, by infusing chain-of-thought datasets, code explanations and instructions, snippets, logical deductions and Alpaca GPT-4 prompts.
Though designed to improve Langchain, it's quite versatile and works very well for other tasks like creative writing and chatbots. The author also pruned a number of filters from the datasets. As of early May 2023, it's the most recommended model on /lmg/
Type: Instruct
Filtering: Light
Model | Type | Download |
---|---|---|
Original LoRA | LoRA | https://huggingface.co/kaiokendev/SuperCOT-LoRA |
13B GGML | CPU | Q5_0, f16 |
30B GGML | CPU | Q4_1, Q5_1 |
13B | GPU | Q4 CUDA 128g |
30B | GPU | Q4 CUDA, Q4 CUDA 128g |
Dataset Formats
Alpaca
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Input
### Response:
Alpaca (with input)
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
Instruction
### Input:
Input
### Response:
Metharme
<|system|>This is a text adventure game. Describe the scenario to the user and give him three options to pick from on each turn.<|user|>Input<|model|>
OpenAssistant
<|prompter|>Input<|endoftext|><|assistant|>
Vicuna 1.1
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
USER: Input
ASSISTANT:
WizardLM
Input
### Response:
Filtering/Bias Rundown
Both bias and filtering is introduced into LLMs by modifying the training/finetuning data. Foundational models, which are the raw and untuned versions (ie, the "original weights" below), primarily function as text generators/sentence completion tools and typically lack intentional bias or filtering.
Instruct models take these raw weights and guide them through fine-tuning to adhere to specific instructions, which allows for the intentional manipulation of outputs by whoever is curating the dataset. And the current crop of instruct datasets are largely derived from GPT outputs, which are plagued with OpenAI's bias and filtering.
Filtering occurs when an instruct model outright refuses to generate an output in response to an instruction, because the model has been trained to deem the output as "offensive" or "unsafe". An example of a common filtering output is "I'm sorry but as an AI assistant, I cannot do that". It's usually coupled with moralizing that will tell you why your input was denied, and how it's "important" to be inclusive/non-offensive/etc.
Bias is a more subtle phenomenon that influences the model's outputs in a particular direction. For example, asking GPT-instruct derived models about controversial political, racial and social issues will typically result in outputs that align with left-wing narratives.
This also manifests as a "positivity" or "wholesomeness" bias/weighting. For example, you can remove filtering so that the model will comply with a request to output something it would ordinarily deem as "derogatory" or "offensive", but it can and usually will skew the output to make it complementary or "positive" instead. This can affect creative writing and RP in unwanted ways as well, as it will tend to favor positive outcomes to stories, events and conversations.
Removing filtering from a dataset is generally much easier than removing bias, the latter of which is often baked into the training data in ways that are difficult to detect and remove.
Previous Model List
The old rentry, retained for archiving purposes. Contains older and outdated models.