/lmg/ Model Links and Torrents

This list is no longer being maintained. Until a replacement is added, you can use the following links :

For a general list of LLaMA models and finetunes, see https://rentry.org/lmg-resources#llama-models-guides-resources
If you're looking for an updated list of the best "creative" models to RP/ERP with, see https://rentry.org/ayumi_erp_rating

Changelog (MDY)
4-bit GPU Model Requirements
4-bit CPU/llama.cpp RAM Requirements
Original Weights
1. LLaMA 16-bit Weights
2. LLaMA 4-bit Weights
Models/Finetunes/LoRA's
Dataset Formats
Filtering/Bias Rundown
Previous Model List

Changelog (MDY)

[05-31-2023] - Added Wizard Vicuna 30B Uncensored
[05-29-2023] - Added Chronos 13B
[05-26-2023] - Added BluemoonRP 30B
[05-24-2023] - Added SuperHOT Prototype

4-bit GPU Model Requirements

VRAM Required takes full context (2048) into account. You may be able to load the model on GPU's with slightly lower VRAM, but you will not be able to run at full context. If you do not have enough RAM to load model, it will load into swap. Groupsize models will increase VRAM usage, as will running a LoRA alongside the model.

Model Parameters	VRAM Required	GPU Examples	RAM to Load
7B	8GB	RTX 1660, 2060, AMD 5700xt, RTX 3050, RTX 3060, RTX 3070	6 GB
13B	12GB	AMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080 12GB, A2000	12GB
30B	24GB	RTX 3090, RTX 4090, A4500, A5000, 6000, Tesla V100	32GB
65B	42GB	A100 80GB, NVIDIA Quadro RTX 8000, Quadro RTX A6000	64GB

4-bit CPU/llama.cpp RAM Requirements

5bit to 8bit Quantized models are becoming more common, and will obviously require more RAM. Will update these with the numbers when I have them.

Model	4-bit	5-bit	8-bit
7B	3.9 GB
13B	7.8 GB
30B	19.5 GB
65B	38.5 GB

Original Weights

LLaMA 16-bit Weights

The original LLaMA weights converted to Transformers @ 16bit. A torrent is available as well, but it uses outdated configuration files that will need to be updated. Note that these aren't for general use, as the VRAM requirements are beyond consumer scope.

Type: Foundational
Filtering: None

Model	Type	Download
7B 16bit	HF Format	HuggingFace
13B 16bit	HF Format	HuggingFace
30B 16bit	HF Format	HuggingFace
65B 16bit	HF Format	HuggingFace
All the above	HF Format	Torrent Magnet

LLaMA 4-bit Weights

The original LLaMA weights quantized to 4-bit. The GPU CUDA versions have outdated tokenizer and configuration files. It is recommended to either update them with this or use the universal LLaMA tokenizer.

The CPU old format version is before the recent quantization format change via pull #1405, and will not work with versions of llama.cpp beyond that pull. The CPU new format links have been converted to work with #1405 and beyond.

Type: Foundational
Filtering: None

Model	Type	Download
7B, 13B, 30B, 65B	CPU (old format)	Torrent Magnet
7B, 13B, 30B, 65B	CPU (new format)	7B, 13B, 30B
7B, 13B, 30B, 65B	GPU CUDA (no groupsize)	Torrent Magnet
7B, 13B, 30B, 65B	GPU CUDA (128gs)	Torrent Magnet
7B, 13B, 30B, 65B	GPU Triton	Neko Institute of Science HF page

Models/Finetunes/LoRA's

Wizard Vicuna 30B Uncensored (05/31/2023)

This is wizard-vicuna-30B trained with a subset of the dataset - responses that contained alignment / moralizing were removed. The intent is to train a WizardLM that doesn't have alignment built-in, so that alignment (of any sort) can be added separately with for example with a RLHF LoRA.

Type: Instruct
Filtering: Light

Model	Type	Download
30B GGML	CPU	Q4_0, Q4_1, Q5_0, Q5_1, Q8
30B	GPU	Q4

Chronos 13B (05/29/2023)

This model is primarily focused on chat, roleplay, and storywriting, but can accomplish other tasks such as simple reasoning and coding. Chronos generates very long outputs with coherent text, largely due to the human inputs it was trained on.

Type: Roleplay Instruct
Filtering: Light

Model	Type	Download
13B GGML	CPU	Q4_0, Q4_1, Q5_0, Q5_1, Q8
13B	GPU	Q4 CUDA 128g, Q4 CUDA, Q4 Triton

BluemoonRP 30B 4K (05/26/2023)

An RP/ERP focused finetune of LLaMA 30B, trained on BluemoonRP logs. It is designed to simulate a 2-person RP session. This version has 4K context token size, achieved with AliBi.

It uses a non-standard format (LEAD/ASSOCIATE), so ensure that you read the model card and use the correct syntax.

Type: Roleplay
Filtering: None

Model	Type	Download
30B GGML	CPU	Q5_0
30B	GPU	Q4 CUDA 128g

SuperHOT 30B Prototype (05/24/2023)

A LoRA trained on a variety of combined roleplaying datasets. Made by the creator of SuperCOT. This is a prototype at around 0.5 epochs. It uses a special format, so read the LoRA card for instructions.

It's not generally recommended for use yet, given its an early prototype. This entry will be removed when the final version is posted.

Type: Roleplay Instruct
Filtering: ???

Model	Type	Download
30B LoRA	LoRA	HF Link
30B GGML	CPU	None yet.
30B	GPU	Q4 CUDA 128g

Manticore 13B (05/20/2023)

Manticore 13B is a 3 epoch LLaMa 13B model fine-tuned on a number of merged datasets.

Type: Instruct
Filtering: Light

Model	Type	Download
13B 16-bit	Unquantized	HF Link
13B GGML	CPU	Q4_0, Q4_1, Q5_0, Q5_1, Q8
13B	GPU	Q4 CUDA 128g

https://huggingface.co/kaiokendev/SuperHOT-LoRA-prototype

WizardLM 30B Uncensored (05/23/2023)

WizardLM 30B trained with a subset of the dataset - responses that contained alignment / moralizing were removed. The intent is to train a WizardLM that doesn't have alignment built-in, so that alignment (of any sort) can be added separately with for example with a RLHF LoRA.

Type: Instruct
Filtering: Light

Model	Type	Download
30B 16-bit	Unquantized	HF Link
30B GGML	CPU	Q4_0, Q4_1, Q5_0, Q5_1, Q8
30B	GPU	Q4 Triton

Manticore 13B (05/20/2023)

Manticore 13B is a 3 epoch LLaMa 13B model fine-tuned on a number of merged datasets.

Type: Instruct
Filtering: Light

Model	Type	Download
13B 16-bit	Unquantized	HF Link
13B GGML	CPU	Q4_0, Q4_1, Q5_0, Q5_1, Q8
13B	GPU	Q4 CUDA 128g

Pygmalion/Metharme 13B (05/19/2023)

Pygmalion 13B is a dialogue model that uses LLaMA-13B as a base. The dataset includes RP/ERP content. Metharme 13B is an experimental instruct-tuned variation, which can be guided using natural language like other instruct models.

Type: Roleplay (Pyg), Roleplay Instruct (Meth)
Filtering: None

Model	Type	Download
13B Pygmalion/Metharme XOR	XOR	https://huggingface.co/PygmalionAI/
13B Pygmalion GGML	CPU	Q4_0, Q4_1, Q5_0, Q5_1, Q8
13B Metharme GGML	CPU	Q4_1, Q5_1, Q8
13B Pygmalion	GPU	Q4 CUDA 128g
13B Metharme	GPU	Q4 CUDA 128g

VicUnLocked 30B (05/18/2023)

A full context LoRA fine-tuned to 1 epoch on the ShareGPT Vicuna Unfiltered dataset, with filtering mostly removed. There's also a half-context 3 epoch version that you can get here.

Type: Instruct
Filtering: Light

Model	Type	Download
LoRA	LoRA	HF Link
30B GGML	CPU	Q4_0, Q4_1, Q5_0, Q5_1, Q8
30B	GPU	Q4 Triton, Q4 CUDA

Wizard Mega 13B (05/16/2023)

Wizard Mega is a Llama 13B model fine-tuned on the ShareGPT, WizardLM, and Wizard-Vicuna datasets. These particular datasets have all been filtered to remove responses where the model responds with "As an AI language model...", etc or when the model refuses to respond.

Type: Instruct
Filtering: Light

Model	Type	Download
13B GGML	CPU	Q4_0, Q5_0, Q5_1, Q8
13B	GPU	Q4 CUDA 128g

WizardLM 13B Uncensored (05/10/2023)

This is WizardLM trained with a subset of the dataset - responses that contained alignment / moralizing were removed. The intent is to train a WizardLM that doesn't have alignment built-in, so that alignment (of any sort) can be added separately with for example with a RLHF LoRA.

Note that despite being an "uncensored" model, several tests have demonstrated that the model will still refuse to comply with certain requests.

Type: Instruct
Filtering: Light

Model	Type	Download
13B GGML	CPU	Q4, Q5, Q8
13B	GPU	Q4 CUDA 128g

BluemoonRP 13B (05/07/2023)

An RP/ERP focused finetune of LLaMA 13B finetuned on BluemoonRP logs. It is designed to simulate a 2-person RP session. Two versions are provided; a standard 13B with 2K context and an experimental 13B with 4K context. It has a non-standard format (LEAD/ASSOCIATE), so ensure that you read the model card and use the correct syntax.

Type: Roleplay
Filtering: None

Model	Type	Download
13B GGML	CPU	Q5
13B	GPU	Q4 CUDA 128g

Vicuna 13B Cocktail (05/07/2023)

Vicuna 1.1 13B finetune incorporating various datasets in addition to the unfiltered ShareGPT. This is an experiment attempting to enhance the creativity of the Vicuna 1.1, while also reducing censorship as much as possible. All datasets have been cleaned. Additionally, only the "instruct" portion of GPTeacher has been used. It has a non-standard format (USER/ASSOCIATE), so ensure that you read the model card and use the correct syntax.

Type: Instruct
Filtering : Light

Model	Type	Download
13B GGML	CPU	Q5, Q8
13B	GPU	Q4 CUDA 128s, Q4 Triton

GPT4-x-AlpacaDente2-30B (05/05/2023)

ChanSung's Alpaca-LoRA-30B-elina merged with Open Assistant's second Finetune.

Type: Instruct
Filtering: Medium

Model	Type	Download
30B GGML	CPU	Awaiting re-quantization
30B	GPU	Q4 CUDA

https://huggingface.co/askmyteapot/GPT4-x-AlpacaDente2-30b-4bit

Vicuna 13B Free v1.1 (05/01/2023)

A work-in-progress, community driven attempt to make an unfiltered version of Vicuna. It currently has an early stopping bug, and a partial workaround has been posted on the repo's model card.

Type: Instruct
Filtering: Light

Model	Type	Download
13B GGML	CPU	Q5, f16
13B	GPU	Q4 CUDA 128g

Pygmalion/Metharme 7B (04/30/2023)

Pygmalion 7B is a dialogue model that uses LLaMA-7B as a base. The dataset includes RP/ERP content. Metharme 7B is an experimental instruct-tuned variation, which can be guided using natural language like other instruct models.

Type: Roleplay (Pyg), Roleplay Instruct (Meth)
Filtering: None

Model	Type	Download
7B Pygmalion/Metharme XOR	XOR	https://huggingface.co/PygmalionAI/
7B Pygmalion GGML	CPU	Q4_3, Q5_0, Q5_1, Q8
7B Metharme GGML	CPU	Q4_1, Q5_1, Q8, f32
7B Pygmalion	GPU	Q4 Triton, Q4 CUDA 128g
7B Metharme	GPU	Q4 Triton, Q4 CUDA

GPT4-X-Alpasta 30B (04/29/2023)

An attempt at improving Open Assistant's performance as an instruct while retaining its excellent prose. The merge consists of Chansung's GPT4-Alpaca Lora and Open Assistant's native fine-tune.

It is an extremely coherent model for logic based instruct outputs. And while the prose is generally very good, it does suffer from the "Assistant" personality bleedthrough that plagues the OpenAssistant dataset, which can give you dry dialogue for creative writing/chatbot purposes. However, several accounts claim it's nowhere near as bad as OA's finetunes, and that the prose and coherence gains makes up for it.

Type: Instruct
Filtering: Medium

Model	Type	Download
30B GGML	CPU	Q4_0
30B	GPU CUDA	Q4 CUDA, Q4 CUDA 128g

OpenAssistant LLaMa 30B SFT 7 (04/23/2023)

An open-source alternative to OpenAI’s ChatGPT/GPT 3.5 Turbo. However, it seems to suffer from overfitting and is heavily filtered. Not recommended for creative writing or chat bots, given the "assistant" personality constantly bleeds through, giving you dry dialogue.

Type: Instruct
Filtering : Heavy

Model	Type	Download
30B XOR	XOR	https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor
30B GGML	CPU	Q4_0, Q5_0, Q5_1, Q8
30B	GPU	Q4 CUDA, Q4 CUDA 128g

SuperCOT (04/22/2023)

SuperCOT is a LoRA trained with the aim of making LLaMa follow prompts for Langchain better, by infusing chain-of-thought datasets, code explanations and instructions, snippets, logical deductions and Alpaca GPT-4 prompts.

Though designed to improve Langchain, it's quite versatile and works very well for other tasks like creative writing and chatbots. The author also pruned a number of filters from the datasets. As of early May 2023, it's the most recommended model on /lmg/

Type: Instruct
Filtering: Light

Model	Type	Download
Original LoRA	LoRA	https://huggingface.co/kaiokendev/SuperCOT-LoRA
13B GGML	CPU	Q5_0, f16
30B GGML	CPU	Q4_1, Q5_1
13B	GPU	Q4 CUDA 128g
30B	GPU	Q4 CUDA, Q4 CUDA 128g

Dataset Formats

Alpaca

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Input

### Response:

Alpaca (with input)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Instruction

### Input:
Input

### Response:

Metharme

OpenAssistant

<|prompter|>Input<|endoftext|><|assistant|>

Vicuna 1.1

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

USER: Input
ASSISTANT:

WizardLM

Input

### Response:

Filtering/Bias Rundown

Both bias and filtering is introduced into LLMs by modifying the training/finetuning data. Foundational models, which are the raw and untuned versions (ie, the "original weights" below), primarily function as text generators/sentence completion tools and typically lack intentional bias or filtering.

Instruct models take these raw weights and guide them through fine-tuning to adhere to specific instructions, which allows for the intentional manipulation of outputs by whoever is curating the dataset. And the current crop of instruct datasets are largely derived from GPT outputs, which are plagued with OpenAI's bias and filtering.

Filtering occurs when an instruct model outright refuses to generate an output in response to an instruction, because the model has been trained to deem the output as "offensive" or "unsafe". An example of a common filtering output is "I'm sorry but as an AI assistant, I cannot do that". It's usually coupled with moralizing that will tell you why your input was denied, and how it's "important" to be inclusive/non-offensive/etc.

Bias is a more subtle phenomenon that influences the model's outputs in a particular direction. For example, asking GPT-instruct derived models about controversial political, racial and social issues will typically result in outputs that align with left-wing narratives.

This also manifests as a "positivity" or "wholesomeness" bias/weighting. For example, you can remove filtering so that the model will comply with a request to output something it would ordinarily deem as "derogatory" or "offensive", but it can and usually will skew the output to make it complementary or "positive" instead. This can affect creative writing and RP in unwanted ways as well, as it will tend to favor positive outcomes to stories, events and conversations.

Removing filtering from a dataset is generally much easier than removing bias, the latter of which is often baked into the training data in ways that are difficult to detect and remove.

Previous Model List

The old rentry, retained for archiving purposes. Contains older and outdated models.

https://rentry.org/backupmdlist

/lmg/ Model Links and Torrents

Changelog (MDY)

4-bit GPU Model Requirements

4-bit CPU/llama.cpp RAM Requirements

Original Weights

LLaMA 16-bit Weights

LLaMA 4-bit Weights

Models/Finetunes/LoRA's

Wizard Vicuna 30B Uncensored (05/31/2023)

Chronos 13B (05/29/2023)

BluemoonRP 30B 4K (05/26/2023)

SuperHOT 30B Prototype (05/24/2023)

Manticore 13B (05/20/2023)

WizardLM 30B Uncensored (05/23/2023)

Manticore 13B (05/20/2023)

Pygmalion/Metharme 13B (05/19/2023)

VicUnLocked 30B (05/18/2023)

Wizard Mega 13B (05/16/2023)

WizardLM 13B Uncensored (05/10/2023)

BluemoonRP 13B (05/07/2023)

Vicuna 13B Cocktail (05/07/2023)

GPT4-x-AlpacaDente2-30B (05/05/2023)

Vicuna 13B Free v1.1 (05/01/2023)

Pygmalion/Metharme 7B (04/30/2023)

GPT4-X-Alpasta 30B (04/29/2023)

OpenAssistant LLaMa 30B SFT 7 (04/23/2023)

SuperCOT (04/22/2023)

Dataset Formats

Alpaca

Alpaca (with input)

Metharme

OpenAssistant

Vicuna 1.1

WizardLM

Filtering/Bias Rundown

Previous Model List

Warning