Models for llama.cpp (ggml format)

LLaMA quantized 4-bit weights (ggml q4_0)

SHA256 checksums:

⎗

2dad53e70ca521fedcf9f9be5c26c15df602487a9c008bdafbb2bf8f946b6bf0  llama-7b-ggml-q4_0/ggml-model-q4_0.bin
9cd4d6c1f5f42d5abf529c51bde3303991fba912ab8ed452adfd7c97a4be77d7  llama-13b-ggml-q4_0/ggml-model-q4_0.bin
daefbc6b1b644a75be0286ef865253ab3786e96a2c1bca8b71216b1751eee63e  llama-33b-ggml-q4_0/ggml-model-q4_0.bin
d58a29c8403ecbd14258bbce07d90894fc5a8be25b9d359463c18f9f2ef96eb6  llama-65b-ggml-q4_0/ggml-model-q4_0.bin

ggml model file magic: 0x67676a74 (ggjt in hex)
ggml model file version: 1

Alpaca quantized 4-bit weights (ggml q4_0)

Model	Download
LLaMA 7B fine-tune from chavinlo/alpaca-native	2023-03-31 torrent magnet
LLaMA 7B merged with tloen/alpaca-lora-7b LoRA	2023-03-31 torrent magnet
LLaMA 13B merged with chansung/alpaca-lora-13b LoRA	2023-03-31 torrent magnet
LLaMA 33B merged with chansung/alpaca-lora-30b LoRA	2023-03-31 torrent magnet

Tutorial link for llama.cpp

Example:
./main --model ggml-model-q4_0.bin --file prompts/alpaca.txt --instruct --ctx_size 2048 --keep -1

Tutorial link for koboldcpp

SHA256 checksums:

⎗

f5e264b10944c55a84810e8073dfdcd653fa8e47ff50ea043ec071051ac7821d  alpaca-7b-ggml-q4_0-native-finetune/ggml-model-q4_0.bin
d9777baad5cf6a5d196e70867338d8cc3c7af68c7744e68de839a522983860d7  alpaca-7b-ggml-q4_0-lora-merged/ggml-model-q4_0.bin
3838aa32651c65948e289374abd71f6feab1a62a4921a648e30d979df86a4af3  alpaca-13b-ggml-q4_0-lora-merged/ggml-model-q4_0.bin
2267ed1dc0bf0d6d300ba292c25083c7fa5395f3726c7c68a49b2be19a64b349  alpaca-33b-ggml-q4_0-lora-merged/ggml-model-q4_0.bin

ggml model file magic: 0x67676a74 (ggjt in hex)
ggml model file version: 1

GPT4All 7B quantized 4-bit weights (ggml q4_0)

2023-03-31 torrent magnet

Tutorial link for llama.cpp

GPT4All can be used with llama.cpp in the same way as the other ggml models.

Tutorial link for koboldcpp

SHA256 checksums:

⎗

1
2

9f6cd4830a3c45a86147c80a32888e7be8f8a489284c87cdb882a7cfe40940c1  gpt4all-unfiltered-7b-ggml-q4_0-lora-merged/ggml-model-q4_0.bin
de314c5ee155ac40a03ca3b3be85ba2b02aef9e9f083c411c0b4490689dd047e  gpt4all-7b-ggml-q4_0-lora-merged/ggml-model-q4_0.bin

ggml model file magic: 0x67676a74 (ggjt in hex)
ggml model file version: 1

GPT4 x Alpaca 13B quantized 4-bit weights (ggml q4_0)

2023-04-01 torrent magnet

Tutorial link for llama.cpp

GPT4 x Alpaca can be used with llama.cpp in the same way as the other ggml models.
Text generation with this version is faster compared to the GPTQ-quantized one.

Tutorial link for koboldcpp

SHA256 checksum:

⎗

1	e6b77ebf297946949b25b3c4b870f10cdc98fb9fcaa6d19cef4dda9021031580 gpt4-x-alpaca-13b-ggml-q4_0/ggml-model-q4_0.bin

ggml model file magic: 0x67676a74 (ggjt in hex)
ggml model file version: 1

Model source

GPT4 x Alpaca 13B quantized 4-bit weights (ggml q4_1 from GPTQ with groupsize 128)

2023-04-01 torrent magnet

Tutorial link for llama.cpp

GPT4 x Alpaca can be used with llama.cpp in the same way as the other ggml models.

Tutorial link for koboldcpp

SHA256 checksum:

⎗

1	d4a640a1ce33009c244a361c6f87733aacbc2bea90e84d3c304a4c8be2bdf22d gpt4-x-alpaca-13b-ggml-q4_1-from-gptq-4bit-128g/ggml-model-q4_1.bin

ggml model file magic: 0x67676a74 (ggjt in hex)
ggml model file version: 1

Model source

Vicuna 13B quantized 4-bit weights (ggml q4_0)

2023-04-03 torrent magnet

Tutorial link for llama.cpp

Vicuna can be used with llama.cpp in the same way as the other ggml models.

Tutorial link for koboldcpp

SHA256 checksum:

⎗

1	f96689a13c581f53b616887b2efe82bbfbc5321258dbcfdbe69a22076a7da461 vicuna-13b-ggml-q4_0-delta-merged/ggml-model-q4_0.bin

ggml model file magic: 0x67676a74 (ggjt in hex)
ggml model file version: 1

Model source

OpenAssistant LLaMA 13B WIP fine-tune quantized 4-bit weights (ggml q4_0 & q4_1)

Variant: dvruette/oasst-llama-13b-2-epochs

2023-04-07 torrent magnet | HuggingFace Hub download

Tutorial link for llama.cpp

Tutorial link for koboldcpp

SHA256 checksums:

⎗

1 2	fe77206c7890ecd0824c7b6b6a6deab92e471366b2e4271c05ece9a686474ef6 ggml-model-q4_0.bin 412da683b6ab0f710ce0adc8bc36db52bb92df96698558c5f2a1399af9bd0a78 ggml-model-q4_1.bin

ggml model file magic: 0x67676a74 (ggjt in hex)
ggml model file version: 1

More details
GPTQ-quantized model source
Torrent source

Alpacino 13B fine-tune 4-bit weights (ggml q4_0)

Variant: digitous/Alpacino13b

HuggingFace Hub download

Tutorial link for llama.cpp

Tutorial link for koboldcpp

SHA256 checksum:

⎗

1	af65956d0c533d5cf1250f8e08a493528c87c9635361f493b2ac5409eb73d73a Alpacino-13b-q4_0.bin

ggml model file magic: 0x67676a74 (ggjt in hex)
ggml model file version: 1

More information

Alpacino 33B fine-tune 4-bit weights (ggml q4_0)

Variant: digitous/Alpacino30b

2023-04-17 torrent magnet | HuggingFace Hub download

Tutorial link for llama.cpp

Tutorial link for koboldcpp

SHA256 checksum:

⎗

1	ac8487e14714bda9bf6efdbbba983913f207f8f15e62be00dd4f9a926ddfb6f3 ggml-model-q4_0.bin

ggml model file magic: 0x67676a74 (ggjt in hex)
ggml model file version: 1

Torrent source

Models for HuggingFace 🤗

Updated tokenizer and model configuration files can be found here.

LLaMA float16 weights

2023-03-26 torrent magnet | HuggingFace Hub downloads

Tutorial link for Text generation web UI

Torrent source and SHA256 checksums

Vicuna 13B float16 weights

2023-04-03 torrent magnet

Tutorial link for Text generation web UI

Model source

LLaMA quantized 4-bit weights (GPTQ format without groupsize)

2023-03-26 torrent magnet

Tutorial link for Text generation web UI

SHA256 checksums:

⎗

09841a1c4895e1da3b05c1bdbfb8271c6d43812661e4348c862ff2ab1e6ff5b3  llama-7b-4bit/llama-7b-4bit.safetensors
edfa0b4060aae392b1e9df21fb60a97d78c9268ac6972e3888f6dc955ba0377b  llama-13b-4bit/llama-13b-4bit.safetensors
4cb560746fe58796233159612d8d3c9dbdebdf6f0443b47be71643f2f91b8541  llama-30b-4bit/llama-30b-4bit.safetensors
886ce814ed54c4bd6850e2216d5f198c49475210f8690f45dc63365d9aff3177  llama-65b-4bit/llama-65b-4bit.safetensors

Torrent source and more information

LLaMA quantized 4-bit weights (GPTQ format with groupsize 128)

2023-03-26 torrent magnet

Tutorial link for Text generation web UI

Groupsize 128 is a better choice for the 13B, 33B and 65B models, according to this.

SHA256 checksums:

⎗

ed8ec9c9f0ebb83210157ad0e3c5148760a4e9fd2acfb02cf00f8f2054d2743b  llama-7b-4bit-128g/llama-7b-4bit-128g.safetensors
d3073ef1a2c0b441f95a5d4f8a5aa3b82884eef45d8997270619cb29bcc994b8  llama-13b-4bit-128g/llama-13b-4bit-128g.safetensors
8b7d75d562938823c4503b956cb4b8af6ac0a5afbce2278566cc787da0f8f682  llama-30b-4bit-128g/llama-30b-4bit-128g.safetensors
f1418091e3307611fb0a213e50a0f52c80841b9c4bcba67abc1f6c64c357c850  llama-65b-4bit-128g/llama-65b-4bit-128g.safetensors

Torrent source and more information

Alpaca quantized 4-bit weights (GPTQ format with groupsize 128)

Model	Download
LLaMA 7B fine-tune from ozcur/alpaca-native-4bit as safetensors	2023-03-29 torrent magnet
LLaMA 33B merged with baseten/alpaca-30b LoRA by an anon	2023-03-26 torrent magnet \| extra config files

Tutorial link for Text generation web UI

SHA256 checksums:

⎗

1
2

17d6ba8f83be89f8dfa05cd4720cdd06b4d32c3baed79986e3ba1501b2305530  Alpaca-7B-GPTQ-4bit-128g-native-finetune_2023-03-29/alpaca-7b-4bit-128g-native-finetune.safetensors
a2f8d202ce61b1b612afe08c11f97133c1d56076d65391e738b1ab57c854ee05  Alpaca-30B-4bit-128g/alpaca-30b-hf-4bit.safetensors