17-Apr-2023
tk - token, on average aroud 2 tokens are a word.
ctx - context, "memory" of the model
S - SOTA (>>>/g/aicg 🦗🦗🦗 or cryostasis 💤💤💤)
G - Goated (Good)
M - Mid (Middling)
L - Common L (Lackluster)
D - Dogshit
MODEL TABLE
https://github.com/underlines/awesome-marketing-datascience/blob/master/awesome-ai.md#llama-models
https://docs.google.com/spreadsheets/d/1JUFgv-iwQKBzTxgCXVXObbj6jqKrvRuzoVEFwZr2LCU/edit?usp=sharing
(after finished lets convert to markdown and put it here)
B-Model | Model | Link | Purpou | Local | CPP | Horde | Colab | Qualit |
---|---|---|---|---|---|---|---|---|
OPTIONS EXPLAINED
KoboldAI Colab
koboldai.org/colab
Blazing fast generations.
TPUs BORKED AS OF NOW
With TPU lets you run models based on OPT-13B/GPT-NeoX-20B (L)
Without TPU lets you run models based on GPT-J-6B (D)
ghetto colab (UX bad)
Blazing fast generations.
https://pastebin.com/cYpykqUe save as *.ipynb upload to google colab paste the .safetensors 13b-4bit-128g LLaMA (M) link from huggingface (look table), finetunes and loras need fiddling.
KoboldAI Horde
lite.koboldai.net
Relies on other people hosting the model, variable availability and speeds (between blazing fast and 200s per 100tk)
As other people host they are able to read your logs (it takes effort from them to do but they can), they won't have your IP.
Any model can be hosted, queues are often proportional to quality, look table.
For hosting the models you get Kudos, Kudos let you skip the queue and enjoy the models at blazing fast generation. Same process as setting up KoboldAI Local on GPU (i think)
github.com/db0/KoboldAI-Horde-Bridge
NovelAI Trial
novelai.net/trial
Blazing fast generations.
I don't know much about them, some other anon should fill this in, good UX.
50 actions without login, 50 with, all on on Eutrepe (L)
Runpod
from $0.44/h (30B LLaMA) up to $0.79/h (65B LLaMA)
Blazing fast generations.
Fiddly and you need to have some knowhow.
Look table.
ghetto colab + runpod (UX bad)
Don't attempt before trying in google colab
arch.b4k.co/vg/thread/425535148/#q425548406
- download raw https://pastebin.com/cYpykqUe save it as an .ipynb filetype
- make an account and put some funds on it
- go to "Secure Cloud" and pick your GPU. This is based on what model you want to run.
- 7B / 13B: Just use the free colab on Google.
- 30B: 3090 (0.44 / hr), 4090 (0.69 / hr) (can sometimes be a little fucky at full 2048 context so might need to lower it slightly).
- 65B: A40 / A6000 (both 0.79 / hr) (no issues at full context).
- pick the Pytorch template, give yourself memory (go 100 GB each), make sure Jupyter notebook is enabled.
- deploy the pod, hit "Connect", then "Connect to Jupyter Lab", which opens up a notebook window.
- go to the Jupyter notebook, there's not a good way to upload the notebook directly from your file system, so open it in google colab and paste the cells into the new notebook. Make sure the "use_drive" flag is off in the first cell.
- run the cells and you should be good to go.
koboldAI + runpod haxx0rz (no clue, no easy way to do it but teoretically should be possible)
koboldai.org/runpod only has native support for (L) tier models
KoboldAI Local
Blazing fast generation.
Minimal requirements - 11GB VRAM for 13B (M) or 24GB for 30B (G)
out of the loop how to set it up
rentry.org/TESFT-LLaMa <- probably not very useful linkdump
KoboldCPP
koboldai.org/cpp
Slow (improving, but still slow)
minimal requirements:
RAM - 13B > 10GB, 30B > 18GB
CPU - works on everything, but pre AVX1 will be extremly sluggish, AVX1 (2011 and onwards) will be very slow, AVX2 (2014 and onwards) will be slow, AVX-512 (2019 and onwards) will be running on a medicore but usable speed (I guess about other cpus, kinda making shit up, correct pls)
On Windows
- base version: drag the model into the .exe
- developement speedup features (2context loading speed, 1.2inference speed):
- open powershell in the place where the koboldcpp.exe is
koboldcpp.exe [ggml_model.bin] --useclblast [platform_id] [device_id] --smartcontext
You will have to guess Platform and Device ID, 0 0 or 0 1 are the most common, ggml_model.bin is the name of the model file
On Debian/Ubuntu
git clone https://github.com/LostRuins/koboldcpp
sudo apt install libclblast-dev libopenblas-dev opencl-c-headers
cd koboldcpp
make LLAMA_CLBLAST=1
- [optional]
sudo apt install clinfo
- [optional, you will get Platform and Device ID, but you can just guess]
clinfo -l
koboldcpp.py [ggml_model.bin] --useclblast [platform_id] [device_id] --smartcontext
Weights
Weights are model you want to run, without [ggml_model.bin]
you can't do anything, look table, everything that got converted to .bin works (without fiddling), .bin versions of the model are strictly nessesary.
NovelAI Subscription
Blazing fast generations.
Good UX, 25$/mo will as of now will only give you acess to Krake (M), 10-15$/mo to Eutrepe (L), you can't use your own models.