17-Apr-2023 tk - token, on average aroud 2 tokens are a word.
ctx - context, "memory" of the model

S - SOTA (>>>/g/aicg 🦗🦗🦗 or cryostasis 💤💤💤)
G - Goated (Good)
M - Mid (Middling)
L - Common L (Lackluster)
D - Dogshit

MODEL TABLE

https://github.com/underlines/awesome-marketing-datascience/blob/master/awesome-ai.md#llama-models
https://docs.google.com/spreadsheets/d/1JUFgv-iwQKBzTxgCXVXObbj6jqKrvRuzoVEFwZr2LCU/edit?usp=sharing
(after finished lets convert to markdown and put it here)

B-Model Model Link Purpou Local CPP Horde Colab Qualit

OPTIONS EXPLAINED


KoboldAI Colab

koboldai.org/colab
Blazing fast generations.
TPUs BORKED AS OF NOW
With TPU lets you run models based on OPT-13B/GPT-NeoX-20B (L)
Without TPU lets you run models based on GPT-J-6B (D)

ghetto colab (UX bad)
Blazing fast generations.
https://pastebin.com/cYpykqUe save as *.ipynb upload to google colab paste the .safetensors 13b-4bit-128g LLaMA (M) link from huggingface (look table), finetunes and loras need fiddling.


KoboldAI Horde

lite.koboldai.net
Relies on other people hosting the model, variable availability and speeds (between blazing fast and 200s per 100tk)
As other people host they are able to read your logs (it takes effort from them to do but they can), they won't have your IP.
Any model can be hosted, queues are often proportional to quality, look table.

For hosting the models you get Kudos, Kudos let you skip the queue and enjoy the models at blazing fast generation. Same process as setting up KoboldAI Local on GPU (i think)

github.com/db0/KoboldAI-Horde-Bridge


NovelAI Trial

novelai.net/trial
Blazing fast generations.
I don't know much about them, some other anon should fill this in, good UX.
50 actions without login, 50 with, all on on Eutrepe (L)


Runpod

from $0.44/h (30B LLaMA) up to $0.79/h (65B LLaMA)
Blazing fast generations.
Fiddly and you need to have some knowhow.
Look table.

ghetto colab + runpod (UX bad)

Don't attempt before trying in google colab

arch.b4k.co/vg/thread/425535148/#q425548406

  • download raw https://pastebin.com/cYpykqUe save it as an .ipynb filetype
  • make an account and put some funds on it
  • go to "Secure Cloud" and pick your GPU. This is based on what model you want to run.
    • 7B / 13B: Just use the free colab on Google.
    • 30B: 3090 (0.44 / hr), 4090 (0.69 / hr) (can sometimes be a little fucky at full 2048 context so might need to lower it slightly).
    • 65B: A40 / A6000 (both 0.79 / hr) (no issues at full context).
  • pick the Pytorch template, give yourself memory (go 100 GB each), make sure Jupyter notebook is enabled.
  • deploy the pod, hit "Connect", then "Connect to Jupyter Lab", which opens up a notebook window.
  • go to the Jupyter notebook, there's not a good way to upload the notebook directly from your file system, so open it in google colab and paste the cells into the new notebook. Make sure the "use_drive" flag is off in the first cell.
  • run the cells and you should be good to go.

koboldAI + runpod haxx0rz (no clue, no easy way to do it but teoretically should be possible)
koboldai.org/runpod only has native support for (L) tier models


KoboldAI Local

Blazing fast generation.
Minimal requirements - 11GB VRAM for 13B (M) or 24GB for 30B (G)
out of the loop how to set it up
rentry.org/TESFT-LLaMa <- probably not very useful linkdump


KoboldCPP

koboldai.org/cpp
Slow (improving, but still slow)
minimal requirements:
RAM - 13B > 10GB, 30B > 18GB
CPU - works on everything, but pre AVX1 will be extremly sluggish, AVX1 (2011 and onwards) will be very slow, AVX2 (2014 and onwards) will be slow, AVX-512 (2019 and onwards) will be running on a medicore but usable speed (I guess about other cpus, kinda making shit up, correct pls)

On Windows

  • base version: drag the model into the .exe
  • developement speedup features (2context loading speed, 1.2inference speed):
    • open powershell in the place where the koboldcpp.exe is
    • koboldcpp.exe [ggml_model.bin] --useclblast [platform_id] [device_id] --smartcontext
      You will have to guess Platform and Device ID, 0 0 or 0 1 are the most common, ggml_model.bin is the name of the model file

On Debian/Ubuntu

  • git clone https://github.com/LostRuins/koboldcpp
  • sudo apt install libclblast-dev libopenblas-dev opencl-c-headers
  • cd koboldcpp
  • make LLAMA_CLBLAST=1
  • [optional] sudo apt install clinfo
  • [optional, you will get Platform and Device ID, but you can just guess] clinfo -l
  • koboldcpp.py [ggml_model.bin] --useclblast [platform_id] [device_id] --smartcontext

Weights
Weights are model you want to run, without [ggml_model.bin] you can't do anything, look table, everything that got converted to .bin works (without fiddling), .bin versions of the model are strictly nessesary.


NovelAI Subscription

Blazing fast generations.
Good UX, 25$/mo will as of now will only give you acess to Krake (M), 10-15$/mo to Eutrepe (L), you can't use your own models.

Edit Report
Pub: 17 Apr 2023 11:48 UTC
Views: 97