I am creating this post in order to help newbies and send it to them if someone new joins our empire and asks what to do. Tried to somehow outline most basic stuff and hope i didn't miss anything important, im sorry if so. Did it mostly out of boredom and because "why not", If such a post already exists, then im sorry :<
Intelligence / What "B" stands for?
Usually the intelligence of the model is determined by how many parameters it has, we use letter B for billion, so 7B means 7 Billions parameters, 32B is 32 Billion parameters, ect. However we need to understand that to train one you need to have a large dataset, that means if training data are shitty then model would be shitty as well, most new 8B models are superior to old ~30B models. So let's remember that Trash in -> Trash out.
Memory / Context
Then, ctx/context/memory, basically you can think about it as about the amount of tokens model can work with at once, then the next question is what is token?
Large Language Models(LLM) don't use words and letters as we do, one token can represent a word or it's part. (Try out a tokenizer here: https://platform.openai.com/tokenizer)
Usually long words are made of up to 3~4 tokens, that's different for different models because they have different tokenizers, what i wanted to show is that amount of tokens > amount of words the model can remember, for example for GPT4 32k tokens was about 20k words.
Now, actually LLMs have no memory at all, their context size is the amount of tokens they can work with at once. That means LLM requires the whole chat history up to max tokens limit(context size) in order to have the "memories", that also the reason why with more context occupied the generation speed becomes slightly slower
Should i run models locally?
If you want your chats to be private then run models locally, we don't know what would happen to our chats if we'll use any API, they can be saved, used for further models training, read by someone and so on, we don't know what gonna happen, maybe nothing, maybe something, just forget about privacy if you'll use different APIs