How to quant models yourself
Because maybe, just maybe, you don't trust files from randos on the internet
clone llama.cpp somewhere (/usr/src is traditional)
git clone https://github.com/ggerganov/llama.cpp
create a venv
python3 -m venv .
enter the venv
source bin/activate
ensure pip is working by bootstrapping it
python3 -m ensurepip --upgrade
install pip stuff
pip install -r /usr/src/llama.cpp/requirements/requirements-convert_hf_to_gguf.txt
run script to quant
/usr/src/llama.cpp/convert_hf_to_gguf.py --outfile /path/where/you/want/mistral-large-2411-q8.gguf --outtype q8_0 --verbose /path/to/safetensors/mistral-large-2411/
(Stick with the original weight type like FP16, BF16, etc if you want a lossless starting point, however going for more than q8_0 is probably pointless)
Obviously you need python and the venv stuff installed on your OS before you start and you need to have downloaded the entire model with all the model-00xxx-of-00yyy.safetensors and all its json, tokenizer.model, etc files.
seq -w 1 51 | xargs -I{} wget --header="Cookie: token=$HF_ACCESS TOKEN_GOES_HERE" "https://huggingface.co/mistralai/Mistral-Large-Instruct-2411/resolve/main/model-000{}-of-00051.safetensors"
Once you've created the initial large gguf, you can further quantize with llama-quantize
/usr/src/llama.cpp/llama-quantize /path/to/mistral-large-2411-q8 /path/to/mistral-large-2411-q6.gguf q6_0
And if you need to split them for upload to hugging-face
/usr/src/llama.cpp/llama-gguf-split --split-max-size 43G /path/to/mistral-large-2411-q6.gguf /path/to/mistral-large-2411-q6-split-files