Serving
Context Length: 4096
Model: TheBloke/Platypus2-70B-Instruct-GPTQ:gptq-4bit-32g-actorder_True
Blocking API URL: none
Streaming API URL: none
Note that I am hosting this with spot gpu, that means api might go down any time for any reason, I cannot make promises how long I can host it, but I will try to host for 2-4 hours a day. As long as it doesnt cost me too much.
I dont log ip or prompt. But I would be happy for logs to my email or aicg.
If it gets spammed or queue is too long, I will take it down.
If you find some better model with higher context length, send it to my email.
I would recommend using simple-proxy-for-tavern, it gives more power over prompts and makes responses longer.
Settings
Use freellamas settings. Or experiment with settings.
How to use?
Contact
You can send proompts, logs, settings or anything else, I wanna see how far llama2 70b can go.
GJq7w1tZ4Ps2Oi8S@proton.me