Serving

Context Length: 4096
Model: TheBloke/Platypus2-70B-Instruct-GPTQ:gptq-4bit-32g-actorder_True
Blocking API URL: none
Streaming API URL: none

Note that I am hosting this with spot gpu, that means api might go down any time for any reason, I cannot make promises how long I can host it, but I will try to host for 2-4 hours a day. As long as it doesnt cost me too much.

I dont log ip or prompt. But I would be happy for logs to my email or aicg.
If it gets spammed or queue is too long, I will take it down.
If you find some better model with higher context length, send it to my email.

I would recommend using simple-proxy-for-tavern, it gives more power over prompts and makes responses longer.

Settings

Use freellamas settings. Or experiment with settings.

How to use?

Refer to freelamas.

Contact

You can send proompts, logs, settings or anything else, I wanna see how far llama2 70b can go.
GJq7w1tZ4Ps2Oi8S@proton.me

Serving

Settings

How to use?

Contact

Warning