Here are some subscription providers, not all, just the ones I know of:
First Party LLM Subscriptions
| Service | Price | Rate Limits | Models | Notes |
|---|---|---|---|---|
| Alibaba Coding Plan | $50/month | 6,000 calls/5hr, 45,000 calls/week, 90,000 calls/month | Qwen 3.6-Plus, Kimi-K2.5, GLM-5, MiniMax-M2.5 | Slow with new models |
| BytePlus ModelArk Coding Plan | $10/month | 1,900 calls/5hr, 12,000 calls/week, 24,000 calls/month | Dola Seed 2.0, GLM-5.1, Kimi-K2.5, GPT-OSS-120B | Slow with new models; higher-tier plans available; has a memory API |
| Z.ai Coding Plan | $18/month | 400 calls/5hr, 2,000 calls/week | GLM-5.1 | Offers useful MCPs; highest cost per call; higher-tier plans available |
| Kimi Code | $19/month | 300 calls/5hr | Kimi-K2.6 | Rate limits vary by action type; higher-tier plans available |
| MiniMax Token Plan | $10/month | 1500 calls/5hr, 15,000 calls/week | MiniMax-M2.7 | Has vision and web search MCPs; model is heavily censored; higher-tier plans available |
| Xiaomi MiMo Token Plan | $6/month | 60m tokens/month | MiMo V2.5 Pro, MiMo V2.5, MiMo V2.5-TTS | 20% off during off-peak hours (16:00–24:00 UTC); higher-tier plans available, has TTS |
| StepFun Step Plan | $7/month | 1,500 calls/5hr, 6,000 calls/week | Step 3.5-flash, Stepaudio 2.5-tts | Higher-tier plans available, has TTS |
Corporate LLM Subscriptions
| Service | Price | Rate Limits | Models | Notes |
|---|---|---|---|---|
| Novita Coding Plan | $50/month | 150M tokens/month | GLM-5, Kimi K2.5 | $20 plan offers no discount; $50 plan offers 17% discount over pay-per-token; slow with new models; higher-tier plans available |
| Cerebras Code Pro | $50/month | 24M tokens/day | GLM-4.7, GPT-OSS-120B | Fastest inference; currently sold out; higher-tier plans available |
| Fireworks Fire Pass | $49/month | Unlimited | Kimi K2.5 Turbo | New slots open daily; fastest provider after Cerebras |
| Atlas Cloud Subscription Plan | $10/month | $18 worth of tokens | Most OSS models | Currently unavailable |
SME LLM Subscriptions
| Service | Price | Rate Limits | Models | Notes |
|---|---|---|---|---|
| Featherless | $25/month | Unlimited tokens | Most OSS models | Limited to 32K context; different plans offer different model access; higher and cheaper plans available |
| Synthetic | $30/month | 135 calls/5hr (pack-based) | DeepSeek-V3.2, MiniMax-M2.5, Kimi-K2.5, GLM-5.1 | Mix of self-hosted (Kimi, MiniMax, GLM) and Fireworks/Together; pay double for double calls; 500 free tool calls and calls under 2,048 tokens/day |
| Ollama Cloud | $20/month | No information provided | Most OSS models | Uses Ollama to connect; higher and cheaper plans available; very good web search |
| OpenCode Go | $10/month | $60 worth of tokens | DeepSeek V4 Pro, GLM-5.1, Kimi K2.6, MiniMax M2.7, MiMo V2.5-Pro, Qwen3.6 Plus | Most coverage for the cost; $5 for the first month; adds new models fast (first to add DeepSeek V4 Pro |
| Chutes | $10/month | $50 worth of tokens | Most OSS models | Bittensor-based; higher and cheaper plans available; unreliable tool calling |
| Blackbox | $20/month | $40 worth of tokens | Most models | Web page is confusing, I am not certain what it offers; higher and cheaper plans available |
Amateur Services
| Service | Price | Rate Limits | Models | Notes |
|---|---|---|---|---|
| ArliAI | $15/month | Unlimited tokens and calls | GLM-4.7, GLM 4.6 RP-finetunes, Mistral Medium 3.5 RP-finetunes, Llama-3.3 RP-finetunes | RP-focused; plans with larger context sizes exist; cheaper plans have limited models; higher-tier plans available |
| Infermatic | $20/month | Unlimited tokens and calls | Qwen-3-235B-Thinking | RP-focused; includes embedding and TTS models; cheaper plans have limited models; higher-tier plans available, for the price... why? |
Aggregator Services
(No clear information about operators, use at your own risk)
| Service | Price | Rate Limits | Models | Notes |
|---|---|---|---|---|
| NanoGPT | $12/month | 60M tokens/week | Almost all OSS models | Includes image generation; single plan only; sometimes unreliable tool calling; time to first token sometimes take minutes (putting users in queue?) |
| Electron Hub | $10/month | $8 weekly credit | Most open and closed models (Anthropic, OpenAI, etc.) | Includes image generation; payment via Patreon; higher-tier plans available |
| Other Notable Services | — | — | Most open and closed models (Anthropic, OpenAI, etc.) | VoidAI, NavyAI, Api.Airforce (established but similarly opaque) |
After these, a comparison of this models done by Artificial Analysis, I believe these are the most relevant benchmarks for role-playing. There is a full comparison here: Artificial Analysis
| Model | AA-LCR | AA-Omniscience Non-Hallucination Rate | IFBench |
|---|---|---|---|
| MiMo-V2.5-Pro | 73% | 75% | 80% |
| Kimi K2.6 | 70% | 61% | 76% |
| Qwen3.6 Plus | 70% | 68% | 75% |
| MiniMax-M2.7 | 69% | 66% | 76% |
| DeepSeek V4 Pro | 66% | 6% | 77% |
| GLM-5.1 | 62% | 71% | 76% |
| Step 3.5 Flash 2603 | 54% | 8% | 67% |
Source: Artificial Analysis — Intelligence Evaluations (23 Apr '26). Higher is better across all three benchmarks.
All pricing and model information as of May, 2026. Flagship models listed; most services offer additional higher-tier plans.
PS. I will try to keep this updated. If I am missing something, or something changes, you can leave a comment.
PPS. In my fully subjective opinion, the best service right now is OpenCode Go. If it isn't enough for you, the best with larger limits is Ollama Cloud.