OpenRouter Prefill/TC Support
Contact for corrections: huntsman_book495@slmail.me
Unofficial documentation on which providers OpenRouter (OR) providers support prefilling or Text Completion. There may be mistakes or outdated details as these are all manually checked.
Newest updates/notes on top
- New active providers: Cirrascale, ModelRun
- New active provider: Relace
Providers
If you know for sure prefilling a direct request to the provider works but not with OR, you may let OR know. The table below is tested on Chat Completion (CC). Listed Docs are evidence of prefill support.
next to a provider means there are currently no models listed, and
means the provider is no longer shown in the sidebar of the Models page; their support columns are left at their last known state.
2025-09-06: To reduce clutter, all dead entries as of this date (archive) are removed.
| OR API name | Display name on website if different | Prefill supported |
Prefill unsupported |
Note |
|---|---|---|---|---|
| AI21 | Jamba Large/Mini 1.7 (2 models) |
|||
| AionLabs | Aion-RP |
Aion-1.0/-Mini |
||
| Alibaba | Alibaba Cloud Int. | Qwen models |
||
| Amazon Bedrock | ![]() |
Nova series trims leading whitespace. | ||
| Anthropic | ![]() |
Doc | ||
| AtlasCloud | Kimi K2 (trims leading whitespace) |
V3.2 Exp, V3.1 Terminus, Tongyi DeepResearch 30B A3B, LongCat Flash, V3.1, GPT OSS 20B/120B, Qwen3 Coder, Qwen3 235B A22B 2507 |
||
| Azure | ![]() |
|||
| BaseTen | ![]() |
|||
| Cerebras | all except |
GPT OSS 120B, Llama 4 |
||
| Chutes | V3.1 Base (this is a TC model), Qwen3 235B A22B Thinking 2507, Qwen3 Coder, Mistral Small 3.2, R1 0528 Qwen3, Devstral Small, MAI DS R1, GLM Z1 32B, V2 Shisa, DeepCoder, OlympicCoder, Gemma 2/3, Reka Flash 3, R1/Zero/Distill, QwQ 32B/ArliAI, Dolphin 3.0, Qwen2.5 non-VL |
InternVL3 78B, Qwen3 Next, LongCat Flash, Seed OSS 36B, Qwen3 30B A3B Thinking 2507, Nous Hermes 4, V3.1, Qwen3 235B A22B 2507, Kimi K2, Hunyuan A13B, R1T2, MiniMax M1, Kimi-Dev-72B, Sarvam-M, Phi 4 Reasoning, Prover V2, Qwen3, R1T, V3 0324, GLM 4 32B, Moonshot AI, Nemotron, Llama 4, V3/Base, UI-TARS 72B, Qwen2.5-VL, Mistral Small 3.1, DeepHermes 3 |
||
| Cirrascale | Olmo 2 32B Instruct (1 model) |
|||
| Cloudflare | ![]() |
|||
| Cohere | ![]() |
|||
| Crusoe | Llama 3.3 70B R1 0528, V3 0324 (bad prefill output) |
GPT OSS |
||
| DeepInfra | Qwen3, Kimi K2, all except |
Nemotron Super 49B V1.5, GLM 4.6, V3.2 Exp, V3.1 Terminus, Qwen3 Next, V3.1, GPT OSS 20B/120B, Llama Guard 4, Llama 4 |
||
| DeepSeek | V3.2-Exp (1 model) |
Doc, supports prefill with "prefix": True; cleanest R1 response out of all providers. |
||
| Enfer | ![]() |
|||
| Featherless | CodeLlama 7B |
Llema 7b |
OR's selection is extremely limited and outdated, offensively so. | |
| Fireworks | all except |
V3.1, GPT OSS 20B/120B, Qwen3 235B A22B 2507, Yi Large |
||
| Friendli | ![]() |
|||
| GMICloud | GLM 4.6 (not clean), Qwen3 235B A22B Thinking 2507, R1 0528, Qwen3 32B |
V3.2 Exp, V3.1, GPT OSS, GLM 4.5, Qwen3 Coder, Qwen3 235B A22B Instruct 2507, V3 0324, Llama 4 |
||
| Google Vertex | Claude, Gemini 2.5/2.0 |
Qwen3, V3.1, GPT OSS, R1 0528, Llama |
2.5 Pro's thinking is not reliably skipped, min budget 128 tokens. | |
| Google AI Studio | Gemini 2.5/2.0, Gemma 3 |
2.5 Flash's thinking budget can be explicitly set to 0. | ||
| Groq | Kimi K2, all except |
GPT OSS 20B/120B, Llama Guard 4 (error 404), Qwen3 32B |
Doc (requires login) | |
| Hyperbolic | QwQ 32B Preview, Qwen2.5, Llama |
Qwen3 Next, Qwen3 Coder, QwQ 32B, Qwen2.5-VL, Pixtral, Hermes 3 |
||
| Inception | Mercury, Mercury Coder (2 models) |
|||
| InferenceNet | inference.net | ![]() |
||
| Infermatic | ![]() |
|||
| Inflection | Inflection 3 Pi/Productivity (2 models) |
|||
| Lambda | all except |
Llama 4 |
||
| Liquid | (formerly LFM 7B/3B) |
LFM2-8B-A1B, LFM2-2.6B |
||
| Mancer 2 | Mancer (private) | ![]() |
||
| Meta | ![]() |
|||
| Minimax | MiniMax-M2, MiniMax-M1, MiniMax-Text-01 |
|||
| Mistral | ![]() |
Returns the response with the prefill attached. | ||
| ModelRun | Gemma 3 27B, Llama 3.3 70B (2 models) |
|||
| Moonshot AI | Kimi K2 0905/0711 (2 models) |
Doc, supports prefill with "partial": True. |
||
| Morph | Morph V3 Large/Fast (2 models) |
Special purpose edit_file tool for IDE, last message must be of user role. | ||
| NCompass | nCompass | GPT OSS 20B/120B (2 models) |
Previously had supported models. | |
| Nebius | Nebius AI Studio | all except |
Qwen3 30B A3B Thinking 2507, Nous Hermes 4, GPT OSS 20B/120B, Qwen3 235B A22B Instruct 2507, Nemo, Phi 4 |
|
| NextBit | all except |
Mistral Small 3 |
||
| Nineteen | all except |
InternVL |
||
| Novita | NovitaAI | Qwen3 235B A22B Thinking 2507, Kimi K2, Qwen3, all except |
ERNIE 4.5 21B A3B Thinking, V3.2 Exp, Qwen3 VL 235B A22B, V3.1 Terminus, Qwen3 Next, V3.1, Ernie 4.5 VL, GLM 4.5V, GPT OSS 20B/120B, LM 4.5, Qwen3 Coder / 235B A22B 2507, Kimi K2, GLM 4.1V 9B, ERNIE 4.5 300B, MiniMax M1, Prover V2, GLM 4 9/32B, Llama 4 |
|
| Nvidia | NVIDIA | Nemotron Nano 9B V2 (1 model) (only technically; this model continues as part of reasoning part) |
||
| OpenAI | ![]() |
|||
| OpenInference | QwQ 32B |
Qwen3 30B A3B, GLM 4 32B |
||
| Parasail | Qwen3 235B A22B Thinking 2507, all except |
GLM 4.6, Cydonia 24B V4.1, Qwen3 VL 235B A22B, GPT OSS 120B, GLM 4.5, Qwen3 Coder / 235B A22B 2507, UI-TARS 7B, Kimi K2, Anubis 70B, Valkyrie 49B, Qwen3, Llama 4, V3 0324 |
||
| Perplexity | ![]() |
|||
| Phala | ![]() |
|||
| Relace | Relace Apply 3 |
Special purpose code-patching model; input must follow a format. | ||
| SambaNova | R1 Distill, Llama 3.1/3.3 |
V3.1 Terminus, R1 0528, Qwen3 32B, Llama 4, V3 0324 |
||
| SiliconFlow | V3.1 Terminus, GLM 4.5, Kimi K2 0711, V3.1, Hunyuan A13B, ERNIE 4.5 300B A47B, Kimi Dev 72B, Qwen3, QwQ 32B, Llama 3.1 8B Instruct |
Step3, MiniMax M1 |
||
Stealth ![]() |
2: Cypher Alpha (Amazon, not officially revealed) |
5: Andromeda Alpha (Nemotron Nano 2 VL), 4: Sonoma Dusk/Sky Alpha (Grok 4 Fast), 3: Horizon Alpha/Beta (GPT-5), 1: Optimus/Quasar Alpha (GPT-4.1) |
Limited duration cloaked experimental feedback models. Ordered from 1: oldest, to n: newest. | |
| Switchpoint | Router (1 "model") |
Single router of multiple models; last message must be of user role. | ||
| Targon | R1 |
Qwen3 235B A22B 2507, Kimi K2, V3 |
||
| Together | all except |
Kimi K2 0905/0711, Cogito V2, GPT OSS 120B, Arcee AI, Qwen3, Llama 4 |
||
| Venice | ![]() |
|||
| WandB | Weights & Biases | Llama 3.3 70B |
V3.1, GPT OSS, Qwen3 Coder 480B A35B |
|
| xAI | Grok Code, Grok 4/3/2 |
|||
| Z.AI | GLM 4.6, GLM 4.5V, GLM 4.5 / 4.5 Air, GLM 4 32B |
Text Completion (TC)
I am unaware of providers that are directly TC-only. There are CC-only providers; when you try to send prompt instead of messages to a CC-only provider through OR, presumably OR sends the prompt as a single message. There is no such thing as TC "not supporting prefill"; the entire prompt is "the prefill" unless some sequence tokens are appended before allowing the model to respond, which is effectively what happens in non-prefill CC.
Since no CC-only provider has Min-P, you can assume anything listed as supporting Min-P can do TC. Don't ask me why.
For Min-P I simply took what the OR models search page shows and did not test for it. OR might list a sampler as supported if it doesn't return an error. Listed Docs are evidence of TC, usually /v1/completions endpoint and/or prompt.
| OR API name | Display name on OR website | TC supported |
Min-P | TC unsupported |
Note |
|---|---|---|---|---|---|
| AtlasCloud | ![]() |
To be reviewed. | |||
| Cerebras | ![]() |
GPT OSS (error 400) |
|||
| Chutes | ![]() |
✓ | |||
| Crusoe | ![]() |
||||
| DeepInfra | all except |
✓ | Llama 4 |
Doc | |
| DeepSeek | |
V3.2-Exp, |
Doc, for deepseek-chat |
||
| Enfer | ![]() |
Doc | |||
| Featherless | ![]() |
✓ | |||
| Fireworks | all except |
Doc | |||
| Friendli | ![]() |
✓ | Doc | ||
| GMICloud | (to be tested) |
||||
| Hyperbolic | Qwen3 Coder, QwQ 32B, QwQ 32B Preview, Qwen2.5, Llama |
✓ | Qwen3 Next, Qwen2.5-VL, Pixtral, Hermes 3 |
||
| InferenceNet | inference.net | ![]() |
✓ | Doc | |
| Infermatic | ![]() |
✓ | Doc | ||
| Lambda | ![]() |
Doc | |||
| Liquid | ![]() |
✓ | |||
| Mancer 2 | Mancer (private) | ![]() |
✓ | ||
| ModelRun | ![]() |
||||
| Morph | Morph V2 (1 model) |
Clearly not what it's made for but it technically works. | |||
| NCompass | nCompass | ![]() |
✓ | ||
| Nebius | Nebius AI Studio | all except |
Nemo |
Doc | |
| NextBit | ![]() |
![]() |
Doc | ||
| Nineteen | all except |
InternVL |
API, Doc (requires login) | ||
| Novita | NovitaAI | all except |
✓ | ERNIE 4.5 VL (error 400/503), Kimi K2, ERNIE 4.5 300B, MiniMax M1 |
Doc |
| OpenInference | ![]() |
||||
| Parasail | ![]() |
Doc | |||
| Phala | ![]() |
✓ | |||
| SambaNova | ![]() |
||||
| Targon | ![]() |
||||
| Together | ![]() |
✓ | Doc | ||
| WandB | Weights & Biases | ![]() |
|||
| xAI | Grok 3, Grok 2 |
Grok Code, Grok 4, Grok 3 Mini |
Doc, legacy endpoint, no reasoning models. |
One last thing, GPT-3.5 Turbo Instruct is OpenAI's last TC model.
Kimi K2's prompt looks like this without newlines around sequences: <|im_system|>system<|im_middle|>System prompt.<|im_end|><|im_user|>user<|im_middle|>User message.<|im_end|><|im_assistant|>assistant<|im_middle|>Assistant message.<|im_end|>
Fill-in-the-Middle (FIM)
If a model is trained for FIM, FIM can technically be used through TC. Big question: Which models?
| Model | FIM prompt example (response is and) |
Note |
|---|---|---|
| DeepSeek R1/V3 | <|fim▁begin|>Rise<|fim▁hole|> shine!<|fim▁end|> |
DeepSeek TC (V3 only) is undocumented but officially supports FIM with prompt + suffix parameters without instruct sequences. |
Mistral has a FIM endpoint (not through OR) that also takes prompt + suffix parameters. It totally makes sense to be set up this way to make it easier to implement. Anyway, I'm not coder myself so I am unfamiliar with whatever IDEs and FIM endpoints people are using to autocomplete code.
Inactive providers
These providers (OR API name displayed) either have not started serving / are possibly new
, or are no longer serving
. Chronological order may be inaccurate, but I try to list the latest deactivation toward the top.
- Clarifai

- InoCloud

- Kluster
closes inference services effective 2025-07-24T23:00-04 - CentML
is acquired by Nebius - CrofAI
, Ubicloud 
- Avian.io

- Lepton
is acquired by Nvidia, article written on 2025-04-08 - 01.AI, AnyScale, HuggingFace, Lynn, Lynn 2, Modal, OctoAI, Replicate, SF Compute

- Recursal
rebrands to Featherless - R*fl*ction
(around 2024 Sep)
These don't mean anything: FakeProvider, Modular (duplicate/typo of ModelRun)
Test prompts
When prefilling is supported and the last message of the request is of assistant role, the model should be able to consistently continue from it unless it's brain-damaged with the most stringent safetyism possible rendering it unusable in any case, e.g. IIRC, Phi-3 Mini.
| User | Assistant prefill | Possible responses | Note |
|---|---|---|---|
Hi. |
Hello |
! How can I assist you today?, ! How are you today? |
|
Output an empty JSON Object. |
{ |
}, }, `} ; "status": "success", "data": {} }` (QwQ 32B) |
|
What color is the sky? |
The |
sky typically appears blue[...] |
At least one paragraph. Models will talk about Rayleigh scattering. |
Who are you? |
Who |
am I? I'm[...], am I? Well,[...] |
I chose "Who" over "I" for less implication of "try to complete this" that may happen with prefill disabled. |
Who are you? |
F*ck [sic] |
you, I'm not answering that question., ing great question, dude![...] |
*Markdown does not display leading spaces within inline code.
Model quirks
DeepSeek
R1/V3 will appear confused in short tests at the very start of chat, generating a continuation as if it were the user, then responding to itself.
Hi. + Hello → R1 ! How can I assist you today? 😊Hello! How can I assist you today? 😊
Hello? + Who → R1 are you?\n\nHi! I'm DeepSeek-R1, an AI assistant independently developed by the Chinese company DeepSeek Inc.
Hello? + Who → V3 is this?Hello! This is an AI assistant here to help answer your questions or assist with any tasks you have.
This phenomenon occurs with all prefill-supported providers, but stabilizes after a few messages in.
Popular R1 RP prompts do not rely on prefilling, as to not interfere with its reasoning. Instead, they contain user instructions at the end for CoT and/or utilize squashing where the entire chat history becomes a single user message.
V3 0324 does not get confused. Perhaps the earlier releases were simply undercooked.
Hi. + Hello → V3 0324 ! How can I assist you today? 😊
Hello? + Who → V3 0324 's there? 😊 Just kidding—how can I help you today? Let me know what you're looking for, and I'll do my best!
It is very difficult to skip R1 0528's reasoning with prefill, so don't bother.
Also, V3 0324, R1 0528, and onward have a messed up chat template where the assistant token is not inserted if the first assistant message goes before the first user message. This means the first message would be treated as an unlabeled user message.
Prefill supported
R1 0528, V3 0324 (bad prefill output)