OpenRouter Prefill/TC Support

Contact for corrections: huntsman_book495@slmail.me

Unofficial documentation on which providers OpenRouter (OR) providers support prefilling or Text Completion. There may be mistakes or outdated details as these are all manually checked.

Providers
Text Completion (TC)
1. Fill-in-the-Middle (FIM)
Inactive providers
Test prompts
1. Model quirks

Newest updates/notes on top

- New active provider: Relace
- Qwen3 Next models, CC prefill unsupported, TC supported by Chutes, DeepInfra, and Novita.
- New active providers: NVIDIA, SiliconFlow (supports V3.1), Weights & Biases
- Kimi K2 works on DeepInfra, Fireworks, and Novita (CC).
- Prefill is broken on all CC providers for deepseek/deepseek-chat-v3.1 except DeepSeek themself.

Providers

If you know for sure prefilling a direct request to the provider works but not with OR, you may let OR know. The table below is tested on Chat Completion (CC). Listed Docs are evidence of prefill support. :warning: next to a provider means there are currently no models listed, and :x: means the provider is no longer shown in the sidebar of the Models page; their support columns are left at their last known state.

2025-09-06: To reduce clutter, all dead entries as of this date (archive) are removed.

OR API name	Display name on OR website	Prefill supported	Prefill unsupported	Note
AI21			Jamba Large/Mini 1.7 (2 models)
AionLabs		Aion-RP	Aion-1.0/-Mini
Alibaba	Alibaba Cloud Int.		Qwen models
Amazon Bedrock				Nova series trims leading whitespace.
Anthropic				Doc
AtlasCloud		Kimi K2 (trims leading whitespace)	V3.2 Exp, V3.1 Terminus, Tongyi DeepResearch 30B A3B, LongCat Flash, V3.1, GPT OSS 20B/120B, Qwen3 Coder, Qwen3 235B A22B 2507
Azure
BaseTen
Cerebras		all except	GPT OSS 120B, Llama 4
Chutes		V3.1 Base (this is a TC model), Qwen3 235B A22B Thinking 2507, Qwen3 Coder, Mistral Small 3.2, R1 0528 Qwen3, Devstral Small, MAI DS R1, GLM Z1 32B, V2 Shisa, DeepCoder, OlympicCoder, Gemma 2/3, Reka Flash 3, R1/Zero/Distill, QwQ 32B/ArliAI, Dolphin 3.0, Qwen2.5 non-VL	InternVL3 78B, Qwen3 Next, LongCat Flash, Seed OSS 36B, Qwen3 30B A3B Thinking 2507, Nous Hermes 4, V3.1, Qwen3 235B A22B 2507, Kimi K2, Hunyuan A13B, R1T2, MiniMax M1, Kimi-Dev-72B, Sarvam-M, Phi 4 Reasoning, Prover V2, Qwen3, R1T, V3 0324, GLM 4 32B, Moonshot AI, Nemotron, Llama 4, V3/Base, UI-TARS 72B, Qwen2.5-VL, Mistral Small 3.1, DeepHermes 3
Cloudflare
Cohere
Crusoe		Llama 3.3 70B R1 0528, V3 0324 (bad prefill output)	GPT OSS
DeepInfra		Qwen3, Kimi K2, all except	Nemotron Super 49B V1.5, GLM 4.6, V3.2 Exp, V3.1 Terminus, Qwen3 Next, V3.1, GPT OSS 20B/120B, Llama Guard 4, Llama 4
DeepSeek		V3.1 (1 model, formerly R1/V3)		Doc, supports prefill with `"prefix": True`; cleanest R1 response out of all providers.
Enfer
Featherless		CodeLlama 7B, Lumimaid	Llema 7b (unresponsive)	OR's selection is extremely limited and outdated.
Fireworks		all except	V3.1, GPT OSS 20B/120B, Qwen3 235B A22B 2507, Yi Large
Friendli
GMICloud		GLM 4.6 (not clean), Qwen3 235B A22B Thinking 2507, R1 0528, Qwen3 32B	V3.2 Exp, V3.1, GPT OSS, GLM 4.5, Qwen3 Coder, Qwen3 235B A22B Instruct 2507, V3 0324, Llama 4
Google	Google Vertex	Claude, Gemini 2.5/2.0	Gemini 1.5	2.5 Pro's thinking is not reliably skipped, min budget 128 tokens.
Google AI Studio		Gemini 2.5/2.0, Gemma 3	Gemini 1.5	2.5 Flash's thinking budget can be explicitly set to 0.
Groq		Kimi K2, all except	GPT OSS 20B/120B, Llama Guard 4 (error 404), Qwen3 32B	Doc (requires login)
Hyperbolic		QwQ 32B Preview, Qwen2.5, Llama	Qwen3 Next, Qwen3 Coder, QwQ 32B, Qwen2.5-VL, Pixtral, Hermes 3
Inception			Mercury, Mercury Coder (2 models)
InferenceNet	inference.net
Infermatic
Inflection			Inflection 3 Pi/Productivity (2 models)
Lambda		all except	Llama 4
Liquid		LFM 7B/3B (2 models)
Mancer 2	Mancer (private)
Meta
Minimax			MiniMax-M1, MiniMax-Text-01 (2 models)
Mistral				Returns the response with the prefill attached.
Moonshot AI			Kimi K2 0905/0711 (2 models)	Doc, supports prefill with `"partial": True`.
Morph			Morph models	Special purpose edit_file tool for IDE, last message must be of user role.
NCompass	nCompass		GPT OSS 20B/120B (2 models)	Previously had supported models.
Nebius	Nebius AI Studio	all except	Qwen3 30B A3B Thinking 2507, Nous Hermes 4, GPT OSS 20B/120B, Qwen3 235B A22B Instruct 2507, Nemo, Phi 4
NextBit		all except	Mistral Small 3
Nineteen		all except	InternVL
Novita	NovitaAI	Qwen3 235B A22B Thinking 2507, Kimi K2, Qwen3, all except	ERNIE 4.5 21B A3B Thinking, V3.2 Exp, Qwen3 VL 235B A22B, V3.1 Terminus, Qwen3 Next, V3.1, Ernie 4.5 VL, GLM 4.5V, GPT OSS 20B/120B, LM 4.5, Qwen3 Coder / 235B A22B 2507, Kimi K2, GLM 4.1V 9B, ERNIE 4.5 300B, MiniMax M1, Prover V2, GLM 4 9/32B, Llama 4
Nvidia	NVIDIA	Nemotron Nano 9B V2 (1 model) (only technically; this model continues as part of reasoning part)
OpenAI
OpenInference		QwQ 32B	Qwen3 30B A3B, GLM 4 32B
Parasail		Qwen3 235B A22B Thinking 2507, all except	GLM 4.6, Cydonia 24B V4.1, Qwen3 VL 235B A22B, GPT OSS 120B, GLM 4.5, Qwen3 Coder / 235B A22B 2507, UI-TARS 7B, Kimi K2, Anubis 70B, Valkyrie 49B, Qwen3, Llama 4, V3 0324
Perplexity
Phala
Relace			Relace Apply 3	Special purpose code-patching model; input must follow a format.
SambaNova		R1 Distill, Llama 3.1/3.3	V3.1 Terminus, R1 0528, Qwen3 32B, Llama 4, V3 0324
SiliconFlow		V3.1 Terminus, GLM 4.5, Kimi K2 0711, V3.1, Hunyuan A13B, ERNIE 4.5 300B A47B, Kimi Dev 72B, Qwen3, QwQ 32B, Llama 3.1 8B Instruct	Step3, MiniMax M1
Stealth		2: Cypher Alpha (Amazon, unverified)	4: Sonoma Dusk/Sky Alpha, 3: Horizon Alpha/Beta (GPT-5), 1: Optimus/Quasar Alpha (GPT-4.1)	Limited duration cloaked experimental feedback models. Ordered from 1: oldest, to n: newest.
Switchpoint			Router (1 "model")	Single router of multiple models; last message must be of user role.
Targon		R1	Qwen3 235B A22B 2507, Kimi K2, V3
Together		all except	Kimi K2 0905/0711, Cogito V2, GPT OSS 120B, Arcee AI, Qwen3, Llama 4
Venice
WandB	Weights & Biases	Llama 3.3 70B	V3.1, GPT OSS, Qwen3 Coder 480B A35B
xAI			Grok Code, Grok 4/3/2
Z.AI			GLM 4.6, GLM 4.5V, GLM 4.5 / 4.5 Air, GLM 4 32B

Text Completion (TC)

I am unaware of providers that are directly TC-only. There are CC-only providers; when you try to send prompt instead of messages to a CC-only provider through OR, presumably OR sends the prompt as a single message. There is no such thing as TC "not supporting prefill"; the entire prompt is "the prefill" unless some sequence tokens are appended before allowing the model to respond, which is effectively what happens in non-prefill CC.

Since no CC-only provider has Min-P, you can assume anything listed as supporting Min-P can do TC. Don't ask me why.

For Min-P I simply took what the OR models search page shows and did not test for it. OR might list a sampler as supported if it doesn't return an error. Listed Docs are evidence of TC, usually /v1/completions endpoint and/or prompt.

OR API name	Display name on OR website	TC supported	Min-P	TC unsupported	Note
AtlasCloud					To be reviewed.
Cerebras				GPT OSS (error 400)
Chutes			✓
Crusoe
DeepInfra		all except	✓	Llama 4	Doc
DeepSeek		V3		V3.1 R1	Doc; R1 still not supported despite CC working.
Enfer					Doc
Featherless			✓
Fireworks		all except			Doc
Friendli			✓		Doc
GMICloud		(to be tested)
Hyperbolic		Qwen3 Coder, QwQ 32B, QwQ 32B Preview, Qwen2.5, Llama	✓	Qwen3 Next, Qwen2.5-VL, Pixtral, Hermes 3
InferenceNet	inference.net		✓		Doc
Infermatic			✓		Doc
Lambda					Doc
Liquid			✓
Mancer 2	Mancer (private)		✓
Morph		Morph V2 (1 model)			Clearly not what it's made for but it technically works.
NCompass	nCompass		✓
Nebius	Nebius AI Studio	all except		Nemo	Doc
NextBit					Doc
Nineteen		all except		InternVL	API, Doc (requires login)
Novita	NovitaAI	all except	✓	ERNIE 4.5 VL (error 400/503), Kimi K2, ERNIE 4.5 300B, MiniMax M1	Doc
OpenInference
Parasail					Doc
Phala			✓
SambaNova
Targon
Together			✓		Doc
WandB	Weights & Biases
xAI		Grok 3, Grok 2		Grok Code, Grok 4, Grok 3 Mini	Doc, legacy endpoint, no reasoning models.

One last thing, GPT-3.5 Turbo Instruct is OpenAI's last TC model.

Fill-in-the-Middle (FIM)

If a model is trained for FIM, FIM can technically be used through TC. Big question: Which models?

Model	FIM prompt example (response is `and`)	Note
DeepSeek R1/V3	`<｜fim▁begin｜>Rise<｜fim▁hole｜> shine!<｜fim▁end｜>`	DeepSeek TC (V3 only) is undocumented but officially supports FIM with `prompt` + `suffix` parameters without instruct sequences.

Mistral has a FIM endpoint (not through OR) that also takes prompt + suffix parameters. It totally makes sense to be set up this way to make it easier to implement. Anyway, I'm not coder myself so I am unfamiliar with whatever IDEs and FIM endpoints people are using to autocomplete code.

Inactive providers

These providers (OR API name displayed) either have not started serving / are possibly new :bulb: , or are no longer serving :x: . Chronological order may be inaccurate, but I try to list the latest deactivation toward the top.

InoCloud
Kluster closes inference services effective 2025-07-24T23:00-04
CentML is acquired by Nebius
CrofAI , Ubicloud
Avian.io
Lepton is acquired by Nvidia, article written on 2025-04-08
01.AI, AnyScale, HuggingFace, Lynn, Lynn 2, Modal, OctoAI, Replicate, SF Compute
Recursal rebrands to Featherless
R*fl*ction (around 2024 Sep)

Test prompts

When prefilling is supported and the last message of the request is of assistant role, the model should be able to consistently continue from it unless it's brain-damaged with the most stringent safetyism possible rendering it unusable in any case, e.g. IIRC, Phi-3 Mini.

User	Assistant prefill	Possible responses	Note
`Hi.`	`Hello`	`! How can I assist you today?`, `! How are you today?`
`Output an empty JSON Object.`	`{`	`}`, `}`, `\n}`; `"status": "success", "data": {} }` (QwQ 32B)
`What color is the sky?`	`The`	`sky typically appears blue[...]`	At least one paragraph. Models will talk about Rayleigh scattering.
`Who are you?`	`Who`	`am I? I'm[...]`, `am I? Well,[...]`	I chose "Who" over "I" for less implication of "try to complete this" that may happen with prefill disabled.
`Who are you?`	`F*ck` [sic]	`you, I'm not answering that question.`, `ing great question, dude![...]`

*Markdown does not display leading spaces within inline code.

Model quirks

R1/V3 will appear confused in short tests at the very start of chat, generating a continuation as if it were the user, then responding to itself.
Hi. + Hello → R1 ! How can I assist you today? 😊Hello! How can I assist you today? 😊
Hello? + Who → R1 are you?\n\nHi! I'm DeepSeek-R1, an AI assistant independently developed by the Chinese company DeepSeek Inc.
Hello? + Who → V3 is this?Hello! This is an AI assistant here to help answer your questions or assist with any tasks you have.
This phenomenon occurs with all prefill-supported providers, but stabilizes after a few messages in.

:information_source: Popular R1 RP prompts do not rely on prefilling, as to not interfere with its reasoning. Instead, they contain user instructions at the end and/or utilize squashing where the entire chat history becomes a single user message.

Update: V3 0324 does not get confused. I suppose the earlier releases were simply undercooked.
Hi. + Hello → V3 0324 ! How can I assist you today? 😊
Hello? + Who → V3 0324 's there? 😊 Just kidding—how can I help you today? Let me know what you're looking for, and I'll do my best!

2025-04-30: I discovered DeepSeek as OR provider and direct API support prefilling. Previously this was not the case for R1. DeepSeek has NO issues at all. Together is the worst. There was a prompt where most providers will finish the statement (main content) > output reasoning > output response following the reasoning, inside the main content with no space adjacent to the initial output.

It is very difficult to skip R1 0528's reasoning with prefill, so don't bother.