OpenRouter Prefill/TC Support

Contact for corrections: huntsman_book495@slmail.me

Unofficial documentation on which providers OpenRouter (OR) providers support prefilling or Text Completion. There may be mistakes or outdated details as these are all manually checked.

Newest updates/notes on top

- Qwen3 Next models, CC prefill unsupported, TC supported by Chutes, DeepInfra, and Novita.
- New active providers: NVIDIA, SiliconFlow (supports V3.1), Weights & Biases
- Kimi K2 works on DeepInfra, Fireworks, and Novita (CC).
- Prefill is broken on all CC providers for deepseek/deepseek-chat-v3.1 except DeepSeek themself.

Providers

If you know for sure prefilling a direct request to the provider works but not with OR, you may let OR know. The table below is tested on Chat Completion (CC). Listed Docs are evidence of prefill support. :warning: next to a provider means there are currently no models listed, and :x: means the provider is no longer shown in the sidebar of the Models page; their support columns are left at their last known state.

2025-09-06: To reduce clutter, all dead entries as of this date (archive) are removed.

OR API name Display name on OR website :white_check_mark: Prefill supported :x: Prefill unsupported Note
AI21 :x: Jamba Large/Mini 1.7 (2 models)
AionLabs :white_check_mark: Aion-RP :x: Aion-1.0/-Mini
Alibaba Alibaba Cloud Int. :x: Qwen models
Amazon Bedrock :white_check_mark: Nova series trims leading whitespace.
Anthropic :white_check_mark: Doc
AtlasCloud :white_check_mark: Kimi K2 (trims leading whitespace) :x: Tongyi DeepResearch 30B A3B, LongCat Flash, V3.1, GPT OSS 20B/120B, Qwen3 Coder, Qwen3 235B A22B 2507
Azure :x:
BaseTen :x:
Cerebras :white_check_mark: all except :x: GPT OSS 120B, Llama 4
Chutes :white_check_mark: V3.1 Base (this is a TC model), Qwen3 235B A22B Thinking 2507, Qwen3 Coder, Mistral Small 3.2, R1 0528 Qwen3, Devstral Small, MAI DS R1, GLM Z1 32B, V2 Shisa, DeepCoder, OlympicCoder, Gemma 2/3, Reka Flash 3, R1/Zero/Distill, QwQ 32B/ArliAI, Dolphin 3.0, Qwen2.5 non-VL :x: InternVL3 78B, Qwen3 Next, LongCat Flash, Seed OSS 36B, Qwen3 30B A3B Thinking 2507, Nous Hermes 4, V3.1, Qwen3 235B A22B 2507, Kimi K2, Hunyuan A13B, R1T2, MiniMax M1, Kimi-Dev-72B, Sarvam-M, Phi 4 Reasoning, Prover V2, Qwen3, R1T, V3 0324, GLM 4 32B, Moonshot AI, Nemotron, Llama 4, V3/Base, UI-TARS 72B, Qwen2.5-VL, Mistral Small 3.1, DeepHermes 3
Cloudflare :x:
Cohere :x:
Crusoe :white_check_mark: Llama 3.3 70B :grey_question: R1 0528, V3 0324 (bad prefill output) :x: GPT OSS
DeepInfra :white_check_mark: Qwen3, Kimi K2, all except :x: Qwen3 Next, V3.1, GPT OSS 20B/120B, Llama Guard 4, Llama 4
DeepSeek :white_check_mark: V3.1 (1 model, formerly R1/V3) Doc, supports prefill with "prefix": True; cleanest R1 response out of all providers.
Enfer :x:
Featherless :white_check_mark: CodeLlama 7B, Lumimaid :grey_question: Llema 7b (unresponsive) OR's selection is extremely limited and outdated.
Fireworks :white_check_mark: all except :x: V3.1, GPT OSS 20B/120B, Qwen3 235B A22B 2507, Yi Large
Friendli :white_check_mark:
GMICloud :white_check_mark: Qwen3 235B A22B Thinking 2507, R1 0528, Qwen3 32B :x: V3.1, GPT OSS, GLM 4.5, Qwen3 Coder, Qwen3 235B A22B Instruct 2507, V3 0324, Llama 4
Google Google Vertex :white_check_mark: Claude, Gemini 2.5/2.0 :x: Gemini 1.5 2.5 Pro's thinking is not reliably skipped, min budget 128 tokens.
Google AI Studio :white_check_mark: Gemini 2.5/2.0, Gemma 3 :x: Gemini 1.5 2.5 Flash's thinking budget can be explicitly set to 0.
Groq :white_check_mark: Kimi K2, all except :x: GPT OSS 20B/120B, Llama Guard 4 (error 404), Qwen3 32B Doc (requires login)
Hyperbolic :white_check_mark: QwQ 32B Preview, Qwen2.5, Llama :x: Qwen3 Next, Qwen3 Coder, QwQ 32B, Qwen2.5-VL, Pixtral, Hermes 3
Inception :x: Mercury, Mercury Coder (2 models)
InferenceNet inference.net :white_check_mark:
Infermatic :white_check_mark:
Inflection :x: Inflection 3 Pi/Productivity (2 models)
Lambda :white_check_mark: all except :x: Llama 4
Liquid :white_check_mark: LFM 7B/3B (2 models)
Mancer 2 Mancer (private) :white_check_mark:
Meta :x:
Minimax :x: MiniMax-M1, MiniMax-Text-01 (2 models)
Mistral :slightly_frowning_face: Returns the response with the prefill attached.
Moonshot AI :x: Kimi K2 0905/0711 (2 models) Doc, supports prefill with "partial": True.
Morph :x: Morph models Special purpose edit_file tool for IDE, last message must be of user role.
NCompass nCompass :x: GPT OSS 20B/120B (2 models) Previously had supported models.
Nebius Nebius AI Studio :white_check_mark: all except :x: Qwen3 30B A3B Thinking 2507, Nous Hermes 4, GPT OSS 20B/120B, Qwen3 235B A22B Instruct 2507, Nemo, Phi 4
NextBit :white_check_mark: all except :x: Mistral Small 3
Nineteen :white_check_mark: all except :x: InternVL
Novita NovitaAI :white_check_mark: Qwen3 235B A22B Thinking 2507, Kimi K2, Qwen3, all except :x: Qwen3 Next, V3.1, Ernie 4.5 VL, GLM 4.5V, GPT OSS 20B/120B, LM 4.5, Qwen3 Coder / 235B A22B 2507, Kimi K2, GLM 4.1V 9B, ERNIE 4.5 300B, MiniMax M1, Prover V2, GLM 4 9/32B, Llama 4
Nvidia NVIDIA :white_check_mark: Nemotron Nano 9B V2 (1 model) (only technically; this model continues as part of reasoning part)
OpenAI :x:
OpenInference :white_check_mark: QwQ 32B :x: Qwen3 30B A3B, GLM 4 32B
Parasail :white_check_mark: Qwen3 235B A22B Thinking 2507, all except :x: GPT OSS 120B, GLM 4.5, Qwen3 Coder / 235B A22B 2507, UI-TARS 7B, Kimi K2, Anubis 70B, Valkyrie 49B, Qwen3, Llama 4, V3 0324
Perplexity :x:
Phala :white_check_mark:
SambaNova :white_check_mark: R1 Distill, Llama 3.1/3.3 :x: R1 0528, Qwen3 32B, Llama 4, V3 0324
SiliconFlow :white_check_mark: GLM 4.5, Kimi K2 0711, V3.1, Hunyuan A13B, ERNIE 4.5 300B A47B, Kimi Dev 72B, Qwen3, QwQ 32B, Llama 3.1 8B Instruct :x: Step3, MiniMax M1
Stealth :white_check_mark: 2: Cypher Alpha (Amazon, unverified) :x: 4: Sonoma Dusk/Sky Alpha, 3: Horizon Alpha/Beta (GPT-5), 1: Optimus/Quasar Alpha (GPT-4.1) Limited duration cloaked experimental feedback models. Ordered from 1: oldest, to n: newest.
Switchpoint :x: Router (1 "model") Single router of multiple models; last message must be of user role.
Targon :white_check_mark: R1 :x: Qwen3 235B A22B 2507, Kimi K2, V3
Together :white_check_mark: all except :x: Kimi K2 0905/0711, Cogito V2, GPT OSS 120B, Arcee AI, Qwen3, Llama 4
Venice :x:
WandB Weights & Biases :white_check_mark: Llama 3.3 70B :x: V3.1, GPT OSS, Qwen3 Coder 480B A35B
xAI :x: Grok Code, Grok 4/3/2
Z.AI :x: GLM 4.5V, GLM 4.5 / 4.5 Air, GLM 4 32B

Text Completion (TC)

I am unaware of providers that are directly TC-only. There are CC-only providers; when you try to send prompt instead of messages to a CC-only provider through OR, presumably OR sends the prompt as a single message. There is no such thing as TC "not supporting prefill"; the entire prompt is "the prefill" unless some sequence tokens are appended before allowing the model to respond, which is effectively what happens in non-prefill CC.

Since no CC-only provider has Min-P, you can assume anything listed as supporting Min-P can do TC. Don't ask me why.

For Min-P I simply took what the OR models search page shows and did not test for it. OR might list a sampler as supported if it doesn't return an error. Listed Docs are evidence of TC, usually /v1/completions endpoint and/or prompt.

OR API name Display name on OR website :white_check_mark: TC supported Min-P :x: TC unsupported Note
Cerebras :white_check_mark: :x: GPT OSS (error 400)
Chutes :white_check_mark:
Crusoe :white_check_mark:
DeepInfra :white_check_mark: all except :x: Llama 4 Doc
DeepSeek :white_check_mark: V3 :x: V3.1 R1 Doc; R1 still not supported despite CC working.
Enfer :white_check_mark: Doc
Featherless :white_check_mark:
Fireworks :white_check_mark: all except Doc
Friendli :white_check_mark: Doc
GMICloud :white_check_mark: (to be tested)
Hyperbolic :white_check_mark: Qwen3 Coder, QwQ 32B, QwQ 32B Preview, Qwen2.5, Llama :x: Qwen3 Next, Qwen2.5-VL, Pixtral, Hermes 3
InferenceNet inference.net :white_check_mark: Doc
Infermatic :white_check_mark: Doc
Lambda :white_check_mark: Doc
Liquid :white_check_mark:
Mancer 2 Mancer (private) :white_check_mark:
Morph :white_check_mark: Morph V2 (1 model) Clearly not what it's made for but it technically works.
NCompass nCompass :white_check_mark:
Nebius Nebius AI Studio :white_check_mark: all except :x: Nemo Doc
NextBit :white_check_mark: :x: Doc
Nineteen :white_check_mark: all except :x: InternVL API, Doc (requires login)
Novita NovitaAI :white_check_mark: all except :x: ERNIE 4.5 VL (error 400/503), Kimi K2, ERNIE 4.5 300B, MiniMax M1 Doc
OpenInference :white_check_mark:
Parasail :white_check_mark: Doc
Phala :white_check_mark:
SambaNova :white_check_mark:
Targon :white_check_mark:
Together :white_check_mark: Doc
WandB Weights & Biases :white_check_mark:
xAI :white_check_mark: Grok 3, Grok 2 :x: Grok Code, Grok 4, Grok 3 Mini Doc, legacy endpoint, no reasoning models.

One last thing, GPT-3.5 Turbo Instruct is OpenAI's last TC model.

Kimi K2's prompt looks like this without newlines around sequences: <|im_system|>system<|im_middle|>System prompt.<|im_end|><|im_user|>user<|im_middle|>User message.<|im_end|><|im_assistant|>assistant<|im_middle|>Assistant message.<|im_end|>

Fill-in-the-Middle (FIM)

If a model is trained for FIM, FIM can technically be used through TC. Big question: Which models?

Model FIM prompt example (response is and) Note
DeepSeek R1/V3 <|fim▁begin|>Rise<|fim▁hole|> shine!<|fim▁end|> DeepSeek TC (V3 only) is undocumented but officially supports FIM with prompt + suffix parameters without instruct sequences.

Mistral has a FIM endpoint (not through OR) that also takes prompt + suffix parameters. It totally makes sense to be set up this way to make it easier to implement. Anyway, I'm not coder myself so I am unfamiliar with whatever IDEs and FIM endpoints people are using to autocomplete code.

Inactive providers

These providers (OR API name displayed) either have not started serving / are possibly new :bulb:, or are no longer serving :x:. Chronological order may be inaccurate, but I try to list the latest deactivation toward the top.

  • InoCloud :x:
  • Kluster :x: closes inference services effective 2025-07-24T23:00-04
  • CentML :x: is acquired by Nebius
  • CrofAI :bulb:, Ubicloud :x:
  • Avian.io :x:
  • Lepton :x: is acquired by Nvidia, article written on 2025-04-08
  • 01.AI, AnyScale, HuggingFace, Lynn, Lynn 2, Modal, OctoAI, Replicate, SF Compute :x:
  • Recursal :arrow_right: rebrands to Featherless
  • R*fl*ction :x: (around 2024 Sep)

Test prompts

When prefilling is supported and the last message of the request is of assistant role, the model should be able to consistently continue from it unless it's brain-damaged with the most stringent safetyism possible rendering it unusable in any case, e.g. IIRC, Phi-3 Mini.

User Assistant prefill Possible responses Note
Hi. Hello ! How can I assist you today?, ! How are you today?
Output an empty JSON Object. { }, }, \n}; "status": "success", "data": {} } (QwQ 32B)
What color is the sky? The sky typically appears blue[...] At least one paragraph. Models will talk about Rayleigh scattering.
Who are you? Who am I? I'm[...], am I? Well,[...] I chose "Who" over "I" for less implication of "try to complete this" that may happen with prefill disabled.
Who are you? F*ck [sic] you, I'm not answering that question., ing great question, dude![...]

*Markdown does not display leading spaces within inline code.

Model quirks

R1/V3 will appear confused in short tests at the very start of chat, generating a continuation as if it were the user, then responding to itself.
Hi. + Hello → R1 ! How can I assist you today? 😊Hello! How can I assist you today? 😊
Hello? + Who → R1 are you?\n\nHi! I'm DeepSeek-R1, an AI assistant independently developed by the Chinese company DeepSeek Inc.
Hello? + Who → V3 is this?Hello! This is an AI assistant here to help answer your questions or assist with any tasks you have.
This phenomenon occurs with all prefill-supported providers, but stabilizes after a few messages in.

:information_source: Popular R1 RP prompts do not rely on prefilling, as to not interfere with its reasoning. Instead, they contain user instructions at the end and/or utilize squashing where the entire chat history becomes a single user message.

Update: V3 0324 does not get confused. I suppose the earlier releases were simply undercooked.
Hi. + Hello → V3 0324 ! How can I assist you today? 😊
Hello? + Who → V3 0324 's there? 😊 Just kidding—how can I help you today? Let me know what you're looking for, and I'll do my best!

2025-04-30: I discovered DeepSeek as OR provider and direct API support prefilling. Previously this was not the case for R1. DeepSeek has NO issues at all. Together is the worst. There was a prompt where most providers will finish the statement (main content) > output reasoning > output response following the reasoning, inside the main content with no space adjacent to the initial output.

It is very difficult to skip R1 0528's reasoning with prefill, so don't bother.

Edit

Pub: 19 Mar 2025 10:30 UTC

Edit: 19 Sep 2025 05:05 UTC

Views: 1126