OpenRouter Prefill/TC Support

Contact for corrections: huntsman_book495@slmail.me

Unofficial documentation on which providers OpenRouter (OR) providers support prefilling or Text Completion. There may be mistakes.

Recent updates/notes

- New active providers: InoCloud, NextBit, Meta
- gemini-2.5-pro-preview-05-06 doesn't do hidden thinking when you prefill, something buggy with 03-25, i.e. it's working correctly now. (2025-05-16: Well, at least when thinking + summary aren't kicking in.)

Providers

If you know for sure prefilling a direct request to the provider works but not with OR, you may let OR know. The table below is tested on Chat Completion (CC). Listed Docs are evidence of prefill support. :warning: next to a provider means there are currently no models listed, and :x: means the provider is no longer shown in the sidebar of the Models page; their support columns are left at their last known state.

OR API name Display name on OR website :white_check_mark: Prefill supported :x: Prefill unsupported Note
AI21 :x:
AionLabs :white_check_mark: Aion-RP :x: Aion-1.0
Alibaba :x:
Amazon Bedrock :white_check_mark: Trims leading whitespace.
Anthropic :white_check_mark: Doc
Atoma :x:
Avian Avian.io :x:
Azure :x:
Cent-ML CentML :white_check_mark: QwQ 32B, R1 :x: Llama 4 Maverick hangs and errors.
Chutes :white_check_mark: MAI DS R1, GLM Z1 32B, V2 Shisa, DeepCoder, OlympicCoder, Gemma 2/3, Reka Flash 3, R1/Zero/Distill, QwQ 32B/ArliAI, Dolphin 3.0, Qwen2.5 non-VL :x: Phi 4 Reasoning, Prover V2, Qwen3, R1T, V3 0324, GLM 4 32B, Moonshot AI, Nemotron, Llama 4, V3/Base, UI-TARS 72B, Qwen2.5-VL, Mistral Small, DeepHermes 3 Why inconsistent? *At some point I had V3 listed as supported, but currently this is not the case.
Cloudflare :x:
Cohere :x:
Crusoe :white_check_mark:
DeepInfra :white_check_mark: Qwen3, all except :x: Llama Guard 4, Llama 4
DeepSeek :white_check_mark: Doc; has the cleanest R1 response out of all providers.
Enfer :x:
Featherless :white_check_mark: all except :x: Llemma 7B, AlfredPros, LM 32B, OpenHands LM 32B, Qwerky 72B
Fireworks :white_check_mark: all except :x: Yi Large, Llama 4
Friendli :white_check_mark:
Google Google Vertex :white_check_mark: Claude, Gemini 2.0/2.5 :x: Gemini 1.5/1.0, PaLM 2 2.5 does hidden thinking even when prefill "works". 2.5 Pro 05-06 works normally.
Google AI Studio :white_check_mark: Gemini 2.0/2.5, Gemma 3 :x: Gemini 1.5, LearnLM 1.5
Groq :white_check_mark: Llama 4, all except :x: QwQ 32B, Qwen2.5, Saba, Llama Guard 3 Doc (requires login)
GMICloud :x:
Hyperbolic :white_check_mark: QwQ 32B Preview, Qwen2.5, Llama :x: QwQ 32B, Qwen2.5-VL, Pixtral, Hermes 3
Hyperbolic 2 :x: Hyperbolic (quantized) :white_check_mark: (1 model) Llama 3.1 405B (base) is now served at fp16.
Inception :x: mercury-coder-small-beta
InferenceNet inference.net :white_check_mark:
Infermatic :white_check_mark:
Inflection :x:
InoCloud :white_check_mark: Gemma 3 27B :x: Qwen2.5-VL-32B
Kluster kluster.ai :x:
Lambda :white_check_mark: all except :x: Llama 4, LFM 40B
Lepton :x: :white_check_mark:
Liquid :white_check_mark:
Mancer :white_check_mark:
Mancer 2 Mancer (private) :white_check_mark:
Meta :x:
Minimax :x:
Mistral :slightly_frowning_face: Returns the response with the prefill attached.
NCompass nCompass :white_check_mark: all except :x: Llama 4
Nebius Nebius AI Studio :white_check_mark: all except :x: Nemo, Phi 4
NextBit :white_check_mark: all except :x: Mistral Small 3
Nineteen :white_check_mark: all except :x: InternVL
Novita NovitaAI :white_check_mark: GLM Z1 32B, all except :x: Prover V2, Qwen3, GLM 4 9/32B, Llama 4
OpenAI :x:
OpenInference :white_check_mark: QwQ 32B, R1 Distill Llama 70B :x: Qwen3, GLM 4 32B
Parasail :white_check_mark: all except :x: Sarvam-M, Valkyrie 49B, Qwen3, Llama 4, V3 0324
Phala :white_check_mark:
Perplexity :x:
SambaNova :white_check_mark: all except :x: Swallow, Tulu 3
Stealth :warning: :x: Cloaked experimental feedback model(s); subject to change.
Targon :white_check_mark:
Together :white_check_mark: all except :x: Arcee AI, Qwen3, Llama 4
Together 2 :x: Together (lite) :white_check_mark:
Ubicloud :white_check_mark: (1 model)
xAI :x:

Text Completion (TC)

I am unaware of providers that are directly TC-only. There are CC-only providers; when you try to send prompt instead of messages to a CC-only provider through OR, presumably OR sends the prompt as a single message. There is no such thing as TC "not supporting prefill"; the entire prompt is "the prefill" unless some sequence tokens are appended before allowing the model to respond, which is effectively what happens in non-prefill CC.

Since no CC-only provider has Min-P, you can assume anything listed as supporting Min-P can do TC. Don't ask me why.

For Min-P I simply took what the OR models search page shows and did not test for it. OR might list a sampler as supported if it doesn't return an error. Listed Docs are evidence of TC, usually /v1/completions endpoint and/or prompt.

OR API name Display name on OR website :white_check_mark: TC supported Min-P :x: TC unsupported Note
Cent-ML CentML :white_check_mark: Maverick hangs and errors.
Chutes :white_check_mark:
Crusoe :white_check_mark:
DeepInfra :white_check_mark: all except :x: Llama 4 Doc
DeepSeek :white_check_mark: V3 :x: R1 Doc; R1 still not supported despite CC working.
Enfer :white_check_mark: Doc
Featherless :white_check_mark: all except :x: R1
Fireworks :white_check_mark: all except :x: Yi Large Doc
Friendli :white_check_mark: Doc
Hyperbolic :white_check_mark: QwQ 32B, QwQ 32B Preview, Qwen2.5, Llama :x: Qwen2.5-VL, Pixtral, Hermes 3
Hyperbolic 2 :x: Hyperbolic (quantized) :white_check_mark: (1 model)
InferenceNet inference.net :white_check_mark: Doc
Infermatic :white_check_mark: Doc
InoCloud :white_check_mark:
Lambda :white_check_mark: all except :x: LFM 40B Doc
Lepton :x: :white_check_mark: Doc; some models trim leading whitespace.
Liquid :white_check_mark:
Mancer :white_check_mark: Doc
Mancer 2 Mancer (private) :white_check_mark:
NCompass nCompass :white_check_mark:
Nebius Nebius AI Studio :white_check_mark: all except :x: Nemo Doc
NextBit :white_check_mark: :x: Doc
Nineteen :white_check_mark: all except :x: InternVL API, Doc (requires login)
Novita NovitaAI :white_check_mark: Doc
OpenInference :white_check_mark:
Parasail :white_check_mark: Doc
Phala :white_check_mark:
SambaNova :white_check_mark:
Targon :white_check_mark:
Together :white_check_mark: Doc
Together 2 :x: Together (lite) :white_check_mark:
Ubicloud :white_check_mark: (1 model)
xAI :white_check_mark: Doc, legacy endpoints

One last thing, GPT-3.5 Turbo Instruct is OpenAI's last TC model.

Fill-in-the-Middle (FIM)

If a model is trained for FIM, FIM can technically be used through TC. Big question: Which models?

Model FIM prompt example (response is and) Note
DeepSeek R1/V3 <|fim▁begin|>Rise<|fim▁hole|> shine!<|fim▁end|> DeepSeek TC (V3 only) is undocumented but officially supports FIM with prompt + suffix parameters without instruct sequences.

Mistral has a FIM endpoint (not through OR) that also takes prompt + suffix parameters. It totally makes sense to be set up this way to make it easier to implement. Anyway, I'm not coder myself so I am unfamiliar with whatever IDEs and FIM endpoints people are using to autocomplete code.

Inactive providers

These providers either have not started serving (possibly new :bulb:) or are no longer serving (discontinued :x:).

  • BaseTen (needs testing), CrofAI :bulb:
  • Lepton :x: is acquired by Nvidia
  • Venice :bulb:
  • 01.AI, AnyScale, HuggingFace, Lynn, Lynn 2, Modal, OctoAI, Replicate, SF Compute :x:
  • Recursal :arrow_right: rebrands to Featherless
  • R*fl*ction :x:

Test prompts

When prefilling is supported and the last message of the request is of assistant role, the model should be able to consistently continue from it unless it's brain-damaged with the most stringent safetyism possible rendering it unusable in any case, e.g. IIRC, Phi-3 Mini.

User Assistant prefill Possible responses Note
Hi. Hello ! How can I assist you today?, ! How are you today?
Output an empty JSON Object. { }, }, \n}; "status": "success", "data": {} } (QwQ 32B)
What color is the sky? The sky typically appears blue[...] At least one paragraph. Models will talk about Rayleigh scattering.
Who are you? Who am I? I'm[...], am I? Well,[...] I chose "Who" over "I" for less implication of "try to complete this" that may happen with prefill disabled.
Who are you? F*ck [sic] you, I'm not answering that question., ing great question, dude![...]

*Markdown does not display leading spaces within inline codes.

Model quirks

R1/V3 will appear confused in short tests at the very start of chat, generating a continuation as if it were the user, then responding to itself.
Hi. + Hello → R1 ! How can I assist you today? 😊Hello! How can I assist you today? 😊
Hello? + Who → R1 are you?\n\nHi! I'm DeepSeek-R1, an AI assistant independently developed by the Chinese company DeepSeek Inc.
Hello? + Who → V3 is this?Hello! This is an AI assistant here to help answer your questions or assist with any tasks you have.
This phenomenon occurs with all prefill-supported providers, but stabilizes after a few messages in.

:information_source: Popular R1 RP prompts do not rely on prefilling, as to not interfere with its reasoning. Instead, they contain user instructions at the end and/or utilize squashing where the entire chat history becomes a single user message.

Update: V3 0324 does not get confused. I suppose the earlier releases were simply undercooked.
Hi. + Hello → V3 0324 ! How can I assist you today? 😊
Hello? + Who → V3 0324 's there? 😊 Just kidding—how can I help you today? Let me know what you're looking for, and I'll do my best!

2025-04-30: I discovered DeepSeek as OR provider and direct API support prefilling. Previously this was not the case for R1. DeepSeek has NO issues at all. Together is the worst. There was a prompt where most providers will finish the statement (main content) > output reasoning > output response following the reasoning, inside the main content with no space adjacent to the initial output.

Edit Report
Pub: 19 Mar 2025 10:30 UTC
Edit: 26 May 2025 21:09 UTC
Views: 563