Experimenting with Gemini 2.5 Pro Thinking and system tokens

Recently, someone claimed to leak Gemini's system tokens used to delimit the reasoning chain as <ctrl94> and <ctrl95>, and another person claimed the stop (or end of message) token to be <ctrl97>. My understanding was that API filters those, and the model won't able to repeat system tokens when prompted for this string, only the string itself tokenized in some other way (because system tokens and sequences of normal tokens represent entirely different concepts for it).

Gemini has its reasoning chain wrapped in system tokens that are filtered by the API, as well as any content between them, so you can't see the reasoning process unless you jailbreak it and swap system tokens to something else. The tokenizer for this model isn't publicly available. However, API reports the amount of prompt tokens as promptTokenCount, and the amount of hidden reasoning tokens in the reply as thoughtsTokenCount, which allows one to infer some processes.

I experimented a bit and my findings are that:

  • All <ctrlXX> tokens are represented as single tokens in the tokenizer, which is obvious if you put them into input and look at promptTokenCount in the API reply. This alone suggests some special usage.
  • API filters them out from the output, but not the input, i.e. you can pass them to the model, but if the model prints them the result will be blank.
  • The model doesn't give them any special treatment. It sees them as ordinary text, it's able to repeat them part by part or as a whole, or reason about them.
  • The model's reasoning chain doesn't appear to be controlled by them, which is obvious if you look at thoughtsTokenCount.
  • If you prefill any of these, nothing extraordinary happens. The model just treats then like any normal XML tag opening, but still does the hidden reasoning.
  • If you force it to start its reasoning with <ctrl95> by prompting it, it will display its thinking on the first mention of that token! So this token is clearly not the ending one for the model. But there would be no separators between the reasoning chain and the actual reply, as the API would eat them up. thoughtsTokenCount only counts the hidden part in this case (before ctrl95>).
  • Sometimes API doesn't filter these tokens, see the pic below. One possible explanation is that they can also be represented as a sequence of other tokens (e.g. < c t r l 9 7 > or something like that) and sometimes the model does exactly that. This suggests that the model isn't trained to work with these and has no idea about these.

My theory is that Gemini is trained to output its own system tokens for the reasoning chain, but after that some internal process replaces them with <ctrl94> and <ctrl95>. This is only done in the output. The model never actually sees them, never outputs them by itself, and generally has no idea they exist, they're only here for the API to filter everything between them. I don't know why they added an extra replacement step, but it does appear to work like this.

Model printing <ctrl94> as a single token in the output makes the API hide everything after it, <ctrl95> reveals everything back. <ctrl97> is weird: if the streaming is on, it stops the generation. But if the streaming is off, API outputs an internal server error (500) instead of replying. This might or might not hint at the usage aimed at the AI Studio (IIRC streaming is always on there).

I'm sure there's more of those tokens, with some internal purpose related to the API but not the model itself. At the very least, if you manage to make the model output <ctrl95> at the start of the thinking process reliably, you can have a look at it (although it's easier to prefill it with <think> or something like that, see the /gemini_filters rentry).

e-mail me if you have anything to add.

Edit Report
Pub: 13 May 2025 17:14 UTC
Edit: 14 May 2025 08:46 UTC
Views: 329