R1 JUST WANTS TO <THINK>

So you've booted up DeepSeek's sexy new mammoth model, you hit Send and it comes back talking to itself. For paragraphs, sometimes not even responding to you at all. What gives? You thought this model was supposed to be super smart and controllable.

And it is that! But it's also Weird.
This "self-talk" process is intentional; but you're not supposed to see it unless you ask. Let me explain, and then I'll tell you what to do about it.

Reasoning

R1 is what's known as a "reasoning model", and basically what this means is that takes some time to "think" before outputting its response. If you use R1 through DeepSeek's web app, you'll notice that when you hit "send" on your prompt, this window pops up, and it does the same thing here -- just talks to itself for sometimes a couple of minutes.

This reasoning process is essential to R1's smarts. It takes in the prompt, details of the context, and writes itself small examples before evaluating their fitness for the task at hand.

The problem comes when serving R1 over an API, which is what you connect JanitorAI to (or whatever other frontend you're using if this Rentry breaks containment) in order to play with it.

Wild West

Internally, the reasoning process happens between two tokens, <think> and </think>. This way, the model knows when to stop thinking and start actually performing the task. DeepSeek's official API endpoint doesn't send these "reasoning tokens" by default, unless you send a parameter that asks for them. (If you're familiar enough with HTTP to understand this sentence: you send a header in the request, and the reasoning tokens come back as a second parameter in the response, outside of the message itself.)

The reasoning tokens are still considered in the output, but never displayed, because by default you don't need to see or mess with them. This is, in my opinion, the correct way to do it. When done this way, R1 can be used in any frontend that supports OpenAI-style Chat Completions, without any further modification. And if a frontend wants to add support for displaying the reasoning tokens, it can send them if asked.

The problem is that we don't always use DeepSeek's official API! It's expensive, and right now you can't buy credit for it, because of the overwhelming amount of use it's been getting. So instead, we use third party endpoints, usually with a service like OpenRouter.

Now, OpenRouter does support the "request model reasoning" parameter. But the providers that OpenRouter partners with don't always support it.
If a provider doesn't have proper support for reasoning, this usually presents as the reasoning process being included with the response, as a bunch of nonsense that the user doesn't want to put up with.

That can be okay. Some frontends can parse the reasoning tokens, and put the reasoning process into a collapsible window like on the DeepSeek app. But as of this writing (February 18th, 2025) JanitorAI does not do this. So whatever the OpenRouter provider sends is what you get.

It's kind of the Wild West for reasoning models right now; it's being constantly iterated on, and I'd imagine things will improve with time.

What Do I Do About It?

The easiest thing to do is block OpenRouter providers which have this behavior. To do this, log into OpenRouter's website, go to Settings, and then pick the providers to ignore under "Ignore Providers". Once you're done, click "Save".

It's important to remember that this does not disable reasoning, and *won't make R1 "worse".* It'll just stop the reasoning tokens from crowding up your chat.

On OpenRouter's free endpoint, the only provider with this problem is Targon. (But you should probably block Azure too, because their output tends to be weirdly low-quality, and they apply other filtering to your inputs for API-level refusals. Classic Microsoft.)

I went and tested the providers on the paid endpoint too, both with the full R1 model and its distills.
Only Targon and Cloudflare had this problem on both R1 and whatever distills they host.
For the distills, Novita had this issue, but for full R1, it did not.

All the other providers in my testing: Together, DeepInfra, Fireworks, Nebius, Avian, Kluster, Featherless, worked as intended.

The reason that it seems like the responses from these properly-functional providers are slow to start writing, is because it's generating these reasoning tokens behind the scenes. The time it would have taken the model to start responding if you didn't request the reasoning tokens, is the same as the time it would have taken to start actually writing the response otherwise.

Thanks to this post on the JanitorAI_Official subreddit for pushing me to write this. A common misreading of that post is that doing this will make R1 not do its reasoning process. It won't, it'll just make it play nicer with JanitorAI.

That's all! If I become aware of any other common misconceptions, I'll either put them here or make a new rentry. Have a nice day.

~ Bot

Edit Report
Pub: 18 Feb 2025 10:07 UTC
Edit: 20 Feb 2025 14:45 UTC
Views: 1118