Reasoning models
Reasoning-capable models (DeepSeek R1, Kimi K2.6, OpenAI o-series, Claude Opus 4.8 with adaptive thinking, Gemini 3 Pro with thought summaries) return their chain-of-thought separately from the final answer. We normalize every provider's convention into a single field — reasoning_content on the assistant message — so your code stays portable across models.
Response shape
// non-streaming { "choices": [{ "message": { "role": "assistant", "content": "The answer is 42.", "reasoning_content": "Let me think step by step..." } }] } // streaming — reasoning and content flow as separate deltas { "choices": [{ "delta": { "reasoning_content": "First, " } }] } { "choices": [{ "delta": { "reasoning_content": "consider..." } }] } { "choices": [{ "delta": { "content": "The answer" } }] } { "choices": [{ "delta": { "content": " is 42." } }] }
Render the reasoning trace in a collapsible block (the playground ships a reference UI), or drop it entirely if you don't want it surfaced. Non-reasoning models simply omit the field.
Controlling reasoning effort
Thinking-capable models accept a single reasoning_effort parameter that controls how deeply the model reasons — and, on models that support it, how many tokens it spends overall. One knob, same field, every provider:
"none"— skip the thinking pass entirely; fastest and cheapest."low"/"medium"/"high"— progressively deeper reasoning."high"is the default when you omit the field."xhigh"/"max"— extended depth for long-horizon agentic and coding work, on the models that support them (Claude Opus). On models that don't, they clamp down to"high"— so the same request stays portable.
Higher effort produces more carefully-reasoned output at the cost of more tokens and latency. On the latest Claude models reasoning depth is calibrated adaptively per request — reasoning_effort sets the ceiling, and the model thinks only as much as the task needs.
{ "model": "anthropic/claude-opus-4.8",
"messages": [{ "role": "user", "content": "explain RSA in 3 steps" }],
"reasoning_effort": "medium" }Billing
Reasoning tokens are billed at the same completion_tokens rate as the final answer — they show up in usage.completion_tokens_detailsso you can separate them in your accounting. DeepSeek and Kimi are materially cheaper for long-reasoning workloads.
Don't feed reasoning back in
Reasoning traces are not supposed to be turn-2 context. For multi-turn conversations, only pass back message.content — dropping reasoning_content keeps your context cleaner and your prompt costs lower. OpenAI specifically voids model safety guarantees if you feed o-series its own reasoning back.