View as /docs.md
Inference

Reasoning models

Reasoning-capable models (DeepSeek R1, Kimi K2.6, OpenAI o-series, Claude Opus 4.7 with extended thinking, Gemini 2.5 Pro with thought summaries) return their chain-of-thought separately from the final answer. We normalize every provider's convention into a single field — reasoning_content on the assistant message — so your code stays portable across models.

Response shape

// non-streaming
{ "choices": [{
    "message": {
      "role": "assistant",
      "content": "The answer is 42.",
      "reasoning_content": "Let me think step by step..."
    }
  }] }

// streaming — reasoning and content flow as separate deltas
{ "choices": [{ "delta": { "reasoning_content": "First, " } }] }
{ "choices": [{ "delta": { "reasoning_content": "consider..." } }] }
{ "choices": [{ "delta": { "content": "The answer" } }] }
{ "choices": [{ "delta": { "content": " is 42." } }] }

Render the reasoning trace in a collapsible block (the playground ships a reference UI), or drop it entirely if you don't want it surfaced. Non-reasoning models simply omit the field.

Controlling reasoning effort

OpenAI o-series and Claude extended thinking accept a reasoning_effort parameter: "low", "medium", "high". Higher effort burns more tokens and latency, but produces more carefully-reasoned output. Claude additionally accepts thinking.budget_tokens; we map both onto the same parameter.

{ "model": "openai/o4-mini",
  "messages": [{ "role": "user", "content": "explain RSA in 3 steps" }],
  "reasoning_effort": "medium" }

Billing

Reasoning tokens are billed at the same completion_tokens rate as the final answer — they show up in usage.completion_tokens_detailsso you can separate them in your accounting. DeepSeek and Kimi are materially cheaper for long-reasoning workloads.

Don't feed reasoning back in

Reasoning traces are not supposed to be turn-2 context. For multi-turn conversations, only pass back message.content — dropping reasoning_content keeps your context cleaner and your prompt costs lower. OpenAI specifically voids model safety guarantees if you feed o-series its own reasoning back.