OpenAI's fast, lightweight reasoning model optimized for multi-step problem solving at lower cost. It is a text model from openai, accessible via AIgateway's OpenAI-compatible API at slug openai/o4-mini.

How much does o4-mini cost via AIgateway?

Input costs $1.10 per 1M tokens; output costs $4.40 per 1M tokens. Pass-through plus a 5% platform fee applied at top-up, not per call.

What is the context window of o4-mini?

200,000 tokens. Maximum output is 100,000 tokens.

How do I call o4-mini from my code?

Point the OpenAI SDK at https://api.aigateway.sh/v1 with your AIgateway key and set model to "openai/o4-mini". The request and response shapes match OpenAI exactly.

Does o4-mini support streaming, tool calling, vision, and JSON mode?

Streaming — yes. Tool calling — no. Vision — no. JSON mode — no. Prompt caching — no.

What are the best use cases for o4-mini?

Chat, Content generation, Summarization. Key strengths: General-purpose chat; Streaming; Step-by-step reasoning.

Can I bring my own openai API key (BYOK)?

Yes. Attach a openai key in your AIgateway dashboard and this model flips to pass-through — you pay openai directly and AIgateway waives the 5% platform fee on those calls.

models/openai/o4-mini

o4-mini

text

OpenAI's fast, lightweight reasoning model optimized for multi-step problem solving at lower cost.

slug · openai/o4-miniprovider · openaifamily · openaireleased · 2026-04-13

Quickstart

curl https://api.aigateway.sh/v1/chat/completions \
  -H "Authorization: Bearer $AIGATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/o4-mini",
    "messages": [{"role":"user","content":"hello"}],
    "stream": true
  }'

Capabilities

StreamingReasoning

CONTEXT

200,000 tok

MAX OUTPUT

100,000 tok

Strengths

General-purpose chat
Streaming
Step-by-step reasoning
Low-latency inference

Use cases

ChatContent generationSummarization

Pricing

Input$1.10 / 1M tokens

Output$4.40 / 1M tokens

You pay pass-through · 5% applied at credit top-up, not per-call.

Try in playground →Compare API reference See usage ranking →

Collections

More text models →More from openai →Frontier models →Free-tier models →

API schema

Call o4-mini from any OpenAI SDK

POST https://api.aigateway.sh/v1/chat/completions·Content-Type: application/json·Auth: Bearer sk-aig-...

Model-specific notes

Returns chain-of-thought in message.reasoning_content (non-streaming) and delta.reasoning_content (streaming). Safe to display or ignore — it's separate from content.
Use max_completion_tokens instead of max_tokens. Our gateway accepts either and translates.

Request body

json

{
  "model": "openai/o4-mini",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "Hello!" }
  ],
  "temperature": 0.7,
  "top_p": 0.95,
  "max_completion_tokens": 1024,
  "stream": false

}

Response

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1776947082,
  "model": "openai/o4-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?",
      "reasoning_content": "The user asked..."  // o4-mini chain-of-thought
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 12,
    "total_tokens": 36
  }
}

Streaming (SSE) — set `"stream": true`

// 1. Role announcement (first chunk):
data: {"choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

// 2. Reasoning chunks (o4-mini thinks first):
data: {"choices":[{"index":0,"delta":{"reasoning_content":"The user "},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"reasoning_content":"wants..."},"finish_reason":null}]}

// 3. Content chunks (final answer):
data: {"choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

// Finish chunk:
data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

// Terminator:
data: [DONE]

Quickstart

# pip install aigateway-py openai
# aigateway-py adds sub-accounts, evals, replays, jobs, webhook verify.
# openai SDK covers chat — drop-in per our SDK's own guidance.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)

stream = client.chat.completions.create(
    model="openai/o4-mini",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

# o4-mini returns chain-of-thought in message.reasoning_content —
# display it in a collapsed "show thinking" UI or just ignore it.

Errors

401authentication_errorInvalid or missing API key

402insufficient_creditsWallet empty (PAYG only)

404not_foundUnknown model or endpoint

429rate_limit_errorOver per-minute limit — see Retry-After header

500server_errorUpstream provider failed (retryable)

503service_unavailableUpstream saturated (retryable)

Full docs →API reference →OpenAPI spec →llms.txt →

o4-mini

Quickstart

Capabilities

Strengths

Use cases

Pricing

Collections

Call o4-mini from any OpenAI SDK

Request body

Response

Streaming (SSE) — set "stream": true

Quickstart

Errors

Streaming (SSE) — set `"stream": true`