View as /docs.md
Guide

GPT-5.4

OpenAI's frontier model. 400K context, 128K output, native JSON mode, tight tool calling, strong math and code. Reasoning-grade — so some of its knobs differ from older GPT-4-era models.

Quickstart

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)

r = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[{"role": "user", "content": "Solve: if f(x) = x^3 - 2x + 1, find f'(2)."}],
    max_completion_tokens=2048,   # NOTE: not max_tokens
)
print(r.choices[0].message.content)

Model card

Two knobs that are different on GPT-5.x

Request

{
  "model": "openai/gpt-5.4",
  "messages": [
    { "role": "system", "content": "You are a careful analyst." },
    { "role": "user",   "content": "..." }
  ],
  "max_completion_tokens": 4096,
  "stream": false,

  "tools": [ /* OpenAI function spec */ ],
  "tool_choice": "auto",
  "parallel_tool_calls": true,

  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "invoice",
      "schema": {
        "type": "object",
        "properties": {
          "total_cents": { "type": "integer" },
          "line_items": {
            "type": "array",
            "items": { "type": "object" }
          }
        },
        "required": ["total_cents", "line_items"]
      },
      "strict": true
    }
  }
}

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "openai/gpt-5.4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\"total_cents\": 12345, \"line_items\": [...]}",
        "reasoning_content": "Parsing the invoice..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 240,
    "completion_tokens": 180,
    "total_tokens": 420
  }
}

Structured outputs (strict mode)

GPT-5.4 enforces JSON schemas at the decoder level when strict: true. The response is guaranteed to parse against your schema — no post-hoc validation needed. This is the killer feature vs older GPT models:

# Payload guaranteed to parse — no try/except around json.loads
data = json.loads(r.choices[0].message.content)
assert isinstance(data["total_cents"], int)

Tool calling + parallel calls

GPT-5.4 excels at emitting multiple parallel tool calls in a single turn. Set parallel_tool_calls: true (default) and execute them concurrently client-side.

Use GPT-5.4 in Cursor

# Cursor → Settings → Models → Override OpenAI Base URL
Base URL:  https://api.aigateway.sh/v1
API key:   sk-aig-...
Model ID:  openai/gpt-5.4

Use GPT-5.4 in LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="openai/gpt-5.4",
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
    max_completion_tokens=4096,   # not max_tokens
)

Batch API (50% discount)

GPT-5.4 supports OpenAI's batch endpoint — submit up to 50,000 requests in a file, results come back within 24h at half price. Great for overnight data-extraction jobs.

See the batch docs for the workflow.

Benchmarks

When to use GPT-5.4

For pure agentic coding SWE-Bench style, Claude Opus 4.7 still edges it out; see the Opus guide.

Pricing worked example

An extraction task — 4K-token PDF converted to JSON with ~600 tokens of output:

More