GPT-5.4
OpenAI's frontier model. 400K context, 128K output, native JSON mode, tight tool calling, strong math and code. Reasoning-grade — so some of its knobs differ from older GPT-4-era models.
Quickstart
from openai import OpenAI
client = OpenAI(
base_url="https://api.aigateway.sh/v1",
api_key="sk-aig-...",
)
r = client.chat.completions.create(
model="openai/gpt-5.4",
messages=[{"role": "user", "content": "Solve: if f(x) = x^3 - 2x + 1, find f'(2)."}],
max_completion_tokens=2048, # NOTE: not max_tokens
)
print(r.choices[0].message.content)Model card
- Slug:
openai/gpt-5.4 - Provider: OpenAI
- Released: 2026-03-05
- Context window: 400,000 tokens
- Max output: 128,000 tokens
- Modality: Text + vision
- Capabilities: Streaming, tool calling, JSON mode, structured outputs, batch, caching, reasoning
- Pricing: $2.50 / 1M input, $15.00 / 1M output, $0.25 / 1M cache reads. Pass-through — 5% fee at credit top-up.
Two knobs that are different on GPT-5.x
- Use
max_completion_tokens, notmax_tokens. GPT-5.x and o-series reasoning models renamed this. Our gateway accepts either and translates, but sendingmax_completion_tokensdirectly avoids any ambiguity. - Sampling controls are limited.
temperatureandtop_pare accepted but have reduced effect vs GPT-4-class models — the model reasons internally. For deterministic output rely onresponse_formatand tool schemas instead of temperature=0.
Request
{
"model": "openai/gpt-5.4",
"messages": [
{ "role": "system", "content": "You are a careful analyst." },
{ "role": "user", "content": "..." }
],
"max_completion_tokens": 4096,
"stream": false,
"tools": [ /* OpenAI function spec */ ],
"tool_choice": "auto",
"parallel_tool_calls": true,
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "invoice",
"schema": {
"type": "object",
"properties": {
"total_cents": { "type": "integer" },
"line_items": {
"type": "array",
"items": { "type": "object" }
}
},
"required": ["total_cents", "line_items"]
},
"strict": true
}
}
}Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "openai/gpt-5.4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "{\"total_cents\": 12345, \"line_items\": [...]}",
"reasoning_content": "Parsing the invoice..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 240,
"completion_tokens": 180,
"total_tokens": 420
}
}Structured outputs (strict mode)
GPT-5.4 enforces JSON schemas at the decoder level when strict: true. The response is guaranteed to parse against your schema — no post-hoc validation needed. This is the killer feature vs older GPT models:
# Payload guaranteed to parse — no try/except around json.loads data = json.loads(r.choices[0].message.content) assert isinstance(data["total_cents"], int)
Tool calling + parallel calls
GPT-5.4 excels at emitting multiple parallel tool calls in a single turn. Set parallel_tool_calls: true (default) and execute them concurrently client-side.
Use GPT-5.4 in Cursor
# Cursor → Settings → Models → Override OpenAI Base URL Base URL: https://api.aigateway.sh/v1 API key: sk-aig-... Model ID: openai/gpt-5.4
Use GPT-5.4 in LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="openai/gpt-5.4",
base_url="https://api.aigateway.sh/v1",
api_key="sk-aig-...",
max_completion_tokens=4096, # not max_tokens
)Batch API (50% discount)
GPT-5.4 supports OpenAI's batch endpoint — submit up to 50,000 requests in a file, results come back within 24h at half price. Great for overnight data-extraction jobs.
See the batch docs for the workflow.
Benchmarks
- MMLU: 91.8%
- HumanEval: 94.0%
- GSM8K: 98.2% — best-in-class on math
When to use GPT-5.4
- You need guaranteed-parseable JSON from an LLM (strict mode). GPT-5.4 is the most reliable model for this today.
- Math-heavy or structured-extraction workloads.
- Very long context ingestion (400K window) with disciplined output caps.
For pure agentic coding SWE-Bench style, Claude Opus 4.7 still edges it out; see the Opus guide.
Pricing worked example
An extraction task — 4K-token PDF converted to JSON with ~600 tokens of output:
- Input: 4,000 × $2.50 / 1M = $0.010
- Output: 600 × $15.00 / 1M = $0.009
- ~$0.019 per document. 50 docs per dollar.
- Via Batch API (50% discount): ~$0.0095 per document.