Claude Opus 4.7
Anthropic's flagship reasoning model. 1M-token context, 128K-token output, extended thinking, strong tool use, vision. Best pick when you need the smartest answer and can pay frontier pricing.
Quickstart
from openai import OpenAI
client = OpenAI(
base_url="https://api.aigateway.sh/v1",
api_key="sk-aig-...",
)
r = client.chat.completions.create(
model="anthropic/claude-opus-4.7",
messages=[{"role": "user", "content": "Architect a durable message queue."}],
max_tokens=4096,
# NOTE: Opus 4.7 rejects temperature + top_p — omit these fields.
)
print(r.choices[0].message.content)Model card
- Slug:
anthropic/claude-opus-4.7 - Provider: Anthropic
- Released: 2026-04-16
- Context window: 1,000,000 tokens
- Max output: 128,000 tokens
- Modality: Text + vision
- Capabilities: Streaming, tool calling, JSON mode, extended thinking, caching, vision
- Pricing: $5.00 / 1M input, $25.00 / 1M output. Cache reads $0.50 / 1M, cache writes $6.25 / 1M. Pass-through — 5% platform fee at credit top-up.
Two gotchas specific to Opus 4.7
The frontier 4.7 release deprecated explicit sampling controls and reasons about depth internally:
- Do not send
temperatureortop_p. Opus 4.7 returns aUser Input Errorif either is present. Most SDK wrappers still default-send these; drop them explicitly or use our gateway's canonical body shape (we strip them automatically when routing through this slug). - Extended thinking is on by default. Responses include
message.reasoning_contentalongsidemessage.content. If you don't want CoT, just ignore the field — it's additive.
Request
{
"model": "anthropic/claude-opus-4.7",
"messages": [
{ "role": "system", "content": "You are a careful senior engineer." },
{ "role": "user", "content": "..." }
],
"max_tokens": 8192,
"stream": false,
"tools": [ /* OpenAI function spec */ ],
"tool_choice": "auto",
"response_format": { "type": "json_object" }
// temperature / top_p: NOT supported on Opus 4.7
}Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "anthropic/claude-opus-4.7",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here is a design that handles exactly-once delivery...",
"reasoning_content": "The user is asking for a durable queue. Key constraints..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 420,
"completion_tokens": 1840,
"total_tokens": 2260
}
}Streaming
SSE in the standard OpenAI chunk format. Extended thinking streams in delta.reasoning_content before / interleaved with delta.content.
Prompt caching (huge on Opus)
At $0.50 / 1M cache reads (10% of input cost), prefix caching is load-bearing for Opus economics. Keep your system prompt and tool definitions stable turn-to-turn; our gateway caches automatically and bills cache hits at the lower rate.
See the caching guide for how to maximize hit rate.
Use Opus 4.7 in Cursor
# Cursor → Settings → Models → Override OpenAI Base URL Base URL: https://api.aigateway.sh/v1 API key: sk-aig-... Model ID: anthropic/claude-opus-4.7
Use Opus 4.7 in LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="anthropic/claude-opus-4.7",
base_url="https://api.aigateway.sh/v1",
api_key="sk-aig-...",
max_tokens=8192,
# Don't pass temperature / top_p.
)Benchmarks
- MMLU: 90.4%
- HumanEval: 95.1%
- SWE-Bench: 72.5% — state-of-the-art on agentic coding
When to reach for Opus vs Sonnet vs Kimi
- Opus 4.7: multi-file refactors, novel algorithm design, long-document synthesis where a single mistake costs hours of cleanup. Expensive. Use sparingly and cache aggressively.
- Sonnet 4.6: production agents, RAG, data extraction — the default workhorse. See Sonnet 4.6.
- Kimi K2.6: anywhere open-weight economics win, or during the current free trial. Kimi guide.
Pricing worked example
An Opus turn on a 50K-token codebase RAG prompt with a 2K-token response:
- Input (first call, no cache): 50,000 × $5.00 / 1M = $0.25
- Output: 2,000 × $25.00 / 1M = $0.05
- First turn: ~$0.30
- Cached reruns of same prefix: 50,000 × $0.50 / 1M = $0.025 input
- Subsequent cached turns: ~$0.075 — 4× cheaper