Guide

Claude Opus 4.7

Anthropic's flagship reasoning model. 1M-token context, 128K-token output, extended thinking, strong tool use, vision. Best pick when you need the smartest answer and can pay frontier pricing.

Quickstart

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)

r = client.chat.completions.create(
    model="anthropic/claude-opus-4.7",
    messages=[{"role": "user", "content": "Architect a durable message queue."}],
    max_tokens=4096,
    # NOTE: Opus 4.7 rejects temperature + top_p — omit these fields.
)
print(r.choices[0].message.content)

Model card

Slug: anthropic/claude-opus-4.7
Provider: Anthropic
Released: 2026-04-16
Context window: 1,000,000 tokens
Max output: 128,000 tokens
Modality: Text + vision
Capabilities: Streaming, tool calling, JSON mode, extended thinking, caching, vision
Pricing: $5.00 / 1M input, $25.00 / 1M output. Cache reads $0.50 / 1M, cache writes $6.25 / 1M. Pass-through — 5% platform fee at credit top-up.

Two gotchas specific to Opus 4.7

The frontier 4.7 release deprecated explicit sampling controls and reasons about depth internally:

Do not send temperature or top_p. Opus 4.7 returns a User Input Error if either is present. Most SDK wrappers still default-send these; drop them explicitly or use our gateway's canonical body shape (we strip them automatically when routing through this slug).
Extended thinking is on by default. Responses include message.reasoning_content alongside message.content. If you don't want CoT, just ignore the field — it's additive.

Request

{
  "model": "anthropic/claude-opus-4.7",
  "messages": [
    { "role": "system", "content": "You are a careful senior engineer." },
    { "role": "user",   "content": "..." }
  ],
  "max_tokens": 8192,
  "stream": false,

  "tools": [ /* OpenAI function spec */ ],
  "tool_choice": "auto",

  "response_format": { "type": "json_object" }
  // temperature / top_p: NOT supported on Opus 4.7
}

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "anthropic/claude-opus-4.7",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here is a design that handles exactly-once delivery...",
        "reasoning_content": "The user is asking for a durable queue. Key constraints..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 420,
    "completion_tokens": 1840,
    "total_tokens": 2260
  }
}

Streaming

SSE in the standard OpenAI chunk format. Extended thinking streams in delta.reasoning_content before / interleaved with delta.content.

Prompt caching (huge on Opus)

At $0.50 / 1M cache reads (10% of input cost), prefix caching is load-bearing for Opus economics. Keep your system prompt and tool definitions stable turn-to-turn; our gateway caches automatically and bills cache hits at the lower rate.

See the caching guide for how to maximize hit rate.

Use Opus 4.7 in Cursor

# Cursor → Settings → Models → Override OpenAI Base URL
Base URL:  https://api.aigateway.sh/v1
API key:   sk-aig-...
Model ID:  anthropic/claude-opus-4.7

Use Opus 4.7 in LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="anthropic/claude-opus-4.7",
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
    max_tokens=8192,
    # Don't pass temperature / top_p.
)

Benchmarks

MMLU: 90.4%
HumanEval: 95.1%
SWE-Bench: 72.5% — state-of-the-art on agentic coding

When to reach for Opus vs Sonnet vs Kimi

Opus 4.7: multi-file refactors, novel algorithm design, long-document synthesis where a single mistake costs hours of cleanup. Expensive. Use sparingly and cache aggressively.
Sonnet 4.6: production agents, RAG, data extraction — the default workhorse. See Sonnet 4.6.
Kimi K2.6: anywhere open-weight economics win, or during the current free trial. Kimi guide.

Pricing worked example

An Opus turn on a 50K-token codebase RAG prompt with a 2K-token response:

Input (first call, no cache): 50,000 × $5.00 / 1M = $0.25
Output: 2,000 × $25.00 / 1M = $0.05
First turn: ~$0.30
Cached reruns of same prefix: 50,000 × $0.50 / 1M = $0.025 input
Subsequent cached turns: ~$0.075 — 4× cheaper