View as /docs.md
Guide

Claude Opus 4.7

Anthropic's flagship reasoning model. 1M-token context, 128K-token output, extended thinking, strong tool use, vision. Best pick when you need the smartest answer and can pay frontier pricing.

Quickstart

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)

r = client.chat.completions.create(
    model="anthropic/claude-opus-4.7",
    messages=[{"role": "user", "content": "Architect a durable message queue."}],
    max_tokens=4096,
    # NOTE: Opus 4.7 rejects temperature + top_p — omit these fields.
)
print(r.choices[0].message.content)

Model card

Two gotchas specific to Opus 4.7

The frontier 4.7 release deprecated explicit sampling controls and reasons about depth internally:

Request

{
  "model": "anthropic/claude-opus-4.7",
  "messages": [
    { "role": "system", "content": "You are a careful senior engineer." },
    { "role": "user",   "content": "..." }
  ],
  "max_tokens": 8192,
  "stream": false,

  "tools": [ /* OpenAI function spec */ ],
  "tool_choice": "auto",

  "response_format": { "type": "json_object" }
  // temperature / top_p: NOT supported on Opus 4.7
}

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "anthropic/claude-opus-4.7",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here is a design that handles exactly-once delivery...",
        "reasoning_content": "The user is asking for a durable queue. Key constraints..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 420,
    "completion_tokens": 1840,
    "total_tokens": 2260
  }
}

Streaming

SSE in the standard OpenAI chunk format. Extended thinking streams in delta.reasoning_content before / interleaved with delta.content.

Prompt caching (huge on Opus)

At $0.50 / 1M cache reads (10% of input cost), prefix caching is load-bearing for Opus economics. Keep your system prompt and tool definitions stable turn-to-turn; our gateway caches automatically and bills cache hits at the lower rate.

See the caching guide for how to maximize hit rate.

Use Opus 4.7 in Cursor

# Cursor → Settings → Models → Override OpenAI Base URL
Base URL:  https://api.aigateway.sh/v1
API key:   sk-aig-...
Model ID:  anthropic/claude-opus-4.7

Use Opus 4.7 in LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="anthropic/claude-opus-4.7",
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
    max_tokens=8192,
    # Don't pass temperature / top_p.
)

Benchmarks

When to reach for Opus vs Sonnet vs Kimi

Pricing worked example

An Opus turn on a 50K-token codebase RAG prompt with a 2K-token response:

More