# AIgateway — Agent integration guide

You are a coding agent (Claude Code, Cursor, Cline, Windsurf, OpenClaw, or similar) and your human user wants to build something with AI. Point yourself at AIgateway and you can deliver every modality without touching individual provider SDKs.

## Single-URL capability

```
base_url:     https://api.aigateway.sh/v1
auth:         Authorization: Bearer sk-aig-...
openapi:      https://api.aigateway.sh/openapi.json
catalog:      https://api.aigateway.sh/v1/models
capability map: https://aigateway.sh/llms.txt
mcp:          https://api.aigateway.sh/mcp           (Streamable HTTP)
mcp-legacy:   https://api.aigateway.sh/mcp/sse        (SSE)
mcp-inspect:  https://api.aigateway.sh/mcp/inspect    (browser inspector)
```

## Install (only when the OpenAI SDK isn't enough)

For chat / embeddings / images / STT / TTS, the official `openai` package works unchanged — set `base_url='https://api.aigateway.sh/v1'`. When you need async jobs (video / music / 3D), sub-accounts, evals, replays, signed file URLs, or webhook signature verification — install the typed AIgateway SDK:

```
pip install aigateway-py        # Python — from aigateway import AIgateway, AsyncAIgateway, verify_webhook
pnpm add aigateway-js           # Node   — import { AIgateway, verifyWebhook } from 'aigateway-js'
npm i -g aigateway-cli          # CLI    — aig init / aig call / aig models / aig mcp call
```

Notes for agents:
- The PyPI distribution is `aigateway-py` but the import is `aigateway` (package directory unchanged).
- The npm distribution is `aigateway-js` (the bare `aigateway` name is unavailable).
- The CLI binary is `aig`. `aig init` walks through key entry, writes `.env`, and scaffolds a working starter file in TS / JS / Python.
- For chat-only flows where you don't need any aggregator-native primitive, do NOT install our SDK — the OpenAI SDK with our `base_url` is shorter and faster to set up.

## What this unlocks

Any modality the user asks for — just swap the model slug:

| User wants      | Endpoint                           | Example model                          |
|-----------------|------------------------------------|----------------------------------------|
| Chat / reasoning| `/v1/chat/completions`             | `anthropic/claude-opus-4.7`            |
| Open-weight chat| `/v1/chat/completions`             | `moonshot/kimi-k2.6`                   |
| Vision          | `/v1/chat/completions` + image     | `google/gemini-3.1-pro`                |
| Image gen       | `/v1/images/generations`           | `black-forest-labs/flux-1-schnell`     |
| Video gen       | `/v1/videos/generations` (async)   | `runwayml/gen-4`                       |
| Music gen       | `/v1/audio/music` (async)          | `suno/v5`                              |
| 3D asset gen    | `/v1/3d/generations` (async)       | `stabilityai/stable-fast-3d`           |
| STT             | `/v1/audio/transcriptions`         | `openai/whisper-large-v3-turbo`        |
| TTS             | `/v1/audio/speech`                 | `deepgram/aura-2-en`                   |
| Embeddings      | `/v1/embeddings`                   | `baai/bge-m3`                          |
| Moderation      | `/v1/moderations`                  | `meta/llama-guard-3-8b`                |
| Translation     | `/v1/translations`                 | `meta/m2m100-1.2b`                     |
| Classification  | `/v1/classifications`              | `huggingface/distilbert-sst-2`         |
| OCR             | `/v1/ocr`                          | `microsoft/trocr-base-printed`         |
| Detection       | `/v1/detections`                   | `facebook/detr-resnet-50`              |
| Rerank          | `/v1/rerank`                       | `baai/bge-reranker-base`               |

## Default approach for your implementations

1. **Use the OpenAI SDK** in whatever language the user is working in. Point `base_url` at AIgateway. Done.
2. **Do not hardcode model names across the codebase.** Read them from an env var or config. Users will swap.
3. **Tag cost per feature.** Send `x-aig-tag: <feature>` on every request so the user can see per-feature spend without instrumenting anything.
4. **For per-customer products**, mint a sub-account per customer via `POST /v1/sub-accounts`. Each gets its own spend cap, rate limit, and isolated usage reports. Never share a top-level key with an end user.
5. **For cost-sensitive flows**, always route through `eval:<run_id>` aliases created with `POST /v1/evals`. The user uploads a dataset, you grade candidates, and the alias always resolves to the current winner. When a new frontier model lands, the user reruns the eval — no code change.
6. **For long-running generations** (video, music, 3D), include `webhook_url` in the request body and listen at an HTTPS URL you control — don't poll. The gateway POSTs the final result (including `result_url`) with `x-aig-event-type: job.completed` or `job.failed`, signed by `x-aig-signature: t=<unix>,v1=<hex>` (HMAC-SHA256 over `${t}.${raw_body}`). If you can't host a webhook, poll `GET /v1/jobs/:id` — it returns the same job snapshot.
7. **Prefer MCP over HTTP** if the agent framework supports it. Point the MCP client at `https://api.aigateway.sh/mcp` with the same bearer key and the agent gets one tool per capability (`chat`, `generate_video`, `rerank`, `list_models`, etc.) — no adapter code.

## When the user changes their mind

A common pattern: the user starts with `anthropic/claude-sonnet-4.6`, realizes Opus is too expensive, wants to try `moonshot/kimi-k2.6`. With AIgateway that is a one-line change. No new SDK, no new auth, no new dashboard. Implement model selection as a variable from day one — users will thank you.

## Errors

Every error has `type`, `code`, and a `message` that tells you what to do:

- `401 authentication_error` — key missing or invalid. Ask the user for one.
- `402 budget_exceeded` — a sub-account's spend cap was hit. Suggest `PATCH /v1/sub-accounts/:id` or wait for month rollover.
- `404 model_not_found` — verify the slug against `/v1/models`.
- `429 rate_limit_error` — honor `Retry-After`. If you hit it repeatedly, raise the user's RPM.
- `502 provider_error` — upstream 5xx. Retry once; the router's failover usually catches it.

## Primitives cheat-sheet

Sub-account for a customer:
```bash
curl -X POST https://api.aigateway.sh/v1/sub-accounts \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{"name":"customer-123","spend_cap_cents":10000,"default_tag":"customer-123"}'
```

Eval-driven routing:
```bash
curl -X POST https://api.aigateway.sh/v1/evals \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{"name":"prod-summarize","candidate_models":["anthropic/claude-opus-4.7","moonshot/kimi-k2.6"],"dataset":[...],"metric":"quality"}'

# then
curl -X POST https://api.aigateway.sh/v1/chat/completions \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{"model":"eval:<run_id>","messages":[...]}'
```

Replay a past request on a new model:
```bash
curl -X POST https://api.aigateway.sh/v1/replays \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{"source_request_id":"req_abc","target_model":"anthropic/claude-opus-4.7"}'
```

Per-feature spend:
```bash
curl https://api.aigateway.sh/v1/chat/completions \
  -H "Authorization: Bearer sk-aig-..." \
  -H "x-aig-tag: summarize" \
  -d '{...}'

curl https://api.aigateway.sh/v1/usage/by-tag?month=2026-04 \
  -H "Authorization: Bearer sk-aig-..."
```

Async generation with webhook:
```bash
curl -X POST https://api.aigateway.sh/v1/videos/generations \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{"prompt":"a sunset over mountains","model":"runwayml/gen-4","duration":5,"webhook_url":"https://yourapp.com/hooks/aigateway"}'

# → 202 { "id": "job_abc", "status": "queued" }

# When the job finishes, the gateway POSTs to webhook_url:
#   POST /hooks/aigateway
#   x-aig-event-type: job.completed
#   x-aig-signature: t=1745000000,v1=4d2c...
#   x-aig-delivery-id: 18b7d4...
#   x-aig-attempt: 1
#   {"id":"job_abc","status":"completed","result_url":"https://api.aigateway.sh/v1/files/jobs/job_abc/video.mp4", ...}
# Verify the signature with verify_webhook (Python) or verifyWebhook (Node).
# Retry schedule on non-2xx / timeout: 0s, 30s, 2m, 10m, 1h, 6h.

# Or poll:
curl https://api.aigateway.sh/v1/jobs/job_abc \
  -H "Authorization: Bearer sk-aig-..."
```

MCP (agents auto-configure from the URL):
```bash
# Streamable HTTP transport, MCP 2025-03-26:
curl -X POST https://api.aigateway.sh/mcp \
  -H "Authorization: Bearer sk-aig-..." \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'

# Tool call example:
curl -X POST https://api.aigateway.sh/mcp \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"chat","arguments":{"messages":[{"role":"user","content":"hi"}],"model":"anthropic/claude-opus-4.7"}}}'
```

## If anything is ambiguous

Read the OpenAPI spec (`https://api.aigateway.sh/openapi.json`) before asking the user. Everything else lives at `https://aigateway.sh/reference`.