# AIgateway

> AIgateway is a universal AI API. One OpenAI-compatible endpoint, one key, every frontier and open-weight model, every modality (text, image, audio, vision, embeddings, moderation, classification, translation, rerank). Pay upstream cost + 5%; cache hits bill at 10% of uncached. Bill only successful runs.

If you are a coding agent (Claude Code, Cursor, Cline, Windsurf, OpenClaw, etc.), you can autoconfigure a full multi-modal pipeline from this file alone.

- Base URL: `https://api.aigateway.sh/v1`
- Auth: `Authorization: Bearer sk-aig-...`
- SDK: drop-in for the OpenAI client in any language — just change `base_url`
- OpenAPI 3.1 spec: https://api.aigateway.sh/openapi.json
- Model catalog (live): https://api.aigateway.sh/v1/models
- Provider health (live): https://api.aigateway.sh/v1/health/providers
- Get a key: https://aigateway.sh/signin (free tier is 100 req/day on Kimi K2.6, no card — through Apr 30, 2026)
- Full dynamic catalog: https://aigateway.sh/llms-full.txt (every model + price + capabilities, regenerated hourly)

## Quick answers
- Cheapest LLM with tool calling: `google/gemini-3.1-flash-lite` at $0.05/M in, $0.20/M out.
- Cheapest free model: `moonshot/kimi-k2.6` (100 req/day, no card).
- Longest context window: `google/gemini-3.1-pro` at 2M tokens.
- Best coding model (SWE-bench): `anthropic/claude-opus-4.7` at 77.3%.
- Best cheap alternative to GPT-5.4: `moonshot/kimi-k2.6` or `openai/gpt-5.4-mini`.
- How to call Kimi K2.6 in Python: `OpenAI(base_url="https://api.aigateway.sh/v1").chat.completions.create(model="moonshot/kimi-k2.6", ...)`
- Install: `pip install aigateway-py openai` (Python) or `pnpm add aigateway-js openai` (Node/TS).
- CLI: `npm i -g aigateway-cli` then `aig login` (browser device auth) → `aig call moonshot/kimi-k2.6 "hi"`.

## Core endpoints

- `POST /v1/chat/completions` — OpenAI-compatible chat. Any text model, tool calling, streaming. See: https://aigateway.sh/reference#chat
- `POST /v1/embeddings` — embeddings across every embedding model in the catalog. See: https://aigateway.sh/reference#embeddings
- `POST /v1/images/generations` — image generation (Flux, Stable Diffusion XL, Lucid Origin, DreamShaper, Phoenix). See: https://aigateway.sh/reference#images
- `POST /v1/audio/transcriptions` — STT (Whisper variants, Deepgram Nova 3, Flux, Smart Turn). See: https://aigateway.sh/reference#stt
- `POST /v1/audio/speech` — TTS (Deepgram Aura 1/2, MeloTTS). See: https://aigateway.sh/reference#tts
- `POST /v1/moderations` — content moderation via Llama Guard 3. See: https://aigateway.sh/reference#moderations

## Utility endpoints

- `POST /v1/translations` — translate text between languages. Body: `{ text, source_lang, target_lang, model? }`.
- `POST /v1/classifications` — text classification with label + score. Body: `{ input, model? }`.
- `POST /v1/detections` — object detection in an image. Body: `{ image_url | image_b64, model? }`.
- `POST /v1/ocr` — text extraction from an image. Body: `{ image_url | image_b64, model? }`.
- `POST /v1/rerank` — RAG re-ranking. Body: `{ query, documents, model? }`.

## Async endpoints (video, music, 3D)

Long-running generations return a job record immediately. Poll `GET /v1/jobs/:id` or supply `webhook_url` for push notification. Binary results land at `GET /v1/files/jobs/:id/:filename`.

- `POST /v1/videos/generations` — text-to-video + image-to-video (Runway, Luma, CF video models). Body: `{ prompt, model?, duration, aspect_ratio, resolution, image_url?, webhook_url? }` → `202 { id, status }`.
- `POST /v1/audio/music` — text-to-music. Body: `{ prompt, model?, duration, webhook_url? }` → `202 { id, status }`.
- `POST /v1/3d/generations` — text-to-3D (GLB assets). Body: `{ prompt, model?, image_url?, webhook_url? }` → `202 { id, status }`.
- `GET /v1/jobs/:id` — poll job status. Response includes `status: queued | processing | completed | failed`, `result_url`, and `result` when terminal.
- `DELETE /v1/jobs/:id` — cancel a queued job.

## MCP server (Model Context Protocol)

Every capability above is also exposed as an MCP tool. Point an MCP-enabled agent at the endpoint and it can auto-discover everything.

- **Streamable HTTP transport** (preferred, MCP 2025-03-26): `POST https://api.aigateway.sh/mcp`
- **Legacy SSE transport**: `GET https://api.aigateway.sh/mcp/sse` + `POST https://api.aigateway.sh/mcp/message?sessionId=...`
- **Auth**: same `Authorization: Bearer sk-aig-...` as the HTTP API.

Tools exposed: `chat`, `embed`, `generate_image`, `transcribe`, `speak`, `translate`, `classify`, `moderate`, `rerank`, `ocr`, `detect_objects`, `generate_video`, `generate_music`, `generate_3d`, `get_job`, `cancel_job`, `list_models`, `search_models`. Call `tools/list` on the server for current JSON Schemas.

## Aggregator-native primitives

These are only on AIgateway. They all compare across models or look at the full traffic picture — single-provider SDKs physically can't ship them.

- `POST /v1/sub-accounts` — mint a scoped API key for one of your end customers with its own spend cap, rate limit, default tag, and isolated analytics. See: https://aigateway.sh/reference#sub-accounts
- `POST /v1/evals` — run an eval across candidate models on your own dataset; use `eval:<run_id>` as a model alias to always route to the current winner. See: https://aigateway.sh/reference#evals
- `POST /v1/replays` — re-run any past request against a different model and diff cost, latency, and output. See: https://aigateway.sh/reference#replays
- `GET /v1/usage/by-tag` — per-tag cost attribution. Tag any request with `x-aig-tag: <string>`. See: https://aigateway.sh/reference#tags
- `GET /v1/usage/by-sub-account` — per-customer cost attribution.

## Official SDKs + CLI

- **Python**: `pip install aigateway-py` — `from aigateway import AIgateway, AsyncAIgateway, verify_webhook`. Sync + async clients. (Distribution name on PyPI is `aigateway-py`; the import path is `aigateway`.)
- **Node / TypeScript**: `pnpm add aigateway-js` (or `npm install aigateway-js`) — `import { AIgateway, verifyWebhook } from 'aigateway-js'`. ESM + CJS, zero runtime deps. Covers async jobs, sub-accounts, evals, replays, signed URLs, webhook verification.
- **CLI**: `npm i -g aigateway-cli` (or `npx aigateway-cli init`) — installs the `aig` binary. `aig init` walks through key + scaffolds a starter file. Also ships `aig call`, `aig models`, `aig jobs`, `aig mcp`, `aig usage`, `aig eval`, `aig replay`, `aig sub-account`, `aig tail`.

For chat / embeddings / images / STT / TTS, just use the official `openai` package with `base_url='https://api.aigateway.sh/v1'`. Reach for the AIgateway SDKs when you need the aggregator-native surface (async jobs, sub-accounts, evals, replays, signed URLs, webhook verification) — endpoints OpenAI doesn't model.

## Webhook signatures

Every callback (async job results AND lifecycle events) carries:

- `x-aig-signature: t=<unix>,v1=<hex>` — HMAC-SHA256 over `${t}.${raw_body}` using the per-key signing secret
- `x-aig-event-type: <event>` — see event list below
- `x-aig-delivery-id: <uuid>` — stable across retries, use for idempotency
- `x-aig-attempt: <n>` — 1-indexed attempt counter

Fetch the signing secret at `GET /v1/webhook-secret` (rotate with `POST /v1/webhook-secret/rotate`). Failed deliveries (non-2xx, timeout) retry on a 6-attempt schedule: `0s, 30s, 2m, 10m, 1h, 6h`. Both official SDKs ship constant-time verifiers (`verify_webhook` in Python, `verifyWebhook` in Node).

Event types currently emitted:

- `job.completed`, `job.failed` — async generations (video, music, 3D)
- `balance.low`, `balance.exhausted` — wallet thresholds
- `usage.threshold.exceeded`, `usage.daily.summary` — spend reports
- `subaccount.created`, `subaccount.spend.exceeded` — multi-tenant signals
- `model.added`, `model.deprecated` — catalog drift
- `key.rotated` — API-key lifecycle

## Hosts

- `api.aigateway.sh` — JSON API
- `media.aigateway.sh` — file downloads (job results, signed URLs). All `result_url` values in poll + webhook responses resolve to this host.
- `logs.aigateway.sh`, `store.aigateway.sh` — reserved for upcoming features.

## Signed file URLs

Share completed job results without handing out the gateway key:

- `GET https://api.aigateway.sh/v1/files/jobs/:jobId/:filename/signed?expires_in=3600` → `{ url, expires_at }`. The returned URL is on `media.aigateway.sh`, publicly fetchable until `exp` (no Authorization needed). Max expiry: 7 days.
- Storage is swept nightly; files older than 7 days are deleted.

## MCP inspector

Live HTML inspector at `https://api.aigateway.sh/mcp/inspect` — paste your key and try every MCP tool from the browser. Useful for eyeballing schemas before wiring up an agent.

## Request headers every agent should know

- `Authorization: Bearer sk-aig-...` (required)
- `X-Request-Id: <uuid>` (optional correlation id; echoed in response)
- `x-aig-tag: <string>` (attribute the request to a feature / tenant / user for cost reports)
- `x-cache: auto | force | skip` (override cache behavior)
- `x-routing: cost | speed | quality | auto` (bias auto-routing when `model` is omitted)

## Model naming

All model IDs use `<provider>/<model>` slugs. Examples:

- `anthropic/claude-opus-4.7`
- `openai/gpt-5.4`
- `google/gemini-3.1-pro`
- `moonshot/kimi-k2.6`
- `meta/llama-4-scout-17b-16e-instruct`
- `black-forest-labs/flux-1-schnell`
- `deepgram/aura-2-en`
- `openai/whisper-large-v3-turbo`
- `baai/bge-m3`

The full live list (with pricing, context window, capabilities, modality) is `GET /v1/models`. Filter with `?modality=text`, `?modality=image`, `?provider=anthropic`, etc.

## Errors (OpenAI-shaped, with remediation)

Every error has `type`, `code`, and a `message` that tells you what to do next.

| code | type | remediation |
|------|------|-------------|
| 400  | invalid_request_error | Fix the request body |
| 401  | authentication_error  | Check `Authorization: Bearer ...` |
| 402  | budget_exceeded       | Raise the sub-account spend cap or wait for the month rollover |
| 404  | model_not_found       | Use the exact `<provider>/<model>` slug from `/v1/models` |
| 429  | rate_limit_error      | Honor `Retry-After`; request a higher RPM |
| 502  | provider_error        | Upstream 5xx; automatic failover should have engaged — see `/v1/health/providers` |
| 504  | timeout_error         | Upstream timed out; retry with a smaller `max_tokens` |

## Quickstart — agent copy-paste

```python
from openai import OpenAI
client = OpenAI(base_url="https://api.aigateway.sh/v1", api_key="sk-aig-...")

# 1. text
r = client.chat.completions.create(
    model="anthropic/claude-opus-4.7",
    messages=[{"role": "user", "content": "hi"}],
)

# 2. image
img = client.images.generate(
    model="black-forest-labs/flux-1-schnell",
    prompt="a cozy reading corner, golden hour",
    size="1024x1024",
)

# 3. embeddings
e = client.embeddings.create(model="baai/bge-m3", input=["hello", "world"])
```

## Pricing in one line

Every model bills at upstream cost + a flat 5% platform fee. Cache hits bill at 10% of uncached. Failed requests don't bill. Free tier is 100 req/day on Kimi K2.6 (through Apr 30, 2026). No monthly minimum; top up in any amount from $5.

## Switching from another gateway

Credit-match on your last invoice (up to $500) if you're coming from another aggregator. See per-competitor migration guides:

- OpenRouter → https://aigateway.sh/switch/openrouter
- Portkey → https://aigateway.sh/switch/portkey
- Helicone → https://aigateway.sh/switch/helicone
- LiteLLM → https://aigateway.sh/switch/litellm
- Together → https://aigateway.sh/switch/together
- Fireworks → https://aigateway.sh/switch/fireworks
- Requesty → https://aigateway.sh/switch/requesty
- Braintrust → https://aigateway.sh/switch/braintrust

## For humans

- Docs: https://aigateway.sh/docs
- API reference: https://aigateway.sh/reference
- Pricing: https://aigateway.sh/pricing
- Playground: https://aigateway.sh/playground
- Rankings (live model leaderboard): https://aigateway.sh/rankings
- Providers (every lab we route to): https://aigateway.sh/providers
- Compare any two models: https://aigateway.sh/compare
- Model catalog with per-model deep pages: https://aigateway.sh/models
- Enterprise (evals, guardrails, replay, prompt IDs, SSO, SLA): https://aigateway.sh/enterprise
- Security (posture, compliance, incident response): https://aigateway.sh/security
- Integrations (OpenAI SDK, ai-sdk, LangChain, LlamaIndex, Cursor, Claude Code, Continue, Cline): https://aigateway.sh/integrations
- Support: https://aigateway.sh/support (reply under 24h from a real engineer)