# AIgateway > AIgateway is a universal AI API. One OpenAI-compatible endpoint, one key, every frontier and open-weight model, every modality (text, image, audio, vision, embeddings, moderation, classification, translation, rerank). Pay upstream cost + 5%; cache hits bill at 10% of uncached. Bill only successful runs. If you are a coding agent (Claude Code, Cursor, Cline, Windsurf, OpenClaw, etc.), you can autoconfigure a full multi-modal pipeline from this file alone. - Base URL: `https://api.aigateway.sh/v1` - Auth: `Authorization: Bearer sk-aig-...` - SDK: drop-in for the OpenAI client in any language — just change `base_url` - OpenAPI 3.1 spec: https://api.aigateway.sh/openapi.json - Model catalog (live): https://api.aigateway.sh/v1/models - Provider health (live): https://api.aigateway.sh/v1/health/providers - Get a key: https://aigateway.sh/signin (free tier is 100 req/day on Kimi K2.6, no card — through Apr 30, 2026) - Full dynamic catalog: https://aigateway.sh/llms-full.txt (every model + price + capabilities, regenerated hourly) ## Quick answers - Cheapest LLM with tool calling: `google/gemini-3.1-flash-lite` at $0.05/M in, $0.20/M out. - Cheapest free model: `moonshot/kimi-k2.6` (100 req/day, no card). - Longest context window: `google/gemini-3.1-pro` at 2M tokens. - Best coding model (SWE-bench): `anthropic/claude-opus-4.7` at 77.3%. - Best cheap alternative to GPT-5.4: `moonshot/kimi-k2.6` or `openai/gpt-5.4-mini`. - How to call Kimi K2.6 in Python: `OpenAI(base_url="https://api.aigateway.sh/v1").chat.completions.create(model="moonshot/kimi-k2.6", ...)` - Install: `pip install aigateway-py openai` (Python) or `pnpm add aigateway-js openai` (Node/TS). - CLI: `npm i -g aigateway-cli` then `aig login` (browser device auth) → `aig call moonshot/kimi-k2.6 "hi"`. ## Core endpoints - `POST /v1/chat/completions` — OpenAI-compatible chat. Any text model, tool calling, streaming. See: https://aigateway.sh/reference#chat - `POST /v1/embeddings` — embeddings across every embedding model in the catalog. See: https://aigateway.sh/reference#embeddings - `POST /v1/images/generations` — image generation (Flux, Stable Diffusion XL, Lucid Origin, DreamShaper, Phoenix). See: https://aigateway.sh/reference#images - `POST /v1/audio/transcriptions` — STT (Whisper variants, Deepgram Nova 3, Flux, Smart Turn). See: https://aigateway.sh/reference#stt - `POST /v1/audio/speech` — TTS (Deepgram Aura 1/2, MeloTTS). See: https://aigateway.sh/reference#tts - `POST /v1/moderations` — content moderation via Llama Guard 3. See: https://aigateway.sh/reference#moderations ## Utility endpoints - `POST /v1/translations` — translate text between languages. Body: `{ text, source_lang, target_lang, model? }`. - `POST /v1/classifications` — text classification with label + score. Body: `{ input, model? }`. - `POST /v1/detections` — object detection in an image. Body: `{ image_url | image_b64, model? }`. - `POST /v1/ocr` — text extraction from an image. Body: `{ image_url | image_b64, model? }`. - `POST /v1/rerank` — RAG re-ranking. Body: `{ query, documents, model? }`. ## Async endpoints (video, music, 3D) Long-running generations return a job record immediately. Poll `GET /v1/jobs/:id` or supply `webhook_url` for push notification. Binary results land at `GET /v1/files/jobs/:id/:filename`. - `POST /v1/videos/generations` — text-to-video + image-to-video (Runway, Luma, CF video models). Body: `{ prompt, model?, duration, aspect_ratio, resolution, image_url?, webhook_url? }` → `202 { id, status }`. - `POST /v1/audio/music` — text-to-music. Body: `{ prompt, model?, duration, webhook_url? }` → `202 { id, status }`. - `POST /v1/3d/generations` — text-to-3D (GLB assets). Body: `{ prompt, model?, image_url?, webhook_url? }` → `202 { id, status }`. - `GET /v1/jobs/:id` — poll job status. Response includes `status: queued | processing | completed | failed`, `result_url`, and `result` when terminal. - `DELETE /v1/jobs/:id` — cancel a queued job. ## MCP server (Model Context Protocol) Every capability above is also exposed as an MCP tool. Point an MCP-enabled agent at the endpoint and it can auto-discover everything. - **Streamable HTTP transport** (preferred, MCP 2025-03-26): `POST https://api.aigateway.sh/mcp` - **Legacy SSE transport**: `GET https://api.aigateway.sh/mcp/sse` + `POST https://api.aigateway.sh/mcp/message?sessionId=...` - **Auth**: same `Authorization: Bearer sk-aig-...` as the HTTP API. Tools exposed: `chat`, `embed`, `generate_image`, `transcribe`, `speak`, `translate`, `classify`, `moderate`, `rerank`, `ocr`, `detect_objects`, `generate_video`, `generate_music`, `generate_3d`, `get_job`, `cancel_job`, `list_models`, `search_models`. Call `tools/list` on the server for current JSON Schemas. ## Aggregator-native primitives These are only on AIgateway. They all compare across models or look at the full traffic picture — single-provider SDKs physically can't ship them. - `POST /v1/sub-accounts` — mint a scoped API key for one of your end customers with its own spend cap, rate limit, default tag, and isolated analytics. See: https://aigateway.sh/reference#sub-accounts - `POST /v1/evals` — run an eval across candidate models on your own dataset; use `eval:` as a model alias to always route to the current winner. See: https://aigateway.sh/reference#evals - `POST /v1/replays` — re-run any past request against a different model and diff cost, latency, and output. See: https://aigateway.sh/reference#replays - `GET /v1/usage/by-tag` — per-tag cost attribution. Tag any request with `x-aig-tag: `. See: https://aigateway.sh/reference#tags - `GET /v1/usage/by-sub-account` — per-customer cost attribution. ## Official SDKs + CLI - **Python**: `pip install aigateway-py` — `from aigateway import AIgateway, AsyncAIgateway, verify_webhook`. Sync + async clients. (Distribution name on PyPI is `aigateway-py`; the import path is `aigateway`.) - **Node / TypeScript**: `pnpm add aigateway-js` (or `npm install aigateway-js`) — `import { AIgateway, verifyWebhook } from 'aigateway-js'`. ESM + CJS, zero runtime deps. Covers async jobs, sub-accounts, evals, replays, signed URLs, webhook verification. - **CLI**: `npm i -g aigateway-cli` (or `npx aigateway-cli init`) — installs the `aig` binary. `aig init` walks through key + scaffolds a starter file. Also ships `aig call`, `aig models`, `aig jobs`, `aig mcp`, `aig usage`, `aig eval`, `aig replay`, `aig sub-account`, `aig tail`. For chat / embeddings / images / STT / TTS, just use the official `openai` package with `base_url='https://api.aigateway.sh/v1'`. Reach for the AIgateway SDKs when you need the aggregator-native surface (async jobs, sub-accounts, evals, replays, signed URLs, webhook verification) — endpoints OpenAI doesn't model. ## Webhook signatures Every callback (async job results AND lifecycle events) carries: - `x-aig-signature: t=,v1=` — HMAC-SHA256 over `${t}.${raw_body}` using the per-key signing secret - `x-aig-event-type: ` — see event list below - `x-aig-delivery-id: ` — stable across retries, use for idempotency - `x-aig-attempt: ` — 1-indexed attempt counter Fetch the signing secret at `GET /v1/webhook-secret` (rotate with `POST /v1/webhook-secret/rotate`). Failed deliveries (non-2xx, timeout) retry on a 6-attempt schedule: `0s, 30s, 2m, 10m, 1h, 6h`. Both official SDKs ship constant-time verifiers (`verify_webhook` in Python, `verifyWebhook` in Node). Event types currently emitted: - `job.completed`, `job.failed` — async generations (video, music, 3D) - `balance.low`, `balance.exhausted` — wallet thresholds - `usage.threshold.exceeded`, `usage.daily.summary` — spend reports - `subaccount.created`, `subaccount.spend.exceeded` — multi-tenant signals - `model.added`, `model.deprecated` — catalog drift - `key.rotated` — API-key lifecycle ## Hosts - `api.aigateway.sh` — JSON API - `media.aigateway.sh` — file downloads (job results, signed URLs). All `result_url` values in poll + webhook responses resolve to this host. - `logs.aigateway.sh`, `store.aigateway.sh` — reserved for upcoming features. ## Signed file URLs Share completed job results without handing out the gateway key: - `GET https://api.aigateway.sh/v1/files/jobs/:jobId/:filename/signed?expires_in=3600` → `{ url, expires_at }`. The returned URL is on `media.aigateway.sh`, publicly fetchable until `exp` (no Authorization needed). Max expiry: 7 days. - Storage is swept nightly; files older than 7 days are deleted. ## MCP inspector Live HTML inspector at `https://api.aigateway.sh/mcp/inspect` — paste your key and try every MCP tool from the browser. Useful for eyeballing schemas before wiring up an agent. ## Request headers every agent should know - `Authorization: Bearer sk-aig-...` (required) - `X-Request-Id: ` (optional correlation id; echoed in response) - `x-aig-tag: ` (attribute the request to a feature / tenant / user for cost reports) - `x-cache: auto | force | skip` (override cache behavior) - `x-routing: cost | speed | quality | auto` (bias auto-routing when `model` is omitted) ## Model naming All model IDs use `/` slugs. Examples: - `anthropic/claude-opus-4.7` - `openai/gpt-5.4` - `google/gemini-3.1-pro` - `moonshot/kimi-k2.6` - `meta/llama-4-scout-17b-16e-instruct` - `black-forest-labs/flux-1-schnell` - `deepgram/aura-2-en` - `openai/whisper-large-v3-turbo` - `baai/bge-m3` The full live list (with pricing, context window, capabilities, modality) is `GET /v1/models`. Filter with `?modality=text`, `?modality=image`, `?provider=anthropic`, etc. ## Errors (OpenAI-shaped, with remediation) Every error has `type`, `code`, and a `message` that tells you what to do next. | code | type | remediation | |------|------|-------------| | 400 | invalid_request_error | Fix the request body | | 401 | authentication_error | Check `Authorization: Bearer ...` | | 402 | budget_exceeded | Raise the sub-account spend cap or wait for the month rollover | | 404 | model_not_found | Use the exact `/` slug from `/v1/models` | | 429 | rate_limit_error | Honor `Retry-After`; request a higher RPM | | 502 | provider_error | Upstream 5xx; automatic failover should have engaged — see `/v1/health/providers` | | 504 | timeout_error | Upstream timed out; retry with a smaller `max_tokens` | ## Quickstart — agent copy-paste ```python from openai import OpenAI client = OpenAI(base_url="https://api.aigateway.sh/v1", api_key="sk-aig-...") # 1. text r = client.chat.completions.create( model="anthropic/claude-opus-4.7", messages=[{"role": "user", "content": "hi"}], ) # 2. image img = client.images.generate( model="black-forest-labs/flux-1-schnell", prompt="a cozy reading corner, golden hour", size="1024x1024", ) # 3. embeddings e = client.embeddings.create(model="baai/bge-m3", input=["hello", "world"]) ``` ## Pricing in one line Every model bills at upstream cost + a flat 5% platform fee. Cache hits bill at 10% of uncached. Failed requests don't bill. Free tier is 100 req/day on Kimi K2.6 (through Apr 30, 2026). No monthly minimum; top up in any amount from $5. ## Switching from another gateway Credit-match on your last invoice (up to $500) if you're coming from another aggregator. See per-competitor migration guides: - OpenRouter → https://aigateway.sh/switch/openrouter - Portkey → https://aigateway.sh/switch/portkey - Helicone → https://aigateway.sh/switch/helicone - LiteLLM → https://aigateway.sh/switch/litellm - Together → https://aigateway.sh/switch/together - Fireworks → https://aigateway.sh/switch/fireworks - Requesty → https://aigateway.sh/switch/requesty - Braintrust → https://aigateway.sh/switch/braintrust ## For humans - Docs: https://aigateway.sh/docs - API reference: https://aigateway.sh/reference - Pricing: https://aigateway.sh/pricing - Playground: https://aigateway.sh/playground - Rankings (live model leaderboard): https://aigateway.sh/rankings - Providers (every lab we route to): https://aigateway.sh/providers - Compare any two models: https://aigateway.sh/compare - Model catalog with per-model deep pages: https://aigateway.sh/models - Enterprise (evals, guardrails, replay, prompt IDs, SSO, SLA): https://aigateway.sh/enterprise - Security (posture, compliance, incident response): https://aigateway.sh/security - Integrations (OpenAI SDK, ai-sdk, LangChain, LlamaIndex, Cursor, Claude Code, Continue, Cline): https://aigateway.sh/integrations - Support: https://aigateway.sh/support (reply under 24h from a real engineer)