# AIgateway — API reference

> One OpenAI-compatible API to every frontier and open-weight AI model.

OpenAI-compatible endpoints for chat, embeddings, images, audio, video, translation, classification, OCR, rerank, moderation, plus aggregator-native primitives (sub-accounts, evals, replays, batches, files, signed URLs, webhooks). Catalog: 150+ models across 44 labs. Change `base_url` on the OpenAI SDK to https://api.aigateway.sh/v1 and every existing integration works.

- **Base URL**: `https://api.aigateway.sh/v1`
- **Auth**: `Authorization: Bearer sk-aig-...`
- **OpenAPI JSON**: https://aigateway.sh/openapi.json
- **Live catalog**: `GET /v1/models`  ·  150+ models
- **MCP server**: https://api.aigateway.sh/mcp

## chat
OpenAI-compatible chat completions (text, vision, tool use, streaming, reasoning)

### `POST /chat/completions`
**Create chat completion**

Drop-in replacement for OpenAI's /v1/chat/completions. Supports streaming via SSE, tool calling, vision, JSON mode, and reasoning models with normalized `reasoning_content` deltas.

_Body_ (`application/json`): see `ChatCompletionRequest` schema below

_Responses_: `200`, `400`, `401`, `402`, `404`, `429`

## embeddings
Text embeddings

### `POST /embeddings`
**Create embeddings**

_Body_ (`application/json`)
- `model` (string, required) — e.g. `baai/bge-m3`
- `input` (string, required)

_Responses_: `200`

## images
Image generation

### `POST /images/generations`
**Generate image**

_Body_ (`application/json`)
- `model` (string, required) — e.g. `black-forest-labs/flux-2-klein-9b`
- `prompt` (string, required)
- `size` (string) — e.g. `1024x1024`
- `quality` (string)
- `style` (string)
- `response_format` (string)
- `n` (integer)

_Responses_: `200`

## audio
STT, TTS, music

### `POST /audio/transcriptions`
**Transcribe audio (STT)**

_Body_ (`multipart/form-data`)
- `model` (string, required) — e.g. `deepgram/nova-3`
- `file` (string, required)
- `language` (string)

_Body_ (`application/json`)
- `model` (string, required) — e.g. `deepgram/nova-3`
- `audio_url` (string, required)
- `language` (string)

_Responses_: `200`

### `POST /audio/speech`
**Synthesize speech (TTS)**

_Body_ (`application/json`)
- `model` (string, required) — e.g. `deepgram/aura-2-en`
- `input` (string, required)
- `voice` (string)
- `response_format` (string)

_Responses_: `200`

### `POST /audio/music`
**Generate music (async)**

_Body_ (`application/json`): see `AsyncJobRequest` schema below

_Responses_: `202`

## video
Async video + 3D generation

### `POST /videos/generations`
**Generate video (async)**

Returns 202 with a job id. Poll GET /jobs/{id} or pass webhook_url.

_Body_ (`application/json`): see `AsyncJobRequest` schema below

_Responses_: `202`

### `POST /3d/generations`
**Generate 3D asset (async)**

_Body_ (`application/json`): see `AsyncJobRequest` schema below

_Responses_: `202`

## moderation
Content safety

### `POST /moderations`
**Moderate content**

_Body_ (`application/json`)
- `model` (string) — e.g. `meta/llama-guard-3-8b`
- `input` (string, required)

_Responses_: `200`

## modality-extra
Translation, classification, detection, OCR, rerank

### `POST /translations`
**Translate text**

_Body_ (`application/json`)
- `text` (string, required)
- `source_lang` (string, required) — e.g. `en`
- `target_lang` (string, required) — e.g. `fr`
- `model` (string) — e.g. `meta/m2m100-1.2b`

_Responses_: `200`

### `POST /classifications`
**Classify text**

_Body_ (`application/json`)
- `input` (string, required)
- `model` (string) — e.g. `huggingface/distilbert-sst-2`

_Responses_: `200`

### `POST /detections`
**Detect objects in image**

_Body_ (`application/json`)
- `image_url` (string)
- `image_b64` (string)
- `model` (string) — e.g. `facebook/detr-resnet-50`

_Responses_: `200`

### `POST /ocr`
**Extract text from image**

_Body_ (`application/json`)
- `image_url` (string)
- `image_b64` (string)
- `model` (string) — e.g. `microsoft/trocr-base-printed`

_Responses_: `200`

### `POST /rerank`
**Rerank documents**

_Body_ (`application/json`)
- `query` (string, required)
- `documents` (array, required)
- `model` (string) — e.g. `baai/bge-reranker-base`

_Responses_: `200`

## jobs
Async job lifecycle

### `GET /jobs/{id}`
**Get async job**

_Parameters_
- `id` (path, string, required)

_Responses_: `200`

### `DELETE /jobs/{id}`
**Cancel async job**

_Parameters_
- `id` (path, string, required)

_Responses_: `200`

## files
File upload, download, signed URLs

### `GET /files`
**List files**

_Parameters_
- `purpose` (query, string)

_Responses_: `200`

### `POST /files`
**Upload file**

_Body_ (`multipart/form-data`)
- `file` (string, required)
- `purpose` (string, required)

_Responses_: `200`

### `GET /files/{id}`
**Get file metadata**

_Parameters_
- `id` (path, string, required)

_Responses_: `200`

### `DELETE /files/{id}`
**Delete file**

_Parameters_
- `id` (path, string, required)

_Responses_: `200`

### `GET /files/{id}/content`
**Download file content**

_Parameters_
- `id` (path, string, required)

_Responses_: `200`

### `GET /files/jobs/{jobId}/{filename}/signed`
**Mint signed URL for a job result**

Returns a publicly fetchable URL on media.aigateway.sh, valid until expires_at. Max expiry 7 days.

_Parameters_
- `jobId` (path, string, required)
- `filename` (path, string, required)
- `expires_in` (query, integer)

_Responses_: `200`

## batches
OpenAI-style batch API at 50% off

### `GET /batches`
**List batches**

_Parameters_
- `limit` (query, integer)

_Responses_: `200`

### `POST /batches`
**Create batch**

Submits a JSONL file (uploaded via /files with purpose=batch) for batched inference at 50% off. SLA 24h.

_Body_ (`application/json`)
- `input_file_id` (string, required)
- `endpoint` (string, required) — e.g. `/v1/chat/completions`

_Responses_: `200`

### `GET /batches/{id}`
**Get batch**

_Parameters_
- `id` (path, string, required)

_Responses_: `200`

### `POST /batches/{id}/cancel`
**Cancel batch**

_Parameters_
- `id` (path, string, required)

_Responses_: `200`

## sub-accounts
Per-customer scoped keys with spend caps

### `GET /sub-accounts`
**List sub-accounts**

_Responses_: `200`

### `POST /sub-accounts`
**Create sub-account**

Mints a scoped key for one of your end customers. Spend cap, RPM, default tag, isolated analytics.

_Body_ (`application/json`)
- `name` (string, required)
- `external_ref` (string)
- `spend_cap_cents` (integer, required)
- `rate_limit_rpm` (integer)
- `default_tag` (string)

_Responses_: `200`

### `GET /sub-accounts/{id}`
**Get sub-account**

_Parameters_
- `id` (path, string, required)

_Responses_: `200`

### `PATCH /sub-accounts/{id}`
**Update sub-account**

_Parameters_
- `id` (path, string, required)

_Responses_: `200`

### `DELETE /sub-accounts/{id}`
**Delete sub-account**

_Parameters_
- `id` (path, string, required)

_Responses_: `200`

### `GET /sub-accounts/{id}/usage`
**Per-customer usage**

_Parameters_
- `id` (path, string, required)
- `month` (query, string) — e.g. `2026-04`

_Responses_: `200`

## evals
Eval-driven model routing

### `GET /evals`
**List eval runs**

_Responses_: `200`

### `POST /evals`
**Create eval run**

Run an eval across candidate models on your dataset. Returns an id you can use as `model: 'eval:<id>'` to always route to the current winner.

_Body_ (`application/json`)
- `name` (string, required)
- `candidate_models` (array, required)
- `dataset` (array, required)
- `metric` (string, required)

_Responses_: `200`

### `GET /evals/{id}`
**Get eval**

_Parameters_
- `id` (path, string, required)

_Responses_: `200`

## replays
Replay any past request on a new model

### `GET /replays`
**List replays**

_Responses_: `200`

### `POST /replays`
**Replay a past request on a new model**

_Body_ (`application/json`)
- `source_request_id` (string, required)
- `target_model` (string, required)
- `shadow` (boolean)

_Responses_: `200`

### `GET /replays/{id}`
**Get replay**

_Parameters_
- `id` (path, string, required)

_Responses_: `200`

## usage
Per-tag and per-customer cost attribution

### `GET /usage/by-tag`
**Usage by tag**

_Parameters_
- `month` (query, string) — e.g. `2026-04`

_Responses_: `200`

### `GET /usage/by-sub-account`
**Usage by sub-account**

_Parameters_
- `month` (query, string) — e.g. `2026-04`

_Responses_: `200`

### `POST /budgets`
**Set monthly budget for a tag**

_Body_ (`application/json`)
- `tag` (string, required)
- `cap_cents` (integer, required)

_Responses_: `200`

## webhooks
Signed callbacks for jobs + lifecycle events

### `GET /webhook-secret`
**Get signing secret**

_Responses_: `200`

### `POST /webhook-secret/rotate`
**Rotate signing secret**

_Responses_: `200`

## account
Balance + cache management

### `GET /balance`
**Wallet balance**

_Responses_: `200`

### `DELETE /cache`
**Purge cache**

Drops every entry in the exact-match KV cache and the semantic Vectorize cache for the authenticated account.

_Responses_: `200`

## models
Catalog discovery

### `GET /models`
**List models**

_Parameters_
- `modality` (query, string) — enum: `text|image|audio-stt|audio-tts|video|embedding|rerank|moderation`
- `provider` (query, string) — e.g. `anthropic`

_Responses_: `200`

### `GET /models/{id}`
**Get model detail**

_Parameters_
- `id` (path, string, required) — e.g. `anthropic/claude-opus-4.7`

_Responses_: `200`

## health
Provider health

### `GET /health/providers`
**Provider health**

_Responses_: `200`

## webhooks
Signed callbacks for async job results AND lifecycle events. Verify `x-aig-signature` (HMAC-SHA256 over `${t}.${raw_body}`) using the secret from `GET /v1/webhook-secret`. Failed deliveries retry on a 6-attempt schedule: 0s, 30s, 2m, 10m, 1h, 6h.

Headers on every delivery:
- `x-aig-signature` — `t=<unix>,v1=<hex>`
- `x-aig-event-type` — event name (see WebhookEvent schema)
- `x-aig-delivery-id` — stable across retries
- `x-aig-attempt` — 1-indexed attempt counter

## schemas
### ChatCompletionRequest
- `model` (string, required)
- `messages` (array, required)
- `stream` (boolean)
- `temperature` (number)
- `max_tokens` (integer)
- `tools` (array)
- `tool_choice` (string)
- `response_format` (object)
- `webhook_url` (string)

### AsyncJobRequest
- `prompt` (string, required)
- `model` (string)
- `duration` (number)
- `aspect_ratio` (string)
- `resolution` (string)
- `image_url` (string)
- `webhook_url` (string)

### JobAccepted
- `id` (string, required)
- `status` (string, required) — enum: `queued`
- `object` (string, required) — enum: `job`

### Job
- `id` (string, required)
- `status` (string, required) — enum: `queued|processing|completed|failed`
- `modality` (string)
- `model` (string)
- `created_at` (integer)
- `updated_at` (integer)
- `error` (object)
- `result_file_id` (string)
- `result_url` (string)
- `webhook_url` (string)
- `webhook_delivered` (boolean)
- `attempts` (integer)

### File
- `id` (string)
- `object` (string) — enum: `file`
- `bytes` (integer)
- `created_at` (integer)
- `filename` (string)
- `purpose` (string)

### Batch
- `id` (string)
- `object` (string) — enum: `batch`
- `endpoint` (string)
- `input_file_id` (string)
- `completion_window` (string)
- `status` (string) — enum: `validating|queued|in_progress|finalizing|completed|failed|expired`
- `output_file_id` (string)
- `error_file_id` (string)
- `request_counts` (object)
- `created_at` (integer)
- `expires_at` (integer)
- `completed_at` (integer)

### WebhookEvent
- `id` (string, required)
- `type` (string, required) — enum: `job.completed|job.failed|balance.low|balance.exhausted|usage.threshold.exceeded|usage.daily.summary|subaccount.created|subaccount.spend.exceeded|model.added|model.deprecated|key.rotated`
- `occurred_at` (integer, required)
- `data` (object, required)

---

Pipe this URL into Claude Code, Cursor, or any agent that consumes Markdown context.

- llms.txt (capability map): https://aigateway.sh/llms.txt
- llms-full.txt (every model with prices): https://aigateway.sh/llms-full.txt
- agents.md (integration playbook): https://aigateway.sh/agents.md