Reference

API reference

View as /reference.md

One OpenAI-compatible base URL. Every modality, every primitive, one API key. The main endpoints are drop-in with the OpenAI SDK; the aggregator primitives ( sub-accounts, replay, evals, cost tags ) live under /v1/ too.

BASEhttps://api.aigateway.sh/v1
SDK alternative

Most endpoints documented below are also wrapped in our typed SDKs. Useful when you want async-job helpers, sub-account convenience methods, or webhook signature verification without writing the boilerplate yourself.

pip install aigateway-py · pnpm add aigateway-js · npm i -g aigateway-cli
Python · Node · CLI · MCP server

Authentication

All requests use a Bearer token. Create keys in the dashboard.

bash
curl https://api.aigateway.sh/v1/models \
  -H "Authorization: Bearer sk-aig-..."

Errors

Errors follow the OpenAI shape.

json
{
  "error": {
    "message": "Monthly spend cap of $500.00 reached (spent $503.12)",
    "type": "budget_exceeded",
    "code": 402
  }
}
CodeTypeWhen
400invalid_request_errorMalformed body / missing field
401authentication_errorMissing or invalid API key
402budget_exceededSub-account spend cap hit
404model_not_foundUnknown model ID
429rate_limit_errorRPM limit hit; honor Retry-After
502provider_errorUpstream model returned a 5xx
504timeout_errorUpstream model timed out

Request headers

HeaderPurpose
Authorization: Bearer sk-aig-...Required on every request
X-Request-IdOptional idempotency-friendly correlation id; echoed back in the response
x-aig-tag: <string>Attribute this request to a tag (feature / tenant / user). Shows up in usage analytics and powers per-tag budgets.
x-cache: auto | force | skipCache behavior override
x-routing: cost | speed | quality | autoBias auto-routing when model is omitted

Chat completions

POST/v1/chat/completions

OpenAI-compatible. Use any model from the catalog.

python
from openai import OpenAI
client = OpenAI(base_url="https://api.aigateway.sh/v1", api_key="sk-aig-...")

r = client.chat.completions.create(
    model="anthropic/claude-opus-4.7",
    messages=[{"role": "user", "content": "Explain mixture-of-experts in 3 lines."}],
)
print(r.choices[0].message.content)
bash
curl https://api.aigateway.sh/v1/chat/completions \
  -H "Authorization: Bearer sk-aig-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4",
    "messages": [{"role": "user", "content": "hello"}]
  }'

Streaming

Add stream: true. SSE frames are OpenAI-format data: chunks.

javascript
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "https://api.aigateway.sh/v1", apiKey: process.env.AIG_KEY });

const stream = await client.chat.completions.create({
  model: "moonshot/kimi-k2.6",
  messages: [{ role: "user", content: "Stream a haiku about caching." }],
  stream: true,
});
for await (const part of stream) process.stdout.write(part.choices[0]?.delta?.content ?? "");

Tool calling

Pass tools exactly as OpenAI does. Tool schemas are normalized to every supported provider.

json
{
  "model": "anthropic/claude-sonnet-4.6",
  "messages": [{"role": "user", "content": "weather in Tokyo?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Look up live weather for a city.",
      "parameters": {
        "type": "object",
        "properties": { "city": { "type": "string" } },
        "required": ["city"]
      }
    }
  }]
}

Embeddings

POST/v1/embeddings
python
r = client.embeddings.create(
    model="baai/bge-m3",
    input=["cache hit", "cache miss", "stale read"],
)
vectors = [e.embedding for e in r.data]

Image generation

POST/v1/images/generations
bash
curl https://api.aigateway.sh/v1/images/generations \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{
    "model": "black-forest-labs/flux-2-klein-9b",
    "prompt": "a cozy reading corner, golden hour, 35mm film",
    "size": "1024x1024",
    "n": 1
  }'

Audio transcriptions (STT)

POST/v1/audio/transcriptions
python
with open("call.mp3", "rb") as f:
    r = client.audio.transcriptions.create(
        model="deepgram/nova-3",
        file=f,
    )
print(r.text)

Text-to-speech (TTS)

POST/v1/audio/speech
bash
curl https://api.aigateway.sh/v1/audio/speech \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{
    "model": "deepgram/aura-2-en",
    "input": "Hello from the aggregator.",
    "voice": "aura-2-en-angus",
    "response_format": "mp3"
  }' --output speech.mp3

Moderations

POST/v1/moderations
bash
curl https://api.aigateway.sh/v1/moderations \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{"model":"meta/llama-guard-3-8b","input":"some text"}'

Sub-accounts

Mint a scoped key for one of your customers. Each has its own spend cap, rate limit, default tag, and isolated analytics.

POST/v1/sub-accounts
bash
curl -X POST https://api.aigateway.sh/v1/sub-accounts \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{
    "name": "acme-corp",
    "external_ref": "acme-123",
    "spend_cap_cents": 50000,
    "rate_limit_rpm": 300,
    "default_tag": "acme"
  }'

# => { "id": "sa_9f...", "key": "sk-aig-...", "spend_cap_cents": 50000, ... }
GET/v1/sub-accounts
GET/v1/sub-accounts/:id
DELETE/v1/sub-accounts/:id

Cost tags + budgets

Set x-aig-tag on any request. Tags surface in attribution reports; sub-account spend caps are enforced server-side before dispatch.

bash
curl https://api.aigateway.sh/v1/chat/completions \
  -H "Authorization: Bearer sk-aig-..." \
  -H "x-aig-tag: summarize" \
  -d '{"model":"moonshot/kimi-k2.6","messages":[...]}'

Replay + shadow A/B

Re-run any past request against a different model and see cost, latency, and output diffs.

POST/v1/replays
bash
curl -X POST https://api.aigateway.sh/v1/replays \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{
    "source_request_id": "req_abc123",
    "target_model": "anthropic/claude-opus-4.7",
    "shadow": false
  }'

# =>
# { "source_output": "...",
#   "target_output": "...",
#   "cost_source_cents": 1.2,
#   "cost_target_cents": 4.7,
#   "latency_source_ms": 810,
#   "latency_target_ms": 2240,
#   "score_delta": 0.82 }
GET/v1/replays
GET/v1/replays/:id

Eval-driven routing

Upload a dataset of inputs (and optional expected outputs) plus candidate models. Get a winner, then use eval:<run_id> as a model alias from then on — rerun whenever a new frontier model lands.

POST/v1/evals
json
{
  "name": "prod-summarize",
  "metric": "quality",
  "candidate_models": [
    "anthropic/claude-opus-4.7",
    "openai/gpt-5.4",
    "moonshot/kimi-k2.6"
  ],
  "dataset": [
    { "input": "Summarize: ...", "expected": "..." },
    { "input": "Summarize: ...", "expected": "..." }
  ]
}

Then call chat with the alias:

python
r = client.chat.completions.create(
    model="eval:ev_7h3k...",            # alias of the winning model
    messages=[{"role": "user", "content": "..."}],
)
GET/v1/evals
GET/v1/evals/:id

Usage + attribution

GET/v1/usage/by-tag?month=2026-04
json
{
  "month": "2026-04",
  "data": [
    { "tag": "summarize", "requests": 12012, "cost_cents": 4210, "units": 17482000 },
    { "tag": "chat",      "requests":  8391, "cost_cents": 9830, "units": 19740000 }
  ]
}
GET/v1/usage/by-sub-account?month=2026-04

List models

GET/v1/models

Query with ?modality=text or ?provider=anthropic.

Get model detail

GET/v1/models/:id

Single-model lookup. Returns pricing, context window, modality, capability flags. Use the same provider/slug form as the chat model field.

Provider health

GET/v1/health/providers

p50, p95 and error rate per upstream. The router uses these internally.

Wallet balance

GET/v1/balance

Returns { cents, usd } for the authenticated key. Useful for in-app low-balance prompts before a call lands a 402.

Cache management

DELETE/v1/cache

Purges every entry in the exact-match KV cache and the semantic Vectorize cache for your account. Useful after a deploy that changes prompt templates.

Webhook secret

GET/v1/webhook-secret
POST/v1/webhook-secret/rotate

Returns the per-key signing secret used to verify x-aig-signature on inbound callbacks. Both SDKs ship a constant-time verifier; rotation invalidates the previous secret immediately, so coordinate with your endpoint before calling rotate.

Not supported (yet)

The router accepts the paths below but returns a deliberate non-200. Don't probe — these are tracked, and you'll get the expected enable-date in the response when they ship.

All endpoints

Auto-generated from /openapi.json. The curated sections above include examples and prose; this list is the exhaustive index — if it's here, the gateway honours it.

chatOpenAI-compatible chat completions (text, vision, tool use, streaming, reasoning)
  • POST/v1/chat/completionsCreate chat completion
embeddingsText embeddings
  • POST/v1/embeddingsCreate embeddings
imagesImage generation
  • POST/v1/images/generationsGenerate image
audioSTT, TTS, music
  • POST/v1/audio/transcriptionsTranscribe audio (STT)
  • POST/v1/audio/speechSynthesize speech (TTS)
  • POST/v1/audio/musicGenerate music (async)
videoAsync video + 3D generation
  • POST/v1/videos/generationsGenerate video (async)
  • POST/v1/3d/generationsGenerate 3D asset (async)
moderationContent safety
  • POST/v1/moderationsModerate content
modality-extraTranslation, classification, detection, OCR, rerank
  • POST/v1/translationsTranslate text
  • POST/v1/classificationsClassify text
  • POST/v1/detectionsDetect objects in image
  • POST/v1/ocrExtract text from image
  • POST/v1/rerankRerank documents
jobsAsync job lifecycle
  • GET/v1/jobs/{id}Get async job
  • DELETE/v1/jobs/{id}Cancel async job
filesFile upload, download, signed URLs
  • GET/v1/filesList files
  • POST/v1/filesUpload file
  • GET/v1/files/{id}Get file metadata
  • DELETE/v1/files/{id}Delete file
  • GET/v1/files/{id}/contentDownload file content
  • GET/v1/files/jobs/{jobId}/{filename}/signedMint signed URL for a job result
batchesOpenAI-style batch API at 50% off
  • GET/v1/batchesList batches
  • POST/v1/batchesCreate batch
  • GET/v1/batches/{id}Get batch
  • POST/v1/batches/{id}/cancelCancel batch
sub-accountsPer-customer scoped keys with spend caps
  • GET/v1/sub-accountsList sub-accounts
  • POST/v1/sub-accountsCreate sub-account
  • GET/v1/sub-accounts/{id}Get sub-account
  • PATCH/v1/sub-accounts/{id}Update sub-account
  • DELETE/v1/sub-accounts/{id}Delete sub-account
  • GET/v1/sub-accounts/{id}/usagePer-customer usage
evalsEval-driven model routing
  • GET/v1/evalsList eval runs
  • POST/v1/evalsCreate eval run
  • GET/v1/evals/{id}Get eval
replaysReplay any past request on a new model
  • GET/v1/replaysList replays
  • POST/v1/replaysReplay a past request on a new model
  • GET/v1/replays/{id}Get replay
usagePer-tag and per-customer cost attribution
  • GET/v1/usage/by-tagUsage by tag
  • GET/v1/usage/by-sub-accountUsage by sub-account
  • POST/v1/budgetsSet monthly budget for a tag
webhooksSigned callbacks for jobs + lifecycle events
  • GET/v1/webhook-secretGet signing secret
  • POST/v1/webhook-secret/rotateRotate signing secret
accountBalance + cache management
  • GET/v1/balanceWallet balance
  • DELETE/v1/cachePurge cache
modelsCatalog discovery
  • GET/v1/modelsList models
  • GET/v1/models/{id}Get model detail
healthProvider health
  • GET/v1/health/providersProvider health

Need help? Contact support — we'll reply in under 24h.