API reference
One OpenAI-compatible base URL. Every modality, every primitive, one API key. The main endpoints are drop-in with the OpenAI SDK; the aggregator primitives ( sub-accounts, replay, evals, cost tags ) live under /v1/ too.
Most endpoints documented below are also wrapped in our typed SDKs. Useful when you want async-job helpers, sub-account convenience methods, or webhook signature verification without writing the boilerplate yourself.
pip install aigateway-py · pnpm add aigateway-js · npm i -g aigateway-cliAuthentication
All requests use a Bearer token. Create keys in the dashboard.
curl https://api.aigateway.sh/v1/models \ -H "Authorization: Bearer sk-aig-..."
Errors
Errors follow the OpenAI shape.
{
"error": {
"message": "Monthly spend cap of $500.00 reached (spent $503.12)",
"type": "budget_exceeded",
"code": 402
}
}| Code | Type | When |
|---|---|---|
| 400 | invalid_request_error | Malformed body / missing field |
| 401 | authentication_error | Missing or invalid API key |
| 402 | budget_exceeded | Sub-account spend cap hit |
| 404 | model_not_found | Unknown model ID |
| 429 | rate_limit_error | RPM limit hit; honor Retry-After |
| 502 | provider_error | Upstream model returned a 5xx |
| 504 | timeout_error | Upstream model timed out |
Request headers
| Header | Purpose |
|---|---|
Authorization: Bearer sk-aig-... | Required on every request |
X-Request-Id | Optional idempotency-friendly correlation id; echoed back in the response |
x-aig-tag: <string> | Attribute this request to a tag (feature / tenant / user). Shows up in usage analytics and powers per-tag budgets. |
x-cache: auto | force | skip | Cache behavior override |
x-routing: cost | speed | quality | auto | Bias auto-routing when model is omitted |
Chat completions
OpenAI-compatible. Use any model from the catalog.
from openai import OpenAI
client = OpenAI(base_url="https://api.aigateway.sh/v1", api_key="sk-aig-...")
r = client.chat.completions.create(
model="anthropic/claude-opus-4.7",
messages=[{"role": "user", "content": "Explain mixture-of-experts in 3 lines."}],
)
print(r.choices[0].message.content)curl https://api.aigateway.sh/v1/chat/completions \
-H "Authorization: Bearer sk-aig-..." \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.4",
"messages": [{"role": "user", "content": "hello"}]
}'Streaming
Add stream: true. SSE frames are OpenAI-format data: chunks.
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "https://api.aigateway.sh/v1", apiKey: process.env.AIG_KEY });
const stream = await client.chat.completions.create({
model: "moonshot/kimi-k2.6",
messages: [{ role: "user", content: "Stream a haiku about caching." }],
stream: true,
});
for await (const part of stream) process.stdout.write(part.choices[0]?.delta?.content ?? "");Tool calling
Pass tools exactly as OpenAI does. Tool schemas are normalized to every supported provider.
{
"model": "anthropic/claude-sonnet-4.6",
"messages": [{"role": "user", "content": "weather in Tokyo?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Look up live weather for a city.",
"parameters": {
"type": "object",
"properties": { "city": { "type": "string" } },
"required": ["city"]
}
}
}]
}Embeddings
r = client.embeddings.create(
model="baai/bge-m3",
input=["cache hit", "cache miss", "stale read"],
)
vectors = [e.embedding for e in r.data]Image generation
curl https://api.aigateway.sh/v1/images/generations \
-H "Authorization: Bearer sk-aig-..." \
-d '{
"model": "black-forest-labs/flux-2-klein-9b",
"prompt": "a cozy reading corner, golden hour, 35mm film",
"size": "1024x1024",
"n": 1
}'Audio transcriptions (STT)
with open("call.mp3", "rb") as f:
r = client.audio.transcriptions.create(
model="deepgram/nova-3",
file=f,
)
print(r.text)Text-to-speech (TTS)
curl https://api.aigateway.sh/v1/audio/speech \
-H "Authorization: Bearer sk-aig-..." \
-d '{
"model": "deepgram/aura-2-en",
"input": "Hello from the aggregator.",
"voice": "aura-2-en-angus",
"response_format": "mp3"
}' --output speech.mp3Moderations
curl https://api.aigateway.sh/v1/moderations \
-H "Authorization: Bearer sk-aig-..." \
-d '{"model":"meta/llama-guard-3-8b","input":"some text"}'Sub-accounts
Mint a scoped key for one of your customers. Each has its own spend cap, rate limit, default tag, and isolated analytics.
curl -X POST https://api.aigateway.sh/v1/sub-accounts \
-H "Authorization: Bearer sk-aig-..." \
-d '{
"name": "acme-corp",
"external_ref": "acme-123",
"spend_cap_cents": 50000,
"rate_limit_rpm": 300,
"default_tag": "acme"
}'
# => { "id": "sa_9f...", "key": "sk-aig-...", "spend_cap_cents": 50000, ... }Cost tags + budgets
Set x-aig-tag on any request. Tags surface in attribution reports; sub-account spend caps are enforced server-side before dispatch.
curl https://api.aigateway.sh/v1/chat/completions \
-H "Authorization: Bearer sk-aig-..." \
-H "x-aig-tag: summarize" \
-d '{"model":"moonshot/kimi-k2.6","messages":[...]}'Replay + shadow A/B
Re-run any past request against a different model and see cost, latency, and output diffs.
curl -X POST https://api.aigateway.sh/v1/replays \
-H "Authorization: Bearer sk-aig-..." \
-d '{
"source_request_id": "req_abc123",
"target_model": "anthropic/claude-opus-4.7",
"shadow": false
}'
# =>
# { "source_output": "...",
# "target_output": "...",
# "cost_source_cents": 1.2,
# "cost_target_cents": 4.7,
# "latency_source_ms": 810,
# "latency_target_ms": 2240,
# "score_delta": 0.82 }Eval-driven routing
Upload a dataset of inputs (and optional expected outputs) plus candidate models. Get a winner, then use eval:<run_id> as a model alias from then on — rerun whenever a new frontier model lands.
{
"name": "prod-summarize",
"metric": "quality",
"candidate_models": [
"anthropic/claude-opus-4.7",
"openai/gpt-5.4",
"moonshot/kimi-k2.6"
],
"dataset": [
{ "input": "Summarize: ...", "expected": "..." },
{ "input": "Summarize: ...", "expected": "..." }
]
}Then call chat with the alias:
r = client.chat.completions.create(
model="eval:ev_7h3k...", # alias of the winning model
messages=[{"role": "user", "content": "..."}],
)Usage + attribution
{
"month": "2026-04",
"data": [
{ "tag": "summarize", "requests": 12012, "cost_cents": 4210, "units": 17482000 },
{ "tag": "chat", "requests": 8391, "cost_cents": 9830, "units": 19740000 }
]
}List models
Query with ?modality=text or ?provider=anthropic.
Get model detail
Single-model lookup. Returns pricing, context window, modality, capability flags. Use the same provider/slug form as the chat model field.
Provider health
p50, p95 and error rate per upstream. The router uses these internally.
Wallet balance
Returns { cents, usd } for the authenticated key. Useful for in-app low-balance prompts before a call lands a 402.
Cache management
Purges every entry in the exact-match KV cache and the semantic Vectorize cache for your account. Useful after a deploy that changes prompt templates.
Webhook secret
Returns the per-key signing secret used to verify x-aig-signature on inbound callbacks. Both SDKs ship a constant-time verifier; rotation invalidates the previous secret immediately, so coordinate with your endpoint before calling rotate.
Not supported (yet)
The router accepts the paths below but returns a deliberate non-200. Don't probe — these are tracked, and you'll get the expected enable-date in the response when they ship.
POST /v1/completions→400· legacy completions; use/v1/chat/completionsinsteadPOST /v1/images/edits→501· image editing API not yet wiredGET /v1/realtime→501· WebSocket realtime not yet wired- Assistants / Threads / Vector stores / Fine-tuning — intentionally not implemented; aggregator scope
All endpoints
Auto-generated from /openapi.json. The curated sections above include examples and prose; this list is the exhaustive index — if it's here, the gateway honours it.
- POST
/v1/chat/completionsCreate chat completion
- POST
/v1/embeddingsCreate embeddings
- POST
/v1/images/generationsGenerate image
- POST
/v1/audio/transcriptionsTranscribe audio (STT) - POST
/v1/audio/speechSynthesize speech (TTS) - POST
/v1/audio/musicGenerate music (async)
- POST
/v1/videos/generationsGenerate video (async) - POST
/v1/3d/generationsGenerate 3D asset (async)
- POST
/v1/moderationsModerate content
- POST
/v1/translationsTranslate text - POST
/v1/classificationsClassify text - POST
/v1/detectionsDetect objects in image - POST
/v1/ocrExtract text from image - POST
/v1/rerankRerank documents
- GET
/v1/jobs/{id}Get async job - DELETE
/v1/jobs/{id}Cancel async job
- GET
/v1/filesList files - POST
/v1/filesUpload file - GET
/v1/files/{id}Get file metadata - DELETE
/v1/files/{id}Delete file - GET
/v1/files/{id}/contentDownload file content - GET
/v1/files/jobs/{jobId}/{filename}/signedMint signed URL for a job result
- GET
/v1/batchesList batches - POST
/v1/batchesCreate batch - GET
/v1/batches/{id}Get batch - POST
/v1/batches/{id}/cancelCancel batch
- GET
/v1/sub-accountsList sub-accounts - POST
/v1/sub-accountsCreate sub-account - GET
/v1/sub-accounts/{id}Get sub-account - PATCH
/v1/sub-accounts/{id}Update sub-account - DELETE
/v1/sub-accounts/{id}Delete sub-account - GET
/v1/sub-accounts/{id}/usagePer-customer usage
- GET
/v1/evalsList eval runs - POST
/v1/evalsCreate eval run - GET
/v1/evals/{id}Get eval
- GET
/v1/replaysList replays - POST
/v1/replaysReplay a past request on a new model - GET
/v1/replays/{id}Get replay
- GET
/v1/usage/by-tagUsage by tag - GET
/v1/usage/by-sub-accountUsage by sub-account - POST
/v1/budgetsSet monthly budget for a tag
- GET
/v1/webhook-secretGet signing secret - POST
/v1/webhook-secret/rotateRotate signing secret
- GET
/v1/balanceWallet balance - DELETE
/v1/cachePurge cache
- GET
/v1/modelsList models - GET
/v1/models/{id}Get model detail
- GET
/v1/health/providersProvider health
Need help? Contact support — we'll reply in under 24h.