API reference
One OpenAI-compatible base URL. Every modality, every primitive, one API key. The main endpoints are drop-in with the OpenAI SDK; the aggregator primitives ( sub-accounts, replay, evals, cost tags ) live under /v1/ too.
Most endpoints documented below are also wrapped in our typed SDKs. Useful when you want async-job helpers, sub-account convenience methods, or webhook signature verification without writing the boilerplate yourself.
pip install aigateway-py · pnpm add aigateway-js · npm i -g aigateway-cliAuthentication
All requests use a Bearer token. Create keys in the dashboard.
curl https://api.aigateway.sh/v1/models \ -H "Authorization: Bearer sk-aig-..."
Errors
Errors follow the OpenAI shape.
{
"error": {
"message": "Monthly spend cap of $500.00 reached (spent $503.12)",
"type": "budget_exceeded",
"code": 402
}
}| Code | Type | When |
|---|---|---|
| 400 | invalid_request_error | Malformed body / missing field |
| 401 | authentication_error | Missing or invalid API key |
| 402 | budget_exceeded | Sub-account spend cap hit |
| 404 | model_not_found | Unknown model ID |
| 429 | rate_limit_error | RPM limit hit; honor Retry-After |
| 502 | provider_error | Upstream model returned a 5xx |
| 504 | timeout_error | Upstream model timed out |
Request headers
| Header | Purpose |
|---|---|
Authorization: Bearer sk-aig-... | Required on every request |
X-Request-Id | Optional idempotency-friendly correlation id; echoed back in the response |
x-aig-tag: <string> | Attribute this request to a tag (feature / tenant / user). Shows up in usage analytics and powers per-tag budgets. |
x-cache: auto | force | skip | Cache behavior override |
x-routing: cost | speed | quality | auto | Bias auto-routing when model is omitted |
Chat completions
OpenAI-compatible. Use any model from the catalog.
# pip install aigateway-py openai
# aigateway-py: sub-accounts, evals, replays, jobs, webhook verify.
# openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
from openai import OpenAI
client = OpenAI(base_url="https://api.aigateway.sh/v1", api_key="sk-aig-...")
r = client.chat.completions.create(
model="anthropic/claude-opus-4.7",
messages=[{"role": "user", "content": "Explain mixture-of-experts in 3 lines."}],
)
print(r.choices[0].message.content)curl https://api.aigateway.sh/v1/chat/completions \
-H "Authorization: Bearer sk-aig-..." \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.4",
"messages": [{"role": "user", "content": "hello"}]
}'Streaming
Add stream: true. SSE frames are OpenAI-format data: chunks.
// pnpm add aigateway-js openai (or npm / yarn)
// aigateway-js: sub-accounts, evals, replays, jobs, webhook verify.
// openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "https://api.aigateway.sh/v1", apiKey: process.env.AIG_KEY });
const stream = await client.chat.completions.create({
model: "moonshot/kimi-k2.6",
messages: [{ role: "user", content: "Stream a haiku about caching." }],
stream: true,
});
for await (const part of stream) process.stdout.write(part.choices[0]?.delta?.content ?? "");Tool calling
Pass tools exactly as OpenAI does. Tool schemas are normalized to every supported provider.
{
"model": "anthropic/claude-sonnet-4.6",
"messages": [{"role": "user", "content": "weather in Tokyo?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Look up live weather for a city.",
"parameters": {
"type": "object",
"properties": { "city": { "type": "string" } },
"required": ["city"]
}
}
}]
}Embeddings
r = client.embeddings.create(
model="baai/bge-m3",
input=["cache hit", "cache miss", "stale read"],
)
vectors = [e.embedding for e in r.data]Image generation
curl https://api.aigateway.sh/v1/images/generations \
-H "Authorization: Bearer sk-aig-..." \
-d '{
"model": "black-forest-labs/flux-2-klein-9b",
"prompt": "a cozy reading corner, golden hour, 35mm film",
"size": "1024x1024",
"n": 1
}'Image edits
Edit an existing image with a prompt. Send multipart/form-data with an image file (and optional mask), or JSON with an image_url. Default model is bria/fibo-edit/edit; uploads cap at 25 MB.
curl https://api.aigateway.sh/v1/images/edits \ -H "Authorization: Bearer sk-aig-..." \ -F "image=@cat.png" \ -F "prompt=make the cat wear sunglasses" \ -F "model=bria/fibo-edit/edit"
Audio transcriptions (STT)
with open("call.mp3", "rb") as f:
r = client.audio.transcriptions.create(
model="deepgram/nova-3",
file=f,
)
print(r.text)For long recordings, transcribe asynchronously — pass async: true (then poll /v1/jobs/<id>) or a webhook_url. Available on deepgram/nova-3 and deepgram/flux.
curl -X POST https://api.aigateway.sh/v1/audio/transcriptions \
-H "Authorization: Bearer sk-aig-..." \
-H "Content-Type: application/json" \
-d '{"model":"deepgram/nova-3","audio_url":"https://example.com/call.wav","async":true}'
# → { "id": "<job_id>", "status": "processing" } — then poll GET /v1/jobs/<job_id>
# → { "status": "completed", "result": { "transcript": { "text": "...", "segments": [...] } } }Realtime transcription (WebSocket)
Stream audio over a WebSocket for live interim + final transcripts. Browsers pass the key as ?api_key=; servers can use the Authorization header. End the stream with { "type": "CloseStream" }. Billed per audio-minute at the realtime (websocket) rate, which is higher than batch. Models: deepgram/nova-3, deepgram/flux.
const ws = new WebSocket(
"wss://api.aigateway.sh/v1/realtime?model=deepgram/nova-3&encoding=linear16&sample_rate=16000&interim_results=true&api_key=" + KEY,
);
ws.onmessage = (e) => {
const m = JSON.parse(e.data);
if (m.type === "Results") console.log(m.channel.alternatives[0].transcript, m.is_final);
};
// stream raw linear16 PCM frames, then end:
ws.send(JSON.stringify({ type: "CloseStream" }));Text-to-speech (TTS)
curl https://api.aigateway.sh/v1/audio/speech \
-H "Authorization: Bearer sk-aig-..." \
-d '{
"model": "deepgram/aura-2-en",
"input": "Hello from the aggregator.",
"voice": "aura-2-en-angus",
"response_format": "mp3"
}' --output speech.mp3Moderations
curl https://api.aigateway.sh/v1/moderations \
-H "Authorization: Bearer sk-aig-..." \
-d '{"model":"meta/llama-guard-3-8b","input":"some text"}'Sub-accounts
Mint a scoped key for one of your customers. Each has its own spend cap, rate limit, default tag, and isolated analytics.
# pip install aigateway-py openai
# aigateway-py: sub-accounts, evals, replays, jobs, webhook verify.
# openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
from aigateway import AIgateway
aig = AIgateway(api_key="sk-aig-...")
sub = aig.sub_accounts.create(
name="acme-corp",
external_ref="acme-123",
spend_cap_cents=50_000,
rate_limit_rpm=300,
default_tag="acme",
)
# sub.id, sub.key — hand sub.key to Acme; it's scoped + capped.// pnpm add aigateway-js openai (or npm / yarn)
// aigateway-js: sub-accounts, evals, replays, jobs, webhook verify.
// openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
import { AIgateway } from "aigateway-js";
const aig = new AIgateway({ apiKey: process.env.AIG_KEY! });
const sub = await aig.subAccounts.create({
name: "acme-corp",
externalRef: "acme-123",
spendCapCents: 50_000,
rateLimitRpm: 300,
defaultTag: "acme",
});
// sub.id, sub.key — hand sub.key to Acme; it's scoped + capped.curl -X POST https://api.aigateway.sh/v1/sub-accounts \
-H "Authorization: Bearer sk-aig-..." \
-d '{
"name": "acme-corp",
"external_ref": "acme-123",
"spend_cap_cents": 50000,
"rate_limit_rpm": 300,
"default_tag": "acme"
}'
# => { "id": "sa_9f...", "key": "sk-aig-...", "spend_cap_cents": 50000, ... }Cost tags + budgets
Set x-aig-tag on any request. Tags surface in attribution reports; sub-account spend caps are enforced server-side before dispatch.
curl https://api.aigateway.sh/v1/chat/completions \
-H "Authorization: Bearer sk-aig-..." \
-H "x-aig-tag: summarize" \
-d '{"model":"moonshot/kimi-k2.6","messages":[...]}'Replay + shadow A/B
Re-run any past request against a different model and see cost, latency, and output diffs.
curl -X POST https://api.aigateway.sh/v1/replays \
-H "Authorization: Bearer sk-aig-..." \
-d '{
"source_request_id": "req_abc123",
"target_model": "anthropic/claude-opus-4.7",
"shadow": false
}'
# =>
# { "source_output": "...",
# "target_output": "...",
# "cost_source_cents": 1.2,
# "cost_target_cents": 4.7,
# "latency_source_ms": 810,
# "latency_target_ms": 2240,
# "score_delta": 0.82 }Eval-driven routing
Upload a dataset of inputs (and optional expected outputs) plus candidate models. Get a winner, then use eval:<run_id> as a model alias from then on — rerun whenever a new frontier model lands.
{
"name": "prod-summarize",
"metric": "quality",
"candidate_models": [
"anthropic/claude-opus-4.7",
"openai/gpt-5.4",
"moonshot/kimi-k2.6"
],
"dataset": [
{ "input": "Summarize: ...", "expected": "..." },
{ "input": "Summarize: ...", "expected": "..." }
]
}Then call chat with the alias:
r = client.chat.completions.create(
model="eval:ev_7h3k...", # alias of the winning model
messages=[{"role": "user", "content": "..."}],
)Usage + attribution
{
"month": "2026-04",
"data": [
{ "tag": "summarize", "requests": 12012, "cost_cents": 4210, "units": 17482000 },
{ "tag": "chat", "requests": 8391, "cost_cents": 9830, "units": 19740000 }
]
}List models
Query with ?modality=text or ?provider=anthropic.
Get model detail
Single-model lookup. Returns pricing, context window, modality, capability flags — plus a schema block: the exact endpoint, a request example, response and streaming shapes, model-specific quirks, and runnable curl/Python/TypeScript snippets. Use the same provider/slug form as the chat model field.
Model schema
The invocation contract for one model, on its own. This is the canonical "how do I call this model" call — an agent fetches it instead of guessing the request shape. Quirks flag the gotchas: reasoning models that stream reasoning_content, models that reject temperature, and o-series/GPT-5 that use max_completion_tokens.
{
"endpoint": { "method": "POST", "path": "/v1/chat/completions", "url": "https://api.aigateway.sh/v1/chat/completions" },
"quirks": { "reasoning": true, "noSampling": true, "anthropicNative": true },
"request_example": "{ \"model\": \"anthropic/claude-opus-4.7\", \"messages\": [ ... ] }",
"response_example": "{ \"id\": \"chatcmpl-...\", \"choices\": [ ... ] }",
"streaming_example": "data: {\"choices\":[{\"delta\":{\"content\":\"Hello\"}}]}",
"sdk": { "curl": "...", "python": "...", "typescript": "..." }
}Capabilities
The fixed capability vocabulary used across the catalog — each id with a one-line meaning and the endpoint it maps to (e.g. vision, function_calling, reasoning, text-to-video, async). Agents reason over this fixed set instead of guessing what a free-text tag means; the capabilities array on each model entry draws from it.
Provider health
p50, p95 and error rate per upstream. The router uses these internally.
Wallet balance
Returns { cents, usd } for the authenticated key. Useful for in-app low-balance prompts before a call lands a 402.
Cache management
Purges every entry in the exact-match KV cache and the semantic Vectorize cache for your account. Useful after a deploy that changes prompt templates.
Webhook secret
Returns the per-key signing secret used to verify x-aig-signature on inbound callbacks. Both SDKs ship a constant-time verifier; rotation invalidates the previous secret immediately, so coordinate with your endpoint before calling rotate.
Not supported (yet)
The router accepts the paths below but returns a deliberate non-200. Don't probe — these are tracked, and you'll get the expected enable-date in the response when they ship.
POST /v1/completions→400· legacy completions; use/v1/chat/completionsinstead- Assistants / Threads / Vector stores / Fine-tuning — intentionally not implemented; aggregator scope
All endpoints
Auto-generated from /openapi.json. The curated sections above include examples and prose; this list is the exhaustive index — if it's here, the gateway honours it.
- POST
/v1/chat/completionsCreate chat completion
- POST
/v1/embeddingsCreate embeddings
- POST
/v1/images/generationsGenerate image - POST
/v1/images/editsEdit image (image-to-image)
- POST
/v1/audio/transcriptionsTranscribe audio (STT) — sync, or async via async:true / webhook_url - GET
/v1/realtimeRealtime streaming transcription (WebSocket) - POST
/v1/audio/speechSynthesize speech (TTS) - POST
/v1/audio/musicGenerate music (async)
- POST
/v1/videos/generationsGenerate video (async) - POST
/v1/3d/generationsGenerate 3D asset (async)
- POST
/v1/moderationsModerate content
- POST
/v1/translationsTranslate text - POST
/v1/classificationsClassify text - POST
/v1/detectionsDetect objects in image - POST
/v1/ocrExtract text from image - POST
/v1/rerankRerank documents
- GET
/v1/jobs/{id}Get async job - DELETE
/v1/jobs/{id}Cancel async job
- GET
/v1/filesList files - POST
/v1/filesUpload file - GET
/v1/files/{id}Get file metadata - DELETE
/v1/files/{id}Delete file - GET
/v1/files/{id}/contentDownload file content - GET
/v1/files/jobs/{jobId}/{filename}/signedMint signed URL for a job result
- GET
/v1/batchesList batches - POST
/v1/batchesCreate batch - GET
/v1/batches/{id}Get batch - POST
/v1/batches/{id}/cancelCancel batch
- GET
/v1/sub-accountsList sub-accounts - POST
/v1/sub-accountsCreate sub-account - GET
/v1/sub-accounts/{id}Get sub-account - PATCH
/v1/sub-accounts/{id}Update sub-account - DELETE
/v1/sub-accounts/{id}Delete sub-account
- GET
/v1/evalsList eval runs - POST
/v1/evalsCreate eval run - GET
/v1/evals/{id}Get eval
- GET
/v1/replaysList replays - POST
/v1/replaysReplay a past request on a new model - GET
/v1/replays/{id}Get replay
- GET
/v1/usage/by-tagUsage by tag - GET
/v1/usage/by-sub-accountUsage by sub-account
- GET
/v1/webhook-secretGet signing secret - POST
/v1/webhook-secret/rotateRotate signing secret
- GET
/v1/balanceWallet balance - DELETE
/v1/cachePurge cache
- GET
/v1/modelsList models - GET
/v1/models/{id}Get model detail + invocation schema - GET
/v1/models/{id}/schemaGet a model's invocation schema - GET
/v1/capabilitiesList the capability vocabulary
- GET
/v1/health/providersProvider health
Need help? Contact support — we'll reply in under 24h.