Reference

API reference

View as /reference.md

One OpenAI-compatible base URL. Every modality, every primitive, one API key. The main endpoints are drop-in with the OpenAI SDK; the aggregator primitives ( sub-accounts, replay, evals, cost tags ) live under /v1/ too.

BASEhttps://api.aigateway.sh/v1

SDK alternative

Most endpoints documented below are also wrapped in our typed SDKs. Useful when you want async-job helpers, sub-account convenience methods, or webhook signature verification without writing the boilerplate yourself.

pip install aigateway-py · pnpm add aigateway-js · npm i -g aigateway-cli

Python · Node · CLI · MCP server

Authentication

All requests use a Bearer token. Create keys in the dashboard.

bash

curl https://api.aigateway.sh/v1/models \
  -H "Authorization: Bearer sk-aig-..."

Errors

Errors follow the OpenAI shape.

json

{
  "error": {
    "message": "Monthly spend cap of $500.00 reached (spent $503.12)",
    "type": "budget_exceeded",
    "code": 402
  }
}

Code	Type	When
400	invalid_request_error	Malformed body / missing field
401	authentication_error	Missing or invalid API key
402	budget_exceeded	Sub-account spend cap hit
404	model_not_found	Unknown model ID
429	rate_limit_error	RPM limit hit; honor `Retry-After`
502	provider_error	Upstream model returned a 5xx
504	timeout_error	Upstream model timed out

Request headers

Header	Purpose
`Authorization: Bearer sk-aig-...`	Required on every request
`X-Request-Id`	Optional idempotency-friendly correlation id; echoed back in the response
`x-aig-tag: <string>`	Attribute this request to a tag (feature / tenant / user). Shows up in usage analytics and powers per-tag budgets.
`x-cache: auto \| force \| skip`	Cache behavior override
`x-routing: cost \| speed \| quality \| auto`	Bias auto-routing when `model` is omitted

Chat completions

POST/v1/chat/completions

OpenAI-compatible. Use any model from the catalog.

python

# pip install aigateway-py openai
# aigateway-py: sub-accounts, evals, replays, jobs, webhook verify.
# openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
from openai import OpenAI
client = OpenAI(base_url="https://api.aigateway.sh/v1", api_key="sk-aig-...")

r = client.chat.completions.create(
    model="anthropic/claude-opus-4.7",
    messages=[{"role": "user", "content": "Explain mixture-of-experts in 3 lines."}],
)
print(r.choices[0].message.content)

bash

curl https://api.aigateway.sh/v1/chat/completions \
  -H "Authorization: Bearer sk-aig-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4",
    "messages": [{"role": "user", "content": "hello"}]
  }'

Streaming

Add stream: true. SSE frames are OpenAI-format data: chunks.

javascript

// pnpm add aigateway-js openai   (or npm / yarn)
// aigateway-js: sub-accounts, evals, replays, jobs, webhook verify.
// openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
import OpenAI from "openai";
const client = new OpenAI({ baseURL: "https://api.aigateway.sh/v1", apiKey: process.env.AIG_KEY });

const stream = await client.chat.completions.create({
  model: "moonshot/kimi-k2.6",
  messages: [{ role: "user", content: "Stream a haiku about caching." }],
  stream: true,
});
for await (const part of stream) process.stdout.write(part.choices[0]?.delta?.content ?? "");

Tool calling

Pass tools exactly as OpenAI does. Tool schemas are normalized to every supported provider.

json

{
  "model": "anthropic/claude-sonnet-4.6",
  "messages": [{"role": "user", "content": "weather in Tokyo?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Look up live weather for a city.",
      "parameters": {
        "type": "object",
        "properties": { "city": { "type": "string" } },
        "required": ["city"]
      }
    }
  }]
}

Embeddings

POST/v1/embeddings

python

r = client.embeddings.create(
    model="baai/bge-m3",
    input=["cache hit", "cache miss", "stale read"],
)
vectors = [e.embedding for e in r.data]

Image generation

POST/v1/images/generations

bash

curl https://api.aigateway.sh/v1/images/generations \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{
    "model": "black-forest-labs/flux-2-klein-9b",
    "prompt": "a cozy reading corner, golden hour, 35mm film",
    "size": "1024x1024",
    "n": 1
  }'

Image edits

POST/v1/images/edits

Edit an existing image with a prompt. Send multipart/form-data with an image file (and optional mask), or JSON with an image_url. Default model is bria/fibo-edit/edit; uploads cap at 25 MB.

bash

curl https://api.aigateway.sh/v1/images/edits \
  -H "Authorization: Bearer sk-aig-..." \
  -F "image=@cat.png" \
  -F "prompt=make the cat wear sunglasses" \
  -F "model=bria/fibo-edit/edit"

Audio transcriptions (STT)

POST/v1/audio/transcriptions

python

with open("call.mp3", "rb") as f:
    r = client.audio.transcriptions.create(
        model="deepgram/nova-3",
        file=f,
    )
print(r.text)

For long recordings, transcribe asynchronously — pass async: true (then poll /v1/jobs/<id>) or a webhook_url. Available on deepgram/nova-3 and deepgram/flux.

bash

curl -X POST https://api.aigateway.sh/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-aig-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"deepgram/nova-3","audio_url":"https://example.com/call.wav","async":true}'
# → { "id": "<job_id>", "status": "processing" } — then poll GET /v1/jobs/<job_id>
# → { "status": "completed", "result": { "transcript": { "text": "...", "segments": [...] } } }

Realtime transcription (WebSocket)

GET/v1/realtime

Stream audio over a WebSocket for live interim + final transcripts. Browsers pass the key as ?api_key=; servers can use the Authorization header. End the stream with { "type": "CloseStream" }. Billed per audio-minute at the realtime (websocket) rate, which is higher than batch. Models: deepgram/nova-3, deepgram/flux.

javascript

const ws = new WebSocket(
  "wss://api.aigateway.sh/v1/realtime?model=deepgram/nova-3&encoding=linear16&sample_rate=16000&interim_results=true&api_key=" + KEY,
);
ws.onmessage = (e) => {
  const m = JSON.parse(e.data);
  if (m.type === "Results") console.log(m.channel.alternatives[0].transcript, m.is_final);
};
// stream raw linear16 PCM frames, then end:
ws.send(JSON.stringify({ type: "CloseStream" }));

Text-to-speech (TTS)

POST/v1/audio/speech

bash

curl https://api.aigateway.sh/v1/audio/speech \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{
    "model": "deepgram/aura-2-en",
    "input": "Hello from the aggregator.",
    "voice": "aura-2-en-angus",
    "response_format": "mp3"
  }' --output speech.mp3

Moderations

POST/v1/moderations

bash

curl https://api.aigateway.sh/v1/moderations \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{"model":"meta/llama-guard-3-8b","input":"some text"}'

Sub-accounts

Mint a scoped key for one of your customers. Each has its own spend cap, rate limit, default tag, and isolated analytics.

POST/v1/sub-accounts

python

# pip install aigateway-py openai
# aigateway-py: sub-accounts, evals, replays, jobs, webhook verify.
# openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
from aigateway import AIgateway

aig = AIgateway(api_key="sk-aig-...")

sub = aig.sub_accounts.create(
    name="acme-corp",
    external_ref="acme-123",
    spend_cap_cents=50_000,
    rate_limit_rpm=300,
    default_tag="acme",
)
# sub.id, sub.key — hand sub.key to Acme; it's scoped + capped.

typescript

// pnpm add aigateway-js openai   (or npm / yarn)
// aigateway-js: sub-accounts, evals, replays, jobs, webhook verify.
// openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
import { AIgateway } from "aigateway-js";

const aig = new AIgateway({ apiKey: process.env.AIG_KEY! });

const sub = await aig.subAccounts.create({
  name: "acme-corp",
  externalRef: "acme-123",
  spendCapCents: 50_000,
  rateLimitRpm: 300,
  defaultTag: "acme",
});
// sub.id, sub.key — hand sub.key to Acme; it's scoped + capped.

bash

curl -X POST https://api.aigateway.sh/v1/sub-accounts \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{
    "name": "acme-corp",
    "external_ref": "acme-123",
    "spend_cap_cents": 50000,
    "rate_limit_rpm": 300,
    "default_tag": "acme"
  }'

# => { "id": "sa_9f...", "key": "sk-aig-...", "spend_cap_cents": 50000, ... }

GET/v1/sub-accounts

GET/v1/sub-accounts/:id

DELETE/v1/sub-accounts/:id

Cost tags + budgets

Set x-aig-tag on any request. Tags surface in attribution reports; sub-account spend caps are enforced server-side before dispatch.

bash

curl https://api.aigateway.sh/v1/chat/completions \
  -H "Authorization: Bearer sk-aig-..." \
  -H "x-aig-tag: summarize" \
  -d '{"model":"moonshot/kimi-k2.6","messages":[...]}'

Replay + shadow A/B

Re-run any past request against a different model and see cost, latency, and output diffs.

POST/v1/replays

bash

curl -X POST https://api.aigateway.sh/v1/replays \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{
    "source_request_id": "req_abc123",
    "target_model": "anthropic/claude-opus-4.7",
    "shadow": false
  }'

# =>
# { "source_output": "...",
#   "target_output": "...",
#   "cost_source_cents": 1.2,
#   "cost_target_cents": 4.7,
#   "latency_source_ms": 810,
#   "latency_target_ms": 2240,
#   "score_delta": 0.82 }

GET/v1/replays

GET/v1/replays/:id

Eval-driven routing

Upload a dataset of inputs (and optional expected outputs) plus candidate models. Get a winner, then use eval:<run_id> as a model alias from then on — rerun whenever a new frontier model lands.

POST/v1/evals

json

{
  "name": "prod-summarize",
  "metric": "quality",
  "candidate_models": [
    "anthropic/claude-opus-4.7",
    "openai/gpt-5.4",
    "moonshot/kimi-k2.6"
  ],
  "dataset": [
    { "input": "Summarize: ...", "expected": "..." },
    { "input": "Summarize: ...", "expected": "..." }
  ]
}

Then call chat with the alias:

python

r = client.chat.completions.create(
    model="eval:ev_7h3k...",            # alias of the winning model
    messages=[{"role": "user", "content": "..."}],
)

GET/v1/evals

GET/v1/evals/:id

Usage + attribution

GET/v1/usage/by-tag?month=2026-04

json

{
  "month": "2026-04",
  "data": [
    { "tag": "summarize", "requests": 12012, "cost_cents": 4210, "units": 17482000 },
    { "tag": "chat",      "requests":  8391, "cost_cents": 9830, "units": 19740000 }
  ]
}

GET/v1/usage/by-sub-account?month=2026-04

List models

GET/v1/models

Query with ?modality=text or ?provider=anthropic.

Get model detail

GET/v1/models/:id

Single-model lookup. Returns pricing, context window, modality, capability flags — plus a schema block: the exact endpoint, a request example, response and streaming shapes, model-specific quirks, and runnable curl/Python/TypeScript snippets. Use the same provider/slug form as the chat model field.

Model schema

GET/v1/models/:id/schema

The invocation contract for one model, on its own. This is the canonical "how do I call this model" call — an agent fetches it instead of guessing the request shape. Quirks flag the gotchas: reasoning models that stream reasoning_content, models that reject temperature, and o-series/GPT-5 that use max_completion_tokens.

json

{
  "endpoint": { "method": "POST", "path": "/v1/chat/completions", "url": "https://api.aigateway.sh/v1/chat/completions" },
  "quirks": { "reasoning": true, "noSampling": true, "anthropicNative": true },
  "request_example": "{ \"model\": \"anthropic/claude-opus-4.7\", \"messages\": [ ... ] }",
  "response_example": "{ \"id\": \"chatcmpl-...\", \"choices\": [ ... ] }",
  "streaming_example": "data: {\"choices\":[{\"delta\":{\"content\":\"Hello\"}}]}",
  "sdk": { "curl": "...", "python": "...", "typescript": "..." }
}

Capabilities

GET/v1/capabilities

The fixed capability vocabulary used across the catalog — each id with a one-line meaning and the endpoint it maps to (e.g. vision, function_calling, reasoning, text-to-video, async). Agents reason over this fixed set instead of guessing what a free-text tag means; the capabilities array on each model entry draws from it.

Provider health

GET/v1/health/providers

p50, p95 and error rate per upstream. The router uses these internally.

Wallet balance

GET/v1/balance

Returns { cents, usd } for the authenticated key. Useful for in-app low-balance prompts before a call lands a 402.

Cache management

DELETE/v1/cache

Purges every entry in the exact-match KV cache and the semantic Vectorize cache for your account. Useful after a deploy that changes prompt templates.

Webhook secret

GET/v1/webhook-secret

POST/v1/webhook-secret/rotate

Returns the per-key signing secret used to verify x-aig-signature on inbound callbacks. Both SDKs ship a constant-time verifier; rotation invalidates the previous secret immediately, so coordinate with your endpoint before calling rotate.

Not supported (yet)

The router accepts the paths below but returns a deliberate non-200. Don't probe — these are tracked, and you'll get the expected enable-date in the response when they ship.

POST /v1/completions → 400 · legacy completions; use /v1/chat/completions instead
Assistants / Threads / Vector stores / Fine-tuning — intentionally not implemented; aggregator scope

All endpoints

Auto-generated from /openapi.json. The curated sections above include examples and prose; this list is the exhaustive index — if it's here, the gateway honours it.

chatOpenAI-compatible chat completions (text, vision, tool use, streaming, reasoning)

POST/v1/chat/completionsCreate chat completion

embeddingsText embeddings

POST/v1/embeddingsCreate embeddings

imagesImage generation

POST/v1/images/generationsGenerate image
POST/v1/images/editsEdit image (image-to-image)

audioSTT, TTS, music

POST/v1/audio/transcriptionsTranscribe audio (STT) — sync, or async via async:true / webhook_url
GET/v1/realtimeRealtime streaming transcription (WebSocket)
POST/v1/audio/speechSynthesize speech (TTS)
POST/v1/audio/musicGenerate music (async)

videoAsync video + 3D generation

POST/v1/videos/generationsGenerate video (async)
POST/v1/3d/generationsGenerate 3D asset (async)

moderationContent safety

POST/v1/moderationsModerate content

modality-extraTranslation, classification, detection, OCR, rerank

POST/v1/translationsTranslate text
POST/v1/classificationsClassify text
POST/v1/detectionsDetect objects in image
POST/v1/ocrExtract text from image
POST/v1/rerankRerank documents

jobsAsync job lifecycle

GET/v1/jobs/{id}Get async job
DELETE/v1/jobs/{id}Cancel async job

filesFile upload, download, signed URLs

GET/v1/filesList files
POST/v1/filesUpload file
GET/v1/files/{id}Get file metadata
DELETE/v1/files/{id}Delete file
GET/v1/files/{id}/contentDownload file content
GET/v1/files/jobs/{jobId}/{filename}/signedMint signed URL for a job result

batchesOpenAI-style batch API at 50% off

GET/v1/batchesList batches
POST/v1/batchesCreate batch
GET/v1/batches/{id}Get batch
POST/v1/batches/{id}/cancelCancel batch

sub-accountsPer-customer scoped keys with spend caps

GET/v1/sub-accountsList sub-accounts
POST/v1/sub-accountsCreate sub-account
GET/v1/sub-accounts/{id}Get sub-account
PATCH/v1/sub-accounts/{id}Update sub-account
DELETE/v1/sub-accounts/{id}Delete sub-account

evalsEval-driven model routing

GET/v1/evalsList eval runs
POST/v1/evalsCreate eval run
GET/v1/evals/{id}Get eval

replaysReplay any past request on a new model

GET/v1/replaysList replays
POST/v1/replaysReplay a past request on a new model
GET/v1/replays/{id}Get replay

usagePer-tag and per-customer cost attribution

GET/v1/usage/by-tagUsage by tag
GET/v1/usage/by-sub-accountUsage by sub-account

webhooksSigned callbacks for jobs + lifecycle events

GET/v1/webhook-secretGet signing secret
POST/v1/webhook-secret/rotateRotate signing secret

accountBalance + cache management

GET/v1/balanceWallet balance
DELETE/v1/cachePurge cache

modelsCatalog discovery

GET/v1/modelsList models
GET/v1/models/{id}Get model detail + invocation schema
GET/v1/models/{id}/schemaGet a model's invocation schema
GET/v1/capabilitiesList the capability vocabulary

healthProvider health

GET/v1/health/providersProvider health

Need help? Contact support — we'll reply in under 24h.