Return modes

Batch API

OpenAI-compatible batch endpoint: 50% off on every batched unit, 24h SLA, up to 100 requests per batch (MVP cap — raising soon). Works for /v1/chat/completions and /v1/embeddings.

End-to-end flow

Upload a JSONL file with one request per line to /v1/files (purpose=batch).
POST to /v1/batches with the input_file_id.
Poll /v1/batches/{id} until status === "completed".
Download results JSONL from /v1/files/{output_file_id}/content.

Worked example

// input.jsonl — one JSON per line
{"custom_id":"req-1","method":"POST","url":"/v1/chat/completions","body":{"model":"moonshot/kimi-k2.6","messages":[{"role":"user","content":"hi"}]}}
{"custom_id":"req-2","method":"POST","url":"/v1/chat/completions","body":{"model":"anthropic/claude-haiku-4.5","messages":[{"role":"user","content":"hello"}]}}

# 1) upload
curl https://api.aigateway.sh/v1/files \
  -H "Authorization: Bearer $AIG_KEY" \
  -F "purpose=batch" \
  -F "file=@input.jsonl"
// → { "id": "file-abc...", ... }

# 2) create batch
curl https://api.aigateway.sh/v1/batches \
  -H "Authorization: Bearer $AIG_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input_file_id":"file-abc...","endpoint":"/v1/chat/completions"}'
// → { "id": "batch_...", "status": "validating", ... }

# 3) poll
curl https://api.aigateway.sh/v1/batches/batch_... \
  -H "Authorization: Bearer $AIG_KEY"
// → { ..., "status": "completed", "output_file_id": "file-xyz...", "error_file_id": null }

# 4) download results
curl https://api.aigateway.sh/v1/files/file-xyz.../content \
  -H "Authorization: Bearer $AIG_KEY"

Output format

Each line of the output JSONL mirrors the input: the custom_id you sent, a response.status_code, and a response.body that is the full model response. Failures land in a separate error_file_id with the same structure but a non-2xx status.

When to use Batch

Eval runs — thousands of evaluations against one prompt.
Bulk classification, extraction, or embedding.
Any async workload where 24h latency is fine and 50% savings matter.

Batches are billed on the 50% table the moment the last request completes, not when you download the output file.

← PreviousEmbeddings Next →Async jobs