Return modes
Batch API
OpenAI-compatible batch endpoint: 50% off on every batched unit, 24h SLA, up to 100 requests per batch (MVP cap — raising soon). Works for /v1/chat/completions and /v1/embeddings.
End-to-end flow
- Upload a JSONL file with one request per line to
/v1/files(purpose=batch). - POST to
/v1/batcheswith theinput_file_id. - Poll
/v1/batches/{id}untilstatus === "completed". - Download results JSONL from
/v1/files/{output_file_id}/content.
Worked example
// input.jsonl — one JSON per line {"custom_id":"req-1","method":"POST","url":"/v1/chat/completions","body":{"model":"moonshot/kimi-k2.6","messages":[{"role":"user","content":"hi"}]}} {"custom_id":"req-2","method":"POST","url":"/v1/chat/completions","body":{"model":"anthropic/claude-haiku-4.5","messages":[{"role":"user","content":"hello"}]}} # 1) upload curl https://api.aigateway.sh/v1/files \ -H "Authorization: Bearer $AIG_KEY" \ -F "purpose=batch" \ -F "file=@input.jsonl" // → { "id": "file-abc...", ... } # 2) create batch curl https://api.aigateway.sh/v1/batches \ -H "Authorization: Bearer $AIG_KEY" \ -H "Content-Type: application/json" \ -d '{"input_file_id":"file-abc...","endpoint":"/v1/chat/completions"}' // → { "id": "batch_...", "status": "validating", ... } # 3) poll curl https://api.aigateway.sh/v1/batches/batch_... \ -H "Authorization: Bearer $AIG_KEY" // → { ..., "status": "completed", "output_file_id": "file-xyz...", "error_file_id": null } # 4) download results curl https://api.aigateway.sh/v1/files/file-xyz.../content \ -H "Authorization: Bearer $AIG_KEY"
Output format
Each line of the output JSONL mirrors the input: the custom_id you sent, a response.status_code, and a response.body that is the full model response. Failures land in a separate error_file_id with the same structure but a non-2xx status.
When to use Batch
- Eval runs — thousands of evaluations against one prompt.
- Bulk classification, extraction, or embedding.
- Any async workload where 24h latency is fine and 50% savings matter.
Batches are billed on the 50% table the moment the last request completes, not when you download the output file.