View as /docs.md
Inference

Streaming

Pass stream: true on any text endpoint. You receive an SSE stream of chat.completion.chunk events terminated by data: [DONE]. The stream starts the moment the provider emits its first token — typically under 500ms via our edge runtime.

Wire format

data: { "id": "chatcmpl-...", "choices": [{ "delta": { "role": "assistant" } }] }
data: { "choices": [{ "delta": { "content": "Hello" } }] }
data: { "choices": [{ "delta": { "content": " there" }, "finish_reason": "stop" }] }
data: { "usage": { "prompt_tokens": 12, "completion_tokens": 37 } }
data: [DONE]

SDK usage

The OpenAI SDKs handle streaming for you. Both snippets below print tokens as they arrive:

python
typescript
from openai import OpenAI
client = OpenAI(base_url="https://api.aigateway.sh/v1", api_key=os.environ["AIG_KEY"])

stream = client.chat.completions.create(
    model="moonshot/kimi-k2.6",
    messages=[{"role": "user", "content": "write a haiku"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Reasoning deltas

When the model is reasoning-capable, reasoning tokens arrive first as delta.reasoning_content, then the final answer arrives as delta.content. Render them in separate UI regions — see Reasoning models for the exact shape.

Tool call deltas

When a tool call is emitted mid-stream, you'll see a delta.tool_calls[].function.arguments string that accumulates chunk by chunk. Parse it only after finish_reason === "tool_calls".

Cancel a stream

Just close the connection. The edge propagates the cancellation upstream within ~50ms; billing stops at the last fully-delivered chunk. For AbortController in Node / fetch, call controller.abort().

Edge cases