Streaming
Pass stream: true on any text endpoint. You receive an SSE stream of chat.completion.chunk events terminated by data: [DONE]. The stream starts the moment the provider emits its first token — typically under 500ms via our edge runtime.
Wire format
data: { "id": "chatcmpl-...", "choices": [{ "delta": { "role": "assistant" } }] }
data: { "choices": [{ "delta": { "content": "Hello" } }] }
data: { "choices": [{ "delta": { "content": " there" }, "finish_reason": "stop" }] }
data: { "usage": { "prompt_tokens": 12, "completion_tokens": 37 } }
data: [DONE]SDK usage
The OpenAI SDKs handle streaming for you. Both snippets below print tokens as they arrive:
from openai import OpenAI client = OpenAI(base_url="https://api.aigateway.sh/v1", api_key=os.environ["AIG_KEY"]) stream = client.chat.completions.create( model="moonshot/kimi-k2.6", messages=[{"role": "user", "content": "write a haiku"}], stream=True, ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True)
Reasoning deltas
When the model is reasoning-capable, reasoning tokens arrive first as delta.reasoning_content, then the final answer arrives as delta.content. Render them in separate UI regions — see Reasoning models for the exact shape.
Tool call deltas
When a tool call is emitted mid-stream, you'll see a delta.tool_calls[].function.arguments string that accumulates chunk by chunk. Parse it only after finish_reason === "tool_calls".
Cancel a stream
Just close the connection. The edge propagates the cancellation upstream within ~50ms; billing stops at the last fully-delivered chunk. For AbortController in Node / fetch, call controller.abort().
Edge cases
- Reconnects are not supported — SSE streams are single-shot. On transient disconnects, retry the request.
- Behind certain proxies, streams are buffered. Set
X-Accel-Buffering: noon your own reverse proxy if you see delayed tokens. usageis emitted as a separate chunk just before[DONE]— you don't needstream_optionsto enable it; it's always on.