Question 1

How do I stream responses from AIgateway?

Accepted Answer

Set stream: true on the chat completion request. The response is OpenAI-compatible SSE — every chunk starts with 'data: ' and ends with 'data: [DONE]'. The OpenAI SDK iterates over chunks natively.

Question 2

Do all models support streaming?

Accepted Answer

Every text/reasoning model in the catalog streams. Image, video, and music generations are async (submit → poll or webhook) because the upstream providers don't stream those.

Question 3

What's the first-token latency on AIgateway?

Accepted Answer

Under 500ms p50 globally. The gateway runs on 300+ edge PoPs so the TLS + auth + routing work finishes before the upstream provider produces a token.

Question 4

Can I stream tool calls and reasoning traces?

Accepted Answer

Yes. Tool-call arguments arrive as delta fragments. Reasoning traces are normalized into a 'thinking' event type across Claude, OpenAI o-series, and DeepSeek R1.

Streaming

Wire format

SDK usage

Reasoning deltas

Tool call deltas

Cancel a stream

Edge cases