Alibaba Qwen

Qwq-32b

reasoning

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

MODALITIES

text

INPUT

$0.200 /1M

OUTPUT

$0.400 /1M

CONTEXT

24K tok

MAX OUTPUT

24K tok

RELEASED

2025-03-05

Qwq-32b (qwen/qwq-32b) is a reasoning model from Alibaba Qwen, released 2025-03-05. Context window: 24,000 tokens; max output 24,000. Pricing via AIgateway: input $0.200/M tokens, output $0.400/M tokens. Capabilities: streaming, json, reasoning. Call it via https://api.aigateway.sh/v1/chat/completions with the OpenAI SDK — set model="qwen/qwq-32b". Best for: Math, Code review, Planning.

model · qwen/qwq-32bfamily · Qwen

Use this model

model: qwen/qwq-32b

curl https://api.aigateway.sh/v1/chat/completions \
  -H "Authorization: Bearer $AIGATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen/qwq-32b","messages":[{"role":"user","content":"hello"}],"stream":true}'

curl https://api.aigateway.sh/v1/chat/completions \ -H "Authorization: Bearer $AIGATEWAY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen/qwq-32b", "messages": [{"role":"user","content":"hello"}], "stream": true }'

Capabilities

StreamingJSON modeReasoning

CONTEXT

24,000 tok

MAX OUTPUT

24,000 tok

Strengths

Strong reasoning at 32B scale
Open-weight
Fast

Use cases

MathCode reviewPlanning

Pricing

Input$0.200 / 1M tokens

Output$0.400 / 1M tokens

Open-weight

You pay pass-through pricing.

Try in playground →Compare API reference See usage ranking →

Collections

More text models →More from Alibaba Qwen →Frontier models →Free-tier models →

API schema

Call Qwq-32b from any OpenAI SDK

POST https://api.aigateway.sh/v1/chat/completions·Content-Type: application/json·Auth: Bearer sk-aig-...

Request body

json

{
  "model": "qwen/qwq-32b",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "Hello!" }
  ],
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 1024,
  "stream": false,
  "response_format": { "type": "json_object" }

}

Response

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1776947082,
  "model": "qwen/qwq-32b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 12,
    "total_tokens": 36
  }
}

Streaming (SSE) — set `"stream": true`

// 1. Role announcement (first chunk):
data: {"choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

// 2. Content chunks (final answer):
data: {"choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

// Finish chunk:
data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

// Terminator:
data: [DONE]

Quickstart

# pip install aigateway-py openai
# aigateway-py adds sub-accounts, evals, replays, jobs, webhook verify.
# openai SDK covers chat — drop-in per our SDK's own guidance.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)

stream = client.chat.completions.create(
    model="qwen/qwq-32b",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Errors

401authentication_errorInvalid or missing API key

402insufficient_creditsWallet empty (PAYG only)

404not_foundUnknown model or endpoint

429rate_limit_errorOver per-minute limit — see Retry-After header

500server_errorUpstream provider failed (retryable)

503service_unavailableUpstream saturated (retryable)

Full docs →API reference →OpenAPI spec →llms.txt →

Frequently asked questions

What is Qwq-32b?

How much does Qwq-32b cost via AIgateway?

Input costs $0.200 per 1M tokens; output costs $0.400 per 1M tokens, billed pass-through.

What is the context window of Qwq-32b?

24,000 tokens. Maximum output is 24,000 tokens.

How do I call Qwq-32b from my code?

Point the OpenAI SDK at https://api.aigateway.sh/v1 with your AIgateway key and set model to "qwen/qwq-32b". The request and response shapes match OpenAI exactly.

Does Qwq-32b support streaming, tool calling, vision, and JSON mode?

Streaming — yes. Tool calling — no. Vision — no. JSON mode — yes. Prompt caching — no.

What are the best use cases for Qwq-32b?

Math, Code review, Planning. Key strengths: Strong reasoning at 32B scale; Open-weight; Fast.

Can I bring my own Alibaba Qwen API key (BYOK)?

Yes. Attach a Alibaba Qwen key in your AIgateway dashboard and this model flips to pass-through — you pay Alibaba Qwen directly and AIgateway adds no platform fee on those calls.