Kimi K2.6
Moonshot's frontier open-weight agent model. 262K context, native tool calling, vision, and extended reasoning. Served through AIgateway with an OpenAI-compatible API — drop-in compatible with the OpenAI SDK, Cursor, Cline, LangChain, Vercel AI SDK, and anything else that speaks OpenAI.
Quickstart (60 seconds)
Every OpenAI SDK works — just change base_url. Here's the Python version:
from openai import OpenAI
client = OpenAI(
base_url="https://api.aigateway.sh/v1",
api_key="sk-aig-...",
)
r = client.chat.completions.create(
model="moonshot/kimi-k2.6",
messages=[{"role": "user", "content": "Write a Python one-liner to reverse a string."}],
)
print(r.choices[0].message.content)Model card
- Slug:
moonshot/kimi-k2.6 - Provider: Moonshot (served on the edge via AIgateway)
- Context window: 262,144 tokens (~700 pages of text, most mid-sized repos fit whole)
- Max output: 16,384 tokens
- Modality: Text + vision
- Capabilities: Streaming, tool calling, JSON mode, extended reasoning, vision
- Pricing (after trial): $0.95 / 1M input tokens, $4.00 / 1M output tokens. Cache hits at 10% of uncached. Pass-through — our 5% fee applies at credit top-up, not per call.
Request
Full OpenAI chat.completions body. Everything is optional except model and messages:
{
"model": "moonshot/kimi-k2.6",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Hello!" }
],
"temperature": 0.7,
"top_p": 0.95,
"max_tokens": 4096,
"stream": false,
"tools": [ /* OpenAI function spec — see below */ ],
"tool_choice": "auto",
"parallel_tool_calls": true,
"response_format": { "type": "json_object" }
}Response (non-streaming)
Two fields are non-obvious on Kimi: reasoning_content (chain of thought) and tool_calls (when the model wants to call a function).
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1776947082,
"model": "moonshot/kimi-k2.6",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The weather in Tokyo is sunny.",
"reasoning_content": "The user asked about Tokyo. I should call the tool...",
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\":\"Tokyo\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
],
"usage": {
"prompt_tokens": 53,
"completion_tokens": 79,
"total_tokens": 132
}
}If you don't want to show the chain of thought, just read content — reasoning_content is additive, not required. Cursor, Cline, and the OpenAI SDK all ignore it by default.
Streaming (SSE)
Set "stream": true and read text/event-stream. Kimi emits chunks in this order:
// 1. Role
data: {"choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
// 2. Reasoning (Kimi thinks first — stream of reasoning_content)
data: {"choices":[{"index":0,"delta":{"reasoning_content":"The "},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"reasoning_content":"user "},"finish_reason":null}]}
// ... many chunks ...
// 3. Content (the actual answer)
data: {"choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
// 4. Finish
data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Tool calling
Kimi speaks the standard OpenAI tool-call protocol. Define tools, Kimi decides when to call them, you execute the call and feed the result back as a role: "tool" message.
# Define a tool
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}]
# Turn 1 — Kimi asks to call the tool
r = client.chat.completions.create(
model="moonshot/kimi-k2.6",
messages=[{"role": "user", "content": "Weather in Tokyo?"}],
tools=tools,
)
call = r.choices[0].message.tool_calls[0]
# call.function.name == "get_weather"
# call.function.arguments == '{"city":"Tokyo"}'
# You execute the tool
result = {"temperature_c": 22, "conditions": "sunny"}
# Turn 2 — feed the result back, Kimi writes the final answer
r2 = client.chat.completions.create(
model="moonshot/kimi-k2.6",
messages=[
{"role": "user", "content": "Weather in Tokyo?"},
r.choices[0].message, # assistant turn with tool_calls
{"role": "tool", "tool_call_id": call.id,
"content": json.dumps(result)},
],
tools=tools,
)
print(r2.choices[0].message.content)Streaming tool calls
When streaming with tools, the arguments string arrives as fragments. Concatenate by index until finish_reason === "tool_calls":
data: {"choices":[{"index":0,"delta":{"tool_calls":[
{"index":0,"id":"call_abc","type":"function",
"function":{"name":"get_weather","arguments":""}}]},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"tool_calls":[
{"index":0,"function":{"arguments":"{\"city\":"}}]},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"tool_calls":[
{"index":0,"function":{"arguments":"\"Tokyo\"}"}}]},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}Use Kimi in Cursor (free agent mode)
Cursor's agent mode speaks full OpenAI-compat, so Kimi slots in:
- Get a key at aigateway.sh
- In Cursor: Settings → Models → "Override OpenAI Base URL"
Base URL:https://api.aigateway.sh/v1
Add model:moonshot/kimi-k2.6 - Code.
Agent mode, tool calls, multi-turn conversations — all work. Tab autocomplete and Cmd-K still use Cursor's own backend (hardwired on their side), but the chat + agent panel is fully yours.
Use Kimi in Cline
# Cline: Settings → API Provider → "OpenAI Compatible" Base URL: https://api.aigateway.sh/v1 API key: sk-aig-... Model ID: moonshot/kimi-k2.6
Use Kimi in LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="moonshot/kimi-k2.6",
base_url="https://api.aigateway.sh/v1",
api_key="sk-aig-...",
)
print(llm.invoke("Hello!").content)Use Kimi in Vercel AI SDK
import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";
const aigateway = createOpenAI({
baseURL: "https://api.aigateway.sh/v1",
apiKey: process.env.AIG_KEY,
});
const result = await streamText({
model: aigateway("moonshot/kimi-k2.6"),
prompt: "Hello!",
});
for await (const chunk of result.textStream) process.stdout.write(chunk);Vision input
Kimi sees images. Pass them as OpenAI-style content parts:
{
"model": "moonshot/kimi-k2.6",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What is in this image?" },
{ "type": "image_url",
"image_url": { "url": "https://example.com/cat.jpg" } }
]
}
]
}Pricing worked example
A typical agent turn: ~500 input tokens (conversation + tool defs), ~300 output tokens. During the trial this is $0. After the trial:
- Input: 500 × $0.95 / 1,000,000 = $0.000475
- Output: 300 × $4.00 / 1,000,000 = $0.0012
- Total: ~$0.0017 per turn — about 600 agent turns per dollar.
Cache hits are 10% of the uncached cost. A prefix-cached agent loop (system prompt + tool definitions unchanged turn-to-turn) bills at ~$0.00017 per turn — 6,000 turns per dollar.
Limits
- Free tier: 100 Kimi requests / day, 10 requests / minute
- PAYG (with credits): 1,000,000 Kimi requests / day, 10,000 requests / minute
- Trial ends: 2026-04-30 (UTC). After that, PAYG only.
Benchmarks
Moonshot's published numbers against the common suite:
- MMLU: ~86% — close to Claude Sonnet 4.5
- HumanEval: ~93% — on par with GPT-5.4 on straightforward code
- SWE-Bench: ~58% — meaningfully behind Opus 4.7 on hairy refactors, ahead of Haiku 4.5
- Tool-use: strong on multi-step agent loops, clean argument JSON
In our own eyeballed coding eval across ~200 prompts, Kimi K2.6 is roughly Sonnet-tier on day-to-day agent work and noticeably behind Opus on multi-file architectural refactors.
Common errors
429 rate_limit_error— over 10 req/min on free tier. Add credits to unlock 10,000 req/min.402 insufficient_credits— trial exhausted and no wallet balance. Add credits.503 service_unavailable— upstream saturation on Kimi. Retryable; default retry-after is 2 seconds.