Guide

Kimi K2.6

Moonshot's frontier open-weight agent model. 262K context, native tool calling, vision, and extended reasoning. Served through AIgateway with an OpenAI-compatible API — drop-in compatible with the OpenAI SDK, Cursor, Cline, LangChain, Vercel AI SDK, and anything else that speaks OpenAI.

Free through April 30, 2026. No card required. 100 requests per day per account during the trial window — sign up, grab a key, and you're calling it in ~30 seconds.

Quickstart (60 seconds)

Every OpenAI SDK works — just change base_url. Here's the Python version:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)

r = client.chat.completions.create(
    model="moonshot/kimi-k2.6",
    messages=[{"role": "user", "content": "Write a Python one-liner to reverse a string."}],
)
print(r.choices[0].message.content)

Model card

Slug: moonshot/kimi-k2.6
Provider: Moonshot (served on the edge via AIgateway)
Context window: 262,144 tokens (~700 pages of text, most mid-sized repos fit whole)
Max output: 16,384 tokens
Modality: Text + vision
Capabilities: Streaming, tool calling, JSON mode, extended reasoning, vision
Pricing (after trial): $0.95 / 1M input tokens, $4.00 / 1M output tokens. Cache hits at 10% of uncached. Pass-through — our 5% fee applies at credit top-up, not per call.

Request

Full OpenAI chat.completions body. Everything is optional except model and messages:

{
  "model": "moonshot/kimi-k2.6",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "Hello!" }
  ],
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 4096,
  "stream": false,

  "tools": [ /* OpenAI function spec — see below */ ],
  "tool_choice": "auto",
  "parallel_tool_calls": true,

  "response_format": { "type": "json_object" }
}

Response (non-streaming)

Two fields are non-obvious on Kimi: reasoning_content (chain of thought) and tool_calls (when the model wants to call a function).

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1776947082,
  "model": "moonshot/kimi-k2.6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The weather in Tokyo is sunny.",
        "reasoning_content": "The user asked about Tokyo. I should call the tool...",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\":\"Tokyo\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 53,
    "completion_tokens": 79,
    "total_tokens": 132
  }
}

If you don't want to show the chain of thought, just read content — reasoning_content is additive, not required. Cursor, Cline, and the OpenAI SDK all ignore it by default.

Streaming (SSE)

Set "stream": true and read text/event-stream. Kimi emits chunks in this order:

// 1. Role
data: {"choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

// 2. Reasoning (Kimi thinks first — stream of reasoning_content)
data: {"choices":[{"index":0,"delta":{"reasoning_content":"The "},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"reasoning_content":"user "},"finish_reason":null}]}
// ... many chunks ...

// 3. Content (the actual answer)
data: {"choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

// 4. Finish
data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Tool calling

Kimi speaks the standard OpenAI tool-call protocol. Define tools, Kimi decides when to call them, you execute the call and feed the result back as a role: "tool" message.

# Define a tool
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

# Turn 1 — Kimi asks to call the tool
r = client.chat.completions.create(
    model="moonshot/kimi-k2.6",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
)
call = r.choices[0].message.tool_calls[0]
# call.function.name == "get_weather"
# call.function.arguments == '{"city":"Tokyo"}'

# You execute the tool
result = {"temperature_c": 22, "conditions": "sunny"}

# Turn 2 — feed the result back, Kimi writes the final answer
r2 = client.chat.completions.create(
    model="moonshot/kimi-k2.6",
    messages=[
        {"role": "user", "content": "Weather in Tokyo?"},
        r.choices[0].message,                         # assistant turn with tool_calls
        {"role": "tool", "tool_call_id": call.id,
         "content": json.dumps(result)},
    ],
    tools=tools,
)
print(r2.choices[0].message.content)

Streaming tool calls

When streaming with tools, the arguments string arrives as fragments. Concatenate by index until finish_reason === "tool_calls":

data: {"choices":[{"index":0,"delta":{"tool_calls":[
  {"index":0,"id":"call_abc","type":"function",
   "function":{"name":"get_weather","arguments":""}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[
  {"index":0,"function":{"arguments":"{\"city\":"}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[
  {"index":0,"function":{"arguments":"\"Tokyo\"}"}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}

Use Kimi in Cursor (free agent mode)

Cursor's agent mode speaks full OpenAI-compat, so Kimi slots in:

Get a key at aigateway.sh
In Cursor: Settings → Models → "Override OpenAI Base URL"
Base URL: https://api.aigateway.sh/v1
Add model: moonshot/kimi-k2.6
Code.

Agent mode, tool calls, multi-turn conversations — all work. Tab autocomplete and Cmd-K still use Cursor's own backend (hardwired on their side), but the chat + agent panel is fully yours.

Use Kimi in Cline

# Cline: Settings → API Provider → "OpenAI Compatible"
Base URL:  https://api.aigateway.sh/v1
API key:   sk-aig-...
Model ID:  moonshot/kimi-k2.6

Use Kimi in LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="moonshot/kimi-k2.6",
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)
print(llm.invoke("Hello!").content)

Use Kimi in Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";

const aigateway = createOpenAI({
  baseURL: "https://api.aigateway.sh/v1",
  apiKey: process.env.AIG_KEY,
});

const result = await streamText({
  model: aigateway("moonshot/kimi-k2.6"),
  prompt: "Hello!",
});
for await (const chunk of result.textStream) process.stdout.write(chunk);

Vision input

Kimi sees images. Pass them as OpenAI-style content parts:

{
  "model": "moonshot/kimi-k2.6",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What is in this image?" },
        { "type": "image_url",
          "image_url": { "url": "https://example.com/cat.jpg" } }
      ]
    }
  ]
}

Pricing worked example

A typical agent turn: ~500 input tokens (conversation + tool defs), ~300 output tokens. During the trial this is $0. After the trial:

Input: 500 × $0.95 / 1,000,000 = $0.000475
Output: 300 × $4.00 / 1,000,000 = $0.0012
Total: ~$0.0017 per turn — about 600 agent turns per dollar.

Cache hits are 10% of the uncached cost. A prefix-cached agent loop (system prompt + tool definitions unchanged turn-to-turn) bills at ~$0.00017 per turn — 6,000 turns per dollar.

Limits

Free tier: 100 Kimi requests / day, 10 requests / minute
PAYG (with credits): 1,000,000 Kimi requests / day, 10,000 requests / minute
Trial ends: 2026-04-30 (UTC). After that, PAYG only.

Benchmarks

Moonshot's published numbers against the common suite:

MMLU: ~86% — close to Claude Sonnet 4.5
HumanEval: ~93% — on par with GPT-5.4 on straightforward code
SWE-Bench: ~58% — meaningfully behind Opus 4.7 on hairy refactors, ahead of Haiku 4.5
Tool-use: strong on multi-step agent loops, clean argument JSON

In our own eyeballed coding eval across ~200 prompts, Kimi K2.6 is roughly Sonnet-tier on day-to-day agent work and noticeably behind Opus on multi-file architectural refactors.

Common errors

429 rate_limit_error — over 10 req/min on free tier. Add credits to unlock 10,000 req/min.
402 insufficient_credits — trial exhausted and no wallet balance. Add credits.
503 service_unavailable — upstream saturation on Kimi. Retryable; default retry-after is 2 seconds.

Kimi K2.6

Quickstart (60 seconds)

Model card

Request

Response (non-streaming)

Streaming (SSE)

Tool calling

Streaming tool calls

Use Kimi in Cursor (free agent mode)

Use Kimi in Cline

Use Kimi in LangChain

Use Kimi in Vercel AI SDK

Vision input

Pricing worked example

Limits

Benchmarks

Common errors

More