View as /docs.md
Guide

Kimi K2.6

Moonshot's frontier open-weight agent model. 262K context, native tool calling, vision, and extended reasoning. Served through AIgateway with an OpenAI-compatible API — drop-in compatible with the OpenAI SDK, Cursor, Cline, LangChain, Vercel AI SDK, and anything else that speaks OpenAI.

Free through April 30, 2026. No card required. 100 requests per day per account during the trial window — sign up, grab a key, and you're calling it in ~30 seconds.

Quickstart (60 seconds)

Every OpenAI SDK works — just change base_url. Here's the Python version:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)

r = client.chat.completions.create(
    model="moonshot/kimi-k2.6",
    messages=[{"role": "user", "content": "Write a Python one-liner to reverse a string."}],
)
print(r.choices[0].message.content)

Model card

Request

Full OpenAI chat.completions body. Everything is optional except model and messages:

{
  "model": "moonshot/kimi-k2.6",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "Hello!" }
  ],
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 4096,
  "stream": false,

  "tools": [ /* OpenAI function spec — see below */ ],
  "tool_choice": "auto",
  "parallel_tool_calls": true,

  "response_format": { "type": "json_object" }
}

Response (non-streaming)

Two fields are non-obvious on Kimi: reasoning_content (chain of thought) and tool_calls (when the model wants to call a function).

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1776947082,
  "model": "moonshot/kimi-k2.6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The weather in Tokyo is sunny.",
        "reasoning_content": "The user asked about Tokyo. I should call the tool...",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\":\"Tokyo\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 53,
    "completion_tokens": 79,
    "total_tokens": 132
  }
}

If you don't want to show the chain of thought, just read contentreasoning_content is additive, not required. Cursor, Cline, and the OpenAI SDK all ignore it by default.

Streaming (SSE)

Set "stream": true and read text/event-stream. Kimi emits chunks in this order:

// 1. Role
data: {"choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

// 2. Reasoning (Kimi thinks first — stream of reasoning_content)
data: {"choices":[{"index":0,"delta":{"reasoning_content":"The "},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"reasoning_content":"user "},"finish_reason":null}]}
// ... many chunks ...

// 3. Content (the actual answer)
data: {"choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

// 4. Finish
data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Tool calling

Kimi speaks the standard OpenAI tool-call protocol. Define tools, Kimi decides when to call them, you execute the call and feed the result back as a role: "tool" message.

# Define a tool
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

# Turn 1 — Kimi asks to call the tool
r = client.chat.completions.create(
    model="moonshot/kimi-k2.6",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
)
call = r.choices[0].message.tool_calls[0]
# call.function.name == "get_weather"
# call.function.arguments == '{"city":"Tokyo"}'

# You execute the tool
result = {"temperature_c": 22, "conditions": "sunny"}

# Turn 2 — feed the result back, Kimi writes the final answer
r2 = client.chat.completions.create(
    model="moonshot/kimi-k2.6",
    messages=[
        {"role": "user", "content": "Weather in Tokyo?"},
        r.choices[0].message,                         # assistant turn with tool_calls
        {"role": "tool", "tool_call_id": call.id,
         "content": json.dumps(result)},
    ],
    tools=tools,
)
print(r2.choices[0].message.content)

Streaming tool calls

When streaming with tools, the arguments string arrives as fragments. Concatenate by index until finish_reason === "tool_calls":

data: {"choices":[{"index":0,"delta":{"tool_calls":[
  {"index":0,"id":"call_abc","type":"function",
   "function":{"name":"get_weather","arguments":""}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[
  {"index":0,"function":{"arguments":"{\"city\":"}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[
  {"index":0,"function":{"arguments":"\"Tokyo\"}"}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}

Use Kimi in Cursor (free agent mode)

Cursor's agent mode speaks full OpenAI-compat, so Kimi slots in:

  1. Get a key at aigateway.sh
  2. In Cursor: Settings → Models → "Override OpenAI Base URL"
    Base URL: https://api.aigateway.sh/v1
    Add model: moonshot/kimi-k2.6
  3. Code.

Agent mode, tool calls, multi-turn conversations — all work. Tab autocomplete and Cmd-K still use Cursor's own backend (hardwired on their side), but the chat + agent panel is fully yours.

Use Kimi in Cline

# Cline: Settings → API Provider → "OpenAI Compatible"
Base URL:  https://api.aigateway.sh/v1
API key:   sk-aig-...
Model ID:  moonshot/kimi-k2.6

Use Kimi in LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="moonshot/kimi-k2.6",
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)
print(llm.invoke("Hello!").content)

Use Kimi in Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";

const aigateway = createOpenAI({
  baseURL: "https://api.aigateway.sh/v1",
  apiKey: process.env.AIG_KEY,
});

const result = await streamText({
  model: aigateway("moonshot/kimi-k2.6"),
  prompt: "Hello!",
});
for await (const chunk of result.textStream) process.stdout.write(chunk);

Vision input

Kimi sees images. Pass them as OpenAI-style content parts:

{
  "model": "moonshot/kimi-k2.6",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What is in this image?" },
        { "type": "image_url",
          "image_url": { "url": "https://example.com/cat.jpg" } }
      ]
    }
  ]
}

Pricing worked example

A typical agent turn: ~500 input tokens (conversation + tool defs), ~300 output tokens. During the trial this is $0. After the trial:

Cache hits are 10% of the uncached cost. A prefix-cached agent loop (system prompt + tool definitions unchanged turn-to-turn) bills at ~$0.00017 per turn — 6,000 turns per dollar.

Limits

Benchmarks

Moonshot's published numbers against the common suite:

In our own eyeballed coding eval across ~200 prompts, Kimi K2.6 is roughly Sonnet-tier on day-to-day agent work and noticeably behind Opus on multi-file architectural refactors.

Common errors

More