View as /docs.md
Guide

Kimi K2.7 Code

Moonshot's frontier open-weight coding model. 262K context, native tool calling, vision, and extended reasoning — coding-tuned and agentic, built for the multi-step edit loops coding agents run. Served through AIgateway with an OpenAI-compatible API — drop-in compatible with the OpenAI SDK, Cursor, Cline, LangChain, Vercel AI SDK, and anything else that speaks OpenAI.

Included in the $5 signup credit shortlist. Every new account gets $5 in free credits redeemable on a curated 7-model edge tier — Kimi K2.7 Code is the chat model in that set. Sign up, grab a key, and you're calling it in ~30 seconds.

Quickstart (60 seconds)

Every OpenAI SDK works — just change base_url. Here's the Python version:

# pip install aigateway-py openai
# aigateway-py: sub-accounts, evals, replays, jobs, webhook verify.
# openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)

r = client.chat.completions.create(
    model="moonshot/kimi-k2.7-code",
    messages=[{"role": "user", "content": "Write a Python one-liner to reverse a string."}],
)
print(r.choices[0].message.content)

Model card

Request

Full OpenAI chat.completions body. Everything is optional except model and messages:

{
  "model": "moonshot/kimi-k2.7-code",
  "messages": [
    { "role": "system", "content": "You are a helpful coding assistant." },
    { "role": "user",   "content": "Hello!" }
  ],
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 4096,
  "stream": false,

  "tools": [ /* OpenAI function spec — see below */ ],
  "tool_choice": "auto",
  "parallel_tool_calls": true,

  "response_format": { "type": "json_object" }
}

Response (non-streaming)

Two fields are non-obvious on Kimi: reasoning_content (chain of thought) and tool_calls (when the model wants to call a function).

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1776947082,
  "model": "moonshot/kimi-k2.7-code",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The weather in Tokyo is sunny.",
        "reasoning_content": "The user asked about Tokyo. I should call the tool...",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\":\"Tokyo\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 53,
    "completion_tokens": 79,
    "total_tokens": 132
  }
}

If you don't want to show the chain of thought, just read contentreasoning_content is additive, not required. Cursor, Cline, and the OpenAI SDK all ignore it by default.

Streaming (SSE)

Set "stream": true and read text/event-stream. Kimi emits chunks in this order:

// 1. Role
data: {"choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

// 2. Reasoning (Kimi thinks first — stream of reasoning_content)
data: {"choices":[{"index":0,"delta":{"reasoning_content":"The "},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"reasoning_content":"user "},"finish_reason":null}]}
// ... many chunks ...

// 3. Content (the actual answer)
data: {"choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

// 4. Finish
data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Tool calling

Kimi speaks the standard OpenAI tool-call protocol. Define tools, Kimi decides when to call them, you execute the call and feed the result back as a role: "tool" message.

# Define a tool
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

# Turn 1 — Kimi asks to call the tool
r = client.chat.completions.create(
    model="moonshot/kimi-k2.7-code",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
)
call = r.choices[0].message.tool_calls[0]
# call.function.name == "get_weather"
# call.function.arguments == '{"city":"Tokyo"}'

# You execute the tool
result = {"temperature_c": 22, "conditions": "sunny"}

# Turn 2 — feed the result back, Kimi writes the final answer
r2 = client.chat.completions.create(
    model="moonshot/kimi-k2.7-code",
    messages=[
        {"role": "user", "content": "Weather in Tokyo?"},
        r.choices[0].message,                         # assistant turn with tool_calls
        {"role": "tool", "tool_call_id": call.id,
         "content": json.dumps(result)},
    ],
    tools=tools,
)
print(r2.choices[0].message.content)

Streaming tool calls

When streaming with tools, the arguments string arrives as fragments. Concatenate by index until finish_reason === "tool_calls":

data: {"choices":[{"index":0,"delta":{"tool_calls":[
  {"index":0,"id":"call_abc","type":"function",
   "function":{"name":"get_weather","arguments":""}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[
  {"index":0,"function":{"arguments":"{\"city\":"}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[
  {"index":0,"function":{"arguments":"\"Tokyo\"}"}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}

Use Kimi K2.7 Code in Cursor (free agent mode)

Cursor's agent mode speaks full OpenAI-compat, so Kimi slots in:

  1. Get a key at aigateway.sh
  2. In Cursor: Settings → Models → "Override OpenAI Base URL"
    Base URL: https://api.aigateway.sh/v1
    Add model: moonshot/kimi-k2.7-code
  3. Code.

Agent mode, tool calls, multi-turn conversations — all work. Tab autocomplete and Cmd-K still use Cursor's own backend (hardwired on their side), but the chat + agent panel is fully yours.

Use Kimi K2.7 Code in Cline

# Cline: Settings → API Provider → "OpenAI Compatible"
Base URL:  https://api.aigateway.sh/v1
API key:   sk-aig-...
Model ID:  moonshot/kimi-k2.7-code

Use Kimi K2.7 Code in LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="moonshot/kimi-k2.7-code",
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)
print(llm.invoke("Hello!").content)

Use Kimi K2.7 Code in Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";

const aigateway = createOpenAI({
  baseURL: "https://api.aigateway.sh/v1",
  apiKey: process.env.AIG_KEY,
});

const result = await streamText({
  model: aigateway("moonshot/kimi-k2.7-code"),
  prompt: "Hello!",
});
for await (const chunk of result.textStream) process.stdout.write(chunk);

Vision input

Kimi sees images. Pass them as OpenAI-style content parts:

{
  "model": "moonshot/kimi-k2.7-code",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What is in this image?" },
        { "type": "image_url",
          "image_url": { "url": "https://example.com/cat.jpg" } }
      ]
    }
  ]
}

Pricing worked example

A typical agent turn: ~500 input tokens (conversation + tool defs), ~300 output tokens.

With provider prompt caching (distinct from gateway-level response caching), cached input tokens bill at 50% of the input rate ($0.95 / 1M × 50% = $0.475 / 1M) while output tokens remain at the full rate. A prefix-cached agent loop (system prompt + tool definitions unchanged turn-to-turn) cuts input cost roughly in half — about 800 turns per dollar. Check the X-Cached-Input-Units response header to see how many input tokens were cache-served on each call.

Limits

Why coding agents

K2.7 Code is coding-tuned: it's aimed at the agentic edit loop — read a repo, plan a change, call tools, write a multi-file diff, run a test, iterate. Its strengths:

Run it against Opus 4.7 or GPT-5.4 on your own repo — point an eval at all three and pick the winner for your codebase. K2.6 remains in the catalog at the same pricing if you want to compare versions; see the K2.6 guide.

Common errors

More