Guide

Kimi K2.7 Code

Moonshot's frontier open-weight coding model. 262K context, native tool calling, vision, and extended reasoning — coding-tuned and agentic, built for the multi-step edit loops coding agents run. Served through AIgateway with an OpenAI-compatible API — drop-in compatible with the OpenAI SDK, Cursor, Cline, LangChain, Vercel AI SDK, and anything else that speaks OpenAI.

Included in the $5 signup credit shortlist. Every new account gets $5 in free credits redeemable on a curated 7-model edge tier — Kimi K2.7 Code is the chat model in that set. Sign up, grab a key, and you're calling it in ~30 seconds.

Quickstart (60 seconds)

Every OpenAI SDK works — just change base_url. Here's the Python version:

# pip install aigateway-py openai
# aigateway-py: sub-accounts, evals, replays, jobs, webhook verify.
# openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)

r = client.chat.completions.create(
    model="moonshot/kimi-k2.7-code",
    messages=[{"role": "user", "content": "Write a Python one-liner to reverse a string."}],
)
print(r.choices[0].message.content)

Model card

Slug: moonshot/kimi-k2.7-code
Provider: Moonshot (served on the edge via AIgateway)
Context window: 262,144 tokens (~700 pages of text, most mid-sized repos fit whole)
Max output: 16,384 tokens
Modality: Text + vision
Capabilities: Streaming, tool calling, JSON mode, extended reasoning, vision
Pricing: $0.95 / 1M input tokens, $4.00 / 1M output tokens. Provider prompt-cached input tokens bill at 50% of the input rate ($0.475 / 1M). Pass-through — our 5% fee is added to the provider cost on every call.

Request

Full OpenAI chat.completions body. Everything is optional except model and messages:

{
  "model": "moonshot/kimi-k2.7-code",
  "messages": [
    { "role": "system", "content": "You are a helpful coding assistant." },
    { "role": "user",   "content": "Hello!" }
  ],
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 4096,
  "stream": false,

  "tools": [ /* OpenAI function spec — see below */ ],
  "tool_choice": "auto",
  "parallel_tool_calls": true,

  "response_format": { "type": "json_object" }
}

Response (non-streaming)

Two fields are non-obvious on Kimi: reasoning_content (chain of thought) and tool_calls (when the model wants to call a function).

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1776947082,
  "model": "moonshot/kimi-k2.7-code",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The weather in Tokyo is sunny.",
        "reasoning_content": "The user asked about Tokyo. I should call the tool...",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\":\"Tokyo\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 53,
    "completion_tokens": 79,
    "total_tokens": 132
  }
}

If you don't want to show the chain of thought, just read content — reasoning_content is additive, not required. Cursor, Cline, and the OpenAI SDK all ignore it by default.

Streaming (SSE)

Set "stream": true and read text/event-stream. Kimi emits chunks in this order:

// 1. Role
data: {"choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

// 2. Reasoning (Kimi thinks first — stream of reasoning_content)
data: {"choices":[{"index":0,"delta":{"reasoning_content":"The "},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"reasoning_content":"user "},"finish_reason":null}]}
// ... many chunks ...

// 3. Content (the actual answer)
data: {"choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

// 4. Finish
data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Tool calling

Kimi speaks the standard OpenAI tool-call protocol. Define tools, Kimi decides when to call them, you execute the call and feed the result back as a role: "tool" message.

# Define a tool
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

# Turn 1 — Kimi asks to call the tool
r = client.chat.completions.create(
    model="moonshot/kimi-k2.7-code",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
)
call = r.choices[0].message.tool_calls[0]
# call.function.name == "get_weather"
# call.function.arguments == '{"city":"Tokyo"}'

# You execute the tool
result = {"temperature_c": 22, "conditions": "sunny"}

# Turn 2 — feed the result back, Kimi writes the final answer
r2 = client.chat.completions.create(
    model="moonshot/kimi-k2.7-code",
    messages=[
        {"role": "user", "content": "Weather in Tokyo?"},
        r.choices[0].message,                         # assistant turn with tool_calls
        {"role": "tool", "tool_call_id": call.id,
         "content": json.dumps(result)},
    ],
    tools=tools,
)
print(r2.choices[0].message.content)

Streaming tool calls

When streaming with tools, the arguments string arrives as fragments. Concatenate by index until finish_reason === "tool_calls":

data: {"choices":[{"index":0,"delta":{"tool_calls":[
  {"index":0,"id":"call_abc","type":"function",
   "function":{"name":"get_weather","arguments":""}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[
  {"index":0,"function":{"arguments":"{\"city\":"}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[
  {"index":0,"function":{"arguments":"\"Tokyo\"}"}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}

Use Kimi K2.7 Code in Cursor (free agent mode)

Cursor's agent mode speaks full OpenAI-compat, so Kimi slots in:

Get a key at aigateway.sh
In Cursor: Settings → Models → "Override OpenAI Base URL"
Base URL: https://api.aigateway.sh/v1
Add model: moonshot/kimi-k2.7-code
Code.

Agent mode, tool calls, multi-turn conversations — all work. Tab autocomplete and Cmd-K still use Cursor's own backend (hardwired on their side), but the chat + agent panel is fully yours.

Use Kimi K2.7 Code in Cline

# Cline: Settings → API Provider → "OpenAI Compatible"
Base URL:  https://api.aigateway.sh/v1
API key:   sk-aig-...
Model ID:  moonshot/kimi-k2.7-code

Use Kimi K2.7 Code in LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="moonshot/kimi-k2.7-code",
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)
print(llm.invoke("Hello!").content)

Use Kimi K2.7 Code in Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";

const aigateway = createOpenAI({
  baseURL: "https://api.aigateway.sh/v1",
  apiKey: process.env.AIG_KEY,
});

const result = await streamText({
  model: aigateway("moonshot/kimi-k2.7-code"),
  prompt: "Hello!",
});
for await (const chunk of result.textStream) process.stdout.write(chunk);

Vision input

Kimi sees images. Pass them as OpenAI-style content parts:

{
  "model": "moonshot/kimi-k2.7-code",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What is in this image?" },
        { "type": "image_url",
          "image_url": { "url": "https://example.com/cat.jpg" } }
      ]
    }
  ]
}

Pricing worked example

A typical agent turn: ~500 input tokens (conversation + tool defs), ~300 output tokens.

Input: 500 × $0.95 / 1,000,000 = $0.000475
Output: 300 × $4.00 / 1,000,000 = $0.0012
Provider cost: ~$0.0017 per turn; with our 5% platform fee on top, ~$0.00178 per turn — about 560 agent turns per dollar.

With provider prompt caching (distinct from gateway-level response caching), cached input tokens bill at 50% of the input rate ($0.95 / 1M × 50% = $0.475 / 1M) while output tokens remain at the full rate. A prefix-cached agent loop (system prompt + tool definitions unchanged turn-to-turn) cuts input cost roughly in half — about 800 turns per dollar. Check the X-Cached-Input-Units response header to see how many input tokens were cache-served on each call.

Limits

Signup-credit tier: 10 requests / minute. Spendable from the $5 free credit on the curated 7-model edge tier; expires 7 days after signup.
Paid (any topup): 600 requests / minute. Auto-promoted on first topup. See rate limits.
Enterprise: 30,000+ RPM under contract.

Why coding agents

K2.7 Code is coding-tuned: it's aimed at the agentic edit loop — read a repo, plan a change, call tools, write a multi-file diff, run a test, iterate. Its strengths:

Clean tool-call argument JSON across long multi-step loops
256K context — most mid-sized repos fit whole, no chunking
Extended reasoning before it commits to an edit plan
Vision input for screenshots, diagrams, and UI references

Run it against Opus 4.7 or GPT-5.4 on your own repo — point an eval at all three and pick the winner for your codebase. K2.6 remains in the catalog at the same pricing if you want to compare versions; see the K2.6 guide.

Common errors

429 rate_limit_error — over 10 req/min on the signup-credit tier. Top up to unlock 600 req/min on the paid tier.
402 insufficient_credits — $5 signup credit exhausted or expired, and no topup balance. Add credits.
503 service_unavailable — upstream saturation on Kimi. Retryable; default retry-after is 2 seconds.

Kimi K2.7 Code

Quickstart (60 seconds)

Model card

Request

Response (non-streaming)

Streaming (SSE)

Tool calling

Streaming tool calls

Use Kimi K2.7 Code in Cursor (free agent mode)

Use Kimi K2.7 Code in Cline

Use Kimi K2.7 Code in LangChain

Use Kimi K2.7 Code in Vercel AI SDK

Vision input

Pricing worked example

Limits

Why coding agents

Common errors

More