Kimi K2.7 Code
Moonshot's frontier open-weight coding model. 262K context, native tool calling, vision, and extended reasoning — coding-tuned and agentic, built for the multi-step edit loops coding agents run. Served through AIgateway with an OpenAI-compatible API — drop-in compatible with the OpenAI SDK, Cursor, Cline, LangChain, Vercel AI SDK, and anything else that speaks OpenAI.
Quickstart (60 seconds)
Every OpenAI SDK works — just change base_url. Here's the Python version:
# pip install aigateway-py openai
# aigateway-py: sub-accounts, evals, replays, jobs, webhook verify.
# openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
from openai import OpenAI
client = OpenAI(
base_url="https://api.aigateway.sh/v1",
api_key="sk-aig-...",
)
r = client.chat.completions.create(
model="moonshot/kimi-k2.7-code",
messages=[{"role": "user", "content": "Write a Python one-liner to reverse a string."}],
)
print(r.choices[0].message.content)Model card
- Slug:
moonshot/kimi-k2.7-code - Provider: Moonshot (served on the edge via AIgateway)
- Context window: 262,144 tokens (~700 pages of text, most mid-sized repos fit whole)
- Max output: 16,384 tokens
- Modality: Text + vision
- Capabilities: Streaming, tool calling, JSON mode, extended reasoning, vision
- Pricing: $0.95 / 1M input tokens, $4.00 / 1M output tokens. Provider prompt-cached input tokens bill at 50% of the input rate ($0.475 / 1M). Pass-through — our 5% fee is added to the provider cost on every call.
Request
Full OpenAI chat.completions body. Everything is optional except model and messages:
{
"model": "moonshot/kimi-k2.7-code",
"messages": [
{ "role": "system", "content": "You are a helpful coding assistant." },
{ "role": "user", "content": "Hello!" }
],
"temperature": 0.7,
"top_p": 0.95,
"max_tokens": 4096,
"stream": false,
"tools": [ /* OpenAI function spec — see below */ ],
"tool_choice": "auto",
"parallel_tool_calls": true,
"response_format": { "type": "json_object" }
}Response (non-streaming)
Two fields are non-obvious on Kimi: reasoning_content (chain of thought) and tool_calls (when the model wants to call a function).
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1776947082,
"model": "moonshot/kimi-k2.7-code",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The weather in Tokyo is sunny.",
"reasoning_content": "The user asked about Tokyo. I should call the tool...",
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\":\"Tokyo\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
],
"usage": {
"prompt_tokens": 53,
"completion_tokens": 79,
"total_tokens": 132
}
}If you don't want to show the chain of thought, just read content — reasoning_content is additive, not required. Cursor, Cline, and the OpenAI SDK all ignore it by default.
Streaming (SSE)
Set "stream": true and read text/event-stream. Kimi emits chunks in this order:
// 1. Role
data: {"choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
// 2. Reasoning (Kimi thinks first — stream of reasoning_content)
data: {"choices":[{"index":0,"delta":{"reasoning_content":"The "},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"reasoning_content":"user "},"finish_reason":null}]}
// ... many chunks ...
// 3. Content (the actual answer)
data: {"choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
// 4. Finish
data: {"choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Tool calling
Kimi speaks the standard OpenAI tool-call protocol. Define tools, Kimi decides when to call them, you execute the call and feed the result back as a role: "tool" message.
# Define a tool
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}]
# Turn 1 — Kimi asks to call the tool
r = client.chat.completions.create(
model="moonshot/kimi-k2.7-code",
messages=[{"role": "user", "content": "Weather in Tokyo?"}],
tools=tools,
)
call = r.choices[0].message.tool_calls[0]
# call.function.name == "get_weather"
# call.function.arguments == '{"city":"Tokyo"}'
# You execute the tool
result = {"temperature_c": 22, "conditions": "sunny"}
# Turn 2 — feed the result back, Kimi writes the final answer
r2 = client.chat.completions.create(
model="moonshot/kimi-k2.7-code",
messages=[
{"role": "user", "content": "Weather in Tokyo?"},
r.choices[0].message, # assistant turn with tool_calls
{"role": "tool", "tool_call_id": call.id,
"content": json.dumps(result)},
],
tools=tools,
)
print(r2.choices[0].message.content)Streaming tool calls
When streaming with tools, the arguments string arrives as fragments. Concatenate by index until finish_reason === "tool_calls":
data: {"choices":[{"index":0,"delta":{"tool_calls":[
{"index":0,"id":"call_abc","type":"function",
"function":{"name":"get_weather","arguments":""}}]},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"tool_calls":[
{"index":0,"function":{"arguments":"{\"city\":"}}]},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"tool_calls":[
{"index":0,"function":{"arguments":"\"Tokyo\"}"}}]},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}Use Kimi K2.7 Code in Cursor (free agent mode)
Cursor's agent mode speaks full OpenAI-compat, so Kimi slots in:
- Get a key at aigateway.sh
- In Cursor: Settings → Models → "Override OpenAI Base URL"
Base URL:https://api.aigateway.sh/v1
Add model:moonshot/kimi-k2.7-code - Code.
Agent mode, tool calls, multi-turn conversations — all work. Tab autocomplete and Cmd-K still use Cursor's own backend (hardwired on their side), but the chat + agent panel is fully yours.
Use Kimi K2.7 Code in Cline
# Cline: Settings → API Provider → "OpenAI Compatible" Base URL: https://api.aigateway.sh/v1 API key: sk-aig-... Model ID: moonshot/kimi-k2.7-code
Use Kimi K2.7 Code in LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="moonshot/kimi-k2.7-code",
base_url="https://api.aigateway.sh/v1",
api_key="sk-aig-...",
)
print(llm.invoke("Hello!").content)Use Kimi K2.7 Code in Vercel AI SDK
import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";
const aigateway = createOpenAI({
baseURL: "https://api.aigateway.sh/v1",
apiKey: process.env.AIG_KEY,
});
const result = await streamText({
model: aigateway("moonshot/kimi-k2.7-code"),
prompt: "Hello!",
});
for await (const chunk of result.textStream) process.stdout.write(chunk);Vision input
Kimi sees images. Pass them as OpenAI-style content parts:
{
"model": "moonshot/kimi-k2.7-code",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What is in this image?" },
{ "type": "image_url",
"image_url": { "url": "https://example.com/cat.jpg" } }
]
}
]
}Pricing worked example
A typical agent turn: ~500 input tokens (conversation + tool defs), ~300 output tokens.
- Input: 500 × $0.95 / 1,000,000 = $0.000475
- Output: 300 × $4.00 / 1,000,000 = $0.0012
- Provider cost: ~$0.0017 per turn; with our 5% platform fee on top, ~$0.00178 per turn — about 560 agent turns per dollar.
With provider prompt caching (distinct from gateway-level response caching), cached input tokens bill at 50% of the input rate ($0.95 / 1M × 50% = $0.475 / 1M) while output tokens remain at the full rate. A prefix-cached agent loop (system prompt + tool definitions unchanged turn-to-turn) cuts input cost roughly in half — about 800 turns per dollar. Check the X-Cached-Input-Units response header to see how many input tokens were cache-served on each call.
Limits
- Signup-credit tier: 10 requests / minute. Spendable from the $5 free credit on the curated 7-model edge tier; expires 7 days after signup.
- Paid (any topup): 600 requests / minute. Auto-promoted on first topup. See rate limits.
- Enterprise: 30,000+ RPM under contract.
Why coding agents
K2.7 Code is coding-tuned: it's aimed at the agentic edit loop — read a repo, plan a change, call tools, write a multi-file diff, run a test, iterate. Its strengths:
- Clean tool-call argument JSON across long multi-step loops
- 256K context — most mid-sized repos fit whole, no chunking
- Extended reasoning before it commits to an edit plan
- Vision input for screenshots, diagrams, and UI references
Run it against Opus 4.7 or GPT-5.4 on your own repo — point an eval at all three and pick the winner for your codebase. K2.6 remains in the catalog at the same pricing if you want to compare versions; see the K2.6 guide.
Common errors
429 rate_limit_error— over 10 req/min on the signup-credit tier. Top up to unlock 600 req/min on the paid tier.402 insufficient_credits— $5 signup credit exhausted or expired, and no topup balance. Add credits.503 service_unavailable— upstream saturation on Kimi. Retryable; default retry-after is 2 seconds.