Build your own Claude Code — 300-line local coding agent

Claude Code and Cursor have trained everyone to expect an agent that knows your repo, plans multi-file changes, runs your tests, and iterates on failures. The good news: the architecture is not mysterious. It's a ReAct loop — reason + act — wrapped around four tools.

This example is the whole thing in 300 lines of Python. The loop, the tools, the diff formatting, the test runner. Point it at Kimi K2.6 and your $5 signup credit covers early development; point it at Claude Opus 4.7 and you get something within shouting distance of Claude Code itself. No vendor lock-in, no proprietary surface — it's your agent.

AIgateway keyPython 3.11+subprocessrich (pretty terminal)unified diff

Note

Why roll your own? Control. You pick the model, the prompt, the tool surface, the auto-approve policy. Everything the vendor hides becomes a knob. Works-in-your-language, fits-your-team.

Build it in five steps

STEP 01

Declare the tools

Four is enough for most coding tasks — read files, edit files, run shell commands, run tests. Each is a Python function plus a JSON schema the LLM can call. More tools are a distraction until the core loop works.

TOOLS = [
    {"type": "function", "function": {
        "name": "read_file",
        "description": "Return the full text of a file, optionally a line range.",
        "parameters": {"type": "object",
            "properties": {"path": {"type": "string"}, "start": {"type": "integer"},
                           "end": {"type": "integer"}},
            "required": ["path"]}}},
    {"type": "function", "function": {
        "name": "apply_patch",
        "description": "Apply a unified-diff patch to the repo.",
        "parameters": {"type": "object",
            "properties": {"patch": {"type": "string"}},
            "required": ["patch"]}}},
    {"type": "function", "function": {
        "name": "run_shell",
        "description": "Run a shell command in the repo root. Returns stdout + stderr.",
        "parameters": {"type": "object",
            "properties": {"cmd": {"type": "string"}},
            "required": ["cmd"]}}},
    {"type": "function", "function": {
        "name": "run_tests",
        "description": "Run the project's test command. Returns pass/fail summary + output.",
        "parameters": {"type": "object",
            "properties": {"scope": {"type": "string"}}}}},
]

STEP 02

Implement the tool handlers

Map each function name to a Python function that does the thing. The run_shell handler is the one that needs care — sandbox it, cap the walltime, and whitelist commands if you're running in a shared environment.

import subprocess, pathlib

def read_file(path: str, start: int = None, end: int = None) -> str:
    lines = pathlib.Path(path).read_text().splitlines(keepends=True)
    if start is None: return "".join(lines)
    return "".join(lines[start-1:end])

def apply_patch(patch: str) -> str:
    r = subprocess.run(["git", "apply", "-"], input=patch.encode(),
                       capture_output=True)
    if r.returncode: raise RuntimeError(r.stderr.decode())
    return "applied"

def run_shell(cmd: str, timeout: int = 30) -> str:
    r = subprocess.run(cmd, shell=True, capture_output=True, timeout=timeout)
    return r.stdout.decode() + r.stderr.decode()

def run_tests(scope: str = "") -> str:
    cmd = f"pnpm test {scope}" if scope else "pnpm test"
    return run_shell(cmd, timeout=120)

HANDLERS = {"read_file": read_file, "apply_patch": apply_patch,
            "run_shell": run_shell, "run_tests": run_tests}

STEP 03

Write the ReAct loop

Ask the model, execute any tool calls it returns, feed the results back, repeat until the model stops calling tools. That's the whole loop — everything else is quality-of-life on top.

from openai import OpenAI
import json

client = OpenAI(base_url="https://api.aigateway.sh/v1", api_key="sk-aig-...")
SYSTEM = open("agent-system-prompt.md").read()  # your preferences + conventions

def step(messages: list[dict], model: str = "moonshot/kimi-k2.6") -> list[dict]:
    r = client.chat.completions.create(
        model=model, messages=messages, tools=TOOLS, tool_choice="auto",
        extra_headers={"x-aig-tag": "coder.loop"},
    )
    msg = r.choices[0].message
    messages.append({"role": "assistant", "content": msg.content or "",
                     "tool_calls": msg.tool_calls or []})
    if not msg.tool_calls:
        return messages
    for tc in msg.tool_calls:
        args = json.loads(tc.function.arguments or "{}")
        try:
            out = str(HANDLERS[tc.function.name](**args))[:8000]
        except Exception as e:
            out = f"ERROR: {e}"
        messages.append({"role": "tool", "tool_call_id": tc.id, "content": out})
    return messages

def run(goal: str, model: str = "moonshot/kimi-k2.6"):
    messages = [{"role": "system", "content": SYSTEM},
                {"role": "user",   "content": goal}]
    for i in range(40):  # hard step cap
        messages = step(messages, model)
        if messages[-1]["role"] == "assistant" and not messages[-1].get("tool_calls"):
            break
    return messages[-1]["content"]

STEP 04

Plug in a human gate for risky tools

Auto-approving edits and shell commands is fine in a sandbox; in a real repo, gate the two write-side tools behind a one-key confirmation. That one line is the difference between 'a clever demo' and 'something you'd let run on Friday afternoon.'

RISKY = {"apply_patch", "run_shell"}

def human_ok(tool_name: str, args: dict) -> bool:
    if tool_name not in RISKY:
        return True
    print(f"\n{tool_name}({args})")
    return input("[y/N] > ").strip().lower() == "y"

# Wrap HANDLERS call sites with human_ok() — 4 extra lines.

STEP 05

Point at any model

The whole loop is model-agnostic. Start on Kimi K2.6 for dev (your $5 signup credit covers it), upgrade to Opus 4.7 or GPT-5.4 for production, A/B all three with the eval example in this library. One string change.

if __name__ == "__main__":
    import sys
    goal  = sys.argv[1]
    model = sys.argv[2] if len(sys.argv) > 2 else "moonshot/kimi-k2.6"
    print(run(goal, model))

# $ python agent.py "fix the pagination bug in users API"
# $ python agent.py "add a CSV exporter for /v1/usage" anthropic/claude-opus-4.7

What to add next

MCP tools. If you already built the MCP server example in this library, plug its tools into the coder with no extra handlers — the gateway routes MCP calls through the same key.

A plan step. Before the first tool call, ask the model to produce a written plan. That makes the agent's intent auditable and dramatically cuts the number of dead-end edits on hard tasks.

A cache. Coding workloads repeat — turn on `x-aig-cache: semantic` for read-heavy phases (reading the same files, running the same tests) and you'll see 30-40% bill reduction on iterative tasks.

Make it yours

The system prompt file is where your team's conventions live — naming, commit style, testing policy, what to avoid. Treat it like a living README for the agent; the prompt is half the agent.

Custom tools are a one-function addition. A company-specific `deploy_preview(env)` tool can turn this agent from 'writes code' into 'ships features' for your stack specifically.

# Add a tool that knows your stack.
TOOLS.append({"type": "function", "function": {
    "name": "deploy_preview",
    "description": "Create a preview deployment and return the URL.",
    "parameters": {"type": "object",
        "properties": {"branch": {"type": "string"}}, "required": ["branch"]}}})

def deploy_preview(branch: str) -> str:
    return run_shell(f"vercel --env=preview --git-branch={branch}")
HANDLERS["deploy_preview"] = deploy_preview

FAQ

How does this compare to Claude Code?+

Architecturally the same — ReAct loop plus a tool belt. Claude Code is more polished (UI, sandbox, approval policies, file watchers), but it is locked to Anthropic models and Anthropic's tool surface. This agent gives up the polish in exchange for total control: pick any model, define any tool, ship any UX.

Which model should I run it on?+

Kimi K2.6 is the best cheap option and genuinely competitive on most coding tasks (and your $5 signup credit covers early development); Opus 4.7 is best-in-class for complex multi-file refactors; GPT-5.4 is a strong middle ground. Run the eval example in this library on your own task samples before committing.

Is running shell commands safe?+

Not by default. Always gate `run_shell` and `apply_patch` behind a human approval in real repos, or run the agent in a disposable sandbox (Docker, VM, or a CF Containers instance). The example shows the 4-line gate.

Can the agent run my full test suite?+

Yes, but cap the walltime. `run_tests` in the example has a 120s timeout — if your suite is longer, split it into scoped runs (the `scope` arg in the tool schema). For CI-scale runs, skip in-loop testing and let the CI verify at PR time.

How do I stop it from going off the rails?+

The 40-step hard cap in the loop is the first line. The `x-aig-tag: coder.loop` header plus a monthly cap on that tag is the second. Human approval for write-side tools is the third. In practice those three guardrails catch 99% of runaway cases.

Can I stream the agent's thinking to the terminal?+

Yes — set `stream=True` on the chat-completions call and render deltas as they arrive. The real Claude Code does exactly this; it's the single biggest UX lift over a blocking agent.

What about multi-repo or monorepo?+

Add a `workspace` parameter to the tools and pass a repo root per call. In monorepos, restrict `read_file` to paths under the current working package so the agent doesn't drown in unrelated context.

Can I run the agent on CI?+

Yes. A good pattern is 'agent-on-PR' — when a PR opens with a label, run the agent with a short goal (add tests, fix lint, port to the new API), let it push to the branch, and require human review. The `x-aig-tag` makes the per-PR cost obvious.

READY TO BUILD?

Get an AIgateway key in 30 seconds. $5 signup credit covers Kimi K2.6 and six other curated picks; everything else is pass-through.

Get your key →API reference Kimi K2.6 details

More examples

Build an MCP server any agent can use — hosted, 40 lines

Ship one MCP server on the AIgateway MCP surface and every agent — Claude Code, Cursor, Codex, Cline, anything that speaks MCP — can call its tools. No infra to host, no auth to wire, no schema-sync headaches.

Launch a work-automation agent swarm with Hermes + Kimi K2.6

Spin up a five-agent swarm that runs locally on Hermes, routes every call through AIgateway, and automates your inbox, calendar, research, coding, and reporting — Kimi K2.6 handles the reasoning on your $5 signup credit.

Run an eval across 5 frontier models on your own data in 10 minutes

Send the same 50-row dataset to Opus 4.7, GPT-5.4, Kimi K2.6, Gemini 3.1, and Llama 4.1 in parallel through one AIgateway key, grade every response with an LLM judge, and publish a scorecard — 40 lines of Python, no eval framework required.