A ReAct loop with four tools — read, plan, edit, test — that reads your repo, makes multi-file patches, runs the tests, and iterates on failures. One AIgateway key, any model slug, full control. No vendor lock-in.
Claude Code and Cursor have trained everyone to expect an agent that knows your repo, plans multi-file changes, runs your tests, and iterates on failures. The good news: the architecture is not mysterious. It's a ReAct loop — reason + act — wrapped around four tools.
This example is the whole thing in 300 lines of Python. The loop, the tools, the diff formatting, the test runner. Point it at Kimi K2.6 and it runs on the free tier; point it at Claude Opus 4.7 and you get something within shouting distance of Claude Code itself. No vendor lock-in, no proprietary surface — it's your agent.
Four is enough for most coding tasks — read files, edit files, run shell commands, run tests. Each is a Python function plus a JSON schema the LLM can call. More tools are a distraction until the core loop works.
TOOLS = [
{"type": "function", "function": {
"name": "read_file",
"description": "Return the full text of a file, optionally a line range.",
"parameters": {"type": "object",
"properties": {"path": {"type": "string"}, "start": {"type": "integer"},
"end": {"type": "integer"}},
"required": ["path"]}}},
{"type": "function", "function": {
"name": "apply_patch",
"description": "Apply a unified-diff patch to the repo.",
"parameters": {"type": "object",
"properties": {"patch": {"type": "string"}},
"required": ["patch"]}}},
{"type": "function", "function": {
"name": "run_shell",
"description": "Run a shell command in the repo root. Returns stdout + stderr.",
"parameters": {"type": "object",
"properties": {"cmd": {"type": "string"}},
"required": ["cmd"]}}},
{"type": "function", "function": {
"name": "run_tests",
"description": "Run the project's test command. Returns pass/fail summary + output.",
"parameters": {"type": "object",
"properties": {"scope": {"type": "string"}}}}},
]Map each function name to a Python function that does the thing. The run_shell handler is the one that needs care — sandbox it, cap the walltime, and whitelist commands if you're running in a shared environment.
import subprocess, pathlib
def read_file(path: str, start: int = None, end: int = None) -> str:
lines = pathlib.Path(path).read_text().splitlines(keepends=True)
if start is None: return "".join(lines)
return "".join(lines[start-1:end])
def apply_patch(patch: str) -> str:
r = subprocess.run(["git", "apply", "-"], input=patch.encode(),
capture_output=True)
if r.returncode: raise RuntimeError(r.stderr.decode())
return "applied"
def run_shell(cmd: str, timeout: int = 30) -> str:
r = subprocess.run(cmd, shell=True, capture_output=True, timeout=timeout)
return r.stdout.decode() + r.stderr.decode()
def run_tests(scope: str = "") -> str:
cmd = f"pnpm test {scope}" if scope else "pnpm test"
return run_shell(cmd, timeout=120)
HANDLERS = {"read_file": read_file, "apply_patch": apply_patch,
"run_shell": run_shell, "run_tests": run_tests}Ask the model, execute any tool calls it returns, feed the results back, repeat until the model stops calling tools. That's the whole loop — everything else is quality-of-life on top.
from openai import OpenAI
import json
client = OpenAI(base_url="https://api.aigateway.sh/v1", api_key="sk-aig-...")
SYSTEM = open("agent-system-prompt.md").read() # your preferences + conventions
def step(messages: list[dict], model: str = "moonshot/kimi-k2.6") -> list[dict]:
r = client.chat.completions.create(
model=model, messages=messages, tools=TOOLS, tool_choice="auto",
extra_headers={"x-aig-tag": "coder.loop"},
)
msg = r.choices[0].message
messages.append({"role": "assistant", "content": msg.content or "",
"tool_calls": msg.tool_calls or []})
if not msg.tool_calls:
return messages
for tc in msg.tool_calls:
args = json.loads(tc.function.arguments or "{}")
try:
out = str(HANDLERS[tc.function.name](**args))[:8000]
except Exception as e:
out = f"ERROR: {e}"
messages.append({"role": "tool", "tool_call_id": tc.id, "content": out})
return messages
def run(goal: str, model: str = "moonshot/kimi-k2.6"):
messages = [{"role": "system", "content": SYSTEM},
{"role": "user", "content": goal}]
for i in range(40): # hard step cap
messages = step(messages, model)
if messages[-1]["role"] == "assistant" and not messages[-1].get("tool_calls"):
break
return messages[-1]["content"]Auto-approving edits and shell commands is fine in a sandbox; in a real repo, gate the two write-side tools behind a one-key confirmation. That one line is the difference between 'a clever demo' and 'something you'd let run on Friday afternoon.'
RISKY = {"apply_patch", "run_shell"}
def human_ok(tool_name: str, args: dict) -> bool:
if tool_name not in RISKY:
return True
print(f"\n{tool_name}({args})")
return input("[y/N] > ").strip().lower() == "y"
# Wrap HANDLERS call sites with human_ok() — 4 extra lines.The whole loop is model-agnostic. Start on free Kimi K2.6 for dev, upgrade to Opus 4.7 or GPT-5.4 for production, A/B all three with the eval example in this library. One string change.
if __name__ == "__main__":
import sys
goal = sys.argv[1]
model = sys.argv[2] if len(sys.argv) > 2 else "moonshot/kimi-k2.6"
print(run(goal, model))
# $ python agent.py "fix the pagination bug in users API"
# $ python agent.py "add a CSV exporter for /v1/usage" anthropic/claude-opus-4.7MCP tools. If you already built the MCP server example in this library, plug its tools into the coder with no extra handlers — the gateway routes MCP calls through the same key.
A plan step. Before the first tool call, ask the model to produce a written plan. That makes the agent's intent auditable and dramatically cuts the number of dead-end edits on hard tasks.
A cache. Coding workloads repeat — turn on `x-aig-cache: semantic` for read-heavy phases (reading the same files, running the same tests) and you'll see 30-40% bill reduction on iterative tasks.
The system prompt file is where your team's conventions live — naming, commit style, testing policy, what to avoid. Treat it like a living README for the agent; the prompt is half the agent.
Custom tools are a one-function addition. A company-specific `deploy_preview(env)` tool can turn this agent from 'writes code' into 'ships features' for your stack specifically.
# Add a tool that knows your stack.
TOOLS.append({"type": "function", "function": {
"name": "deploy_preview",
"description": "Create a preview deployment and return the URL.",
"parameters": {"type": "object",
"properties": {"branch": {"type": "string"}}, "required": ["branch"]}}})
def deploy_preview(branch: str) -> str:
return run_shell(f"vercel --env=preview --git-branch={branch}")
HANDLERS["deploy_preview"] = deploy_previewArchitecturally the same — ReAct loop plus a tool belt. Claude Code is more polished (UI, sandbox, approval policies, file watchers), but it is locked to Anthropic models and Anthropic's tool surface. This agent gives up the polish in exchange for total control: pick any model, define any tool, ship any UX.
Kimi K2.6 is the best free option and genuinely competitive on most coding tasks; Opus 4.7 is best-in-class for complex multi-file refactors; GPT-5.4 is a strong middle ground. Run the eval example in this library on your own task samples before committing.
Not by default. Always gate `run_shell` and `apply_patch` behind a human approval in real repos, or run the agent in a disposable sandbox (Docker, VM, or a CF Containers instance). The example shows the 4-line gate.
Yes, but cap the walltime. `run_tests` in the example has a 120s timeout — if your suite is longer, split it into scoped runs (the `scope` arg in the tool schema). For CI-scale runs, skip in-loop testing and let the CI verify at PR time.
The 40-step hard cap in the loop is the first line. The `x-aig-tag: coder.loop` header plus a monthly cap on that tag is the second. Human approval for write-side tools is the third. In practice those three guardrails catch 99% of runaway cases.
Yes — set `stream=True` on the chat-completions call and render deltas as they arrive. The real Claude Code does exactly this; it's the single biggest UX lift over a blocking agent.
Add a `workspace` parameter to the tools and pass a repo root per call. In monorepos, restrict `read_file` to paths under the current working package so the agent doesn't drown in unrelated context.
Yes. A good pattern is 'agent-on-PR' — when a PR opens with a label, run the agent with a short goal (add tests, fix lint, port to the new API), let it push to the branch, and require human review. The `x-aig-tag` makes the per-PR cost obvious.