Launch a work-automation agent swarm with Hermes + Kimi K2.6

A single chatbot is a toy. A swarm — a planner that delegates to specialists, each with its own memory and tool belt — is what actually replaces a workday.

In this example we wire up five Hermes agents running locally on your laptop, point them at AIgateway, and let Kimi K2.6 do the thinking. You get agent-grade reasoning (1T parameters, 256K context, native tool calling, 300-agent sub-plan fan-out) on the free tier through Apr 30, 2026 — then pass-through pricing after that.

By the end you'll have a five-agent crew that triages your inbox, keeps your calendar sane, drafts research memos, writes + tests code, and ships a daily stand-up report. All local, all logged, all billable against one key.

AIgateway key Kimi K2.6 (free)Hermes Agent (local)OpenAI SDKMCP tools

Note

Free through Apr 30, 2026 — every AIgateway account gets 100 Kimi K2.6 requests/day with no card on file. The swarm in this tutorial sits comfortably inside that envelope during development.

Architecture — Kimi K2.6 powers a swarm of specialist Hermes agents via AIgateway — Hermes runs the agent loop locally. AIgateway is the one API your five agents talk to — Kimi K2.6 for reasoning today, any other model tomorrow with a slug change.source · LushBinary ↗

Build it in six steps

STEP 01
Get an AIgateway key
Sign in at aigateway.sh/signin and copy your `sk-aig-…` key from the dashboard. Every key automatically includes the Kimi K2.6 free tier (100 req/day through Apr 30).
STEP 02
Install Hermes locally
One-line install. Hermes runs the agent loop on your laptop and ships with an MCP tool client out of the box.
```
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
hermes --version
```

STEP 03

Point Hermes at AIgateway

Hermes reads the same `OPENAI_*` env vars the OpenAI SDK does. Set them to your AIgateway key and base URL, pin the default model to Kimi K2.6.

export OPENAI_API_KEY="sk-aig-..."
export OPENAI_BASE_URL="https://api.aigateway.sh/v1"
export HERMES_DEFAULT_MODEL="moonshot/kimi-k2.6"

hermes model set default moonshot/kimi-k2.6

STEP 04

Define the swarm

Five specialist agents + one planner. Each agent has a role, a toolset, and a shared memory handle. The planner reads your inbox + today's calendar and hands each specialist a focused brief.

from hermes import Agent, Swarm
from openai import OpenAI

client = OpenAI(
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)
MODEL = "moonshot/kimi-k2.6"

planner = Agent(
    name="planner",
    client=client, model=MODEL,
    system=(
        "You are the chief-of-staff. Read the user's inbox, calendar, "
        "and running notes. Produce a JSON plan that assigns one brief "
        "to each specialist: inbox, calendar, research, coder, reporter."
    ),
    tools=["memory.read", "inbox.list", "calendar.list"],
)

inbox = Agent(
    name="inbox",
    client=client, model=MODEL,
    system=(
        "You triage email. Archive noise, reply to acknowledgements, "
        "escalate anything that needs the user. Draft replies — never send."
    ),
    tools=["inbox.list", "inbox.archive", "inbox.draft"],
)

calendar = Agent(
    name="calendar",
    client=client, model=MODEL,
    system=(
        "You own the calendar. Find conflicts, propose holds, "
        "reshuffle when priorities change."
    ),
    tools=["calendar.list", "calendar.propose", "calendar.move"],
)

research = Agent(
    name="research",
    client=client, model=MODEL,
    system=(
        "You are a research analyst. Given a brief, gather sources, "
        "extract the 3 strongest facts, write a 200-word memo."
    ),
    tools=["web.search", "web.fetch", "memory.append"],
)

coder = Agent(
    name="coder",
    client=client, model=MODEL,
    system=(
        "You are a senior engineer. Write small, tested patches. "
        "Never touch production without human review."
    ),
    tools=["repo.read", "repo.diff", "shell.run", "tests.run"],
)

reporter = Agent(
    name="reporter",
    client=client, model=MODEL,
    system=(
        "You write a one-page daily stand-up from the swarm's logs. "
        "Bullet what shipped, what's blocked, what the user should decide."
    ),
    tools=["memory.read", "report.post"],
)

swarm = Swarm(
    planner=planner,
    specialists=[inbox, calendar, research, coder, reporter],
    memory="./swarm-memory.db",
)

STEP 05

Run the morning loop

One call kicks off the whole swarm. Hermes handles the planner → specialist fan-out, tool execution, memory writes, retries, and token accounting.

report = swarm.run(
    goal="Handle my morning: triage inbox, clean up the calendar, "
         "draft the weekly investor update, and ship me a stand-up.",
    budget_usd=0.25,
    max_steps=40,
)
print(report.text)
print("steps:", report.step_count, "cost:", report.cost_usd)

STEP 06
Attribute the spend
Every agent tags its own calls with `x-aig-tag`, so when you open the AIgateway dashboard you see exactly which specialist burned which cents. Hard-cap any tag with a one-line POST if a specialist goes rogue.
```
curl -X POST https://api.aigateway.sh/v1/budgets \
  -H "Authorization: Bearer sk-aig-..." \
  -d '{ "tag": "swarm.research", "monthly_cap_cents": 2000 }'
```

Kimi K2.6 — 1T parameter agent model with 256K context and native tool calling — Kimi K2.6 is the current open-weight frontier for agent workloads — it plans, calls tools natively, and handles long trajectories without drift. Free on AIgateway through Apr 30.source · Moonshot AI ↗

What the swarm actually does in the morning

The planner opens the day with one Kimi K2.6 call. It reads your last 24h of mail, today's calendar, and the swarm's long-term memory, then produces a JSON plan — five briefs, one per specialist, ranked by impact.

The five specialists run in parallel. Hermes watches the token budget; when a specialist overspends, its run is capped and control returns to the planner.

The reporter fires last. It reads every specialist's memory writes and produces a one-page stand-up: shipped, blocked, decisions needed. That file is the first thing you read with your coffee.

Swarm memory — per-agent long-term memory with semantic recall — Hermes' local memory store. Each agent gets its own namespace; the planner sees across namespaces when it needs to hand off work between specialists.source · Unwind AI ↗

Swap Kimi for Opus on a hot call

One of the reasons you route through AIgateway instead of hitting Moonshot direct: on any call, you can upgrade to Claude Opus 4.7 or GPT-5.4 by changing one string. Same key, same schema.

Typical pattern: plan + triage on Kimi K2.6 (cheap, fast), escalate to Opus for the research memo when the stakes are high.

# Escalate just the research call — everything else stays on Kimi.
research.model = "anthropic/claude-opus-4.7"
swarm.run(goal="Write the investor memo. Be rigorous.")

Hermes terminal — live stream of specialist agents running tools — Hermes streams every tool call to the terminal in real time. Nothing is a black box — you can pause, redirect, or kill any specialist mid-step.source · Unwind AI ↗

Going further

Add your own specialists. Hermes agents are just classes — give the new one a role, a system prompt, a toolset, and register it with the swarm.

Replace the `./swarm-memory.db` handle with a Vectorize binding to share memory across devices, or plug in any MCP-compatible memory server.

When you're ready to put the swarm in front of customers, mint a sub-account per user with the `/v1/sub-accounts` API — each customer gets their own key, spend cap, and isolated analytics without you touching a billing table.

FAQ

Do I need a Moonshot API key?+

No. Your AIgateway key (sk-aig-…) is the only credential. AIgateway proxies to Moonshot under the hood — you don't install their SDK, manage a second invoice, or register a second account.

Is Kimi K2.6 really free on AIgateway?+

Yes — 100 requests/day on `moonshot/kimi-k2.6` through Apr 30, 2026. No card on file. After the trial, Kimi bills pass-through like every other model in the catalog. A five-agent swarm typically runs 20-40 requests per morning loop, so the free tier covers regular development use.

Can I run this fully offline?+

The Hermes loop and your tool servers run locally. Model inference is a network call — Kimi K2.6 runs on Moonshot's infrastructure via AIgateway. If you want fully offline, swap the model slug to a Workers AI edge model (`@cf/meta/llama-3.1-8b-instruct`) or host your own and point the OpenAI client at it.

How do I keep one agent from burning the whole budget?+

Every swarm call carries an `x-aig-tag` header per specialist. POST a hard cap to `/v1/budgets` with that tag and AIgateway enforces it before dispatch — the specialist's calls start returning 402 when the cap trips, and the swarm routes around it.

What if Kimi K2.6 gets dethroned next month?+

Change one string. The swarm logic doesn't know which upstream model it's on. Swap every specialist to `gpt-5.4` or `claude-opus-4.7` — or A/B a few — without touching a single line of agent code.

Can I give each of my end users their own agent swarm?+

Yes. `POST /v1/sub-accounts` mints a scoped `sk-aig-…` key per customer with its own spend cap, rate limit, and isolated analytics. Hand it to the customer or keep it server-side. The swarm code doesn't change.

Do the specialists share memory?+

By default each specialist has its own memory namespace. The planner is the only agent that reads across namespaces — that's how it hands off context between specialists without leaking irrelevant history into every prompt.

Why Hermes and not OpenClaw?+

Either works. Hermes has a tighter Python surface and lands closer to the agent-framework end of the spectrum; OpenClaw is heavier but ships with 5,700+ prebuilt skills. The AIgateway side is identical — point either at `https://api.aigateway.sh/v1` with your key.

READY TO BUILD?

Get an AIgateway key in 30 seconds. Free Kimi K2.6 through Apr 30, 2026; everything else is pass-through.

Get your key →API reference Kimi K2.6 details