Spin up a five-agent swarm that runs locally on Hermes, routes every call through AIgateway, and automates your inbox, calendar, research, coding, and reporting — using Kimi K2.6 on the free tier until Apr 30.
A single chatbot is a toy. A swarm — a planner that delegates to specialists, each with its own memory and tool belt — is what actually replaces a workday.
In this example we wire up five Hermes agents running locally on your laptop, point them at AIgateway, and let Kimi K2.6 do the thinking. You get agent-grade reasoning (1T parameters, 256K context, native tool calling, 300-agent sub-plan fan-out) on the free tier through Apr 30, 2026 — then pass-through pricing after that.
By the end you'll have a five-agent crew that triages your inbox, keeps your calendar sane, drafts research memos, writes + tests code, and ships a daily stand-up report. All local, all logged, all billable against one key.
Sign in at aigateway.sh/signin and copy your `sk-aig-…` key from the dashboard. Every key automatically includes the Kimi K2.6 free tier (100 req/day through Apr 30).
One-line install. Hermes runs the agent loop on your laptop and ships with an MCP tool client out of the box.
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash hermes --version
Hermes reads the same `OPENAI_*` env vars the OpenAI SDK does. Set them to your AIgateway key and base URL, pin the default model to Kimi K2.6.
export OPENAI_API_KEY="sk-aig-..." export OPENAI_BASE_URL="https://api.aigateway.sh/v1" export HERMES_DEFAULT_MODEL="moonshot/kimi-k2.6" hermes model set default moonshot/kimi-k2.6
Five specialist agents + one planner. Each agent has a role, a toolset, and a shared memory handle. The planner reads your inbox + today's calendar and hands each specialist a focused brief.
from hermes import Agent, Swarm
from openai import OpenAI
client = OpenAI(
base_url="https://api.aigateway.sh/v1",
api_key="sk-aig-...",
)
MODEL = "moonshot/kimi-k2.6"
planner = Agent(
name="planner",
client=client, model=MODEL,
system=(
"You are the chief-of-staff. Read the user's inbox, calendar, "
"and running notes. Produce a JSON plan that assigns one brief "
"to each specialist: inbox, calendar, research, coder, reporter."
),
tools=["memory.read", "inbox.list", "calendar.list"],
)
inbox = Agent(
name="inbox",
client=client, model=MODEL,
system=(
"You triage email. Archive noise, reply to acknowledgements, "
"escalate anything that needs the user. Draft replies — never send."
),
tools=["inbox.list", "inbox.archive", "inbox.draft"],
)
calendar = Agent(
name="calendar",
client=client, model=MODEL,
system=(
"You own the calendar. Find conflicts, propose holds, "
"reshuffle when priorities change."
),
tools=["calendar.list", "calendar.propose", "calendar.move"],
)
research = Agent(
name="research",
client=client, model=MODEL,
system=(
"You are a research analyst. Given a brief, gather sources, "
"extract the 3 strongest facts, write a 200-word memo."
),
tools=["web.search", "web.fetch", "memory.append"],
)
coder = Agent(
name="coder",
client=client, model=MODEL,
system=(
"You are a senior engineer. Write small, tested patches. "
"Never touch production without human review."
),
tools=["repo.read", "repo.diff", "shell.run", "tests.run"],
)
reporter = Agent(
name="reporter",
client=client, model=MODEL,
system=(
"You write a one-page daily stand-up from the swarm's logs. "
"Bullet what shipped, what's blocked, what the user should decide."
),
tools=["memory.read", "report.post"],
)
swarm = Swarm(
planner=planner,
specialists=[inbox, calendar, research, coder, reporter],
memory="./swarm-memory.db",
)One call kicks off the whole swarm. Hermes handles the planner → specialist fan-out, tool execution, memory writes, retries, and token accounting.
report = swarm.run(
goal="Handle my morning: triage inbox, clean up the calendar, "
"draft the weekly investor update, and ship me a stand-up.",
budget_usd=0.25,
max_steps=40,
)
print(report.text)
print("steps:", report.step_count, "cost:", report.cost_usd)Every agent tags its own calls with `x-aig-tag`, so when you open the AIgateway dashboard you see exactly which specialist burned which cents. Hard-cap any tag with a one-line POST if a specialist goes rogue.
curl -X POST https://api.aigateway.sh/v1/budgets \
-H "Authorization: Bearer sk-aig-..." \
-d '{ "tag": "swarm.research", "monthly_cap_cents": 2000 }'The planner opens the day with one Kimi K2.6 call. It reads your last 24h of mail, today's calendar, and the swarm's long-term memory, then produces a JSON plan — five briefs, one per specialist, ranked by impact.
The five specialists run in parallel. Hermes watches the token budget; when a specialist overspends, its run is capped and control returns to the planner.
The reporter fires last. It reads every specialist's memory writes and produces a one-page stand-up: shipped, blocked, decisions needed. That file is the first thing you read with your coffee.
One of the reasons you route through AIgateway instead of hitting Moonshot direct: on any call, you can upgrade to Claude Opus 4.7 or GPT-5.4 by changing one string. Same key, same schema.
Typical pattern: plan + triage on Kimi K2.6 (cheap, fast), escalate to Opus for the research memo when the stakes are high.
# Escalate just the research call — everything else stays on Kimi. research.model = "anthropic/claude-opus-4.7" swarm.run(goal="Write the investor memo. Be rigorous.")
Add your own specialists. Hermes agents are just classes — give the new one a role, a system prompt, a toolset, and register it with the swarm.
Replace the `./swarm-memory.db` handle with a Vectorize binding to share memory across devices, or plug in any MCP-compatible memory server.
When you're ready to put the swarm in front of customers, mint a sub-account per user with the `/v1/sub-accounts` API — each customer gets their own key, spend cap, and isolated analytics without you touching a billing table.
No. Your AIgateway key (sk-aig-…) is the only credential. AIgateway proxies to Moonshot under the hood — you don't install their SDK, manage a second invoice, or register a second account.
Yes — 100 requests/day on `moonshot/kimi-k2.6` through Apr 30, 2026. No card on file. After the trial, Kimi bills pass-through like every other model in the catalog. A five-agent swarm typically runs 20-40 requests per morning loop, so the free tier covers regular development use.
The Hermes loop and your tool servers run locally. Model inference is a network call — Kimi K2.6 runs on Moonshot's infrastructure via AIgateway. If you want fully offline, swap the model slug to a Workers AI edge model (`@cf/meta/llama-3.1-8b-instruct`) or host your own and point the OpenAI client at it.
Every swarm call carries an `x-aig-tag` header per specialist. POST a hard cap to `/v1/budgets` with that tag and AIgateway enforces it before dispatch — the specialist's calls start returning 402 when the cap trips, and the swarm routes around it.
Change one string. The swarm logic doesn't know which upstream model it's on. Swap every specialist to `gpt-5.4` or `claude-opus-4.7` — or A/B a few — without touching a single line of agent code.
Yes. `POST /v1/sub-accounts` mints a scoped `sk-aig-…` key per customer with its own spend cap, rate limit, and isolated analytics. Hand it to the customer or keep it server-side. The swarm code doesn't change.
By default each specialist has its own memory namespace. The planner is the only agent that reads across namespaces — that's how it hands off context between specialists without leaking irrelevant history into every prompt.
Either works. Hermes has a tighter Python surface and lands closer to the agent-framework end of the spectrum; OpenClaw is heavier but ships with 5,700+ prebuilt skills. The AIgateway side is identical — point either at `https://api.aigateway.sh/v1` with your key.