examples/agents + swarms
Flagship · 15 min read

Build a deep-research agent in a weekend — 200 lines, pennies per run

A planner fans out parallel web searches, an extractor pulls citations, a contradiction check catches hallucinations, and a reporter writes a 1,000-word memo with sources — end-to-end on Kimi K2.6's free tier, swap any step to Opus when the stakes are high.

15 min readpublished 2026-04-25category · Agents + swarms
Planner fans out to four parallel searchers and merges into a cited 1,000-word memo

Perplexity taught the market to expect one thing from research agents: ask a hard question, wait 30 seconds, get a cited memo. The mechanic underneath is not complicated — plan, search in parallel, extract, check for contradictions, write.

This example rebuilds exactly that in 200 lines of Python. Kimi K2.6's 256K context means the agent reads every source it fetches without a vector store. The free tier runs the whole pipeline for pennies through Apr 30, 2026. Swap any step to Opus 4.7 or GPT-5.4 when the stakes rise — the surrounding code does not change.

AIgateway keyKimi K2.6 (free)asyncio fan-outA web-search toolMarkdown output
Note
Why do this over Perplexity's API? Ownership. The pipeline runs on your key, hits your corpus (not just the open web), writes memos in your voice, and costs pennies because the planner decides when to stop searching.

Build it in five steps

  1. STEP 01

    Plan

    One Kimi call turns the user's question into a ranked list of sub-queries. The planner decides fan-out width based on how specific the question is — narrow questions get 3 queries, broad ones get up to 8.

    from openai import AsyncOpenAI
    import asyncio, json
    
    client = AsyncOpenAI(base_url="https://api.aigateway.sh/v1", api_key="sk-aig-...")
    MODEL = "moonshot/kimi-k2.6"
    
    PLAN = """You are a research planner. Given a user question, produce 3-8 web search
    queries that together would answer it. Reply JSON: {"queries": ["...", ...]}."""
    
    async def plan(question: str) -> list[str]:
        r = await client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "system", "content": PLAN},
                      {"role": "user", "content": question}],
            response_format={"type": "json_object"},
            extra_headers={"x-aig-tag": "research.plan"},
        )
        return json.loads(r.choices[0].message.content)["queries"]
  2. STEP 02

    Search in parallel

    Every sub-query fires concurrently through a web search tool (swap for Tavily / Serper / your own crawler). The gateway's async dispatch means wall time is the slowest single search, not the sum.

    import httpx
    
    async def search_one(q: str) -> list[dict]:
        # Replace with your search provider of choice.
        async with httpx.AsyncClient() as http:
            r = await http.get("https://api.tavily.com/search",
                               params={"q": q, "max_results": 5},
                               headers={"Authorization": "Bearer TAVILY_KEY"})
            return r.json()["results"]
    
    async def fan_out_search(queries: list[str]) -> list[dict]:
        results = await asyncio.gather(*(search_one(q) for q in queries))
        return [{"q": q, "hits": hits} for q, hits in zip(queries, results)]
  3. STEP 03

    Read everything (256K context)

    Instead of chunking and embedding, Kimi K2.6's 256K context eats every fetched page whole. The extractor pulls 3-5 key claims from each source with inline citation markers.

    EXTRACT = """You are a research analyst. Given the sources below, extract 3-5 key
    claims with inline citations [1], [2], .... Reply JSON: {"claims": ["...", ...], "citations": [{"n": 1, "url": "...", "title": "..."}, ...]}."""
    
    async def extract(question: str, batch: list[dict]) -> dict:
        sources = "\n\n".join(f"[{i+1}] {h['title']} — {h['url']}\n{h['content']}"
                               for i, h in enumerate(sum((b['hits'] for b in batch), [])))
        r = await client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "system", "content": EXTRACT},
                      {"role": "user", "content": f"QUESTION: {question}\nSOURCES:\n{sources}"}],
            response_format={"type": "json_object"},
            extra_headers={"x-aig-tag": "research.extract"},
        )
        return json.loads(r.choices[0].message.content)
  4. STEP 04

    Contradiction check

    A second Kimi pass reads the extracted claims and flags pairs that disagree with each other. Rows where two sources contradict get human-readable notes in the memo — "two sources disagree on this point."

    CHECK = """You are a fact checker. Given the claims below, return JSON {"contradictions":
    [{"a": <int>, "b": <int>, "issue": "<15 words>"}]} — pairs where claim a and b disagree."""
    
    async def check(claims: list[str]) -> list[dict]:
        r = await client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "system", "content": CHECK},
                      {"role": "user", "content": json.dumps({"claims": claims})}],
            response_format={"type": "json_object"},
            extra_headers={"x-aig-tag": "research.check"},
        )
        return json.loads(r.choices[0].message.content)["contradictions"]
  5. STEP 05

    Write the memo

    A final Kimi call takes the claims, citations, and contradictions and writes a 1,000-word markdown memo in your voice. Total cost for a real run: about $0.04. Total wall time: 12-20 seconds.

    WRITE = """You are a senior analyst. Write a 1,000-word memo that answers the user's
    question using only the claims/citations provided. Call out contradictions explicitly.
    End with a Sources list."""
    
    async def write(question: str, claims: list[str], citations: list[dict], contras: list[dict]) -> str:
        payload = json.dumps({"question": question, "claims": claims,
                              "citations": citations, "contradictions": contras})
        r = await client.chat.completions.create(
            model=MODEL,
            messages=[{"role": "system", "content": WRITE},
                      {"role": "user", "content": payload}],
            extra_headers={"x-aig-tag": "research.memo"},
        )
        return r.choices[0].message.content
    
    async def research(question: str) -> str:
        qs = await plan(question)
        batches = await fan_out_search(qs)
        facts = await extract(question, batches)
        contras = await check(facts["claims"])
        return await write(question, facts["claims"], facts["citations"], contras)
    
    if __name__ == "__main__":
        print(asyncio.run(research("What changed in global GPU supply in Q1 2026?")))

When to swap Kimi for Opus or GPT-5.4

Plan, search, extract, and check run great on Kimi — the workload is structured, the traces are short, the cost is negligible. The only step worth upgrading is the memo writer when quality bar is high: a single Opus 4.7 call for the writing step roughly triples the cost of the run and measurably raises the craft of the prose.

Change one string. `MODEL` stays Kimi for the first four stages; `write()` uses `anthropic/claude-opus-4.7`. Run both side by side with the eval example in this library to decide if the lift is worth it on your questions.

Ground it in your own corpus

The same pipeline works on a private corpus — swap the web search for a file-index lookup. Kimi K2.6's 256K context holds most company wikis whole, so you don't need a vector store for corpora under a few million tokens.

That's the killer combo: a research agent that cites your internal docs by default and falls back to the open web only when the docs don't answer. All on one key, all metered with `x-aig-tag` so you can track how much of the bill is internal vs external research.

# Replace search_one to hit your corpus first, then the web.
async def search_one(q: str) -> list[dict]:
    hits = await internal_corpus.search(q, k=5)
    if len(hits) < 3:
        hits += await web_search(q)
    return hits

FAQ

Why not use a vector store?+

You can, and we have an example that does. But for corpora under roughly 2M tokens, Kimi K2.6's 256K context window means you can read the sources whole — no chunking, no embeddings, no retrieval tuning. Smaller pipeline, fewer failure modes, better citation fidelity.

How much does one run cost?+

A 6-query research run with 4 sources per query averages ~$0.04 when everything runs on Kimi K2.6 — and the Kimi part is free on AIgateway through Apr 30, 2026. Upgrade the final memo to Opus 4.7 and the run lands around $0.12.

What web search provider should I use?+

Tavily and Serper both have generous free tiers and reliable snippet quality. For a fully open-source pipeline, stand up your own SearXNG instance. The example swaps the provider in one function.

How do I add structured output (JSON)?+

Every stage already uses `response_format={'type': 'json_object'}` for plan/extract/check. For the memo, switch to a JSON schema when the downstream consumer is another tool — Kimi and Opus both honor the `json_schema` response-format variant.

Can I stream the memo to the UI?+

Yes — set `stream=True` on the `write()` call. The first four stages are short enough that streaming them is noise; streaming only the memo feels like Perplexity in practice.

Can I cache repeated research?+

Yes. Identical plan/extract inputs hit the exact-match cache automatically. For semantic similarity — "what changed in GPU supply" vs "Q1 GPU market update" — turn on semantic caching; it saves 20-40% on repeat questions for most teams.

Is this enough for production?+

For internal research, yes. For customer-facing research (legal, medical, financial), layer in a guardrail pass and an audit log of every source the memo touches — the `x-aig-tag` header is the anchor point; pair it with AIgateway's replay primitive (Enterprise) to reproduce any memo byte-for-byte.

Can I run this offline?+

The orchestration runs locally; the model calls are network. Swap the model slug to a Workers-AI-edge model (`@cf/meta/llama-3.1-8b-instruct`) if you need fully-offline inference, but expect a measurable drop in memo quality.

READY TO BUILD?
Get an AIgateway key in 30 seconds. Free Kimi K2.6 through Apr 30, 2026; everything else is pass-through.
Get your key →API referenceKimi K2.6 details

More examples