AUTO ROUTER · LIVE

One model name.
Every modality.
Always cheaper than the premium pick.

Set model:"auto" and the router reads each request, picks the cheapest model in a curated pool that still clears the quality floor, and never charges you more than the premium model you'd have called yourself. Text, image, video, speech, transcription, music, embeddings. One field. Every response tells you what ran and what you saved.

Get your key →Read the docs

● cheaper than premium, guaranteed● every generative modality● headers on every call

AUTOmodel: "auto/text"Simple support Q&A, ~400 input + 300 output tokens

Baseline (your pick)

Claude Opus 4.8

$0.00997 / call

Routed → simple

Claude Haiku 4.5

$0.00439 / call

You save 56% vs calling Claude Opus 4.8 directly

Response headers

X-Routing-Selectedanthropic/claude-haiku-4.5

X-Routing-Reasonauto simple -> anthropic/claude-haiku-4.5 (vs anthropic/claude-opus-4.8)

X-Routing-Complexitysimple

X-Routing-Quality0.780

X-Auto-Baseline-Modelanthropic/claude-opus-4.8

X-Auto-Baseline-Cost-Cents0.9975

X-Auto-Route-Fee-Cents0.2394

X-Auto-Savings-Cents0.5586

Why auto

Four reasons to send
one model name.

The auto router is multimodal, guaranteed cheaper than the model you'd have called, transparent on every response, and bounded by a real quality floor.

MULTIMODAL

Truly multimodal

The only auto router that spans generative modalities. Send model:"auto" for chat, image, video, text-to-speech, transcription, music, or embeddings and each request lands on the right model for that job. Or pin a lane with model:"auto/image". One name covers everything you build.

GUARANTEED CHEAPER

Cheaper than the premium model, guaranteed

Every request carries a baseline: the model you'd otherwise call. The router only routes down from it, and the baseline doubles as a hard cost ceiling. You pay strictly less than the premium price on routed calls, and never more. There is no scenario where auto costs you more than calling the expensive model directly.

TRANSPARENT

Transparent by default

Every routed response returns headers showing the model that ran, why it was picked, the complexity read, the quality score, your premium baseline, and the exact dollars saved. No black box. You can audit any single call or log the headers and chart your savings over time.

QUALITY FLOOR

A real quality floor

Auto only selects from curated, tiered pools per modality. Every member carries a quality prior refined by real benchmarks, and candidates below the floor are filtered out before routing. The router trades down on price, never down to garbage.

How it works

Read. Filter.
Pick. Disclose.

Every request runs the same four stages. The baseline you carry is both the quality reference and a hard cost ceiling, so the router only ever routes down.

Read

Complexity

Each request is read for difficulty: simple, moderate, or complex.

Filter

Quality floor + cost ceiling

Drop anything below the floor or above your baseline price.

Pick

Cheapest qualifying model

Choose the lowest-cost model that still clears the quality bar.

Disclose

Headers

Return the pick, the reason, your baseline, and the dollars saved.

premium

GPT-5.5 Pro0.98

Claude Opus 4.80.97

Gemini 3.1 Pro0.95

Grok 40.93

standard

Claude Sonnet 4.60.90

Kimi K2.7 Code0.85

Gemini 3 Flash0.84

Grok 4 Fast0.82

economy

Claude Haiku 4.50.78

GPT-OSS 120B0.74

Gemma 4 26B0.70

Llama 3.3 70B0.68

baseline Claude Opus 4.8 · floor 0.4556% saved

One router, every modality

Scope it to a lane,
or let it cover everything.

Use model:"auto" for everything, or pin a lane with model:"auto/<modality>". Each modality has its own curated, tiered pool.

Text−56%

model: "auto/text"

PremiumGPT-5.5 ProClaude Opus 4.8Gemini 3.1 Pro

StandardClaude Sonnet 4.6Kimi K2.7 CodeGemini 3 Flash

EconomyClaude Haiku 4.5GPT-OSS 120BGemma 4 26B

~~Claude Opus 4.8~~ → Claude Haiku 4.556% cheaper

Image−59%

model: "auto/image"

PremiumNano Banana ProGPT Image 2Recraft V4 Pro

StandardSeedream 4.5FLUX 2 MaxFLUX 2 Dev

EconomyFLUX 2 Klein 9BFLUX 1 Schnell

~~Recraft V4 Pro~~ → Imagen 459% cheaper

Video−56%

model: "auto/video"

PremiumVeo 3.1Seedance 2.0HappyHorse 1 T2V

StandardGen-4.5Veo 3.1 FastHailuo 2.3

EconomyPixVerse V6

~~Veo 3.1~~ → Veo 3.1 Fast56% cheaper

Speech−62%

model: "auto/tts"

PremiumMiniMax Speech 2.8 HDInworld TTS 2TTS-1 HD

StandardMiniMax Speech 2.8 TurboAura 2 EN

EconomyMeloTTS

~~MiniMax Speech 2.8 HD~~ → Aura 2 EN62% cheaper

Transcription−23%

model: "auto/stt"

PremiumGPT-4o TranscribeUniversal 3 ProNova 3

StandardWhisper Large v3 Turbo

EconomyWhisper

~~GPT-4o Transcribe~~ → Whisper23% cheaper

Musicroutes down

model: "auto/music"

PremiumElevenLabs MusicMiniMax Music 2.6

StandardMiniMax Music v2.6

EconomyACE-Step

~~ElevenLabs Music~~ → MiniMax Music v2.6routes down

Embeddingsconsistent

model: "auto/embedding"

PremiumBGE-M3

StandardBGE-Large EN v1.5EmbeddingGemma 300M

EconomyQwen3 Embedding 0.6B

~~BGE-M3~~ → BGE-Large EN v1.5consistent

Where it pays off

Built for the long tail
of trivial calls.

Most production traffic is easy. Auto pays economy prices on the easy calls and reserves frontier models for the genuinely hard ones — without you hand-tuning a model per call site.

Coding harness that steps down by difficulty

Wire your agent to model:"auto" with baseline_model set to your top coding model. Hard architecture prompts hold near Opus 4.8 or Sonnet 4.6; boilerplate, renames, and one-line fixes drop to Haiku 4.5. The harness keeps frontier quality where it matters and pays economy prices on the long tail of trivial edits.

High-volume support chat

Most support turns are factual lookups that a small model handles perfectly. Point your chat endpoint at model:"auto" and the bulk of traffic routes to economy text models while genuinely hard tickets hold at the premium baseline. On a simple Q&A this is a 56% saving per call, multiplied across every conversation.

Batch image generation: drafts vs hero

Use model:"auto/image" for exploration passes so concept drafts route to cheaper image models, then pin your hero render to the premium model by explicit id. A 100-image draft batch lands near $10.82 instead of $26.25 against a Recraft V4 Pro baseline, with the final render untouched.

Transcription pipelines

Run model:"auto/stt" across an ingest queue. Clean audio drops to Whisper while difficult, noisy, or multi-speaker files hold at the premium transcription model. The per-file saving is modest but compounds hard across thousands of hours a month.

Long-running agent loops

Agents fire hundreds of model calls per task, most of them trivial tool-routing or state-update steps. model:"auto" reads each step's complexity independently, so the cheap steps cost cents and only the genuinely hard reasoning turns touch a frontier model — without you hand-tuning a model per call site.

Voiceover and narration at scale

Send model:"auto/tts" for bulk narration. Long-form articles and notifications route to efficient voices like Aura 2 (a 62% saving on a 5k-character run versus an HD baseline), while you reserve the premium HD voice for the moments naturalness actually sells.

Embeddings for ingestion and RAG

Index pipelines embed millions of chunks. model:"auto/embedding" keeps you on the curated embedding pool so re-indexing stays cheap and consistent, with the same OpenAI-compatible response shape your vector store already expects.

Honest comparison

Other auto routers
are text-only black boxes.

Anonymized competitor labels — same public behavior, minus the trash talk. We'll let you Google who's who.

	AIgateway Auto	Other auto routers
Multimodal (image / video / speech / music / embeddings)	Yes — every generative modality	Auto A: text only Auto B: text only Auto C: text only
Optimizes for cost	Yes — routes down to the cheapest model that clears the quality floor	Auto A: opaque Auto B: optimizes for quality, not your bill Auto C: optimizes within a flat fee
Transparent (shows the pick + savings)	Yes — headers show selected model, reason, baseline, and dollars saved	Auto A: opaque, no per-call disclosure Auto B: limited Auto C: limited
Guaranteed cheaper than the premium model	Yes — baseline is a hard cost ceiling; you always pay less than the premium pick	Auto A: no guarantee Auto B: no guarantee Auto C: flat fee regardless of savings
Curated quality floor	Yes — tiered, eval-covered pools; below-floor models filtered out	Auto A: undisclosed Auto B: full open catalog Auto C: undisclosed

See your savings

Pick a modality.
Watch the bill drop.

Prices are computed from the live catalog. Every figure stays under your premium baseline — that's the guarantee, not a marketing line.

Baseline model$5 in / $25 out per Mtok

Routed pick$1 in / $5 out per Mtok

per call / mo1,000

Premium baseline

$9.97

Claude Opus 4.8

With Auto Router

$4.39

− $5.59 (56%)

Guaranteed under your baseline. You never pay more than calling Claude Opus 4.8 directly.

BREAKDOWN · 1,000 per call/mo

Premium baseline cost$9.97Routed model cost$1.99Platform fee (30% of savings)$2.39You save (you keep 70%)− $5.59You pay$4.39

FAQ

Questions before you
flip the model field.

How do I use the Auto Router?

Set model:"auto" on any request, or scope it to a modality with model:"auto/text" (also image, video, tts, stt, music, embedding). You can also omit the model field entirely. Everything else stays OpenAI-compatible — only the model value changes.

Will auto ever cost more than calling the model I'd have picked?

No. Every request carries a baseline (set it with baseline_model, or it defaults to the premium model for that modality). The router only ever selects models no more expensive than the baseline, so the baseline is a hard cost ceiling. On routed calls you pay strictly less than the premium price; in the worst case you pay exactly the baseline.

How do I see what the router picked?

Every routed response returns transparency headers: X-Routing-Selected (the model that ran), X-Routing-Reason, X-Routing-Complexity, X-Routing-Quality, X-Auto-Baseline-Model, X-Auto-Baseline-Cost-Cents, and X-Auto-Savings-Cents. On streaming responses the routing-decision headers arrive up front.

Can I bias the router toward speed or quality?

Yes. Send the x-routing header set to cost, speed, quality, or auto. Cost favors the cheapest model that clears the floor; quality favors the strongest model still at or under your baseline; auto balances both from the complexity read.

Which model does it actually choose from?

A curated, tiered pool per modality (premium / standard / economy), maintained by hand against public benchmark leaderboards. Every candidate carries a quality prior, refined by real eval scores, and anything below the modality's quality floor is filtered out before routing. Your explicit model ids always reach the full catalog — auto just keeps you inside the curated set.

What does it cost?

You pay less than the premium baseline on every routed call, guaranteed. The pricing mechanics are in the docs fine-print; the short version is you keep the majority of every dollar the router saves you versus the model you'd otherwise have called, and you never pay above that baseline.

Does it work for non-text modalities?

Yes — that's the point. Auto routes image, video, text-to-speech, transcription, music, and embeddings as well as text. It's the only auto router that spans generative modalities.

Full pricing mechanics live in the Auto Router docs.

One model name.Every modality.Always cheaper than the premium pick.

Four reasons to sendone model name.

Read. Filter.Pick. Disclose.

Scope it to a lane,or let it cover everything.

Built for the long tailof trivial calls.

Other auto routersare text-only black boxes.

Pick a modality.Watch the bill drop.

Questions before youflip the model field.

Change one string.Pay less on every call.