AUTO ROUTER · LIVE

One model name.
Every modality.
Always cheaper than the premium pick.

Set model:"auto" and the router reads each request, picks the cheapest model in a curated pool that still clears the quality floor, and never charges you more than the premium model you'd have called yourself. Text, image, video, speech, transcription, music, embeddings. One field. Every response tells you what ran and what you saved.

Get your key Read the docs
cheaper than premium, guaranteed every generative modality headers on every call
AUTOmodel: "auto/text"Simple support Q&A, ~400 input + 300 output tokens
Baseline (your pick)
Claude Opus 4.8
$0.00997 / call
Routed → simple
Claude Haiku 4.5
$0.00439 / call
You save 56% vs calling Claude Opus 4.8 directly
Response headers
X-Routing-Selectedanthropic/claude-haiku-4.5
X-Routing-Reasonauto simple -> anthropic/claude-haiku-4.5 (vs anthropic/claude-opus-4.8)
X-Routing-Complexitysimple
X-Routing-Quality0.780
X-Auto-Baseline-Modelanthropic/claude-opus-4.8
X-Auto-Baseline-Cost-Cents0.9975
X-Auto-Route-Fee-Cents0.2394
X-Auto-Savings-Cents0.5586
Why auto

Four reasons to send
one model name.

The auto router is multimodal, guaranteed cheaper than the model you'd have called, transparent on every response, and bounded by a real quality floor.

MULTIMODAL
Truly multimodal
The only auto router that spans generative modalities. Send model:"auto" for chat, image, video, text-to-speech, transcription, music, or embeddings and each request lands on the right model for that job. Or pin a lane with model:"auto/image". One name covers everything you build.
GUARANTEED CHEAPER
Cheaper than the premium model, guaranteed
Every request carries a baseline: the model you'd otherwise call. The router only routes down from it, and the baseline doubles as a hard cost ceiling. You pay strictly less than the premium price on routed calls, and never more. There is no scenario where auto costs you more than calling the expensive model directly.
TRANSPARENT
Transparent by default
Every routed response returns headers showing the model that ran, why it was picked, the complexity read, the quality score, your premium baseline, and the exact dollars saved. No black box. You can audit any single call or log the headers and chart your savings over time.
QUALITY FLOOR
A real quality floor
Auto only selects from curated, tiered pools per modality. Every member carries a quality prior refined by real benchmarks, and candidates below the floor are filtered out before routing. The router trades down on price, never down to garbage.
How it works

Read. Filter.
Pick. Disclose.

Every request runs the same four stages. The baseline you carry is both the quality reference and a hard cost ceiling, so the router only ever routes down.

1
Read
Complexity
Each request is read for difficulty: simple, moderate, or complex.
2
Filter
Quality floor + cost ceiling
Drop anything below the floor or above your baseline price.
3
Pick
Cheapest qualifying model
Choose the lowest-cost model that still clears the quality bar.
4
Disclose
Headers
Return the pick, the reason, your baseline, and the dollars saved.
premium
GPT-5.5 Pro0.98
Claude Opus 4.80.97
Gemini 3.1 Pro0.95
Grok 40.93
standard
Claude Sonnet 4.60.90
Kimi K2.60.85
Gemini 3 Flash0.84
Grok 4 Fast0.82
economy
Claude Haiku 4.50.78
GPT-OSS 120B0.74
Gemma 4 26B0.70
Llama 3.3 70B0.68
baseline Claude Opus 4.8 · floor 0.4556% saved
One router, every modality

Scope it to a lane,
or let it cover everything.

Use model:"auto" for everything, or pin a lane with model:"auto/<modality>". Each modality has its own curated, tiered pool.

Text−56%
model: "auto/text"
PremiumGPT-5.5 ProClaude Opus 4.8Gemini 3.1 Pro
StandardClaude Sonnet 4.6Kimi K2.6Gemini 3 Flash
EconomyClaude Haiku 4.5GPT-OSS 120BGemma 4 26B
Claude Opus 4.8Claude Haiku 4.556% cheaper
Image−59%
model: "auto/image"
PremiumNano Banana ProGPT Image 2Recraft V4 Pro
StandardSeedream 4.5FLUX 2 MaxFLUX 2 Dev
EconomyFLUX 2 Klein 9BFLUX 1 Schnell
Recraft V4 ProImagen 459% cheaper
Video−56%
model: "auto/video"
PremiumVeo 3.1Seedance 2.0HappyHorse 1 T2V
StandardGen-4.5Veo 3.1 FastHailuo 2.3
EconomyPixVerse V6
Veo 3.1Veo 3.1 Fast56% cheaper
Speech−62%
model: "auto/tts"
PremiumMiniMax Speech 2.8 HDInworld TTS 2TTS-1 HD
StandardMiniMax Speech 2.8 TurboAura 2 EN
EconomyMeloTTS
MiniMax Speech 2.8 HDAura 2 EN62% cheaper
Transcription−23%
model: "auto/stt"
PremiumGPT-4o TranscribeUniversal 3 ProNova 3
StandardWhisper Large v3 Turbo
EconomyWhisper
GPT-4o TranscribeWhisper23% cheaper
Musicroutes down
model: "auto/music"
PremiumElevenLabs MusicMiniMax Music 2.6
StandardMiniMax Music v2.6
EconomyACE-Step
ElevenLabs MusicMiniMax Music v2.6routes down
Embeddingsconsistent
model: "auto/embedding"
PremiumBGE-M3
StandardBGE-Large EN v1.5EmbeddingGemma 300M
EconomyQwen3 Embedding 0.6B
BGE-M3BGE-Large EN v1.5consistent
Where it pays off

Built for the long tail
of trivial calls.

Most production traffic is easy. Auto pays economy prices on the easy calls and reserves frontier models for the genuinely hard ones — without you hand-tuning a model per call site.

Coding harness that steps down by difficulty
Wire your agent to model:"auto" with baseline_model set to your top coding model. Hard architecture prompts hold near Opus 4.8 or Sonnet 4.6; boilerplate, renames, and one-line fixes drop to Haiku 4.5. The harness keeps frontier quality where it matters and pays economy prices on the long tail of trivial edits.
High-volume support chat
Most support turns are factual lookups that a small model handles perfectly. Point your chat endpoint at model:"auto" and the bulk of traffic routes to economy text models while genuinely hard tickets hold at the premium baseline. On a simple Q&A this is a 56% saving per call, multiplied across every conversation.
Batch image generation: drafts vs hero
Use model:"auto/image" for exploration passes so concept drafts route to cheaper image models, then pin your hero render to the premium model by explicit id. A 100-image draft batch lands near $10.82 instead of $26.25 against a Recraft V4 Pro baseline, with the final render untouched.
Transcription pipelines
Run model:"auto/stt" across an ingest queue. Clean audio drops to Whisper while difficult, noisy, or multi-speaker files hold at the premium transcription model. The per-file saving is modest but compounds hard across thousands of hours a month.
Long-running agent loops
Agents fire hundreds of model calls per task, most of them trivial tool-routing or state-update steps. model:"auto" reads each step's complexity independently, so the cheap steps cost cents and only the genuinely hard reasoning turns touch a frontier model — without you hand-tuning a model per call site.
Voiceover and narration at scale
Send model:"auto/tts" for bulk narration. Long-form articles and notifications route to efficient voices like Aura 2 (a 62% saving on a 5k-character run versus an HD baseline), while you reserve the premium HD voice for the moments naturalness actually sells.
Embeddings for ingestion and RAG
Index pipelines embed millions of chunks. model:"auto/embedding" keeps you on the curated embedding pool so re-indexing stays cheap and consistent, with the same OpenAI-compatible response shape your vector store already expects.
Honest comparison

Other auto routers
are text-only black boxes.

Anonymized competitor labels — same public behavior, minus the trash talk. We'll let you Google who's who.

 AIgateway AutoOther auto routers
Multimodal (image / video / speech / music / embeddings)Yes — every generative modality
Auto A: text only
Auto B: text only
Auto C: text only
Optimizes for costYes — routes down to the cheapest model that clears the quality floor
Auto A: opaque
Auto B: optimizes for quality, not your bill
Auto C: optimizes within a flat fee
Transparent (shows the pick + savings)Yes — headers show selected model, reason, baseline, and dollars saved
Auto A: opaque, no per-call disclosure
Auto B: limited
Auto C: limited
Guaranteed cheaper than the premium modelYes — baseline is a hard cost ceiling; you always pay less than the premium pick
Auto A: no guarantee
Auto B: no guarantee
Auto C: flat fee regardless of savings
Curated quality floorYes — tiered, eval-covered pools; below-floor models filtered out
Auto A: undisclosed
Auto B: full open catalog
Auto C: undisclosed
See your savings

Pick a modality.
Watch the bill drop.

Prices are computed from the live catalog. Every figure stays under your premium baseline — that's the guarantee, not a marketing line.

$5 in / $25 out per Mtok
$1 in / $5 out per Mtok
1,000
$9.97
Claude Opus 4.8
$4.39
$5.59 (56%)
Guaranteed under your baseline. You never pay more than calling Claude Opus 4.8 directly.
BREAKDOWN · 1,000 per call/mo
Premium baseline cost$9.97Routed model cost$1.99Platform fee (30% of savings)$2.39You save (you keep 70%)$5.59You pay$4.39
FAQ

Questions before you
flip the model field.

How do I use the Auto Router?

Set model:"auto" on any request, or scope it to a modality with model:"auto/text" (also image, video, tts, stt, music, embedding). You can also omit the model field entirely. Everything else stays OpenAI-compatible — only the model value changes.

Will auto ever cost more than calling the model I'd have picked?

No. Every request carries a baseline (set it with baseline_model, or it defaults to the premium model for that modality). The router only ever selects models no more expensive than the baseline, so the baseline is a hard cost ceiling. On routed calls you pay strictly less than the premium price; in the worst case you pay exactly the baseline.

How do I see what the router picked?

Every routed response returns transparency headers: X-Routing-Selected (the model that ran), X-Routing-Reason, X-Routing-Complexity, X-Routing-Quality, X-Auto-Baseline-Model, X-Auto-Baseline-Cost-Cents, and X-Auto-Savings-Cents. On streaming responses the routing-decision headers arrive up front.

Can I bias the router toward speed or quality?

Yes. Send the x-routing header set to cost, speed, quality, or auto. Cost favors the cheapest model that clears the floor; quality favors the strongest model still at or under your baseline; auto balances both from the complexity read.

Which model does it actually choose from?

A curated, tiered pool per modality (premium / standard / economy), maintained by hand against public benchmark leaderboards. Every candidate carries a quality prior, refined by real eval scores, and anything below the modality's quality floor is filtered out before routing. Your explicit model ids always reach the full catalog — auto just keeps you inside the curated set.

What does it cost?

You pay less than the premium baseline on every routed call, guaranteed. The pricing mechanics are in the docs fine-print; the short version is you keep the majority of every dollar the router saves you versus the model you'd otherwise have called, and you never pay above that baseline.

Does it work for non-text modalities?

Yes — that's the point. Auto routes image, video, text-to-speech, transcription, music, and embeddings as well as text. It's the only auto router that spans generative modalities.

Full pricing mechanics live in the Auto Router docs.

Live now

Change one string.
Pay less on every call.

Set model:"auto" on your next request. The router does the rest, the headers prove it, and you never pay above the premium model you'd have called.

Get your key Read the docsTry it in the playground