The auto router is multimodal, guaranteed cheaper than the model you'd have called, transparent on every response, and bounded by a real quality floor.
Every request runs the same four stages. The baseline you carry is both the quality reference and a hard cost ceiling, so the router only ever routes down.
Use model:"auto" for everything, or pin a lane with model:"auto/<modality>". Each modality has its own curated, tiered pool.
model: "auto/text"model: "auto/image"model: "auto/video"model: "auto/tts"model: "auto/stt"model: "auto/music"model: "auto/embedding"Most production traffic is easy. Auto pays economy prices on the easy calls and reserves frontier models for the genuinely hard ones — without you hand-tuning a model per call site.
Anonymized competitor labels — same public behavior, minus the trash talk. We'll let you Google who's who.
| AIgateway Auto | Other auto routers | |
|---|---|---|
| Multimodal (image / video / speech / music / embeddings) | Yes — every generative modality | Auto A: text only Auto B: text only Auto C: text only |
| Optimizes for cost | Yes — routes down to the cheapest model that clears the quality floor | Auto A: opaque Auto B: optimizes for quality, not your bill Auto C: optimizes within a flat fee |
| Transparent (shows the pick + savings) | Yes — headers show selected model, reason, baseline, and dollars saved | Auto A: opaque, no per-call disclosure Auto B: limited Auto C: limited |
| Guaranteed cheaper than the premium model | Yes — baseline is a hard cost ceiling; you always pay less than the premium pick | Auto A: no guarantee Auto B: no guarantee Auto C: flat fee regardless of savings |
| Curated quality floor | Yes — tiered, eval-covered pools; below-floor models filtered out | Auto A: undisclosed Auto B: full open catalog Auto C: undisclosed |
Prices are computed from the live catalog. Every figure stays under your premium baseline — that's the guarantee, not a marketing line.
Set model:"auto" on any request, or scope it to a modality with model:"auto/text" (also image, video, tts, stt, music, embedding). You can also omit the model field entirely. Everything else stays OpenAI-compatible — only the model value changes.
No. Every request carries a baseline (set it with baseline_model, or it defaults to the premium model for that modality). The router only ever selects models no more expensive than the baseline, so the baseline is a hard cost ceiling. On routed calls you pay strictly less than the premium price; in the worst case you pay exactly the baseline.
Every routed response returns transparency headers: X-Routing-Selected (the model that ran), X-Routing-Reason, X-Routing-Complexity, X-Routing-Quality, X-Auto-Baseline-Model, X-Auto-Baseline-Cost-Cents, and X-Auto-Savings-Cents. On streaming responses the routing-decision headers arrive up front.
Yes. Send the x-routing header set to cost, speed, quality, or auto. Cost favors the cheapest model that clears the floor; quality favors the strongest model still at or under your baseline; auto balances both from the complexity read.
A curated, tiered pool per modality (premium / standard / economy), maintained by hand against public benchmark leaderboards. Every candidate carries a quality prior, refined by real eval scores, and anything below the modality's quality floor is filtered out before routing. Your explicit model ids always reach the full catalog — auto just keeps you inside the curated set.
You pay less than the premium baseline on every routed call, guaranteed. The pricing mechanics are in the docs fine-print; the short version is you keep the majority of every dollar the router saves you versus the model you'd otherwise have called, and you never pay above that baseline.
Yes — that's the point. Auto routes image, video, text-to-speech, transcription, music, and embeddings as well as text. It's the only auto router that spans generative modalities.
Full pricing mechanics live in the Auto Router docs.
Set model:"auto" on your next request. The router does the rest, the headers prove it, and you never pay above the premium model you'd have called.