Pick a handful of candidate models, set your monthly workload, and see what each one will actually cost. At typical production workloads, the three cheapest frontier-class models are Llama-4-Scout-17b-16e-Instruct, GPT-5.4 Nano, Gemini 3.1 Flash Lite; the three fastest text-generation picks are GPT-5.4 Nano, Gemini 3.1 Flash Lite, Claude Haiku 4.5. Numbers below are live catalog prices; a 5% platform fee is applied at top-up, not per request.
| Model | Provider | Input | Output | Cache save | Per req | Total / mo | Visual |
|---|---|---|---|---|---|---|---|
| Gemini 3 Flash google/gemini-3-flash· cheapest | $90.00 | $135.00 | − $25.00 | $0.0022 | $225.00 | ||
| GPT-5.4 Mini openai/gpt-5.4-mini | OpenAI | $135.00 | $202.50 | − $37.50 | $0.0034 | $337.50 | |
| Kimi K2.6 moonshot/kimi-k2.6 | Moonshot | $171.00 | $180.00 | − $39.00 | $0.0035 | $351.00 | |
| Claude Sonnet 4.6 anthropic/claude-sonnet-4.6· priciest | Anthropic | $540.00 | $675.00 | − $135.00 | $0.012 | $1.2K |
Formula. For each model: cost = (requests × (1 − cache_rate) × (in_tokens × in_price + out_tokens × out_price)) / 1,000,000. Tokens are counted once per request; output price applies to the full output, input price applies to the full input.
Cache hit math. A cached request is served from L1/L2 cache without hitting the underlying provider, so both input and output tokens are skipped. The calculator multiplies the full-cost projection by 1 − cache_rate. We don't yet model partial cache hits (prompt-prefix caching), which typically save another 10–20% on top.
Platform fee. AIgateway takes a flat 5% at top-up — not per request. The totals below are raw provider pass-through cost. Multiply by 1.05 for the amount you pre-pay as credits.
What's not included. Vision input tokens (billed separately on some models), image generation (priced per image), video (per second), audio (per minute), and embeddings (per 1M tokens input only). The calculator targets chat / completion workloads. For multi-modal pricing see the pricing page or a specific model.
Which should I pick? For maximum quality: Claude Opus 4.7 or GPT-5.4. Best balance: Claude Sonnet 4.6 or Gemini 3.1 Pro. Cost-obsessed: Gemini 3.1 Flash-Lite or GPT-5.4-nano. Open-weight flagship: Kimi K2.6.