Cost calculator · 2026

LLM Cost Calculator.

Pick a handful of candidate models, set your monthly workload, and see what each one will actually cost. At typical production workloads, the three cheapest frontier-class models are Llama-4-Scout-17b-16e-Instruct, GPT-5.4 Nano, Gemini 3.1 Flash Lite; the three fastest text-generation picks are GPT-5.4 Nano, Gemini 3.1 Flash Lite, Claude Haiku 4.5. Numbers below are live catalog prices; a 5% platform fee is added to each API call.

Monthly requests

100K req / mo

Avg input tokens

2K tok / req

Avg output tokens

500 tok / req

Cache hit rate10%

Candidate models

Monthly spend · side-by-side

4 models selected

Model	Provider	Input	Output	Cache save	Per req	Total / mo
Gemini 3 Flash google/gemini-3-flash· cheapest	Google	$95.00	$142.50	− $12.50	$0.0024	$237.50
GPT-5.4 Mini openai/gpt-5.4-mini	OpenAI	$142.50	$213.75	− $18.75	$0.0036	$356.25
Kimi-K2.7-Code moonshot/kimi-k2.7-code	Moonshot	$180.50	$190.00	− $19.50	$0.0037	$370.50
Claude Sonnet 4.6 anthropic/claude-sonnet-4.6· priciest	Anthropic	$570.00	$712.50	− $67.50	$0.013	$1.3K

How the math works

What we assume.

Formula. For each model: cost = (requests × (1 − 0.5 × cache_rate) × (in_tokens × in_price + out_tokens × out_price)) / 1,000,000. Tokens are counted once per request; output price applies to the full output, input price applies to the full input.

Cache hit math. Cached requests get a flat 50% discount on AIgateway — they're served from L1/L2 cache without hitting the upstream provider, and we pass half of that win back to you. The calculator multiplies the full-cost projection by 1 − 0.5 × cache_rate. We don't yet model partial cache hits (prompt-prefix caching), which typically save another 10–20% on top.

Platform fee. AIgateway adds a flat 5% to the provider cost on every API call — our only margin. The totals below are raw provider pass-through cost; multiply by 1.05 for what actually leaves your balance. A separate 5% covers card processing when you top up.

What's not included. Vision input tokens (billed separately on some models), image generation (priced per image), video (per second), audio (per minute), and embeddings (per 1M tokens input only). The calculator targets chat / completion workloads. For multi-modal pricing see the pricing page or a specific model.

Which should I pick? For maximum quality: Claude Opus 4.7 or GPT-5.4. Best balance: Claude Sonnet 4.6 or Gemini 3.1 Pro. Cost-obsessed: Gemini 3.1 Flash-Lite or GPT-5.4-nano. Open-weight flagship: Kimi K2.7 Code.