questions/Cheapest by capability

What is the cheapest long-context model?

google/gemini-3.1-pro at $2/M input (up to 200K tokens) and $12/M output is the cheapest model on AIgateway that will reliably use a full 1M-token context window. Beyond 200K tokens it's $4/$18. Cached reads drop to $0.20/M — feeding a whole codebase once and querying it repeatedly works out to cents per question.

How it works

How the pick was made

This page ranks models on AIgateway that actually expose the capability through a stable, OpenAI-compatible interface — not just "has it in theory". Pricing is compared on blended token cost (2:1 input:output) at list rate, using the published pass-through price; the 5% platform fee applies equally, so it's neutral to the ranking.

How to use the winner

Install the OpenAI SDK, set base_url to https://api.aigateway.sh/v1, and set the model string. No provider-specific client libraries, no separate billing setup — just a model swap. If the winning model ever gets undercut, AIgateway's eval+routing layer can automatically shift traffic to a cheaper-or-equal alternative.

Run an eval if cost matters on your workload

POST /v1/evals with the top 2-3 candidate models, a dataset of 20-50 production prompts, and a grader. AIgateway returns a quality score per model; if the cheapest one passes, pin your alias there. Re-run the eval monthly — frontier prices drop by 40-60% a year and the winner changes fast.

Code example

Python

# pip install aigateway-py openai
# aigateway-py: sub-accounts, evals, replays, jobs, webhook verify.
# openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)

r = client.chat.completions.create(
    model="google/gemini-3.1-pro",
    messages=[{"role": "user", "content": "Explain vector databases in two sentences."}],
)
print(r.choices[0].message.content)

Node / TypeScript

// npm i aigateway-js openai
// aigateway-js: sub-accounts, evals, replays, jobs, webhook verify.
// openai SDK: chat/embeddings/images/audio — drop-in compat.
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.aigateway.sh/v1",
  apiKey: process.env.AIGATEWAY_KEY,
});

const r = await client.chat.completions.create({
  model: "google/gemini-3.1-pro",
  messages: [{ role: "user", content: "Explain vector databases in two sentences." }],
});
console.log(r.choices[0].message.content);

FAQ

Does the cheapest option actually work in production?

Yes — the shortlisted winner on this page is the model our team and customers use in production today. Run a 20-prompt eval on your own workload to confirm; if it fails, the second-cheapest model is one model-string change away.

How often does the cheapest model change?

Frontier model prices drop 40-60% per year. Expect the ranking to change every 2-4 months as new models ship. Pin an alias in AIgateway's router and re-run your eval monthly to stay on the current winner.

Is this compatible with the OpenAI SDK?

Yes. Every model on AIgateway speaks OpenAI's request/response format, so picking a cheaper model is a one-line change in your application code.

Does AIgateway charge extra for cheaper models?

No. Pass-through pricing plus a flat 5% platform fee applied at credit top-up. The fee doesn't change by model; the per-token rate is what each provider publishes.

TRY IT NOW

One key, every model. Free tier, no card.

Get an AIgateway key Open the playground

What is the cheapest long-context model?

How it works

Code example

Related

FAQ

One key, every model. Free tier, no card.