questions/Cheapest by capability

What is the cheapest model for tool calling?

The cheapest model with reliable tool calling on AIgateway right now is google/gemini-3.1-flash-lite at $0.25/M input and $1.50/M output tokens — it supports parallel function calls, JSON mode, and streaming at a fraction of GPT-5.4-nano's cost. Runner-up: openai/gpt-5.4-nano at $0.20/$1.25. Both return OpenAI-shaped tool_calls.

How it works

How the pick was made

This page ranks models on AIgateway that actually expose the capability through a stable, OpenAI-compatible interface — not just "has it in theory". Pricing is compared on blended token cost (2:1 input:output) at list rate, using the published pass-through price; the 5% platform fee applies equally, so it's neutral to the ranking.

How to use the winner

Install the OpenAI SDK, set base_url to https://api.aigateway.sh/v1, and set the model string. No provider-specific client libraries, no separate billing setup — just a model swap. If the winning model ever gets undercut, AIgateway's eval+routing layer can automatically shift traffic to a cheaper-or-equal alternative.

Run an eval if cost matters on your workload

POST /v1/evals with the top 2-3 candidate models, a dataset of 20-50 production prompts, and a grader. AIgateway returns a quality score per model; if the cheapest one passes, pin your alias there. Re-run the eval monthly — frontier prices drop by 40-60% a year and the winner changes fast.

Code example

Python
# pip install aigateway-py openai
# aigateway-py: sub-accounts, evals, replays, jobs, webhook verify.
# openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)

r = client.chat.completions.create(
    model="google/gemini-3.1-flash-lite",
    messages=[{"role": "user", "content": "Explain vector databases in two sentences."}],
)
print(r.choices[0].message.content)
Node / TypeScript
// npm i aigateway-js openai
// aigateway-js: sub-accounts, evals, replays, jobs, webhook verify.
// openai SDK: chat/embeddings/images/audio — drop-in compat.
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.aigateway.sh/v1",
  apiKey: process.env.AIGATEWAY_KEY,
});

const r = await client.chat.completions.create({
  model: "google/gemini-3.1-flash-lite",
  messages: [{ role: "user", content: "Explain vector databases in two sentences." }],
});
console.log(r.choices[0].message.content);

Related

FAQ

Does the cheapest option actually work in production?

Yes — the shortlisted winner on this page is the model our team and customers use in production today. Run a 20-prompt eval on your own workload to confirm; if it fails, the second-cheapest model is one model-string change away.

How often does the cheapest model change?

Frontier model prices drop 40-60% per year. Expect the ranking to change every 2-4 months as new models ship. Pin an alias in AIgateway's router and re-run your eval monthly to stay on the current winner.

Is this compatible with the OpenAI SDK?

Yes. Every model on AIgateway speaks OpenAI's request/response format, so picking a cheaper model is a one-line change in your application code.

Does AIgateway charge extra for cheaper models?

No. Pass-through pricing plus a flat 5% platform fee applied at credit top-up. The fee doesn't change by model; the per-token rate is what each provider publishes.

TRY IT NOW

One key, every model. Free tier, no card.

Get an AIgateway keyOpen the playground