questions/Cheapest by capability

What is the cheapest model for tool calling?

The cheapest model with reliable tool calling on AIgateway right now is google/gemini-3.1-flash-lite at $0.25/M input and $1.50/M output tokens — it supports parallel function calls, JSON mode, and streaming at a fraction of GPT-5.4-nano's cost. Runner-up: openai/gpt-5.4-nano at $0.20/$1.25. Both return OpenAI-shaped tool_calls.

How it works

How the pick was made

This page ranks models on AIgateway that actually expose the capability through a stable, OpenAI-compatible interface — not just "has it in theory". Pricing is compared on blended token cost (2:1 input:output) at list rate, using the published pass-through price; the 5% platform fee applies equally, so it's neutral to the ranking.

How to use the winner

Install the OpenAI SDK, set base_url to https://api.aigateway.sh/v1, and set the model string. No provider-specific client libraries, no separate billing setup — just a model swap. If the winning model ever gets undercut, AIgateway's eval+routing layer can automatically shift traffic to a cheaper-or-equal alternative.

Run an eval if cost matters on your workload

POST /v1/evals with the top 2-3 candidate models, a dataset of 20-50 production prompts, and a grader. AIgateway returns a quality score per model; if the cheapest one passes, pin your alias there. Re-run the eval monthly — frontier prices drop by 40-60% a year and the winner changes fast.

Code example

Python

# pip install aigateway-py openai
# aigateway-py: sub-accounts, evals, replays, jobs, webhook verify.
# openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)

r = client.chat.completions.create(
    model="google/gemini-3.1-flash-lite",
    messages=[{"role": "user", "content": "Explain vector databases in two sentences."}],
)
print(r.choices[0].message.content)

Node / TypeScript

// npm i aigateway-js openai
// aigateway-js: sub-accounts, evals, replays, jobs, webhook verify.
// openai SDK: chat/embeddings/images/audio — drop-in compat.
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.aigateway.sh/v1",
  apiKey: process.env.AIGATEWAY_KEY,
});

const r = await client.chat.completions.create({
  model: "google/gemini-3.1-flash-lite",
  messages: [{ role: "user", content: "Explain vector databases in two sentences." }],
});
console.log(r.choices[0].message.content);

FAQ

Does the cheapest option actually work in production?

Yes — the shortlisted winner on this page is the model our team and customers use in production today. Run a 20-prompt eval on your own workload to confirm; if it fails, the second-cheapest model is one model-string change away.

How often does the cheapest model change?

Frontier model prices drop 40-60% per year. Expect the ranking to change every 2-4 months as new models ship. Pin an alias in AIgateway's router and re-run your eval monthly to stay on the current winner.

Is this compatible with the OpenAI SDK?

Yes. Every model on AIgateway speaks OpenAI's request/response format, so picking a cheaper model is a one-line change in your application code.

Does AIgateway charge extra for cheaper models?

No. Pass-through pricing plus a flat 5% platform fee applied at credit top-up. The fee doesn't change by model; the per-token rate is what each provider publishes.

TRY IT NOW

One key, every model. Free tier, no card.

Get an AIgateway key Open the playground

What is the cheapest model for tool calling?

How it works

Code example

Related

FAQ

One key, every model. Free tier, no card.