Every model you can call through AIgateway, ranked by the benchmarks engineers actually care about: SWE-bench for agentic coding, MMLU + GPQA for reasoning, HumanEval for code generation — plus input and output price per 1M tokens and context window size. Numbers are sourced from each provider's reported eval results; blank cells mean the lab has not published that benchmark. Sort any column. The same data is available as JSON at /api/leaderboard.
964 models across 76 providers. Every row is a click away from a full model page with pricing, quickstart code, and a live playground.
| Model | Provider | SWE-bench | MMLU | GPQA | HumanEval | Input $/M | Output $/M | Context |
|---|---|---|---|---|---|---|---|---|
| Claude Opus 4.8 anthropic/claude-opus-4.8 | Anthropic | — | — | — | — | $5.00 | $25.00 | 1M |
| MiniMax M3 minimax/m3 | MiniMax | — | — | — | — | $0.30 | $1.20 | 1M |
| Gemini 3.1 Pro google/gemini-3.1-pro | — | 89.6 | — | — | $2.00 | $12.00 | 1M | |
| Claude Sonnet 4.6 anthropic/claude-sonnet-4.6 | Anthropic | 62.1 | 87.2 | — | 93.4 | $3.00 | $15.00 | 200K |
| GPT-5.5 openai/gpt-5.5 | OpenAI | — | — | — | — | $5.00 | $30.00 | 1M |
| O3 openai/o3 | OpenAI | — | — | — | — | $2.00 | $8.00 | 200K |
| Gemini 2.5 Pro google/gemini-2.5-pro | — | — | — | — | $1.25 | $10.00 | 1M | |
| Kimi-K2.6 moonshot/kimi-k2.6 | Moonshot | 68.2 | — | — | 92.7 | $0.95 | $4.00 | 262K |
| Qwen 3.5 397B A17B alibaba/qwen3.5-397b-a17b | Alibaba | — | — | — | — | $0.60 | $3.60 | 262K |
| Grok 4.20 Multi-Agent xai/grok-4.20-multi-agent-0309 | xAI | — | — | — | — | $2.00 | $6.00 | 2M |
| GPT-5.4 Mini openai/gpt-5.4-mini | OpenAI | — | 84.5 | — | 88.6 | $0.75 | $4.50 | 128K |
| GPT-5.5 Pro openai/gpt-5.5-pro | OpenAI | — | — | — | — | $30.00 | $180.00 | 1M |
| Gemma-4-26b-A4b-IT google/gemma-4-26b-a4b-it | — | — | — | — | $0.34 | $0.56 | 131K | |
| Claude Haiku 4.5 anthropic/claude-haiku-4.5 | Anthropic | — | 80.1 | — | 85.2 | $1.00 | $5.00 | 200K |
| O4-Mini openai/o4-mini | OpenAI | — | — | — | — | $1.10 | $4.40 | 200K |
| Gpt-Oss-120b openai/gpt-oss-120b | OpenAI | — | — | — | — | $0.35 | $0.75 | 131K |
| Sonar Reasoning Pro perplexity/sonar-reasoning-pro | Perplexity | — | — | — | — | $2.00 | $8.00 | 127K |
| Nemotron-3-120b-A12b nvidia/nemotron-3-120b-a12b | Nvidia | — | — | — | — | $0.50 | $1.20 | 131K |
| Sonar Deep Research perplexity/sonar-deep-research | Perplexity | — | — | — | — | $2.00 | $8.00 | 127K |
| Gemma-2b-IT-Lora google/gemma-2b-it-lora | — | — | — | — | $0.030 | $0.060 | 4K | |
| Llama-3.2-3b-Instruct meta/llama-3.2-3b-instruct | Meta | — | — | — | — | $0.030 | $0.060 | 128K |
| Mistral-7b-Instruct-V0.2-Lora mistral/mistral-7b-instruct-v0.2-lora | Mistral | — | — | — | — | $0.050 | $0.10 | 4K |
| Llama-3.1-8b-Instruct-Fp8 meta/llama-3.1-8b-instruct-fp8 | Meta | — | — | — | — | $0.050 | $0.10 | 131K |
| Llama-3.2-1b-Instruct meta/llama-3.2-1b-instruct | Meta | — | — | — | — | $0.015 | $0.030 | 128K |
| Glm-4.7-Flash zai-org/glm-4.7-flash | Zai-org | — | — | — | — | $0.050 | $0.10 | 131K |
| Llama-2-7b-Chat-HF-Lora meta-llama/llama-2-7b-chat-hf-lora | Meta-llama | — | — | — | — | $0.040 | $0.080 | 4K |
| Llama-3.3-70b-Instruct-Fp8-Fast meta/llama-3.3-70b-instruct-fp8-fast | Meta | — | — | — | — | $0.29 | $2.25 | 131K |
| Granite-4.0-H-Micro ibm-granite/granite-4.0-h-micro | Ibm-granite | — | — | — | — | $0.020 | $0.11 | 8K |
| Qwen2.5-Coder-32b-Instruct qwen/qwen2.5-coder-32b-instruct | Alibaba Qwen | — | — | — | — | $0.66 | $1.00 | 131K |
| Gemma-Sea-Lion-V4-27b-IT aisingapore/gemma-sea-lion-v4-27b-it | AI Singapore | — | — | — | — | $0.30 | $0.50 | 4K |
| Qwen3-30b-A3b-Fp8 qwen/qwen3-30b-a3b-fp8 | Alibaba Qwen | — | — | — | — | $0.25 | $0.50 | 131K |
| Gemma-7b-IT-Lora google/gemma-7b-it-lora | — | — | — | — | $0.080 | $0.16 | 4K | |
| Mistral-Small-3.1-24b-Instruct mistralai/mistral-small-3.1-24b-instruct | Mistral | — | — | — | — | $0.35 | $0.55 | 131K |
| Gpt-Oss-20b openai/gpt-oss-20b | OpenAI | — | — | — | — | $0.20 | $0.30 | 131K |
| Llama-4-Scout-17b-16e-Instruct meta/llama-4-scout-17b-16e-instruct | Meta | — | — | — | — | $0.27 | $0.85 | 131K |
| Grok 4 Fast xai/grok-4-fast | xAI | — | — | — | — | $0.50 | $2.00 | 256K |
| Grok 4 xai/grok-4 | xAI | — | — | — | — | $5.00 | $15.00 | 256K |
| Claude Opus 4.5 anthropic/claude-opus-4.5 | Anthropic | — | — | — | — | $5.00 | $25.00 | 200K |
| Claude Opus 4.6 anthropic/claude-opus-4.6 | Anthropic | — | — | — | — | $5.00 | $25.00 | 1M |
| Claude Opus 4.7 anthropic/claude-opus-4.7 | Anthropic | 72.5 | 90.4 | — | 95.1 | $5.00 | $25.00 | 1M |
| Claude Sonnet 4 anthropic/claude-sonnet-4 | Anthropic | — | — | — | — | $3.00 | $15.00 | 200K |
| Claude Sonnet 4.5 anthropic/claude-sonnet-4.5 | Anthropic | — | — | — | — | $3.00 | $15.00 | 200K |
| GPT-4.1 openai/gpt-4.1 | OpenAI | — | — | — | — | $2.00 | $8.00 | 1.0M |
| GPT-4.1 Mini openai/gpt-4.1-mini | OpenAI | — | — | — | — | $0.40 | $1.60 | 1.0M |
| GPT-4.1 Nano openai/gpt-4.1-nano | OpenAI | — | — | — | — | $0.10 | $0.40 | 1M |
| GPT-4o openai/gpt-4o | OpenAI | — | — | — | — | $2.50 | $10.00 | 128K |
| GPT-4o Mini openai/gpt-4o-mini | OpenAI | — | — | — | — | $0.15 | $0.60 | 128K |
| GPT-5 openai/gpt-5 | OpenAI | — | — | — | — | $1.25 | $10.00 | 128K |
| GPT-5 Chat openai/gpt-5-chat | OpenAI | — | — | — | — | $1.25 | $10.00 | 128K |
| GPT-5 Mini openai/gpt-5-mini | OpenAI | — | — | — | — | $0.25 | $2.00 | 128K |
| GPT-5 Nano openai/gpt-5-nano | OpenAI | — | — | — | — | $0.050 | $0.40 | 128K |
| GPT-5.1 openai/gpt-5.1 | OpenAI | — | — | — | — | $1.25 | $10.00 | 128K |
| GPT-5.1 Chat openai/gpt-5.1-chat | OpenAI | — | — | — | — | $1.25 | $10.00 | 128K |
| GPT-5.4 openai/gpt-5.4 | OpenAI | — | 91.8 | — | 94.0 | $2.50 | $15.00 | 1M |
| GPT-5.4 Nano openai/gpt-5.4-nano | OpenAI | — | — | — | — | $0.20 | $1.25 | 128K |
| GPT-5.4 Pro openai/gpt-5.4-pro | OpenAI | — | — | — | — | $30.00 | $180.00 | 1M |
| Gemini 2.5 Flash google/gemini-2.5-flash | — | — | — | — | $0.30 | $2.50 | 1M | |
| Gemini 2.5 Flash Lite google/gemini-2.5-flash-lite | — | — | — | — | $0.10 | $0.40 | 1M | |
| Gemini 3 Flash google/gemini-3-flash | — | 82.3 | — | — | $0.50 | $3.00 | 1M | |
| Gemini 3.1 Flash Lite google/gemini-3.1-flash-lite | — | — | — | — | $0.25 | $1.50 | 1M | |
| Grok 4.20 Non-Reasoning xai/grok-4.20-0309-non-reasoning | xAI | — | — | — | — | $2.00 | $6.00 | 2M |
| Grok 4.20 Reasoning xai/grok-4.20-0309-reasoning | xAI | — | — | — | — | $2.00 | $6.00 | 2M |
| Grok 4.3 xai/grok-4.3 | xAI | — | — | — | — | $1.25 | $2.50 | 1M |
| MiniMax M2.7 minimax/m2.7 | MiniMax | — | — | — | — | $0.30 | $1.20 | 128K |
| Qwen 3 Max alibaba/qwen3-max | Alibaba | — | — | — | — | $1.20 | $6.00 | 262K |
| O3-Mini openai/o3-mini | OpenAI | — | — | — | — | $1.10 | $4.40 | 200K |
| IndicTrans2 EN→Indic 1B ai4bharat/indictrans2-en-indic-1B | AI4Bharat | — | — | — | — | $0.021 | $0.042 | — |
| BART Large CNN facebook/bart-large-cnn | Meta | — | — | — | — | $0.050 | $0.10 | — |
| DistilBERT SST-2 huggingface/distilbert-sst-2-int8 | Hugging Face | — | — | — | — | — | — | — |
| M2M100 1.2B meta/m2m100-1.2b | Meta | — | — | — | — | $0.021 | $0.042 | — |
Five opinionated slices: coding agents, reasoning-heavy work, cheapest model with tool calling, longest context, and fastest text models. Each slice cites the underlying benchmark or pricing field directly.
| Model | Provider | SWE-bench |
|---|---|---|
| Claude Opus 4.7 anthropic/claude-opus-4.7 | Anthropic | 72.5 |
| Kimi-K2.6 moonshot/kimi-k2.6 | Moonshot | 68.2 |
| Claude Sonnet 4.6 anthropic/claude-sonnet-4.6 | Anthropic | 62.1 |
| Model | Provider | Score |
|---|---|---|
| GPT-5.4 openai/gpt-5.4 | OpenAI | 91.8 MMLU |
| Claude Opus 4.7 anthropic/claude-opus-4.7 | Anthropic | 90.4 MMLU |
| Gemini 3.1 Pro google/gemini-3.1-pro | 89.6 MMLU | |
| Claude Sonnet 4.6 anthropic/claude-sonnet-4.6 | Anthropic | 87.2 MMLU |
| GPT-5.4 Mini openai/gpt-5.4-mini | OpenAI | 84.5 MMLU |
| Gemini 3 Flash google/gemini-3-flash | 82.3 MMLU | |
| Claude Haiku 4.5 anthropic/claude-haiku-4.5 | Anthropic | 80.1 MMLU |
| Model | Provider | In + Out /M |
|---|---|---|
| Granite-4.0-H-Micro ibm-granite/granite-4.0-h-micro | Ibm-granite | $0.13 |
| GPT-5 Nano openai/gpt-5-nano | OpenAI | $0.45 |
| Gpt-Oss-20b openai/gpt-oss-20b | OpenAI | $0.50 |
| GPT-4.1 Nano openai/gpt-4.1-nano | OpenAI | $0.50 |
| Mistral-Small-3.1-24b-Instruct mistralai/mistral-small-3.1-24b-instruct | Mistral | $0.90 |
| Gpt-Oss-120b openai/gpt-oss-120b | OpenAI | $1.10 |
| Llama-4-Scout-17b-16e-Instruct meta/llama-4-scout-17b-16e-instruct | Meta | $1.12 |
| GPT-5.4 Nano openai/gpt-5.4-nano | OpenAI | $1.45 |
| Model | Provider | Context |
|---|---|---|
| Grok 4.20 Multi-Agent xai/grok-4.20-multi-agent-0309 | xAI | 2M |
| Grok 4.20 Non-Reasoning xai/grok-4.20-0309-non-reasoning | xAI | 2M |
| Grok 4.20 Reasoning xai/grok-4.20-0309-reasoning | xAI | 2M |
| GPT-4.1 openai/gpt-4.1 | OpenAI | 1.0M |
| GPT-4.1 Mini openai/gpt-4.1-mini | OpenAI | 1.0M |
| Claude Opus 4.6 anthropic/claude-opus-4.6 | Anthropic | 1M |
| Claude Opus 4.7 anthropic/claude-opus-4.7 | Anthropic | 1M |
| Claude Opus 4.8 anthropic/claude-opus-4.8 | Anthropic | 1M |
| Model | Provider | Tier |
|---|---|---|
| Llama-3.3-70b-Instruct-Fp8-Fast meta/llama-3.3-70b-instruct-fp8-fast | Meta | edge |
| Llama-4-Scout-17b-16e-Instruct meta/llama-4-scout-17b-16e-instruct | Meta | edge |
| Grok 4 Fast xai/grok-4-fast | xAI | edge |
| Claude Haiku 4.5 anthropic/claude-haiku-4.5 | Anthropic | edge |
| GPT-4.1 Mini openai/gpt-4.1-mini | OpenAI | edge |
| GPT-4.1 Nano openai/gpt-4.1-nano | OpenAI | edge |
| GPT-4o Mini openai/gpt-4o-mini | OpenAI | edge |
| GPT-5 Mini openai/gpt-5-mini | OpenAI | edge |
Benchmarks: sourced from each lab's published eval card (Anthropic system cards, OpenAI release notes, Google's technical reports, Moonshot's Kimi paper, Meta's Llama reports). Missing cells mean the lab hasn't disclosed that benchmark — not that the model failed it.
Pricing: the exact pass-through rate you pay on AIgateway. A 5% platform fee is applied at top-up, not per request, so the $/M tokens here is what the provider charges — nothing added.
Updates: the catalog updates on every release, so this page moves when a new frontier model lands. The JSON endpoint is at GET /api/leaderboard with a one-hour browser cache — embed it on your own comparison post.
Want to run any of these? Every row links to a model page with quickstart code. Or try the cost calculator to project spend across candidate models for your workload.