Leaderboard · 2026

AI Model Leaderboard — 2026

Every model you can call through AIgateway, ranked by the benchmarks engineers actually care about: SWE-bench for agentic coding, MMLU + GPQA for reasoning, HumanEval for code generation — plus input and output price per 1M tokens and context window size. Numbers are sourced from each provider's reported eval results; blank cells mean the lab has not published that benchmark. Sort any column. The same data is available as JSON at /api/leaderboard.

169 models across 48 providers. Every row is a click away from a full model page with pricing, quickstart code, and a live playground.

86 of 169 models · JSON
ModelProviderSWE-benchMMLUGPQAHumanEvalInput $/MOutput $/MContext
Claude Opus 4.7
anthropic/claude-opus-4.7
Anthropic72.590.495.1$5.00$25.001M
GPT-5.4
openai/gpt-5.4
OpenAI91.894.0$2.50$15.00128K
Gemini 3.1 Pro
google/gemini-3.1-pro
Google89.6$2.00$12.001M
Kimi K2.6
moonshot/kimi-k2.6
Moonshot68.292.7$0.95$4.00262K
Claude Sonnet 4.6
anthropic/claude-sonnet-4.6
Anthropic62.187.293.4$3.00$15.00200K
M2.7
minimax/m2.7
MiniMax$0.30$1.20128K
GPT-5.4 Mini
openai/gpt-5.4-mini
OpenAI84.588.6$0.75$4.50128K
Grok 4
xai/grok-4
xAI$5.00$15.00256K
Gemini 3 Flash
google/gemini-3-flash
Google82.3$0.50$3.001M
Claude Haiku 4.5
anthropic/claude-haiku-4.5
Anthropic80.185.2$1.00$5.00200K
Llama-4-Scout-17b-16e-Instruct
meta/llama-4-scout-17b-16e-instruct
Meta$0.27$0.85131K
Gpt-Oss-120b
openai/gpt-oss-120b
OpenAI$0.35$0.75128K
Qwen2.5-Coder-32b-Instruct
qwen/qwen2.5-coder-32b-instruct
Alibaba Qwen$0.66$1.0033K
Gemma-Sea-Lion-V4-27b-IT
aisingapore/gemma-sea-lion-v4-27b-it
AI Singapore$0.35$0.56128K
Deepseek-Math-7b-Instruct
deepseek/deepseek-math-7b-instruct
DeepSeek4K
Sqlcoder-7b-2
defog/sqlcoder-7b-2
Defog$0.050$0.1010K
Una-Cybertron-7b-V2-Bf16
fblgit/una-cybertron-7b-v2-bf16
FBL$0.050$0.104K
Gemma-2b-IT-Lora
google/gemma-2b-it-lora
Google$0.030$0.0608K
Gemma-3-12b-IT
google/gemma-3-12b-it
Google$0.35$0.5680K
Gemma-4-26b-A4b-IT
google/gemma-4-26b-a4b-it
Google$0.10$0.30256K
Gemma-7b-IT-Lora
google/gemma-7b-it-lora
Google$0.080$0.164K
Granite-4.0-H-Micro
ibm-granite/granite-4.0-h-micro
IBM$0.017$0.11131K
Llama-2-7b-Chat-HF-Lora
meta-llama/llama-2-7b-chat-hf-lora
Meta$0.040$0.0808K
Llama-2-7b-Chat-Fp16
meta/llama-2-7b-chat-fp16
Meta$0.56$6.674K
Llama-2-7b-Chat-Int8
meta/llama-2-7b-chat-int8
Meta$0.040$0.0808K
Llama-3-8b-Instruct
meta/llama-3-8b-instruct
Meta$0.28$0.838K
Llama-3-8b-Instruct-Awq
meta/llama-3-8b-instruct-awq
Meta$0.12$0.278K
Llama-3.1-70b-Instruct
meta/llama-3.1-70b-instruct
Meta$0.29$0.60131K
Llama-3.1-8b-Instruct
meta/llama-3.1-8b-instruct
Meta$0.050$0.10131K
Llama-3.1-8b-Instruct-Awq
meta/llama-3.1-8b-instruct-awq
Meta$0.12$0.278K
Llama-3.1-8b-Instruct-Fast
meta/llama-3.1-8b-instruct-fast
Meta$0.050$0.10131K
Llama-3.1-8b-Instruct-Fp8
meta/llama-3.1-8b-instruct-fp8
Meta$0.15$0.2932K
Llama-3.2-11b-Vision-Instruct
meta/llama-3.2-11b-vision-instruct
Meta$0.049$0.68128K
Llama-3.2-1b-Instruct
meta/llama-3.2-1b-instruct
Meta$0.027$0.2060K
Llama-3.2-3b-Instruct
meta/llama-3.2-3b-instruct
Meta$0.051$0.3480K
Llama-3.3-70b-Instruct-Fp8-Fast
meta/llama-3.3-70b-instruct-fp8-fast
Meta$0.29$2.2524K
Phi-2
microsoft/phi-2
Microsoft$0.020$0.0402K
Mistral-7b-Instruct-V0.1
mistral/mistral-7b-instruct-v0.1
Mistral$0.11$0.193K
Mistral-7b-Instruct-V0.2-Lora
mistral/mistral-7b-instruct-v0.2-lora
Mistral$0.050$0.1015K
Mistral-Small-3.1-24b-Instruct
mistralai/mistral-small-3.1-24b-instruct
Mistral$0.35$0.55128K
Kimi-K2.5
moonshot/kimi-k2.5
Moonshot$0.60$3.00128K
Nemotron-3-120b-A12b
nvidia/nemotron-3-120b-a12b
NVIDIA$0.50$1.50256K
Gpt-Oss-20b
openai/gpt-oss-20b
OpenAI$0.20$0.30128K
Openchat-3.5-0106
openchat/openchat-3.5-0106
OpenChat$0.050$0.104K
Qwen1.5-0.5b-Chat
qwen/qwen1.5-0.5b-chat
Alibaba Qwen$0.010$0.0204K
Qwen1.5-1.8b-Chat
qwen/qwen1.5-1.8b-chat
Alibaba Qwen$0.020$0.0404K
Qwen1.5-14b-Chat-Awq
qwen/qwen1.5-14b-chat-awq
Alibaba Qwen$0.12$0.244K
Qwen1.5-7b-Chat-Awq
qwen/qwen1.5-7b-chat-awq
Alibaba Qwen$0.060$0.124K
Qwen3-30b-A3b-Fp8
qwen/qwen3-30b-a3b-fp8
Alibaba Qwen$0.051$0.3433K
Discolm-German-7b-V1-Awq
thebloke/discolm-german-7b-v1-awq
TheBloke$0.050$0.104K
Falcon-7b-Instruct
tiiuae/falcon-7b-instruct
TII$0.050$0.104K
Tinyllama-1.1b-Chat-V1.0
tinyllama/tinyllama-1.1b-chat-v1.0
TinyLlama$0.008$0.0162K
Glm-4.7-Flash
zai-org/glm-4.7-flash
Zhipu AI$0.060$0.40131K
Gemma-7b-IT
hf/google/gemma-7b-it
Hugging Face$0.080$0.168K
Meta-Llama-3-8b-Instruct
hf/meta-llama/meta-llama-3-8b-instruct
Hugging Face$0.050$0.104K
Mistral-7b-Instruct-V0.2
hf/mistral/mistral-7b-instruct-v0.2
Hugging Face$0.050$0.103K
Starling-LM-7b-Beta
hf/nexusflow/starling-lm-7b-beta
Hugging Face$0.050$0.104K
Hermes-2-Pro-Mistral-7b
hf/nousresearch/hermes-2-pro-mistral-7b
Hugging Face$0.050$0.1024K
Deepseek-Coder-6.7b-Base-Awq
hf/thebloke/deepseek-coder-6.7b-base-awq
Hugging Face$0.050$0.104K
Deepseek-Coder-6.7b-Instruct-Awq
hf/thebloke/deepseek-coder-6.7b-instruct-awq
Hugging Face$0.050$0.104K
Llama-2-13b-Chat-Awq
hf/thebloke/llama-2-13b-chat-awq
Hugging Face$0.070$0.144K
Llamaguard-7b-Awq
hf/thebloke/llamaguard-7b-awq
Hugging Face$0.040$0.0804K
Mistral-7b-Instruct-V0.1-Awq
hf/thebloke/mistral-7b-instruct-v0.1-awq
Hugging Face$0.050$0.104K
Neural-Chat-7b-V3-1-Awq
hf/thebloke/neural-chat-7b-v3-1-awq
Hugging Face$0.050$0.104K
Openhermes-2.5-Mistral-7b-Awq
hf/thebloke/openhermes-2.5-mistral-7b-awq
Hugging Face$0.050$0.104K
Zephyr-7b-Beta-Awq
hf/thebloke/zephyr-7b-beta-awq
Hugging Face$0.050$0.104K
Sonar Deep Research
perplexity/sonar-deep-research
Perplexity$2.00$8.00127K
Sonar Reasoning Pro
perplexity/sonar-reasoning-pro
Perplexity$2.00$8.00127K
Mistral Small 4
mistral/mistral-small-4-0-26-03
Mistral$0.20$0.60131K
Grok 4 Fast
xai/grok-4-fast
xAI$0.50$2.00256K
Qwen 3 Max
alibaba/qwen3-max
Alibaba$1.20$6.00262K
Qwen 3.5 397B A17B
alibaba/qwen3.5-397b-a17b
Alibaba$0.60$3.60262K
Claude Opus 4.6
anthropic/claude-opus-4.6
Anthropic$5.00$25.001M
Claude Sonnet 4
anthropic/claude-sonnet-4
Anthropic$3.00$15.00200K
Claude Sonnet 4.5
anthropic/claude-sonnet-4.5
Anthropic$3.00$15.00200K
Gemini 3.1 Flash Lite
google/gemini-3.1-flash-lite
Google$0.25$1.501M
GPT-4.1
openai/gpt-4.1
OpenAI$2.00$8.001.0M
GPT-4.1 Mini
openai/gpt-4.1-mini
OpenAI$0.40$1.601.0M
GPT-5
openai/gpt-5
OpenAI$1.25$10.00128K
GPT-5.4 Nano
openai/gpt-5.4-nano
OpenAI$0.20$1.25128K
Bart-Large-CNN
facebook/bart-large-cnn
Meta
o4-mini
openai/o4-mini
openai$1.10$4.40200K
IndicTrans2 EN→Indic 1B
ai4bharat/indictrans2-en-indic-1B
AI4Bharat$0.021$0.042
BART Large CNN
facebook/bart-large-cnn
Meta$0.050$0.10
DistilBERT SST-2
huggingface/distilbert-sst-2-int8
Hugging Face
M2M100 1.2B
meta/m2m100-1.2b
Meta$0.021$0.042
By category

Shortcuts by what you're actually optimizing for.

Five opinionated slices: coding agents, reasoning-heavy work, cheapest model with tool calling, longest context, and fastest text models. Each slice cites the underlying benchmark or pricing field directly.

Best at coding · SWE-bench

ModelProviderSWE-bench
Claude Opus 4.7
anthropic/claude-opus-4.7
Anthropic72.5
Kimi K2.6
moonshot/kimi-k2.6
Moonshot68.2
Claude Sonnet 4.6
anthropic/claude-sonnet-4.6
Anthropic62.1

Best reasoning · GPQA / MMLU

ModelProviderScore
GPT-5.4
openai/gpt-5.4
OpenAI91.8 MMLU
Claude Opus 4.7
anthropic/claude-opus-4.7
Anthropic90.4 MMLU
Gemini 3.1 Pro
google/gemini-3.1-pro
Google89.6 MMLU
Claude Sonnet 4.6
anthropic/claude-sonnet-4.6
Anthropic87.2 MMLU
GPT-5.4 Mini
openai/gpt-5.4-mini
OpenAI84.5 MMLU
Gemini 3 Flash
google/gemini-3-flash
Google82.3 MMLU
Claude Haiku 4.5
anthropic/claude-haiku-4.5
Anthropic80.1 MMLU

Cheapest with tool calling

ModelProviderIn + Out /M
Granite-4.0-H-Micro
ibm-granite/granite-4.0-h-micro
IBM$0.13
Hermes-2-Pro-Mistral-7b
hf/nousresearch/hermes-2-pro-mistral-7b
Hugging Face$0.15
Qwen3-30b-A3b-Fp8
qwen/qwen3-30b-a3b-fp8
Alibaba Qwen$0.39
Gemma-4-26b-A4b-IT
google/gemma-4-26b-a4b-it
Google$0.40
Glm-4.7-Flash
zai-org/glm-4.7-flash
Zhipu AI$0.46
Gpt-Oss-20b
openai/gpt-oss-20b
OpenAI$0.50
Mistral Small 4
mistral/mistral-small-4-0-26-03
Mistral$0.80
Mistral-Small-3.1-24b-Instruct
mistralai/mistral-small-3.1-24b-instruct
Mistral$0.90

Longest context window

ModelProviderContext
GPT-4.1
openai/gpt-4.1
OpenAI1.0M
GPT-4.1 Mini
openai/gpt-4.1-mini
OpenAI1.0M
Claude Opus 4.6
anthropic/claude-opus-4.6
Anthropic1M
Claude Opus 4.7
anthropic/claude-opus-4.7
Anthropic1M
Gemini 3 Flash
google/gemini-3-flash
Google1M
Gemini 3.1 Flash Lite
google/gemini-3.1-flash-lite
Google1M
Gemini 3.1 Pro
google/gemini-3.1-pro
Google1M
Kimi K2.6
moonshot/kimi-k2.6
Moonshot262K

Fastest · edge + mini tier

ModelProviderTier
Llama-3.1-8b-Instruct-Fast
meta/llama-3.1-8b-instruct-fast
Metaedge
Llama-3.3-70b-Instruct-Fp8-Fast
meta/llama-3.3-70b-instruct-fp8-fast
Metaedge
Llama-4-Scout-17b-16e-Instruct
meta/llama-4-scout-17b-16e-instruct
Metaedge
Tinyllama-1.1b-Chat-V1.0
tinyllama/tinyllama-1.1b-chat-v1.0
TinyLlamaedge
Grok 4 Fast
xai/grok-4-fast
xAIedge
Claude Haiku 4.5
anthropic/claude-haiku-4.5
Anthropicedge
Gemini 3 Flash
google/gemini-3-flash
Googleedge
Gemini 3.1 Flash Lite
google/gemini-3.1-flash-lite
Googleedge
How this is built

Public benchmarks. Live pricing. Free JSON.

Benchmarks: sourced from each lab's published eval card (Anthropic system cards, OpenAI release notes, Google's technical reports, Moonshot's Kimi paper, Meta's Llama reports). Missing cells mean the lab hasn't disclosed that benchmark — not that the model failed it.

Pricing: the exact pass-through rate you pay on AIgateway. A 5% platform fee is applied at top-up, not per request, so the $/M tokens here is what the provider charges — nothing added.

Updates: the catalog updates on every release, so this page moves when a new frontier model lands. The JSON endpoint is at GET /api/leaderboard with a one-hour browser cache — embed it on your own comparison post.

Want to run any of these? Every row links to a model page with quickstart code. Or try the cost calculator to project spend across candidate models for your workload.