Every model you can call through AIgateway, ranked by the benchmarks engineers actually care about: SWE-bench for agentic coding, MMLU + GPQA for reasoning, HumanEval for code generation — plus input and output price per 1M tokens and context window size. Numbers are sourced from each provider's reported eval results; blank cells mean the lab has not published that benchmark. Sort any column. The same data is available as JSON at /api/leaderboard.
169 models across 48 providers. Every row is a click away from a full model page with pricing, quickstart code, and a live playground.
| Model | Provider | SWE-bench | MMLU | GPQA | HumanEval | Input $/M | Output $/M | Context |
|---|---|---|---|---|---|---|---|---|
| Claude Opus 4.7 anthropic/claude-opus-4.7 | Anthropic | 72.5 | 90.4 | — | 95.1 | $5.00 | $25.00 | 1M |
| GPT-5.4 openai/gpt-5.4 | OpenAI | — | 91.8 | — | 94.0 | $2.50 | $15.00 | 128K |
| Gemini 3.1 Pro google/gemini-3.1-pro | — | 89.6 | — | — | $2.00 | $12.00 | 1M | |
| Kimi K2.6 moonshot/kimi-k2.6 | Moonshot | 68.2 | — | — | 92.7 | $0.95 | $4.00 | 262K |
| Claude Sonnet 4.6 anthropic/claude-sonnet-4.6 | Anthropic | 62.1 | 87.2 | — | 93.4 | $3.00 | $15.00 | 200K |
| M2.7 minimax/m2.7 | MiniMax | — | — | — | — | $0.30 | $1.20 | 128K |
| GPT-5.4 Mini openai/gpt-5.4-mini | OpenAI | — | 84.5 | — | 88.6 | $0.75 | $4.50 | 128K |
| Grok 4 xai/grok-4 | xAI | — | — | — | — | $5.00 | $15.00 | 256K |
| Gemini 3 Flash google/gemini-3-flash | — | 82.3 | — | — | $0.50 | $3.00 | 1M | |
| Claude Haiku 4.5 anthropic/claude-haiku-4.5 | Anthropic | — | 80.1 | — | 85.2 | $1.00 | $5.00 | 200K |
| Llama-4-Scout-17b-16e-Instruct meta/llama-4-scout-17b-16e-instruct | Meta | — | — | — | — | $0.27 | $0.85 | 131K |
| Gpt-Oss-120b openai/gpt-oss-120b | OpenAI | — | — | — | — | $0.35 | $0.75 | 128K |
| Qwen2.5-Coder-32b-Instruct qwen/qwen2.5-coder-32b-instruct | Alibaba Qwen | — | — | — | — | $0.66 | $1.00 | 33K |
| Gemma-Sea-Lion-V4-27b-IT aisingapore/gemma-sea-lion-v4-27b-it | AI Singapore | — | — | — | — | $0.35 | $0.56 | 128K |
| Deepseek-Math-7b-Instruct deepseek/deepseek-math-7b-instruct | DeepSeek | — | — | — | — | — | — | 4K |
| Sqlcoder-7b-2 defog/sqlcoder-7b-2 | Defog | — | — | — | — | $0.050 | $0.10 | 10K |
| Una-Cybertron-7b-V2-Bf16 fblgit/una-cybertron-7b-v2-bf16 | FBL | — | — | — | — | $0.050 | $0.10 | 4K |
| Gemma-2b-IT-Lora google/gemma-2b-it-lora | — | — | — | — | $0.030 | $0.060 | 8K | |
| Gemma-3-12b-IT google/gemma-3-12b-it | — | — | — | — | $0.35 | $0.56 | 80K | |
| Gemma-4-26b-A4b-IT google/gemma-4-26b-a4b-it | — | — | — | — | $0.10 | $0.30 | 256K | |
| Gemma-7b-IT-Lora google/gemma-7b-it-lora | — | — | — | — | $0.080 | $0.16 | 4K | |
| Granite-4.0-H-Micro ibm-granite/granite-4.0-h-micro | IBM | — | — | — | — | $0.017 | $0.11 | 131K |
| Llama-2-7b-Chat-HF-Lora meta-llama/llama-2-7b-chat-hf-lora | Meta | — | — | — | — | $0.040 | $0.080 | 8K |
| Llama-2-7b-Chat-Fp16 meta/llama-2-7b-chat-fp16 | Meta | — | — | — | — | $0.56 | $6.67 | 4K |
| Llama-2-7b-Chat-Int8 meta/llama-2-7b-chat-int8 | Meta | — | — | — | — | $0.040 | $0.080 | 8K |
| Llama-3-8b-Instruct meta/llama-3-8b-instruct | Meta | — | — | — | — | $0.28 | $0.83 | 8K |
| Llama-3-8b-Instruct-Awq meta/llama-3-8b-instruct-awq | Meta | — | — | — | — | $0.12 | $0.27 | 8K |
| Llama-3.1-70b-Instruct meta/llama-3.1-70b-instruct | Meta | — | — | — | — | $0.29 | $0.60 | 131K |
| Llama-3.1-8b-Instruct meta/llama-3.1-8b-instruct | Meta | — | — | — | — | $0.050 | $0.10 | 131K |
| Llama-3.1-8b-Instruct-Awq meta/llama-3.1-8b-instruct-awq | Meta | — | — | — | — | $0.12 | $0.27 | 8K |
| Llama-3.1-8b-Instruct-Fast meta/llama-3.1-8b-instruct-fast | Meta | — | — | — | — | $0.050 | $0.10 | 131K |
| Llama-3.1-8b-Instruct-Fp8 meta/llama-3.1-8b-instruct-fp8 | Meta | — | — | — | — | $0.15 | $0.29 | 32K |
| Llama-3.2-11b-Vision-Instruct meta/llama-3.2-11b-vision-instruct | Meta | — | — | — | — | $0.049 | $0.68 | 128K |
| Llama-3.2-1b-Instruct meta/llama-3.2-1b-instruct | Meta | — | — | — | — | $0.027 | $0.20 | 60K |
| Llama-3.2-3b-Instruct meta/llama-3.2-3b-instruct | Meta | — | — | — | — | $0.051 | $0.34 | 80K |
| Llama-3.3-70b-Instruct-Fp8-Fast meta/llama-3.3-70b-instruct-fp8-fast | Meta | — | — | — | — | $0.29 | $2.25 | 24K |
| Phi-2 microsoft/phi-2 | Microsoft | — | — | — | — | $0.020 | $0.040 | 2K |
| Mistral-7b-Instruct-V0.1 mistral/mistral-7b-instruct-v0.1 | Mistral | — | — | — | — | $0.11 | $0.19 | 3K |
| Mistral-7b-Instruct-V0.2-Lora mistral/mistral-7b-instruct-v0.2-lora | Mistral | — | — | — | — | $0.050 | $0.10 | 15K |
| Mistral-Small-3.1-24b-Instruct mistralai/mistral-small-3.1-24b-instruct | Mistral | — | — | — | — | $0.35 | $0.55 | 128K |
| Kimi-K2.5 moonshot/kimi-k2.5 | Moonshot | — | — | — | — | $0.60 | $3.00 | 128K |
| Nemotron-3-120b-A12b nvidia/nemotron-3-120b-a12b | NVIDIA | — | — | — | — | $0.50 | $1.50 | 256K |
| Gpt-Oss-20b openai/gpt-oss-20b | OpenAI | — | — | — | — | $0.20 | $0.30 | 128K |
| Openchat-3.5-0106 openchat/openchat-3.5-0106 | OpenChat | — | — | — | — | $0.050 | $0.10 | 4K |
| Qwen1.5-0.5b-Chat qwen/qwen1.5-0.5b-chat | Alibaba Qwen | — | — | — | — | $0.010 | $0.020 | 4K |
| Qwen1.5-1.8b-Chat qwen/qwen1.5-1.8b-chat | Alibaba Qwen | — | — | — | — | $0.020 | $0.040 | 4K |
| Qwen1.5-14b-Chat-Awq qwen/qwen1.5-14b-chat-awq | Alibaba Qwen | — | — | — | — | $0.12 | $0.24 | 4K |
| Qwen1.5-7b-Chat-Awq qwen/qwen1.5-7b-chat-awq | Alibaba Qwen | — | — | — | — | $0.060 | $0.12 | 4K |
| Qwen3-30b-A3b-Fp8 qwen/qwen3-30b-a3b-fp8 | Alibaba Qwen | — | — | — | — | $0.051 | $0.34 | 33K |
| Discolm-German-7b-V1-Awq thebloke/discolm-german-7b-v1-awq | TheBloke | — | — | — | — | $0.050 | $0.10 | 4K |
| Falcon-7b-Instruct tiiuae/falcon-7b-instruct | TII | — | — | — | — | $0.050 | $0.10 | 4K |
| Tinyllama-1.1b-Chat-V1.0 tinyllama/tinyllama-1.1b-chat-v1.0 | TinyLlama | — | — | — | — | $0.008 | $0.016 | 2K |
| Glm-4.7-Flash zai-org/glm-4.7-flash | Zhipu AI | — | — | — | — | $0.060 | $0.40 | 131K |
| Gemma-7b-IT hf/google/gemma-7b-it | Hugging Face | — | — | — | — | $0.080 | $0.16 | 8K |
| Meta-Llama-3-8b-Instruct hf/meta-llama/meta-llama-3-8b-instruct | Hugging Face | — | — | — | — | $0.050 | $0.10 | 4K |
| Mistral-7b-Instruct-V0.2 hf/mistral/mistral-7b-instruct-v0.2 | Hugging Face | — | — | — | — | $0.050 | $0.10 | 3K |
| Starling-LM-7b-Beta hf/nexusflow/starling-lm-7b-beta | Hugging Face | — | — | — | — | $0.050 | $0.10 | 4K |
| Hermes-2-Pro-Mistral-7b hf/nousresearch/hermes-2-pro-mistral-7b | Hugging Face | — | — | — | — | $0.050 | $0.10 | 24K |
| Deepseek-Coder-6.7b-Base-Awq hf/thebloke/deepseek-coder-6.7b-base-awq | Hugging Face | — | — | — | — | $0.050 | $0.10 | 4K |
| Deepseek-Coder-6.7b-Instruct-Awq hf/thebloke/deepseek-coder-6.7b-instruct-awq | Hugging Face | — | — | — | — | $0.050 | $0.10 | 4K |
| Llama-2-13b-Chat-Awq hf/thebloke/llama-2-13b-chat-awq | Hugging Face | — | — | — | — | $0.070 | $0.14 | 4K |
| Llamaguard-7b-Awq hf/thebloke/llamaguard-7b-awq | Hugging Face | — | — | — | — | $0.040 | $0.080 | 4K |
| Mistral-7b-Instruct-V0.1-Awq hf/thebloke/mistral-7b-instruct-v0.1-awq | Hugging Face | — | — | — | — | $0.050 | $0.10 | 4K |
| Neural-Chat-7b-V3-1-Awq hf/thebloke/neural-chat-7b-v3-1-awq | Hugging Face | — | — | — | — | $0.050 | $0.10 | 4K |
| Openhermes-2.5-Mistral-7b-Awq hf/thebloke/openhermes-2.5-mistral-7b-awq | Hugging Face | — | — | — | — | $0.050 | $0.10 | 4K |
| Zephyr-7b-Beta-Awq hf/thebloke/zephyr-7b-beta-awq | Hugging Face | — | — | — | — | $0.050 | $0.10 | 4K |
| Sonar Deep Research perplexity/sonar-deep-research | Perplexity | — | — | — | — | $2.00 | $8.00 | 127K |
| Sonar Reasoning Pro perplexity/sonar-reasoning-pro | Perplexity | — | — | — | — | $2.00 | $8.00 | 127K |
| Mistral Small 4 mistral/mistral-small-4-0-26-03 | Mistral | — | — | — | — | $0.20 | $0.60 | 131K |
| Grok 4 Fast xai/grok-4-fast | xAI | — | — | — | — | $0.50 | $2.00 | 256K |
| Qwen 3 Max alibaba/qwen3-max | Alibaba | — | — | — | — | $1.20 | $6.00 | 262K |
| Qwen 3.5 397B A17B alibaba/qwen3.5-397b-a17b | Alibaba | — | — | — | — | $0.60 | $3.60 | 262K |
| Claude Opus 4.6 anthropic/claude-opus-4.6 | Anthropic | — | — | — | — | $5.00 | $25.00 | 1M |
| Claude Sonnet 4 anthropic/claude-sonnet-4 | Anthropic | — | — | — | — | $3.00 | $15.00 | 200K |
| Claude Sonnet 4.5 anthropic/claude-sonnet-4.5 | Anthropic | — | — | — | — | $3.00 | $15.00 | 200K |
| Gemini 3.1 Flash Lite google/gemini-3.1-flash-lite | — | — | — | — | $0.25 | $1.50 | 1M | |
| GPT-4.1 openai/gpt-4.1 | OpenAI | — | — | — | — | $2.00 | $8.00 | 1.0M |
| GPT-4.1 Mini openai/gpt-4.1-mini | OpenAI | — | — | — | — | $0.40 | $1.60 | 1.0M |
| GPT-5 openai/gpt-5 | OpenAI | — | — | — | — | $1.25 | $10.00 | 128K |
| GPT-5.4 Nano openai/gpt-5.4-nano | OpenAI | — | — | — | — | $0.20 | $1.25 | 128K |
| Bart-Large-CNN facebook/bart-large-cnn | Meta | — | — | — | — | — | — | — |
| o4-mini openai/o4-mini | openai | — | — | — | — | $1.10 | $4.40 | 200K |
| IndicTrans2 EN→Indic 1B ai4bharat/indictrans2-en-indic-1B | AI4Bharat | — | — | — | — | $0.021 | $0.042 | — |
| BART Large CNN facebook/bart-large-cnn | Meta | — | — | — | — | $0.050 | $0.10 | — |
| DistilBERT SST-2 huggingface/distilbert-sst-2-int8 | Hugging Face | — | — | — | — | — | — | — |
| M2M100 1.2B meta/m2m100-1.2b | Meta | — | — | — | — | $0.021 | $0.042 | — |
Five opinionated slices: coding agents, reasoning-heavy work, cheapest model with tool calling, longest context, and fastest text models. Each slice cites the underlying benchmark or pricing field directly.
| Model | Provider | SWE-bench |
|---|---|---|
| Claude Opus 4.7 anthropic/claude-opus-4.7 | Anthropic | 72.5 |
| Kimi K2.6 moonshot/kimi-k2.6 | Moonshot | 68.2 |
| Claude Sonnet 4.6 anthropic/claude-sonnet-4.6 | Anthropic | 62.1 |
| Model | Provider | Score |
|---|---|---|
| GPT-5.4 openai/gpt-5.4 | OpenAI | 91.8 MMLU |
| Claude Opus 4.7 anthropic/claude-opus-4.7 | Anthropic | 90.4 MMLU |
| Gemini 3.1 Pro google/gemini-3.1-pro | 89.6 MMLU | |
| Claude Sonnet 4.6 anthropic/claude-sonnet-4.6 | Anthropic | 87.2 MMLU |
| GPT-5.4 Mini openai/gpt-5.4-mini | OpenAI | 84.5 MMLU |
| Gemini 3 Flash google/gemini-3-flash | 82.3 MMLU | |
| Claude Haiku 4.5 anthropic/claude-haiku-4.5 | Anthropic | 80.1 MMLU |
| Model | Provider | In + Out /M |
|---|---|---|
| Granite-4.0-H-Micro ibm-granite/granite-4.0-h-micro | IBM | $0.13 |
| Hermes-2-Pro-Mistral-7b hf/nousresearch/hermes-2-pro-mistral-7b | Hugging Face | $0.15 |
| Qwen3-30b-A3b-Fp8 qwen/qwen3-30b-a3b-fp8 | Alibaba Qwen | $0.39 |
| Gemma-4-26b-A4b-IT google/gemma-4-26b-a4b-it | $0.40 | |
| Glm-4.7-Flash zai-org/glm-4.7-flash | Zhipu AI | $0.46 |
| Gpt-Oss-20b openai/gpt-oss-20b | OpenAI | $0.50 |
| Mistral Small 4 mistral/mistral-small-4-0-26-03 | Mistral | $0.80 |
| Mistral-Small-3.1-24b-Instruct mistralai/mistral-small-3.1-24b-instruct | Mistral | $0.90 |
| Model | Provider | Context |
|---|---|---|
| GPT-4.1 openai/gpt-4.1 | OpenAI | 1.0M |
| GPT-4.1 Mini openai/gpt-4.1-mini | OpenAI | 1.0M |
| Claude Opus 4.6 anthropic/claude-opus-4.6 | Anthropic | 1M |
| Claude Opus 4.7 anthropic/claude-opus-4.7 | Anthropic | 1M |
| Gemini 3 Flash google/gemini-3-flash | 1M | |
| Gemini 3.1 Flash Lite google/gemini-3.1-flash-lite | 1M | |
| Gemini 3.1 Pro google/gemini-3.1-pro | 1M | |
| Kimi K2.6 moonshot/kimi-k2.6 | Moonshot | 262K |
| Model | Provider | Tier |
|---|---|---|
| Llama-3.1-8b-Instruct-Fast meta/llama-3.1-8b-instruct-fast | Meta | edge |
| Llama-3.3-70b-Instruct-Fp8-Fast meta/llama-3.3-70b-instruct-fp8-fast | Meta | edge |
| Llama-4-Scout-17b-16e-Instruct meta/llama-4-scout-17b-16e-instruct | Meta | edge |
| Tinyllama-1.1b-Chat-V1.0 tinyllama/tinyllama-1.1b-chat-v1.0 | TinyLlama | edge |
| Grok 4 Fast xai/grok-4-fast | xAI | edge |
| Claude Haiku 4.5 anthropic/claude-haiku-4.5 | Anthropic | edge |
| Gemini 3 Flash google/gemini-3-flash | edge | |
| Gemini 3.1 Flash Lite google/gemini-3.1-flash-lite | edge |
Benchmarks: sourced from each lab's published eval card (Anthropic system cards, OpenAI release notes, Google's technical reports, Moonshot's Kimi paper, Meta's Llama reports). Missing cells mean the lab hasn't disclosed that benchmark — not that the model failed it.
Pricing: the exact pass-through rate you pay on AIgateway. A 5% platform fee is applied at top-up, not per request, so the $/M tokens here is what the provider charges — nothing added.
Updates: the catalog updates on every release, so this page moves when a new frontier model lands. The JSON endpoint is at GET /api/leaderboard with a one-hour browser cache — embed it on your own comparison post.
Want to run any of these? Every row links to a model page with quickstart code. Or try the cost calculator to project spend across candidate models for your workload.