baai/bge-m3 at $0.012/M tokens is AIgateway's default embedding model: 1024 dimensions, 100+ languages, 60K-token input. For smaller/faster, google/embeddinggemma-300m and qwen/qwen3-embedding-0.6b are both sub-cent open-weight options. Pair any of them with baai/bge-reranker-base for two-stage retrieval. All served via /v1/embeddings.
AIgateway exposes every model in this modality through the same OpenAI-compatible endpoint. Switch between options with a single model-string change; the request and response schemas don't change. This matters when you're A/B testing or want to hedge on availability.
For production, default to the first-mentioned model above — it's the one that wins on the blended cost-quality axis today. For edge cases (extreme throughput, unusual languages, special licensing needs), pick one of the alternatives and run a 20-prompt eval to confirm it meets your quality bar.
Because every option speaks the same API, upgrading from the budget model to the flagship (or down to a cheaper alternative) is one line of code. AIgateway's eval-driven routing can do it for you automatically — define an SLO, upload a dataset, and the router maintains the winning choice as providers ship new versions.
# pip install aigateway-py openai
# aigateway-py: sub-accounts, evals, replays, jobs, webhook verify.
# openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
from openai import OpenAI
client = OpenAI(
base_url="https://api.aigateway.sh/v1",
api_key="sk-aig-...",
)
r = client.chat.completions.create(
model="moonshot/kimi-k2.6",
messages=[{"role": "user", "content": "Explain vector databases in two sentences."}],
)
print(r.choices[0].message.content)// npm i aigateway-js openai
// aigateway-js: sub-accounts, evals, replays, jobs, webhook verify.
// openai SDK: chat/embeddings/images/audio — drop-in compat.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.aigateway.sh/v1",
apiKey: process.env.AIGATEWAY_KEY,
});
const r = await client.chat.completions.create({
model: "moonshot/kimi-k2.6",
messages: [{ role: "user", content: "Explain vector databases in two sentences." }],
});
console.log(r.choices[0].message.content);Yes — one model-string change. AIgateway normalises the request and response shapes across providers so your code doesn't need to care about vendor-specific quirks.
No. One sk-aig-... key routes to every provider. Billing is unified on a single line-itemised invoice.
Yes on Enterprise (99.95% uptime, latency SLO per model). Free and pay-as-you-go share best-effort routing with automatic provider failover.
AIgateway runs on Cloudflare Workers at 300+ PoPs globally. First-byte latency from your region to the gateway is typically <50ms; total end-to-end latency is dominated by the provider's own inference time.