questions/By modality

What is the best embedding API?

baai/bge-m3 at $0.012/M tokens is AIgateway's default embedding model: 1024 dimensions, 100+ languages, 60K-token input. For smaller/faster, google/embeddinggemma-300m and qwen/qwen3-embedding-0.6b are both sub-cent open-weight options. Pair any of them with baai/bge-reranker-base for two-stage retrieval. All served via /v1/embeddings.

How it works

One endpoint, many models

AIgateway exposes every model in this modality through the same OpenAI-compatible endpoint. Switch between options with a single model-string change; the request and response schemas don't change. This matters when you're A/B testing or want to hedge on availability.

How to pick

For production, default to the first-mentioned model above — it's the one that wins on the blended cost-quality axis today. For edge cases (extreme throughput, unusual languages, special licensing needs), pick one of the alternatives and run a 20-prompt eval to confirm it meets your quality bar.

Upgrading later is free

Because every option speaks the same API, upgrading from the budget model to the flagship (or down to a cheaper alternative) is one line of code. AIgateway's eval-driven routing can do it for you automatically — define an SLO, upload a dataset, and the router maintains the winning choice as providers ship new versions.

Code example

Python

# pip install aigateway-py openai
# aigateway-py: sub-accounts, evals, replays, jobs, webhook verify.
# openai SDK: chat/embeddings/images/audio — drop-in compat per our SDK's own guidance.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.aigateway.sh/v1",
    api_key="sk-aig-...",
)

r = client.chat.completions.create(
    model="moonshot/kimi-k2.6",
    messages=[{"role": "user", "content": "Explain vector databases in two sentences."}],
)
print(r.choices[0].message.content)

Node / TypeScript

// npm i aigateway-js openai
// aigateway-js: sub-accounts, evals, replays, jobs, webhook verify.
// openai SDK: chat/embeddings/images/audio — drop-in compat.
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.aigateway.sh/v1",
  apiKey: process.env.AIGATEWAY_KEY,
});

const r = await client.chat.completions.create({
  model: "moonshot/kimi-k2.6",
  messages: [{ role: "user", content: "Explain vector databases in two sentences." }],
});
console.log(r.choices[0].message.content);

FAQ

Can I switch between these models mid-project?

Yes — one model-string change. AIgateway normalises the request and response shapes across providers so your code doesn't need to care about vendor-specific quirks.

Do I need separate API keys for each?

No. One sk-aig-... key routes to every provider. Billing is unified on a single line-itemised invoice.

Are there production SLAs on these endpoints?

Yes on Enterprise (99.95% uptime, latency SLO per model). Free and pay-as-you-go share best-effort routing with automatic provider failover.

How is latency handled across providers?

AIgateway runs at the edge across 300+ points of presence worldwide. First-byte latency from your region to the gateway is typically <50ms; total end-to-end latency is dominated by the upstream provider's own inference time.

TRY IT NOW

One key, every model. Free tier, no card.

Get an AIgateway key Open the playground

What is the best embedding API?

How it works

Code example

Related

FAQ

One key, every model. Free tier, no card.