View as /docs.md
Inference

Embeddings

Dense vector embeddings through the OpenAI-shape /v1/embeddings endpoint. Swap model to hit OpenAI text-embedding-3-large, Cohere embed-v4, Voyage voyage-3-large, BGE M3, Jina, Mixedbread, Snowflake Arctic. Matryoshka models let you truncate dimensions after the fact without a re-embed.

Embed text

POST /v1/embeddings
{
  "model": "openai/text-embedding-3-large",
  "input": ["cat", "dog", "airplane"],
  "dimensions": 512
}
// → { "data": [{ "embedding": [0.012, -0.4, ...], "index": 0 }, ...], "usage": {...} }

Picking a model

ModelDimsBest for
openai/text-embedding-3-small1536Cheap, fast, solid English baseline.
openai/text-embedding-3-large3072 (matryoshka → 256–3072)Higher recall when accuracy matters.
cohere/embed-v41536Best multilingual + RAG-tuned.
voyage/voyage-3-large2048Top-of-leaderboard code + law + finance.
baai/bge-m31024Open-weight, multilingual, free self-host path.

Batching

input accepts a string or an array (up to 2,048 items / 300k tokens per call). We auto-batch across provider limits — one request in, one response out, regardless of provider batch size. For tens of millions of embeddings, use the Batch API at 50% off.

Matryoshka truncation

Models trained with matryoshka representation learning (OpenAI 3-large, Nomic v1.5, Jina v3) expose a dimensions parameter. Requesting fewer dimensions keeps the highest-information prefix of the full vector — smaller storage, comparable recall. Not all models support this; unsupported values throw 422 unsupported_parameter.

Are vectors normalized?

Yes — all embedding responses are L2-normalized, so cosine similarity ≡ dot product. If a provider returns unnormalized vectors, we normalize before returning to you. Override with normalize: false if you need the raw output.