Inference

Embeddings

Dense vector embeddings through the OpenAI-shape /v1/embeddings endpoint. Swap model to hit OpenAI text-embedding-3-large, Cohere embed-v4, Voyage voyage-3-large, BGE M3, Jina, Mixedbread, Snowflake Arctic. Matryoshka models let you truncate dimensions after the fact without a re-embed.

Embed text

POST /v1/embeddings
{
  "model": "openai/text-embedding-3-large",
  "input": ["cat", "dog", "airplane"],
  "dimensions": 512
}
// → { "data": [{ "embedding": [0.012, -0.4, ...], "index": 0 }, ...], "usage": {...} }

Picking a model

Model	Dims	Best for
`openai/text-embedding-3-small`	1536	Cheap, fast, solid English baseline.
`openai/text-embedding-3-large`	3072 (matryoshka → 256–3072)	Higher recall when accuracy matters.
`cohere/embed-v4`	1536	Best multilingual + RAG-tuned.
`voyage/voyage-3-large`	2048	Top-of-leaderboard code + law + finance.
`baai/bge-m3`	1024	Open-weight, multilingual, free self-host path.

Batching

input accepts a string or an array (up to 2,048 items / 300k tokens per call). We auto-batch across provider limits — one request in, one response out, regardless of provider batch size. For tens of millions of embeddings, use the Batch API at 50% off.

Matryoshka truncation

Models trained with matryoshka representation learning (OpenAI 3-large, Nomic v1.5, Jina v3) expose a dimensions parameter. Requesting fewer dimensions keeps the highest-information prefix of the full vector — smaller storage, comparable recall. Not all models support this; unsupported values throw 422 unsupported_parameter.

Are vectors normalized?

Yes — all embedding responses are L2-normalized, so cosine similarity ≡ dot product. If a provider returns unnormalized vectors, we normalize before returning to you. Override with normalize: false if you need the raw output.

← PreviousAudio · TTS / STT Next →Batch API