xAI's Grok speech-to-text model. Transcribes audio files into text across 25 languages with word-level timestamps, multichannel transcription, speaker diarization, and key-term biasing.
Grok STT (xai/grok-stt) is a audio-stt model from xAI, released 2026-06-03. Pricing via AIgateway: $0.0017 per minute. Call it via https://api.aigateway.sh/v1/audio/transcriptions — set model="xai/grok-stt". Best for: Meeting transcripts, Captions, Voice agents.
curl https://api.aigateway.sh/v1/audio/transcriptions \
-H "Authorization: Bearer $AIGATEWAY_API_KEY" \
-F model="xai/grok-stt" \
-F file="@audio.mp3"# multipart/form-data — use curl -F or SDK file upload model="xai/grok-stt" file=@audio.mp3 response_format=json # or "verbose_json", "text", "srt", "vtt" language=en # optional
{
"text": "Hello from AIgateway.",
"language": "en",
"duration": 1.82
}from openai import OpenAI
client = OpenAI(base_url="https://api.aigateway.sh/v1", api_key="sk-aig-...")
with open("audio.mp3", "rb") as f:
r = client.audio.transcriptions.create(model="xai/grok-stt", file=f)
print(r.text)