Question 1

What is Llama-3.1-70b-Instruct?

Accepted Answer

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks. It is a text model from Meta, accessible via AIgateway's OpenAI-compatible API at slug meta/llama-3.1-70b-instruct.

Question 2

How much does Llama-3.1-70b-Instruct cost via AIgateway?

Accepted Answer

Input costs $0.290 per 1M tokens; output costs $0.600 per 1M tokens. Pass-through plus a 5% platform fee applied at top-up, not per call.

Question 3

What is the context window of Llama-3.1-70b-Instruct?

Accepted Answer

131,072 tokens. Maximum output is 8,192 tokens.

Question 4

How do I call Llama-3.1-70b-Instruct from my code?

Accepted Answer

Point the OpenAI SDK at https://api.aigateway.sh/v1 with your AIgateway key and set model to "meta/llama-3.1-70b-instruct". The request and response shapes match OpenAI exactly.

Question 5

Does Llama-3.1-70b-Instruct support streaming, tool calling, vision, and JSON mode?

Accepted Answer

Streaming — yes. Tool calling — no. Vision — no. JSON mode — no. Prompt caching — no.

Question 6

What are the best use cases for Llama-3.1-70b-Instruct?

Accepted Answer

Chatbots, Content generation, Agentic workflows. Key strengths: General-purpose chat; Long context; Tool use.

Question 7

Can I bring my own Meta API key (BYOK)?

Accepted Answer

Yes. Attach a Meta key in your AIgateway dashboard and this model flips to pass-through — you pay Meta directly and AIgateway waives the 5% platform fee on those calls.

Llama-3.1-70b-Instruct

Quickstart

Capabilities

Strengths

Use cases

Pricing

Collections

Call Llama-3.1-70b-Instruct from any OpenAI SDK

Request body

Response

Streaming (SSE) — set `"stream": true`

Quickstart

Errors

Llama-3.1-70b-Instruct

Quickstart

Capabilities

Strengths

Use cases

Pricing

Collections

Call Llama-3.1-70b-Instruct from any OpenAI SDK

Request body

Response

Streaming (SSE) — set "stream": true

Quickstart

Errors

Streaming (SSE) — set `"stream": true`