Question 1

What is Llama-Guard-3-8b?

Accepted Answer

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated. It is a moderation model from Meta, accessible via AIgateway's OpenAI-compatible API at slug meta/llama-guard-3-8b.

Question 2

How much does Llama-Guard-3-8b cost via AIgateway?

Accepted Answer

Input costs $0.480 per 1M tokens; output costs $0.030 per 1M tokens. Pass-through plus a 5% platform fee applied at top-up, not per call.

Question 3

What is the context window of Llama-Guard-3-8b?

Accepted Answer

131,072 tokens. Maximum output is 4,096 tokens.

Question 4

How do I call Llama-Guard-3-8b from my code?

Accepted Answer

Point the OpenAI SDK at https://api.aigateway.sh/v1 with your AIgateway key and set model to "meta/llama-guard-3-8b". The request and response shapes match OpenAI exactly.

Question 5

Does Llama-Guard-3-8b support streaming, tool calling, vision, and JSON mode?

Accepted Answer

Streaming — yes. Tool calling — no. Vision — no. JSON mode — no. Prompt caching — no.

Question 6

What are the best use cases for Llama-Guard-3-8b?

Accepted Answer

Input/output moderation, Abuse detection. Key strengths: Policy-aware; Fast; Open-weight.

Question 7

Can I bring my own Meta API key (BYOK)?

Accepted Answer

Yes. Attach a Meta key in your AIgateway dashboard and this model flips to pass-through — you pay Meta directly and AIgateway waives the 5% platform fee on those calls.

Llama-Guard-3-8b

Quickstart

Capabilities

Strengths

Use cases

Pricing

Collections

Call Llama-Guard-3-8b from any OpenAI SDK

Request body

Response

Streaming (SSE) — set `"stream": true`

Quickstart

Errors

Llama-Guard-3-8b

Quickstart

Capabilities

Strengths

Use cases

Pricing

Collections

Call Llama-Guard-3-8b from any OpenAI SDK

Request body

Response

Streaming (SSE) — set "stream": true

Quickstart

Errors

Streaming (SSE) — set `"stream": true`