Pricing per million tokens, context window, capabilities — pulled from each provider's public docs. All 2 are available via the same AIgateway OpenAI-compatible endpoint; flip the model string to switch.
The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.
Llama 3.3 70B quantized to fp8 precision, optimized to be faster.
from openai import OpenAI
client = OpenAI(
base_url="https://api.aigateway.sh/v1",
api_key="sk-aig-...",
)
# Llama-3.2-3b-Instruct
client.chat.completions.create(
model="meta/llama-3.2-3b-instruct",
messages=[{"role":"user","content":"hello"}],
)
# Llama-3.3-70b-Instruct-Fp8-Fast
client.chat.completions.create(
model="meta/llama-3.3-70b-instruct-fp8-fast",
messages=[{"role":"user","content":"hello"}],
)