Pricing per million tokens, context window, capabilities — pulled from each provider's public docs. All 2 are available via the same AIgateway OpenAI-compatible endpoint; flip the model string to switch.
Llama 3.1 8B quantized to FP8 precision
Llama 3.3 70B quantized to fp8 precision, optimized to be faster.
from openai import OpenAI
client = OpenAI(
base_url="https://api.aigateway.sh/v1",
api_key="sk-aig-...",
)
# Llama-3.1-8b-Instruct-Fp8
client.chat.completions.create(
model="meta/llama-3.1-8b-instruct-fp8",
messages=[{"role":"user","content":"hello"}],
)
# Llama-3.3-70b-Instruct-Fp8-Fast
client.chat.completions.create(
model="meta/llama-3.3-70b-instruct-fp8-fast",
messages=[{"role":"user","content":"hello"}],
)