Pricing per million tokens, context window, capabilities — pulled from each provider's public docs. All 1 are available via the same AIgateway OpenAI-compatible endpoint; flip the model string to switch.
This is a Llama2 base model that Cloudflare dedicated for inference with LoRA adapters. Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.