Meta: Llama 4 Scout
Meta: Llama 4 Scout is a multimodal model from Meta that accepts both text and image inputs and supports tool use. Its headline feature is a 10 million token context window, one of the largest available, making it suited for workflows that require holding large documents, codebases, or conversation histories in a single session. It does not support native reasoning mode, and structured output support is unconfirmed. Maximum output is capped at 16,384 tokens per response. At $0.10 per million input tokens and $0.30 per million output tokens, Llama 4 Scout sits at the budget end of the market, which is its clearest advantage. However, its blended benchmark score of 6.1 across only three benchmarks leaves its general capability largely unproven relative to more thoroughly evaluated alternatives. Teams with cost-sensitive, high-volume workloads involving long context or image inputs may find it worth trialing, but those prioritizing demonstrated performance should treat the benchmark picture as thin and weigh that gap carefully.
- Model ID
- meta-llama/llama-4-scout
- Vendor
- meta-llama
- Tokenizer
- Llama4
- Input Modalities
- text, image
- Output Modalities
- text
- Max Output
- 16,384 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- not supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no
Category rankings
Where Meta: Llama 4 Scout places across the 3 categories it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #23 | Code CompletionCode · of 25 ranked | 132 |
| #23 | Cheap Bulk InferenceCost · of 25 ranked | 137 |
| #25 | Self-Hosted / LocalCost · of 25 ranked | 117 |