Meta: Llama 4 Scout
Meta: Llama 4 Scout is a multimodal model from Meta that accepts both text and image inputs and supports tool use. Its headline specification is a 10 million token context window, which is among the largest available and makes it a candidate for tasks requiring very long document ingestion. Completions are capped at 16,384 tokens. Reasoning mode and structured output are not confirmed as supported features based on available information. At $0.10 per million input tokens and $0.30 per million output tokens, the pricing is low, which is its clearest practical advantage. However, benchmark coverage is thin at only 3 benchmarks, and the blended score of 10.9 leaves its performance standing largely unproven against the broader field. Buyers who need an inexpensive model capable of processing very long multimodal context should shortlist it, but those prioritizing verified capability across diverse tasks should wait for broader benchmark data before committing.
- Model ID
- meta-llama/llama-4-scout
- Vendor
- meta-llama
- Tokenizer
- Llama4
- Input Modalities
- text, image
- Output Modalities
- text
- Max Output
- 16,384 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- not supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no