meta-llama

Meta: Llama 4 Scout

Meta: Llama 4 Scout is a multimodal model from Meta that accepts both text and image inputs and supports tool use. Its headline specification is a 10 million token context window, which is among the largest available and makes it a candidate for tasks requiring very long document ingestion. Completions are capped at 16,384 tokens. Reasoning mode and structured output are not confirmed as supported features based on available information. At $0.10 per million input tokens and $0.30 per million output tokens, the pricing is low, which is its clearest practical advantage. However, benchmark coverage is thin at only 3 benchmarks, and the blended score of 10.9 leaves its performance standing largely unproven against the broader field. Buyers who need an inexpensive model capable of processing very long multimodal context should shortlist it, but those prioritizing verified capability across diverse tasks should wait for broader benchmark data before committing.

Quality Score
99/100
price + capability + benchmarks
Input Price
$0.10
per 1M tokens
Output Price
$0.30
per 1M tokens
Context Window
10,000,000
tokens
Model ID
meta-llama/llama-4-scout
Vendor
meta-llama
Tokenizer
Llama4
Input Modalities
text, image
Output Modalities
text
Max Output
16,384 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
not supported
Vision
✓ accepts images
Audio
no
Moderated
no

Similar models