meta-llama

Meta: Llama Guard 4 12B

Llama Guard 4 12B is a multimodal model from Meta that accepts both image and text inputs, with a context window of 163,840 tokens and a maximum output of 16,384 tokens. It does not support tool use, reasoning modes, or structured output, which positions it as a focused inference model rather than a general-purpose agent. Its design suggests a content safety or classification orientation, though the scope of tasks it handles well cannot be confirmed from available data alone. At $0.18 per million tokens for both input and output, the pricing is competitive for a multimodal model, but there is currently no independent benchmark coverage to validate performance claims. Buyers who need a low-cost option for image-and-text processing pipelines may want to shortlist it, but without benchmark scores, any quality assumptions remain unproven. Teams with strict performance requirements should treat this model as unvalidated until third-party evaluations are available.

Quality Score
85/100
price + capability + benchmarks
Input Price
$0.18
per 1M tokens
Output Price
$0.18
per 1M tokens
Context Window
163,840
tokens
Model ID
meta-llama/llama-guard-4-12b
Vendor
meta-llama
Tokenizer
Other
Input Modalities
image, text
Output Modalities
text
Max Output
16,384 tokens
Tool Calling
not supported
Structured Output
✓ supported
Reasoning Mode
not supported
Vision
✓ accepts images
Audio
no
Moderated
no

Similar models