meta-llama

Meta: Llama 3.2 11B Vision Instruct

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Quality Score
81/100
composite of price, context, capability
Input Price
$0.24
per 1M tokens
Output Price
$0.24
per 1M tokens
Context Window
131,072
tokens
Model ID
meta-llama/llama-3.2-11b-vision-instruct
Vendor
meta-llama
Tokenizer
Llama3
Input Modalities
text, image
Output Modalities
text
Max Output
16,384 tokens
Tool Calling
not supported
Structured Output
✓ supported
Reasoning Mode
not supported
Vision
✓ accepts images
Audio
no
Moderated
no

Similar models