meta-llama
Meta: Llama 3.2 11B Vision Instruct
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...
Quality Score
81/100
composite of price, context, capability
Input Price
$0.24
per 1M tokens
Output Price
$0.24
per 1M tokens
Context Window
131,072
tokens
- Model ID
- meta-llama/llama-3.2-11b-vision-instruct
- Vendor
- meta-llama
- Tokenizer
- Llama3
- Input Modalities
- text, image
- Output Modalities
- text
- Max Output
- 16,384 tokens
- Tool Calling
- not supported
- Structured Output
- ✓ supported
- Reasoning Mode
- not supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no
Similar models
meta-llama
Meta: Llama Guard 4 12B
$0.18 in / $0.18 out
163,840 ctx
85
meta-llama
Meta: Llama 3.1 70B Instruct
$0.40 in / $0.40 out
131,072 ctx
86
meta-llama
Meta: Llama 3.3 70B Instruct
$0.10 in / $0.32 out
131,072 ctx
86
meta-llama
Meta: Llama 4 Maverick
$0.15 in / $0.60 out
1,048,576 ctx
89
meta-llama
Meta: Llama 3.1 8B Instruct
$0.02 in / $0.05 out
16,384 ctx
72
meta-llama
Llama Guard 3 8B
$0.48 in / $0.03 out
131,072 ctx
71