meta-llama

Meta: Llama 3.2 11B Vision Instruct

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Query via API → View on meta-llama → Estimate cost

Quality Score

81/100

price + capability + benchmarks

Input Price

$0.24

per 1M tokens

Output Price

$0.24

per 1M tokens

Context Window

131,072

tokens

Model ID: meta-llama/llama-3.2-11b-vision-instruct
Vendor: meta-llama
Tokenizer: Llama3
Input Modalities: text, image
Output Modalities: text
Max Output: 16,384 tokens
Tool Calling: not supported
Structured Output: ✓ supported
Reasoning Mode: not supported
Vision: ✓ accepts images
Audio: no
Moderated: no

Similar models

meta-llama

Meta: Llama 3.2 11B Vision Instruct

Similar models

Meta: Llama Guard 4 12B

Meta: Llama 3.1 70B Instruct

Meta: Llama 3.3 70B Instruct

Meta: Llama 3.1 8B Instruct

Meta: Llama 3.3 70B Instruct (free)

Meta: Llama 3.2 1B Instruct