qwen

Qwen: Qwen3 VL 8B Instruct

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...

Quality Score
91/100
composite of price, context, capability
Input Price
$0.08
per 1M tokens
Output Price
$0.50
per 1M tokens
Context Window
131,072
tokens
Model ID
qwen/qwen3-vl-8b-instruct
Vendor
qwen
Tokenizer
Qwen3
Input Modalities
image, text
Output Modalities
text
Max Output
32,768 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
not supported
Vision
✓ accepts images
Audio
no
Moderated
no

Similar models