qwen
Qwen: Qwen3 VL 8B Instruct
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...
Quality Score
91/100
composite of price, context, capability
Input Price
$0.08
per 1M tokens
Output Price
$0.50
per 1M tokens
Context Window
131,072
tokens
- Model ID
- qwen/qwen3-vl-8b-instruct
- Vendor
- qwen
- Tokenizer
- Qwen3
- Input Modalities
- image, text
- Output Modalities
- text
- Max Output
- 32,768 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- not supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no
Similar models
qwen
Qwen: Qwen3 VL 32B Instruct
$0.10 in / $0.42 out
131,072 ctx
91
qwen
Qwen: Qwen3 VL 30B A3B Instruct
$0.13 in / $0.52 out
131,072 ctx
91
qwen
Qwen: Qwen3 30B A3B Thinking 2507
$0.08 in / $0.40 out
131,072 ctx
91
qwen
Qwen: Qwen3 Next 80B A3B Thinking
$0.10 in / $0.78 out
131,072 ctx
90
qwen
Qwen: Qwen3 235B A22B
$0.46 in / $1.82 out
131,072 ctx
90
qwen
Qwen: Qwen VL Max
$0.52 in / $2.08 out
131,072 ctx
90