qwen
Qwen: Qwen3 VL 8B Instruct
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...
Quality Score
99/100
price + capability + benchmarks
Input Price
$0.08
per 1M tokens
Output Price
$0.50
per 1M tokens
Context Window
256,000
tokens
- Model ID
- qwen/qwen3-vl-8b-instruct
- Vendor
- qwen
- Tokenizer
- Qwen3
- Input Modalities
- image, text
- Output Modalities
- text
- Max Output
- 32,768 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- not supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no
Similar models
qwen
Qwen: Qwen3 VL 32B Instruct
$0.10 in / $0.42 out
262,144 ctx
99
qwen
Qwen: Qwen3 VL 30B A3B Instruct
$0.13 in / $0.52 out
262,144 ctx
99
qwen
Qwen: Qwen3 Next 80B A3B Thinking
$0.10 in / $0.78 out
262,144 ctx
99
qwen
Qwen: Qwen3 235B A22B Thinking 2507
$0.10 in / $0.10 out
262,144 ctx
99
qwen
Qwen: Qwen3 VL 235B A22B Instruct
$0.20 in / $0.88 out
262,144 ctx
99
qwen
Qwen: Qwen Plus 0728 (thinking)
$0.26 in / $0.78 out
1,000,000 ctx
99