qwen

Qwen: Qwen3 VL 8B Thinking

Qwen3 VL 8B Thinking is a vision-language model from Qwen that accepts both image and text inputs, making it suitable for tasks that involve visual content alongside written prompts. It supports a 256,000-token context window, tool use, and reasoning, which covers a reasonable range of agentic and multi-step workflows. Structured output support is unconfirmed, and maximum completions are capped at 32,768 tokens. It is not free to use. At $0.117 per million input tokens and $1.365 per million output tokens, the output cost is the main variable to watch for high-volume generation. There is currently no independent benchmark coverage, so performance relative to competing models is unproven. Teams looking for a multimodal model with a large context window and reasoning support may find it worth testing, but those who need verified benchmark standing before committing should treat it as experimental until third-party evaluations are available.

Quality Score
100/100
price + capability + benchmarks
Input Price
$0.12
per 1M tokens
Output Price
$1.36
per 1M tokens
Context Window
256,000
tokens
Model ID
qwen/qwen3-vl-8b-thinking
Vendor
qwen
Tokenizer
Qwen3
Input Modalities
image, text
Output Modalities
text
Max Output
32,768 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
✓ accepts images
Audio
no
Moderated
no

Similar models