qwen

Qwen: Qwen3 VL 235B A22B Instruct

Qwen3 VL 235B A22B Instruct is a multimodal model from Qwen that accepts both text and image inputs, making it suitable for tasks that involve visual content alongside written prompts. It supports a 262,144-token context window, which accommodates long documents or extended conversations, and it includes tool use. Maximum output is capped at 16,384 tokens per response. Structured output support is unconfirmed, and reasoning mode is not available. At $0.20 per million input tokens and $0.88 per million output tokens, the pricing sits on the lower end for a model of this scale, which may appeal to teams running vision-heavy workloads at volume. The significant caveat is that no independent benchmark coverage exists yet, so there is no external performance data to anchor quality expectations. Buyers willing to run their own evaluations may find the cost-per-token attractive, but those who need verified benchmark results before committing should wait for more coverage.

Quality Score
99/100
price + capability + benchmarks
Input Price
$0.20
per 1M tokens
Output Price
$0.88
per 1M tokens
Context Window
262,144
tokens
Model ID
qwen/qwen3-vl-235b-a22b-instruct
Vendor
qwen
Tokenizer
Qwen3
Input Modalities
text, image
Output Modalities
text
Max Output
16,384 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
not supported
Vision
✓ accepts images
Audio
no
Moderated
no

Similar models