Qwen: Qwen3 VL 235B A22B Instruct
Qwen3 VL 235B A22B Instruct is a multimodal model from Qwen that accepts both text and image inputs, making it suitable for tasks that involve visual content alongside written prompts. It supports a 262,144-token context window, which accommodates long documents or extended conversations, and it includes tool use. Maximum output is capped at 16,384 tokens per response. Structured output support is unconfirmed, and reasoning mode is not available. At $0.20 per million input tokens and $0.88 per million output tokens, the pricing sits on the lower end for a model of this scale, which may appeal to teams running vision-heavy workloads at volume. The significant caveat is that no independent benchmark coverage exists yet, so there is no external performance data to anchor quality expectations. Buyers willing to run their own evaluations may find the cost-per-token attractive, but those who need verified benchmark results before committing should wait for more coverage.
- Model ID
- qwen/qwen3-vl-235b-a22b-instruct
- Vendor
- qwen
- Tokenizer
- Qwen3
- Input Modalities
- text, image
- Output Modalities
- text
- Max Output
- 16,384 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- not supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no