qwen

Qwen: Qwen3.5-Flash

Qwen3.5-Flash is a multimodal model from Qwen that accepts text, image, and video inputs, making it applicable to tasks that involve mixed media content. It supports a context window of up to one million tokens, tool use, and reasoning, which positions it for agentic workflows and long-document tasks. Structured output support is unconfirmed. Maximum output is capped at 65,536 tokens per response. At $0.065 per million input tokens and $0.26 per million output tokens, this model sits at the budget end of the multimodal market, which is its clearest selling point. However, it carries zero independent benchmark coverage, so there is no external evidence to validate its reasoning or task performance claims. Buyers who prioritize low cost and need video input support may find it worth testing, but teams requiring verified quality baselines before committing should treat Qwen3.5-Flash as unproven until coverage appears.

Quality Score
100/100
price + capability + benchmarks
Input Price
$0.07
per 1M tokens
Output Price
$0.26
per 1M tokens
Context Window
1,000,000
tokens
Model ID
qwen/qwen3.5-flash-02-23
Vendor
qwen
Tokenizer
Qwen3
Input Modalities
text, image, video
Output Modalities
text
Max Output
65,536 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
✓ accepts images
Audio
no
Moderated
no

Similar models