Qwen: Qwen3.5-Flash
Qwen3.5-Flash is a multimodal model from Qwen that accepts text, image, and video inputs, making it applicable to tasks that involve mixed media content. It supports a context window of up to one million tokens, tool use, and reasoning, which positions it for agentic workflows and long-document tasks. Structured output support is unconfirmed. Maximum output is capped at 65,536 tokens per response. At $0.065 per million input tokens and $0.26 per million output tokens, this model sits at the budget end of the multimodal market, which is its clearest selling point. However, it carries zero independent benchmark coverage, so there is no external evidence to validate its reasoning or task performance claims. Buyers who prioritize low cost and need video input support may find it worth testing, but teams requiring verified quality baselines before committing should treat Qwen3.5-Flash as unproven until coverage appears.
- Model ID
- qwen/qwen3.5-flash-02-23
- Vendor
- qwen
- Tokenizer
- Qwen3
- Input Modalities
- text, image, video
- Output Modalities
- text
- Max Output
- 65,536 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no