Vision · best for
Top picks for Diagram Extraction (2026)
Reading flowcharts, org charts, architecture diagrams. Ranked from 352 live models on the OpenRouter catalog, weighted for vision input, structured output, reasoning quality.
What this is
A capability-matched shortlist, not a benchmark-tested winner. Models are scored by the fit of their declared specs (structured output, reasoning, context, modality, price) against Diagram Extraction. Pair with benchmark sources like Artificial Analysis or LMSys Arena before you ship. Full methodology →
| # | Model | Score | In / 1M | Out / 1M | Context | |
|---|---|---|---|---|---|---|
| 1 | Xiaomi: MiMo-V2.5xiaomi/mimo-v2.5 | 127 | $0.40 | $2.00 | 1,048,576 | Details → |
| 2 | MoonshotAI: Kimi K2.6moonshotai/kimi-k2.6 | 127 | $0.74 | $4.66 | 256,000 | Details → |
| 3 | Google: Gemma 4 26B A4B (free)google/gemma-4-26b-a4b-it:free | 127 | Free | Free | 262,144 | Details → |
| 4 | Google: Gemma 4 26B A4B google/gemma-4-26b-a4b-it | 127 | $0.06 | $0.33 | 262,144 | Details → |
| 5 | Google: Gemma 4 31B (free)google/gemma-4-31b-it:free | 127 | Free | Free | 262,144 | Details → |
| 6 | Google: Gemma 4 31Bgoogle/gemma-4-31b-it | 127 | $0.13 | $0.38 | 262,144 | Details → |
| 7 | Qwen: Qwen3.6 Plusqwen/qwen3.6-plus | 127 | $0.33 | $1.95 | 1,000,000 | Details → |
| 8 | Z.ai: GLM 5V Turboz-ai/glm-5v-turbo | 127 | $1.20 | $4.00 | 202,752 | Details → |
| 9 | xAI: Grok 4.20x-ai/grok-4.20 | 127 | $2.00 | $6.00 | 2,000,000 | Details → |
| 10 | Xiaomi: MiMo-V2-Omnixiaomi/mimo-v2-omni | 127 | $0.40 | $2.00 | 262,144 | Details → |
| 11 | OpenAI: GPT-5.4 Nanoopenai/gpt-5.4-nano | 127 | $0.20 | $1.25 | 400,000 | Details → |
| 12 | OpenAI: GPT-5.4 Miniopenai/gpt-5.4-mini | 127 | $0.75 | $4.50 | 400,000 | Details → |
| 13 | Mistral: Mistral Small 4mistralai/mistral-small-2603 | 127 | $0.15 | $0.60 | 262,144 | Details → |
| 14 | ByteDance Seed: Seed-2.0-Litebytedance-seed/seed-2.0-lite | 127 | $0.25 | $2.00 | 262,144 | Details → |
| 15 | Qwen: Qwen3.5-9Bqwen/qwen3.5-9b | 127 | $0.10 | $0.15 | 262,144 | Details → |
How we ranked these
For Diagram Extraction, we weight models on vision input, structured output, reasoning quality. Higher means better. Scores combine each model's public metadata (context length, modality support, tool calling, structured output, reasoning capability) with live pricing. See full methodology →
Related tasks
Vision
Best for Image Captioning
Accessible alt text and detailed image descriptions.
Vision
Best for Image Generation
Models that produce images, not just read them.
Vision
Best for Screenshot Debugging
Diagnosing UI bugs from a screenshot.
Vision
Best for Chart & Graph Reading
Pulling numbers off charts in research papers and reports.