Vision · best for

Best AI model for Image Captioning (2026)

Accessible alt text and detailed image descriptions. Ranked from 346 live models on the OpenRouter catalog, weighted for vision input, low latency.

#ModelScoreIn / 1MOut / 1MContext
1 MoonshotAI: Kimi K2.6moonshotai/kimi-k2.6 119 $0.80 $3.50 262,144 Try →
2 Google: Gemma 4 26B A4B (free)google/gemma-4-26b-a4b-it:free 119 Free Free 262,144 Try →
3 Google: Gemma 4 26B A4B google/gemma-4-26b-a4b-it 119 $0.07 $0.35 262,144 Try →
4 Google: Gemma 4 31B (free)google/gemma-4-31b-it:free 119 Free Free 262,144 Try →
5 Google: Gemma 4 31Bgoogle/gemma-4-31b-it 119 $0.13 $0.38 262,144 Try →
6 Qwen: Qwen3.6 Plusqwen/qwen3.6-plus 119 $0.33 $1.95 1,000,000 Try →
7 Xiaomi: MiMo-V2-Omnixiaomi/mimo-v2-omni 119 $0.40 $2.00 262,144 Try →
8 OpenAI: GPT-5.4 Nanoopenai/gpt-5.4-nano 119 $0.20 $1.25 400,000 Try →
9 OpenAI: GPT-5.4 Miniopenai/gpt-5.4-mini 119 $0.75 $4.50 400,000 Try →
10 Mistral: Mistral Small 4mistralai/mistral-small-2603 119 $0.15 $0.60 262,144 Try →
11 ByteDance Seed: Seed-2.0-Litebytedance-seed/seed-2.0-lite 119 $0.25 $2.00 262,144 Try →
12 Qwen: Qwen3.5-9Bqwen/qwen3.5-9b 119 $0.10 $0.15 262,144 Try →
13 Google: Gemini 3.1 Flash Lite Previewgoogle/gemini-3.1-flash-lite-preview 119 $0.25 $1.50 1,048,576 Try →
14 ByteDance Seed: Seed-2.0-Minibytedance-seed/seed-2.0-mini 119 $0.10 $0.40 262,144 Try →
15 Qwen: Qwen3.5-35B-A3Bqwen/qwen3.5-35b-a3b 119 $0.16 $1.30 262,144 Try →

How we ranked these

For Image Captioning, we weight models on vision input, low latency. Higher means better. Scores combine OpenRouter's model metadata (context length, modality support, tool calling, structured output, reasoning capability) with public pricing. See full methodology →

Related tasks