Top picks for Video Auto-Tagging (2026)
Bulk video metadata generation. Ranked from 337 live models on the OpenRouter catalog, weighted for video input, low latency, requires_video.
| # | Model | Score | In / 1M | Out / 1M | Context | |
|---|---|---|---|---|---|---|
| 1 | MiniMax: MiniMax M3minimax/minimax-m3 | 123 | $0.30 | $1.20 | 1,048,576 | Details → |
| 2 | StepFun: Step 3.7 Flashstepfun/step-3.7-flash | 123 | $0.20 | $1.15 | 256,000 | Details → |
| 3 | Google: Gemini 3.1 Flash Litegoogle/gemini-3.1-flash-lite | 123 | $0.25 | $1.50 | 1,048,576 | Details → |
| 4 | NVIDIA: Nemotron 3 Nano Omni (free)nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free | 123 | Free | Free | 256,000 | Details → |
| 5 | Qwen: Qwen3.5 Plus 2026-04-20qwen/qwen3.5-plus-20260420 | 123 | $0.30 | $1.80 | 1,000,000 | Details → |
| 6 | Qwen: Qwen3.6 Flashqwen/qwen3.6-flash | 123 | $0.19 | $1.12 | 1,000,000 | Details → |
| 7 | Qwen: Qwen3.6 35B A3Bqwen/qwen3.6-35b-a3b | 123 | $0.14 | $1.00 | 262,144 | Details → |
| 8 | Qwen: Qwen3.6 27Bqwen/qwen3.6-27b | 123 | $0.29 | $2.40 | 262,144 | Details → |
| 9 | Xiaomi: MiMo-V2.5xiaomi/mimo-v2.5 | 123 | $0.14 | $0.28 | 1,048,576 | Details → |
| 10 | Google: Gemma 4 26B A4B (free)google/gemma-4-26b-a4b-it:free | 123 | Free | Free | 262,144 | Details → |
| 11 | Google: Gemma 4 26B A4B google/gemma-4-26b-a4b-it | 123 | $0.06 | $0.33 | 262,144 | Details → |
| 12 | Google: Gemma 4 31B (free)google/gemma-4-31b-it:free | 123 | Free | Free | 262,144 | Details → |
| 13 | Google: Gemma 4 31Bgoogle/gemma-4-31b-it | 123 | $0.12 | $0.36 | 262,144 | Details → |
| 14 | Qwen: Qwen3.6 Plusqwen/qwen3.6-plus | 123 | $0.33 | $1.95 | 1,000,000 | Details → |
| 15 | ByteDance Seed: Seed-2.0-Litebytedance-seed/seed-2.0-lite | 123 | $0.25 | $2.00 | 262,144 | Details → |
Affiliate link. PicksByModel may earn a commission at no extra cost to you.
How we ranked these
For Video Auto-Tagging, we weight models on video input, low latency, requires_video. Scores combine each model's public specs with independent benchmark results (Aider Polyglot coding scores, Artificial Analysis intelligence/coding/agentic indices) and live pricing. See full methodology →
About Video Auto-Tagging
Video auto-tagging is the process of automatically generating metadata labels, categories, and descriptions for video files at scale. You need this when you have dozens to thousands of videos and manually tagging each one would consume weeks of labor or is simply impractical. A good model identifies objects, actions, scenes, text overlays, and audio cues with minimal false positives, then structures output as machine-readable tags or descriptions. Bad models miss context, hallucinate tags unrelated to actual content, or fail on lower-quality video formats. Speed matters here: processing 100 hours of video with a slow model can easily cost 10x more than a fast one, so batch inference efficiency and codec support directly impact your per-video cost.
When to use: Use this when you have a library of videos that need searchable metadata but no budget or bandwidth for manual tagging. Common cases include e-commerce product videos, video archives, user-generated content platforms, or media asset management systems.
Common questions
What is the difference between video auto-tagging and video understanding?
Video auto-tagging produces structured metadata (tags, labels, categories) optimized for search and filtering. Video understanding is broader and may include generating captions, summaries, or answering questions about content. For bulk metadata generation, auto-tagging models like Gemini 2.0 Video or Claude's vision capabilities are purpose-built to be faster and cheaper.
How much does it cost to auto-tag 1,000 videos?
Cost depends on video length, resolution, and model choice. Cloud vision APIs typically charge $1 to $4 per video for moderate lengths. Batch processing and open-source models like CLIP-based taggers can reduce cost to under $0.10 per video if you have GPU infrastructure. Expect trade-offs: cheaper models produce fewer or less precise tags.