Video · best for
Top picks for Video Auto-Tagging (2026)
Bulk video metadata generation. Ranked from 352 live models on the OpenRouter catalog, weighted for video input, low latency.
What this is
A capability-matched shortlist, not a benchmark-tested winner. Models are scored by the fit of their declared specs (structured output, reasoning, context, modality, price) against Video Auto-Tagging. Pair with benchmark sources like Artificial Analysis or LMSys Arena before you ship. Full methodology →
| # | Model | Score | In / 1M | Out / 1M | Context | |
|---|---|---|---|---|---|---|
| 1 | Xiaomi: MiMo-V2.5xiaomi/mimo-v2.5 | 123 | $0.40 | $2.00 | 1,048,576 | Details → |
| 2 | Google: Gemma 4 26B A4B (free)google/gemma-4-26b-a4b-it:free | 123 | Free | Free | 262,144 | Details → |
| 3 | Google: Gemma 4 26B A4B google/gemma-4-26b-a4b-it | 123 | $0.06 | $0.33 | 262,144 | Details → |
| 4 | Google: Gemma 4 31B (free)google/gemma-4-31b-it:free | 123 | Free | Free | 262,144 | Details → |
| 5 | Google: Gemma 4 31Bgoogle/gemma-4-31b-it | 123 | $0.13 | $0.38 | 262,144 | Details → |
| 6 | Qwen: Qwen3.6 Plusqwen/qwen3.6-plus | 123 | $0.33 | $1.95 | 1,000,000 | Details → |
| 7 | Xiaomi: MiMo-V2-Omnixiaomi/mimo-v2-omni | 123 | $0.40 | $2.00 | 262,144 | Details → |
| 8 | ByteDance Seed: Seed-2.0-Litebytedance-seed/seed-2.0-lite | 123 | $0.25 | $2.00 | 262,144 | Details → |
| 9 | Qwen: Qwen3.5-9Bqwen/qwen3.5-9b | 123 | $0.10 | $0.15 | 262,144 | Details → |
| 10 | Google: Gemini 3.1 Flash Lite Previewgoogle/gemini-3.1-flash-lite-preview | 123 | $0.25 | $1.50 | 1,048,576 | Details → |
| 11 | ByteDance Seed: Seed-2.0-Minibytedance-seed/seed-2.0-mini | 123 | $0.10 | $0.40 | 262,144 | Details → |
| 12 | Qwen: Qwen3.5-35B-A3Bqwen/qwen3.5-35b-a3b | 123 | $0.16 | $1.30 | 262,144 | Details → |
| 13 | Qwen: Qwen3.5-27Bqwen/qwen3.5-27b | 123 | $0.20 | $1.56 | 262,144 | Details → |
| 14 | Qwen: Qwen3.5-122B-A10Bqwen/qwen3.5-122b-a10b | 123 | $0.26 | $2.08 | 262,144 | Details → |
| 15 | Qwen: Qwen3.5-Flashqwen/qwen3.5-flash-02-23 | 123 | $0.07 | $0.26 | 1,000,000 | Details → |
How we ranked these
For Video Auto-Tagging, we weight models on video input, low latency. Higher means better. Scores combine each model's public metadata (context length, modality support, tool calling, structured output, reasoning capability) with live pricing. See full methodology →