Writing · best for

Top picks for Short-Form Summarization (2026)

TL;DRs of articles and emails at scale. Ranked from 334 live models on the OpenRouter catalog, weighted for low latency, low cost, reasoning quality.

What this is Ranked by capability match + real benchmark scores (Aider Polyglot, Artificial Analysis Intelligence Index) + live pricing. Models need the right specs for Short-Form Summarization, then benchmark performance refines the order. Full methodology →
#ModelScoreIn / 1MOut / 1MContext
1 DeepSeek: DeepSeek V4 Flashdeepseek/deepseek-v4-flash 132 $0.09 $0.18 1,048,576 Details →
2 DeepSeek: DeepSeek V4 Prodeepseek/deepseek-v4-pro 132 $0.43 $0.87 1,048,576 Details →
3 MoonshotAI: Kimi K2.6moonshotai/kimi-k2.6 131 $0.66 $3.50 262,144 Details →
4 MiniMax: MiniMax M3minimax/minimax-m3 131 $0.30 $1.20 1,048,576 Details →
5 MoonshotAI: Kimi K2.7 Codemoonshotai/kimi-k2.7-code 130 $0.61 $3.07 262,144 Details →
6 Qwen: Qwen3.5 397B A17Bqwen/qwen3.5-397b-a17b 130 $0.39 $2.45 256,000 Details →
7 OpenAI: GPT-5.4 Nanoopenai/gpt-5.4-nano 130 $0.20 $1.25 400,000 Details →
8 Qwen: Qwen3.6 Plusqwen/qwen3.6-plus 130 $0.33 $1.95 1,000,000 Details →
9 Google: Gemma 4 31Bgoogle/gemma-4-31b-it 129 $0.12 $0.35 262,144 Details →
10 Xiaomi: MiMo-V2.5-Proxiaomi/mimo-v2.5-pro 129 $0.43 $0.87 1,048,576 Details →
11 Qwen: Qwen3.7 Plusqwen/qwen3.7-plus 129 $0.32 $1.28 1,000,000 Details →
12 OpenAI: GPT-5.4 Miniopenai/gpt-5.4-mini 129 $0.75 $4.50 400,000 Details →
13 Qwen: Qwen3.6 27Bqwen/qwen3.6-27b 129 $0.29 $3.17 262,144 Details →
14 Google: Gemma 4 26B A4B google/gemma-4-26b-a4b-it 129 $0.06 $0.33 262,144 Details →
15 MiniMax: MiniMax M2.7minimax/minimax-m2.7 129 $0.25 $1.00 204,800 Details →

How we ranked these

For Short-Form Summarization, we weight models on low latency, low cost, reasoning quality. Scores combine each model's public specs with independent benchmark results (Aider Polyglot coding scores, Artificial Analysis intelligence/coding/agentic indices) and live pricing. See full methodology →

About Short-Form Summarization

Short-form summarization is the task of reducing articles, emails, and documents into concise TL;DRs (typically 1-3 sentences) while preserving key information and intent. You need this when processing high-volume communications where humans can't read everything, or when building systems that flag critical information before deep review. Good models at this task preserve factual accuracy, maintain original tone/urgency, and extract what actually matters (not just first sentences). Poor ones hallucinate details, strip context until summaries become useless, or focus on trivial points. The main tradeoff is speed versus detail: token-efficient models like Claude 3.5 Haiku process emails fast and cheap, but larger models like GPT-4 or Claude 3.5 Sonnet catch nuance in complex documents. For inbox-scale work, latency compounds quickly across hundreds of messages.

When to use: Use this when you're drowning in emails, articles, or reports and need to know what each one actually says before deciding whether to read it fully or respond to it.

Common questions

What is the difference between short-form and long-form summarization for AI models?

Short-form summarization targets extreme brevity (1-3 sentences, often under 100 tokens), optimized for rapid scanning and routing. Long-form summarization produces detailed multi-paragraph summaries preserving structure and nuance. Short-form is harder because models must identify signal in noise with almost no room for explanation; Claude 3.5 Haiku and GPT-4o Mini excel here because they're trained to be economical with tokens.

How much does it cost to summarize thousands of emails per day?

Using Claude 3.5 Haiku at roughly $0.80 per 1M input tokens, a typical email (500 tokens) costs under $0.0005 to summarize, putting 1,000 emails around $0.50. GPT-4o Mini costs slightly less ($0.15 per 1M input tokens) but is slower. Batch processing (if your system can wait hours) cuts costs further; real-time summarization during inbox sync costs more but feels instant to users.

Related tasks