Top picks for Short-Form Summarization (2026)
TL;DRs of articles and emails at scale. Ranked from 335 live models on the OpenRouter catalog, weighted for low latency, low cost, reasoning quality.
| # | Model | Score | In / 1M | Out / 1M | Context | |
|---|---|---|---|---|---|---|
| 1 | Google: Gemini 2.5 Flashgoogle/gemini-2.5-flash | 125 | $0.30 | $2.50 | 1,048,576 | Details → |
| 2 | OpenAI: GPT-5openai/gpt-5 | 124 | $1.25 | $10.00 | 400,000 | Details → |
| 3 | NVIDIA: Nemotron 3 Nano Omni (free)nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free | 124 | Free | Free | 256,000 | Details → |
| 4 | Google: Gemma 4 26B A4B (free)google/gemma-4-26b-a4b-it:free | 124 | Free | Free | 262,144 | Details → |
| 5 | Google: Gemma 4 31B (free)google/gemma-4-31b-it:free | 124 | Free | Free | 262,144 | Details → |
| 6 | Qwen: Qwen3.5-9Bqwen/qwen3.5-9b | 124 | $0.10 | $0.15 | 262,144 | Details → |
| 7 | Xiaomi: MiMo-V2.5xiaomi/mimo-v2.5 | 123 | $0.14 | $0.28 | 1,048,576 | Details → |
| 8 | Google: Gemma 4 26B A4B google/gemma-4-26b-a4b-it | 123 | $0.06 | $0.33 | 262,144 | Details → |
| 9 | Google: Gemma 4 31Bgoogle/gemma-4-31b-it | 123 | $0.12 | $0.36 | 262,144 | Details → |
| 10 | ByteDance Seed: Seed-2.0-Minibytedance-seed/seed-2.0-mini | 123 | $0.10 | $0.40 | 262,144 | Details → |
| 11 | Qwen: Qwen3.5-Flashqwen/qwen3.5-flash-02-23 | 123 | $0.07 | $0.26 | 1,000,000 | Details → |
| 12 | ByteDance Seed: Seed 1.6 Flashbytedance-seed/seed-1.6-flash | 123 | $0.07 | $0.30 | 262,144 | Details → |
| 13 | Google: Gemini 2.5 Flash Lite Preview 09-2025google/gemini-2.5-flash-lite-preview-09-2025 | 123 | $0.10 | $0.40 | 1,048,576 | Details → |
| 14 | OpenAI: GPT-5 Nanoopenai/gpt-5-nano | 123 | $0.05 | $0.40 | 400,000 | Details → |
| 15 | Google: Gemini 2.5 Flash Litegoogle/gemini-2.5-flash-lite | 123 | $0.10 | $0.40 | 1,048,576 | Details → |
How we ranked these
For Short-Form Summarization, we weight models on low latency, low cost, reasoning quality. Scores combine each model's public specs with independent benchmark results (Aider Polyglot coding scores, Artificial Analysis intelligence/coding/agentic indices) and live pricing. See full methodology →
About Short-Form Summarization
Short-form summarization is the task of reducing articles, emails, and documents into concise TL;DRs (typically 1-3 sentences) while preserving key information and intent. You need this when processing high-volume communications where humans can't read everything, or when building systems that flag critical information before deep review. Good models at this task preserve factual accuracy, maintain original tone/urgency, and extract what actually matters (not just first sentences). Poor ones hallucinate details, strip context until summaries become useless, or focus on trivial points. The main tradeoff is speed versus detail: token-efficient models like Claude 3.5 Haiku process emails fast and cheap, but larger models like GPT-4 or Claude 3.5 Sonnet catch nuance in complex documents. For inbox-scale work, latency compounds quickly across hundreds of messages.
When to use: Use this when you're drowning in emails, articles, or reports and need to know what each one actually says before deciding whether to read it fully or respond to it.
Common questions
What is the difference between short-form and long-form summarization for AI models?
Short-form summarization targets extreme brevity (1-3 sentences, often under 100 tokens), optimized for rapid scanning and routing. Long-form summarization produces detailed multi-paragraph summaries preserving structure and nuance. Short-form is harder because models must identify signal in noise with almost no room for explanation; Claude 3.5 Haiku and GPT-4o Mini excel here because they're trained to be economical with tokens.
How much does it cost to summarize thousands of emails per day?
Using Claude 3.5 Haiku at roughly $0.80 per 1M input tokens, a typical email (500 tokens) costs under $0.0005 to summarize, putting 1,000 emails around $0.50. GPT-4o Mini costs slightly less ($0.15 per 1M input tokens) but is slower. Batch processing (if your system can wait hours) cuts costs further; real-time summarization during inbox sync costs more but feels instant to users.