Writing · best for

Top picks for Creative Writing (2026)

Fiction, poetry, screenplays. Ranked from 340 live models on the OpenRouter catalog, weighted for reasoning quality, context window.

What this is Ranked by capability match + real benchmark scores (Aider Polyglot, Artificial Analysis Intelligence Index) + live pricing. Models need the right specs for Creative Writing, then benchmark performance refines the order. Full methodology →
#ModelScoreIn / 1MOut / 1MContext
1 Anthropic: Claude Sonnet 4.6anthropic/claude-sonnet-4.6 166 $3.00 $15.00 1,000,000 Details →
2 Anthropic: Claude Opus 4.8anthropic/claude-opus-4.8 166 $5.00 $25.00 1,000,000 Details →
3 OpenAI: GPT-5openai/gpt-5 166 $1.25 $10.00 400,000 Details →
4 Anthropic: Claude Opus 4.7anthropic/claude-opus-4.7 165 $5.00 $25.00 1,000,000 Details →
5 OpenAI: o3openai/o3 152 $2.00 $8.00 200,000 Details →
6 Google: Gemini 2.5 Progoogle/gemini-2.5-pro 139 $1.25 $10.00 1,048,576 Details →
7 OpenAI: GPT-4.1openai/gpt-4.1 137 $2.00 $8.00 1,047,576 Details →
8 Google: Gemini 2.5 Flashgoogle/gemini-2.5-flash 135 $0.30 $2.50 1,048,576 Details →
9 Anthropic: Claude Sonnet 4anthropic/claude-sonnet-4 133 $3.00 $15.00 1,000,000 Details →
10 DeepSeek: DeepSeek V3deepseek/deepseek-chat 130 $0.20 $0.80 131,072 Details →
11 OpenAI: o4 Mini Highopenai/o4-mini-high 130 $1.10 $4.40 200,000 Details →
12 OpenAI: o3 Proopenai/o3-pro 129 $20.00 $80.00 200,000 Details →
13 OpenAI: o3 Mini Highopenai/o3-mini-high 128 $1.10 $4.40 200,000 Details →
14 Qwen: Qwen3.7 Plusqwen/qwen3.7-plus 128 $0.40 $1.60 1,000,000 Details →
15 MiniMax: MiniMax M3minimax/minimax-m3 128 $0.30 $1.20 1,048,576 Details →

How we ranked these

For Creative Writing, we weight models on reasoning quality, context window. Scores combine each model's public specs with independent benchmark results (Aider Polyglot coding scores, Artificial Analysis intelligence/coding/agentic indices) and live pricing. See full methodology →

About Creative Writing

Creative Writing is the task of generating original fiction, poetry, screenplays, and narrative prose where coherence, voice consistency, and emotional resonance matter more than factual accuracy. Use this when you need a model to sustain a character's perspective, maintain plot logic across 2000+ tokens, or match a specific stylistic tone. Good models at this task maintain character voice over long outputs, handle dialogue naturally, and build scenes with sensory detail without repetition. Poor performers lose track of established plot points, repeat phrases mechanically, or produce generic prose that reads like an aggregate of training data. Speed is a real constraint here: generating a 5000-word short story on a slower model costs 3-5x more in latency than on inference-optimized systems like Claude 3.5 Sonnet or GPT-4o.

When to use: Use this when you need a language model to write stories, poems, or scripts where character consistency and emotional authenticity matter more than strict factual accuracy, rather than when you need information retrieval or data analysis.

Common questions

Which AI models are best at writing longer creative fiction without losing plot consistency?

Claude 3.5 Sonnet and GPT-4o are the strongest for multi-thousand-token creative work because they maintain character voice and plot threads reliably. GPT-4o is slightly faster for long outputs; Claude tends to produce more naturalistic dialogue but with longer latency. For pure poetry or experimental forms, both perform equally well, though Claude shows better restraint with overwrought metaphors.

How much does creative writing generation cost compared to other writing tasks, and is speed a practical problem?

Creative writing costs 2-3x more per output than summarization or copywriting because outputs are longer and models need higher context windows to track continuity. Speed matters only if you're generating scripts or serialized content on a tight deadline; for one-off short stories, latency is usually acceptable even on slower models, but batch generation of novel chapters will be expensive on pay-per-token systems.

Related tasks