Best AI Models for Creative Writing
9 models ranked by creative writing quality — evaluated on fiction, poetry, narrative craft, and real user creative prompts.
Best Overall
Claude Opus 4.6
Anthropic · Avg: 90.0
Best Value
Llama 4 Maverick
Meta · $0.15/M in
Best Open Source
Mistral Large
Mistral · Avg: 80.0
| # | Model | Writing Avg | Price |
|---|---|---|---|
| 1 | Claude Opus 4.6 Anthropic | 90.0 | $15.00 |
| 2 | GPT-5.2 OpenAI | 88.0 | $1.75 |
| 3 | Claude Opus 4 Anthropic | 86.0 | $15.00 |
| 4 | Claude Sonnet 4 Anthropic | 84.0 | $3.00 |
| 5 | Gemini 2.5 Pro Google | 83.5 | $1.25 |
| 6 | GPT-4o (2024-05-13) OpenAI | 80.0 | $5.00 |
| 7 | Mistral Large OSS Mistral | 80.0 | $2.00 |
| 8 | Llama 4 Maverick OSS Meta | 78.0 | $0.15 |
| 9 | DeepSeek R1 OSS DeepSeek | 70.0 | $0.70 |
About Creative Writing Benchmarks
Creative Writing Bench uses expert judges to evaluate fiction, poetry, and narrative quality across multiple dimensions including originality, coherence, style, and emotional impact. WildBench Creative evaluates models on real user creative prompts from the wild, judged by GPT-4 for quality and faithfulness to instructions.
Creative writing quality is inherently subjective. These benchmarks capture one dimension of writing ability — your own preferences may differ.
Other Notable Models
These models don't have published creative writing scores yet but are widely used for writing tasks.
GPT-5.2 Pro
OpenAI · Quality: 93
GPT-5 Pro
OpenAI · Quality: 90
O4 Mini
OpenAI · Quality: 90
O3
OpenAI · Quality: 88
O3 Pro
OpenAI · Quality: 88
GPT-5
OpenAI · Quality: 87
Qwen3 235B A22B
Alibaba · Quality: 87
Claude Opus 4.5
Anthropic · Quality: 86
Claude Sonnet 4.6
Anthropic · Quality: 86
Qwen3 Max
Alibaba · Quality: 85