Creative Writing Bench
domainExpert-judged creative writing quality across fiction, poetry, and narrative tasks
10
Models Tested
92.0
Best Score
83.7
Average Score
0–100
Scale Range
0.8x
Weight
How It Works
Models are evaluated according to the benchmark's standardised protocol.
Why It Matters
This benchmark helps compare AI model capabilities in a standardised way.
Limitations
All benchmarks have limitations and should be considered alongside other evaluations.
Leaderboard — Creative Writing Bench
| # | Model | Provider | Score | |
|---|---|---|---|---|
| 🥇 | Claude Opus 4.6 | Anthropic | 92.0 | |
| 🥈 | GPT-5.2 | OpenAI | 90.0 | |
| 🥉 | Claude Opus 4 | Anthropic | 88.0 | |
| 4 | Claude Sonnet 4 | Anthropic | 86.0 | |
| 5 | Gemini 2.5 Pro Preview 06-05 | 85.0 | | |
| 6 | Grok 4 | xAI | 84.0 | |
| 7 | GPT-4o | OpenAI | 82.0 | |
| 8 | Mistral Large | Mistral | 80.0 | |
| 9 | Llama 4 Maverick | Meta | 78.0 | |
| 10 | R1 | DeepSeek | 72.0 | |