Creative Writing Bench

domain

Expert-judged creative writing quality across fiction, poetry, and narrative tasks

10

Models Tested

92.0

Best Score

83.7

Average Score

0–100

Scale Range

0.8x

Weight

How It Works

Models are evaluated according to the benchmark's standardised protocol.

Why It Matters

This benchmark helps compare AI model capabilities in a standardised way.

Limitations

All benchmarks have limitations and should be considered alongside other evaluations.

Leaderboard — Creative Writing Bench

# Model Provider Score
🥇 Claude Opus 4.6 Anthropic 92.0
🥈 GPT-5.2 OpenAI 90.0
🥉 Claude Opus 4 Anthropic 88.0
4 Claude Sonnet 4 Anthropic 86.0
5 Gemini 2.5 Pro Preview 06-05 Google 85.0
6 Grok 4 xAI 84.0
7 GPT-4o OpenAI 82.0
8 Mistral Large Mistral 80.0
9 Llama 4 Maverick Meta 78.0
10 R1 DeepSeek 72.0
All Benchmarks