nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents

Best AI Models for Creative Writing

9 models ranked by creative writing quality — evaluated on fiction, poetry, narrative craft, and real user creative prompts.

Best Overall

Claude Opus 4.6

Anthropic · Avg: 90.0

Best Value

Llama 4 Maverick

Meta · $0.20/M in

Best Open Source

Mistral Large

Mistral · Avg: 80.0

#	Model	Writing Avg	Creative Writing Bench	WildBench Creative	Quality	Price
1	Claude Opus 4.6 Anthropic	90.0	92.0	88.0	89.0	$5.00
2	GPT-5.2 OpenAI	88.0	90.0	86.0	90.0	$1.75
3	Claude Opus 4 Anthropic	86.0	88.0	84.0	84.0	$15.00
4	Claude Sonnet 4 Anthropic	84.0	86.0	82.0	79.0	$3.00
5	Gemini 2.5 Pro Google	83.5	85.0	82.0	83.0	$1.25
6	GPT-4o (2024-05-13) OpenAI	80.0	82.0	78.0	75.0	$5.00
7	Mistral Large OSS Mistral	80.0	80.0	—	73.0	$2.00
8	Llama 4 Maverick OSS Meta	78.0	78.0	—	76.0	$0.20
9	DeepSeek R1 OSS DeepSeek	70.0	72.0	68.0	85.0	$0.70

About Creative Writing Benchmarks

Creative Writing Bench uses expert judges to evaluate fiction, poetry, and narrative quality across multiple dimensions including originality, coherence, style, and emotional impact. WildBench Creative evaluates models on real user creative prompts from the wild, judged by GPT-4 for quality and faithfulness to instructions.

Creative writing quality is inherently subjective. These benchmarks capture one dimension of writing ability — your own preferences may differ.

Other Notable Models

These models don't have published creative writing scores yet but are widely used for writing tasks.

GPT-5.2 Pro

OpenAI · Quality: 93

GPT-5 Pro

OpenAI · Quality: 90

O4 Mini

OpenAI · Quality: 90

OpenAI · Quality: 88

O3 Pro

OpenAI · Quality: 88

GPT-5

OpenAI · Quality: 87

Qwen3 235B A22B

Alibaba · Quality: 87

Claude Opus 4.5

Anthropic · Quality: 86

Claude Sonnet 4.6

Anthropic · Quality: 86

Qwen3 Max

Alibaba · Quality: 85

View full leaderboard → AI Agents → Compare models head-to-head →