now jobs Jobs market snapshot refreshed now data Recomputed benchmark-weighted quality scores now data Updated speed measurements now data Synced Chatbot Arena benchmark track now data Validated official pricing snapshots now data Pulled latest OpenRouter price index 25 May digest OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 May digest Published the 2026-05-25 daily digest 25 May digest Workbench Launches Open Source BullMQ Dashboard For Node Backends 24 May digest SpecBench Tests Reward Hacking In Long Horizon Coding Agents now jobs Jobs market snapshot refreshed now data Recomputed benchmark-weighted quality scores now data Updated speed measurements now data Synced Chatbot Arena benchmark track now data Validated official pricing snapshots now data Pulled latest OpenRouter price index 25 May digest OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 May digest Published the 2026-05-25 daily digest 25 May digest Workbench Launches Open Source BullMQ Dashboard For Node Backends 24 May digest SpecBench Tests Reward Hacking In Long Horizon Coding Agents

nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents

Creative Writing Bench

domain

Expert-judged creative writing quality across fiction, poetry, and narrative tasks

10

Models Tested

92.0

Best Score

83.7

Average Score

0–100

Scale Range

0.8x

Weight

How It Works

Models are evaluated according to the benchmark's standardised protocol.

Why It Matters

This benchmark helps compare AI model capabilities in a standardised way.

Limitations

All benchmarks have limitations and should be considered alongside other evaluations.

Leaderboard — Creative Writing Bench

#	Model	Provider	Score	Source	Measured
🥇	Claude Opus 4.6	Anthropic	92.0	Anthropic	Feb 2026
🥈	GPT-5.2	OpenAI	90.0	OpenAI	Dec 2025
🥉	Claude Opus 4	Anthropic	88.0	Anthropic	May 2025
4	Claude Sonnet 4	Anthropic	86.0	Anthropic	May 2025
5	Gemini 2.5 Pro Preview 06-05	Google	85.0	Google	Mar 2025
6	Grok 4	xAI	84.0	xAI	Jul 2025
7	GPT-4o (2024-05-13)	OpenAI	82.0	OpenAI	May 2024
8	Mistral Large	Mistral	80.0	Mistral	Mar 2025
9	Llama 4 Maverick	Meta	78.0	Meta	Apr 2025
10	R1	DeepSeek	72.0	DeepSeek	Jan 2025