now jobs Jobs market snapshot refreshed now data Recomputed benchmark-weighted quality scores now data Updated speed measurements now data Synced Chatbot Arena benchmark track now data Validated official pricing snapshots now data Pulled latest OpenRouter price index 25 May digest OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 May digest Published the 2026-05-25 daily digest 25 May digest Workbench Launches Open Source BullMQ Dashboard For Node Backends 24 May digest SpecBench Tests Reward Hacking In Long Horizon Coding Agents now jobs Jobs market snapshot refreshed now data Recomputed benchmark-weighted quality scores now data Updated speed measurements now data Synced Chatbot Arena benchmark track now data Validated official pricing snapshots now data Pulled latest OpenRouter price index 25 May digest OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 May digest Published the 2026-05-25 daily digest 25 May digest Workbench Launches Open Source BullMQ Dashboard For Node Backends 24 May digest SpecBench Tests Reward Hacking In Long Horizon Coding Agents

nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents

WildBench Creative

domain

Creative subset of WildBench — real user creative writing prompts judged by GPT-4

View paper / source

8

Models Tested

88.0

Best Score

81.0

Average Score

0–100

Scale Range

0.8x

Weight

How It Works

Models are evaluated according to the benchmark's standardised protocol.

Why It Matters

This benchmark helps compare AI model capabilities in a standardised way.

Limitations

All benchmarks have limitations and should be considered alongside other evaluations.

Leaderboard — WildBench Creative

#	Model	Provider	Score	Source	Measured
🥇	Claude Opus 4.6	Anthropic	88.0	Anthropic	Feb 2026
🥈	GPT-5.2	OpenAI	86.0	OpenAI	Dec 2025
🥉	Claude Opus 4	Anthropic	84.0	Anthropic	May 2025
4	Gemini 2.5 Pro Preview 06-05	Google	82.0	Google	Mar 2025
5	Claude Sonnet 4	Anthropic	82.0	Anthropic	May 2025
6	Grok 4	xAI	80.0	xAI	Jul 2025
7	GPT-4o (2024-05-13)	OpenAI	78.0	OpenAI	May 2024
8	R1	DeepSeek	68.0	DeepSeek	Jan 2025