Chatbot Arena ELO

conversational

Chatbot Arena (by LMSYS) is a crowdsourced evaluation where real users have blind conversations with two anonymous models and vote for which response they prefer. Results are compiled into an ELO rating system.

View paper / source

Models Tested

1375.0

Best Score

1306.0

Average Score

800–1400

Scale Range

1.5x

Weight

How It Works

Users interact with two anonymous models side-by-side and pick the better response. Votes are aggregated using the Bradley-Terry model (similar to chess ELO) to produce a ranking. Over 2 million human votes have been collected.

Why It Matters

Chatbot Arena is widely considered the most reliable benchmark for overall model quality because it captures real human preferences across diverse, unconstrained conversations — not just narrow academic tasks.

Limitations

Ratings reflect crowd preferences which may favour style over substance. English-language biased. User base may not be representative of all use cases. Models can be optimised for arena-style short conversations.

Leaderboard — Chatbot Arena ELO

#	Model	Provider	Score	Source	Measured
🥇	Gemini 3.1 Pro Preview	Google	1375	LMSYS (validated)	Feb 2026
🥈	GPT-5.2	OpenAI	1370	LMSYS (validated)	Jan 2026
🥉	Claude Opus 4.6	Anthropic	1365	LMSYS (validated)	Feb 2026
4	GPT-5	OpenAI	1355	LMSYS (validated)	Sept 2025
5	Claude Sonnet 4.6	Anthropic	1350	LMSYS (validated)	Feb 2026
6	Grok 4	xAI	1345	LMSYS (validated)	Aug 2025
7	Gemini 2.5 Pro Preview 06-05	Google	1340	LMSYS (validated)	Jun 2025
8	o3	OpenAI	1337	LMSYS (validated)	May 2025
9	Claude Opus 4	Anthropic	1330	LMSYS (validated)	Jun 2025
10	Grok 3 Beta	xAI	1329	LMSYS (validated)	Jul 2025
11	Qwen3 235B A22B	Alibaba	1320	LMSYS (validated)	May 2025
12	R1	DeepSeek	1318	LMSYS (validated)	Apr 2025
13	Claude Sonnet 4	Anthropic	1310	LMSYS (validated)	Jun 2025
14	DeepSeek V3 0324	DeepSeek	1310	LMSYS (validated)	Oct 2025
15	Gemini 2.5 Flash	Google	1300	LMSYS (validated)	May 2025
16	Mistral Large	Mistral	1295	LMSYS (validated)	Jul 2025
17	Llama 4 Maverick	Meta	1290	LMSYS (validated)	May 2025
18	GPT-4o (2024-05-13)	OpenAI	1285	LMSYS (validated)	Jan 2025
19	GPT-4.1	OpenAI	1283	LMSYS (validated)	Apr 2025
20	Command A	Cohere	1280	LMSYS (validated)	Apr 2025
21	DeepSeek V3	DeepSeek	1275	LMSYS (validated)	Dec 2024
22	Gemini 2.0 Flash	Google	1270	LMSYS (validated)	Feb 2025
23	Claude 3.5 Haiku	Anthropic	1260	LMSYS (validated)	Oct 2024
24	Llama 3.3 70B Instruct	Meta	1250	LMSYS (validated)	Dec 2024
25	Qwen2.5 72B Instruct	Alibaba	1245	LMSYS (validated)	Jan 2025
26	GPT-4o-mini (2024-07-18)	OpenAI	1240	LMSYS (validated)	Jul 2024
27	Mistral Small 3.1 24B	Mistral	1235	LMSYS (validated)	Mar 2025

All Benchmarks