LegalBench

domain

LegalBench tests legal reasoning across 162 tasks designed by legal professionals, covering issue-spotting, rule-recall, rule-application, interpretation, and rhetorical understanding.

Models Tested

88.0

Best Score

81.3

Average Score

0–100

Scale Range

0.8x

Weight

How It Works

Models perform diverse legal tasks: identifying relevant legal issues, recalling specific rules, applying rules to fact patterns, interpreting statutes, and analysing legal rhetoric. Tasks are hand-crafted by practising lawyers.

Why It Matters

Legal AI is a rapidly growing field but poorly benchmarked. LegalBench provides the first comprehensive evaluation of legal reasoning capabilities, created by legal domain experts rather than AI researchers.

Limitations

Focused on US/common law legal systems. Tasks are simplified compared to real legal practice. Does not test legal writing, case strategy, or client interaction skills.

Leaderboard — LegalBench

#	Model	Provider	Score	Source	Measured
🥇	GPT-5.2	OpenAI	88.0	OpenAI	Dec 2025
🥈	Claude Opus 4.6	Anthropic	86.0	Anthropic	Feb 2026
🥉	o3	OpenAI	85.0	OpenAI	Apr 2025
4	Gemini 2.5 Pro Preview 06-05	Google	84.0	Google	Mar 2025
5	Grok 4	xAI	83.0	xAI	Jul 2025
6	Claude Opus 4	Anthropic	82.0	Anthropic	May 2025
7	Claude Sonnet 4	Anthropic	79.0	Anthropic	May 2025
8	GPT-4o	OpenAI	78.0	OpenAI	May 2024
9	R1	DeepSeek	76.0	DeepSeek	Jan 2025
10	Llama 4 Maverick	Meta	72.0	Meta	Apr 2025

All Benchmarks