LegalBench

domain

LegalBench tests legal reasoning across 162 tasks designed by legal professionals, covering issue-spotting, rule-recall, rule-application, interpretation, and rhetorical understanding.

10

Models Tested

88.0

Best Score

81.3

Average Score

0–100

Scale Range

0.8x

Weight

How It Works

Models perform diverse legal tasks: identifying relevant legal issues, recalling specific rules, applying rules to fact patterns, interpreting statutes, and analysing legal rhetoric. Tasks are hand-crafted by practising lawyers.

Why It Matters

Legal AI is a rapidly growing field but poorly benchmarked. LegalBench provides the first comprehensive evaluation of legal reasoning capabilities, created by legal domain experts rather than AI researchers.

Limitations

Focused on US/common law legal systems. Tasks are simplified compared to real legal practice. Does not test legal writing, case strategy, or client interaction skills.

Leaderboard — LegalBench

# Model Provider Score
🥇 GPT-5.2 OpenAI 88.0
🥈 Claude Opus 4.6 Anthropic 86.0
🥉 o3 OpenAI 85.0
4 Gemini 2.5 Pro Preview 06-05 Google 84.0
5 Grok 4 xAI 83.0
6 Claude Opus 4 Anthropic 82.0
7 Claude Sonnet 4 Anthropic 79.0
8 GPT-4o OpenAI 78.0
9 R1 DeepSeek 76.0
10 Llama 4 Maverick Meta 72.0
All Benchmarks