LegalBench
domainLegalBench tests legal reasoning across 162 tasks designed by legal professionals, covering issue-spotting, rule-recall, rule-application, interpretation, and rhetorical understanding.
10
Models Tested
88.0
Best Score
81.3
Average Score
0–100
Scale Range
0.8x
Weight
How It Works
Models perform diverse legal tasks: identifying relevant legal issues, recalling specific rules, applying rules to fact patterns, interpreting statutes, and analysing legal rhetoric. Tasks are hand-crafted by practising lawyers.
Why It Matters
Legal AI is a rapidly growing field but poorly benchmarked. LegalBench provides the first comprehensive evaluation of legal reasoning capabilities, created by legal domain experts rather than AI researchers.
Limitations
Focused on US/common law legal systems. Tasks are simplified compared to real legal practice. Does not test legal writing, case strategy, or client interaction skills.
Leaderboard — LegalBench
| # | Model | Provider | Score | |
|---|---|---|---|---|
| 🥇 | GPT-5.2 | OpenAI | 88.0 | |
| 🥈 | Claude Opus 4.6 | Anthropic | 86.0 | |
| 🥉 | o3 | OpenAI | 85.0 | |
| 4 | Gemini 2.5 Pro Preview 06-05 | 84.0 | | |
| 5 | Grok 4 | xAI | 83.0 | |
| 6 | Claude Opus 4 | Anthropic | 82.0 | |
| 7 | Claude Sonnet 4 | Anthropic | 79.0 | |
| 8 | GPT-4o | OpenAI | 78.0 | |
| 9 | R1 | DeepSeek | 76.0 | |
| 10 | Llama 4 Maverick | Meta | 72.0 | |