Best AI Models for Legal
9 models ranked by legal benchmark performance on LegalBench — covering issue-spotting, rule-recall, legal interpretation, and rhetorical analysis.
Best Overall
GPT-5.2
OpenAI · Avg: 88.0
Best Value
Llama 4 Maverick
Meta · $0.15/M in
Best Open Source
DeepSeek R1
DeepSeek · Avg: 76.0
| # | Model | Legal Avg | Price |
|---|---|---|---|
| 1 | GPT-5.2 OpenAI | 88.0 | $1.75 |
| 2 | Claude Opus 4.6 Anthropic | 86.0 | $15.00 |
| 3 | O3 OpenAI | 85.0 | $2.00 |
| 4 | Gemini 2.5 Pro Google | 84.0 | $1.25 |
| 5 | Claude Opus 4 Anthropic | 82.0 | $15.00 |
| 6 | Claude Sonnet 4 Anthropic | 79.0 | $3.00 |
| 7 | GPT-4o OpenAI | 78.0 | $2.50 |
| 8 | DeepSeek R1 OSS DeepSeek | 76.0 | $0.70 |
| 9 | Llama 4 Maverick OSS Meta | 72.0 | $0.15 |
About Legal AI Benchmarks
LegalBench evaluates AI across 162 legal reasoning tasks spanning six categories: issue-spotting, rule-recall, rule-application, rule-conclusion, interpretation, and rhetorical understanding. Scores above 85% indicate strong legal reasoning capability.
Note: AI models should not be used as a substitute for qualified legal counsel. Benchmark scores measure reasoning ability, not legal advice quality.
Other Notable Models
These models don't have published legal benchmark scores yet.
GPT-5.2 Pro
OpenAI · Quality: 93
GPT-5 Pro
OpenAI · Quality: 90
O4 Mini
OpenAI · Quality: 90
O3 Pro
OpenAI · Quality: 88
GPT-5
OpenAI · Quality: 87
Qwen3 235B A22B
Alibaba · Quality: 87
Claude Opus 4.5
Anthropic · Quality: 86
Claude Sonnet 4.6
Anthropic · Quality: 86
Qwen3 Max
Alibaba · Quality: 85
o1
OpenAI · Quality: 84