FinanceBench

domain

Open-ended financial analysis — 150 questions over 10-K and 10-Q filings

View paper / source

8

Models Tested

82.0

Best Score

76.0

Average Score

0–100

Scale Range

0.8x

Weight

How It Works

Models are evaluated according to the benchmark's standardised protocol.

Why It Matters

This benchmark helps compare AI model capabilities in a standardised way.

Limitations

All benchmarks have limitations and should be considered alongside other evaluations.

Leaderboard — FinanceBench

# Model Provider Score
🥇 GPT-5.2 OpenAI 82.0
🥈 Claude Opus 4.6 Anthropic 80.0
🥉 o3 OpenAI 79.0
4 Gemini 2.5 Pro Preview 06-05 Google 77.0
5 Grok 4 xAI 76.0
6 Claude Opus 4 Anthropic 74.0
7 R1 DeepSeek 72.0
8 GPT-4o OpenAI 68.0
All Benchmarks