FinQA

domain

Financial question answering over earnings reports — numerical reasoning on real SEC filings

View paper / source

10

Models Tested

85.0

Best Score

77.7

Average Score

0–100

Scale Range

0.8x

Weight

How It Works

Models are evaluated according to the benchmark's standardised protocol.

Why It Matters

This benchmark helps compare AI model capabilities in a standardised way.

Limitations

All benchmarks have limitations and should be considered alongside other evaluations.

Leaderboard — FinQA

# Model Provider Score
🥇 GPT-5.2 OpenAI 85.0
🥈 Claude Opus 4.6 Anthropic 83.0
🥉 o3 OpenAI 82.0
4 Gemini 2.5 Pro Preview 06-05 Google 80.0
5 Grok 4 xAI 79.0
6 Claude Opus 4 Anthropic 78.0
7 R1 DeepSeek 76.0
8 Claude Sonnet 4 Anthropic 74.0
9 GPT-4o OpenAI 72.0
10 Llama 4 Maverick Meta 68.0
All Benchmarks