FinQA

domain

Financial question answering over earnings reports — numerical reasoning on real SEC filings

Models Tested

85.0

Best Score

77.7

Average Score

0–100

Scale Range

0.8x

Weight

How It Works

Models are evaluated according to the benchmark's standardised protocol.

This benchmark helps compare AI model capabilities in a standardised way.

All benchmarks have limitations and should be considered alongside other evaluations.

#	Model	Provider	Score	Source	Measured
🥇	GPT-5.2	OpenAI	85.0	OpenAI	Dec 2025
🥈	Claude Opus 4.6	Anthropic	83.0	Anthropic	Feb 2026
🥉	o3	OpenAI	82.0	OpenAI	Apr 2025
4	Gemini 2.5 Pro Preview 06-05	Google	80.0	Google	Mar 2025
5	Grok 4	xAI	79.0	xAI	Jul 2025
6	Claude Opus 4	Anthropic	78.0	Anthropic	May 2025
7	R1	DeepSeek	76.0	DeepSeek	Jan 2025
8	Claude Sonnet 4	Anthropic	74.0	Anthropic	May 2025
9	GPT-4o	OpenAI	72.0	OpenAI	May 2024
10	Llama 4 Maverick	Meta	68.0	Meta	Apr 2025