now jobs Jobs market snapshot refreshed now data Recomputed benchmark-weighted quality scores now data Updated speed measurements now data Synced Chatbot Arena benchmark track now data Validated official pricing snapshots now data Pulled latest OpenRouter price index 25 May digest OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 May digest Published the 2026-05-25 daily digest 25 May digest Workbench Launches Open Source BullMQ Dashboard For Node Backends 24 May digest SpecBench Tests Reward Hacking In Long Horizon Coding Agents now jobs Jobs market snapshot refreshed now data Recomputed benchmark-weighted quality scores now data Updated speed measurements now data Synced Chatbot Arena benchmark track now data Validated official pricing snapshots now data Pulled latest OpenRouter price index 25 May digest OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 May digest Published the 2026-05-25 daily digest 25 May digest Workbench Launches Open Source BullMQ Dashboard For Node Backends 24 May digest SpecBench Tests Reward Hacking In Long Horizon Coding Agents

nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents

FinanceBench

domain

Open-ended financial analysis — 150 questions over 10-K and 10-Q filings

View paper / source

8

Models Tested

82.0

Best Score

76.0

Average Score

0–100

Scale Range

0.8x

Weight

How It Works

Models are evaluated according to the benchmark's standardised protocol.

Why It Matters

This benchmark helps compare AI model capabilities in a standardised way.

Limitations

All benchmarks have limitations and should be considered alongside other evaluations.

Leaderboard — FinanceBench

#	Model	Provider	Score	Source	Measured
🥇	GPT-5.2	OpenAI	82.0	OpenAI	Dec 2025
🥈	Claude Opus 4.6	Anthropic	80.0	Anthropic	Feb 2026
🥉	o3	OpenAI	79.0	OpenAI	Apr 2025
4	Gemini 2.5 Pro Preview 06-05	Google	77.0	Google	Mar 2025
5	Grok 4	xAI	76.0	xAI	Jul 2025
6	Claude Opus 4	Anthropic	74.0	Anthropic	May 2025
7	R1	DeepSeek	72.0	DeepSeek	Jan 2025
8	GPT-4o (2024-05-13)	OpenAI	68.0	OpenAI	May 2024