Leaderboard

Best AI for Math

Models ranked by mathematical reasoning performance across MATH-500, GSM8K, and AIME 2025. Composite score averages available math benchmarks.

# Model Provider Score MATH-500 AIME 2025 GSM8K Input $/M Open Source
1 O3 Pro OpenAI 97 98 96.7 $20 Proprietary
2 O4 Mini OpenAI 95 96.3 92.7 $1.1 Proprietary
3 Grok 4 xAI 95 95 $3 Proprietary
4 O3 OpenAI 94 96.7 91.6 $2 Proprietary
5 Qwen3 235B A22B Alibaba 92 92 $0.455 ✓ Open
6 DeepSeek R1 DeepSeek 89 97.3 79.8 $0.7 ✓ Open
7 Claude Opus 4 Anthropic 89 88.7 $15 Proprietary
8 Grok 3 Beta xAI 88 91.5 83.9 $3 Proprietary
9 Gemini 2.5 Pro Google 88 90.2 86.7 $1.25 Proprietary
10 Claude Sonnet 4 Anthropic 85 85.4 $3 Proprietary
11 QwQ 32B Alibaba 85 90.6 79.5 $0.15 ✓ Open
12 GPT-4.1 OpenAI 83 83 $2 Proprietary
13 Gemini 2.5 Flash Google 82 82.3 $0.3 Proprietary
14 Qwen2.5 72B Instruct Alibaba 80 80 $0.12 ✓ Open
15 DeepSeek V3 DeepSeek 78 78.3 $0.32 ✓ Open
16 GPT-4o (extended) OpenAI 77 76.6 $6 Proprietary
17 GPT-5.2 OpenAI 45 $1.75 Proprietary
18 Claude Opus 4.6 Anthropic 45 $15 Proprietary
19 GPT-5 OpenAI 44 $1.25 Proprietary
20 Claude Sonnet 4.6 Anthropic 43 $3 Proprietary
21 DeepSeek V3.2 DeepSeek 43 $0.2 ✓ Open
22 Mistral Large Mistral 43 $2 ✓ Open
23 Claude 3.5 Haiku Anthropic 41 $0.8 Proprietary
24 Command A Cohere 41 $2.5 ✓ Open
25 Gemini 2.0 Flash Google 41 $0.1 Proprietary
26 GPT-4o-mini OpenAI 40 $0.15 Proprietary
27 Command R+ (08-2024) Cohere 40 $2.5 ✓ Open
28 Llama 3.3 70B Instruct Meta 40 $0.1 ✓ Open
29 GPT-5 Nano OpenAI 39 $0.05 Proprietary
30 Gemini 2.5 Flash Lite Google 39 $0.1 Proprietary
31 Nova Pro 1.0 Amazon 39 $0.8 Proprietary
32 Gemini 2.0 Flash Lite Google 38 $0.075 Proprietary
33 Mistral Small 3.1 24B Mistral 38 $0.35 ✓ Open
34 GPT-4.1 Nano OpenAI 38 $0.1 Proprietary
35 Llama 4 Maverick Meta 38 $0.15 ✓ Open
36 Reka Flash 3 Reka 37 $0.1 Proprietary
37 Sonar Perplexity 37 $1 Proprietary
38 Command R (08-2024) Cohere 37 $0.15 ✓ Open
39 Mistral Nemo Mistral 36 $0.02 ✓ Open
40 Nova Lite 1.0 Amazon 36 $0.06 Proprietary

Composite score averages available math benchmarks (MATH-500, GSM8K, AIME 2025). Models without benchmark data use a quality-weighted estimate.