AI Reasoning Models
Models ranked by reasoning, math, and logic benchmark performance. Reasoning models use extended "thinking" to solve complex multi-step problems.
Dedicated Reasoning Models
Models specifically designed for extended reasoning and chain-of-thought problem solving.
O4 Mini
OpenAI
Fast reasoning; excels at math/code
Claude Opus 4.6
Anthropic
Most capable; 1M context beta; adaptive thinking Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model. Anthropic flagship model.
Grok 4
xAI
Frontier reasoning (matches o3 quality)
O3
OpenAI
Reasoning model; 80% price cut from launch
O3 Pro
OpenAI
Highest reasoning quality Speed data hidden until it is refreshed from a current live measurement source.
Qwen3 235B A22B
Alibaba
Thinking mode: $0.65/$3.00
Claude Sonnet 4.6
Anthropic
Default model; extended thinking Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model. Anthropic balanced frontier model.
DeepSeek V3.2
DeepSeek
685B params (37B active) MoE; 90% off cache hits
DeepSeek R1
DeepSeek
Open-weight reasoning; CoT tokens billed as output
Gemini 2.5 Pro
Thinking model with 1M context
Gemini 2.5 Flash
With thinking: $3.50 output
QwQ 32B
Alibaba
Reasoning model on Qwen2.5 base Speed data hidden until it is refreshed from a current live measurement source.
DeepSeek V3
DeepSeek
Original V3; 671B params (37B active) Speed data hidden until it is refreshed from a current live measurement source.
Reasoning Benchmark Rankings
All models ranked by average score across reasoning/math benchmarks (AIME 2025, ARC Challenge, GPQA Diamond, Humanity's Last Exam, LiveBench, MATH-500).
| # | Model | Reasoning Avg |
|---|---|---|
| 1 | O4 Mini Reasoning OpenAI | 90.1 |
| 2 | O3 Reasoning OpenAI | 89.2 |
| 3 | GPT-5 OpenAI | 86.0 |
| 4 | Qwen3 235B A22B Reasoning OSS Alibaba | 85.0 |
| 5 | Claude Sonnet 4.6 Reasoning Anthropic | 83.0 |
| 6 | Gemini 2.5 Pro Reasoning Google | 81.8 |
| 7 | Grok 3 Beta xAI | 81.2 |
| 8 | Claude Opus 4 Anthropic | 80.4 |
| 9 | QwQ 32B Reasoning OSS Alibaba | 77.7 |
| 10 | O3 Pro Reasoning OpenAI | 77.2 |
| 11 | Claude Sonnet 4 Anthropic | 76.5 |
| 12 | GPT-4.1 OpenAI | 74.7 |
| 13 | Gemini 2.5 Flash Reasoning Google | 70.8 |
| 14 | Grok 4 Reasoning xAI | 70.5 |
| 15 | DeepSeek R1 Reasoning OSS DeepSeek | 69.3 |
| 16 | DeepSeek V3 Reasoning OSS DeepSeek | 68.7 |
| 17 | GPT-5.2 OpenAI | 67.0 |
| 18 | GPT-4o (extended) OpenAI | 65.1 |
| 19 | Qwen2.5 72B Instruct OSS Alibaba | 64.5 |
| 20 | Llama 4 Maverick OSS Meta | 56.0 |
| 21 | Claude Opus 4.6 Reasoning Anthropic | 54.3 |
What are reasoning models?
Reasoning models (like OpenAI's o-series and DeepSeek R1) use extended "chain-of-thought" processing to work through complex problems step by step. They're particularly strong at:
- Mathematics: Competition-level math problems (AIME, MATH-500)
- Science: Graduate-level science questions (GPQA Diamond)
- Coding: Complex software engineering tasks (SWE-Bench)
- Logic: Multi-step logical deduction and constraint satisfaction
The trade-off is higher latency and cost — reasoning models "think" before responding, which takes longer but produces more accurate answers for hard problems.