Best AI Models for Coding
38 models ranked by coding benchmark performance across HumanEval, SWE-bench, and other coding evaluations.
Best Overall
O4 Mini
OpenAI · Avg: 96.0
Best Value for Coding
Phi 4
Microsoft · $0.07/M in
Best Open Source
Qwen3 Max
Alibaba · Avg: 93.0
| # | Model | Coding Avg | Price |
|---|---|---|---|
| 1 | O4 Mini OpenAI | 96.0 | $1.10 |
| 2 | o3 Mini OpenAI | 94.5 | $1.10 |
| 3 | Qwen3 Max OSS Alibaba | 93.0 | $0.78 |
| 4 | DeepSeek V3 OSS DeepSeek | 89.5 | $0.23 |
| 5 | Gemini 2.5 Flash Google | 88.5 | $0.30 |
| 6 | Llama 3.3 70B Instruct OSS Meta | 88.4 | $0.10 |
| 7 | Claude 3.5 Haiku Anthropic | 88.1 | $0.80 |
| 8 | GPT-4o (2024-05-13) OpenAI | 87.6 | $5.00 |
| 9 | Llama 4 Maverick OSS Meta | 87.5 | $0.15 |
| 10 | GPT-4o-mini OpenAI | 87.2 | $0.15 |
| 11 | Qwen2.5 72B Instruct OSS Alibaba | 86.6 | $0.36 |
| 12 | Claude Opus 4.5 Anthropic | 85.3 | $5.00 |
| 13 | Command A OSS Cohere | 85.0 | $2.50 |
| 14 | Llama 4 Scout OSS Meta | 85.0 | $0.08 |
| 15 | Claude Opus 4 Anthropic | 83.8 | $15.00 |
| 16 | Qwen2.5 Coder 32B Instruct OSS Alibaba | 83.2 | $0.66 |
| 17 | Phi 4 OSS Microsoft | 82.6 | $0.07 |
| 18 | Gemini 2.5 Flash Lite Google | 82.0 | $0.10 |
| 19 | GPT-5.2 OpenAI | 80.8 | $1.75 |
| 20 | GPT-5.2 Pro OpenAI | 80.0 | $21.00 |
| 21 | Mistral Small 3.1 24B OSS Mistral | 80.0 | $0.35 |
| 22 | O3 OpenAI | 79.0 | $2.00 |
| 23 | Claude Sonnet 4 Anthropic | 78.2 | $3.00 |
| 24 | Claude Opus 4.6 Anthropic | 77.3 | $15.00 |
| 25 | Gemini 2.5 Pro Google | 76.8 | $1.25 |
| 26 | GPT-5 Pro OpenAI | 76.5 | $15.00 |
| 27 | R1 0528 OSS DeepSeek | 75.8 | $0.50 |
| 28 | Claude Sonnet 4.6 Anthropic | 75.5 | $3.00 |
| 29 | GPT-5 OpenAI | 75.0 | $1.25 |
| 30 | DeepSeek R1 OSS DeepSeek | 74.2 | $0.70 |
| 31 | GPT-4.1 OpenAI | 74.0 | $2.00 |
| 32 | O3 Pro OpenAI | 73.0 | $20.00 |
| 33 | o1 OpenAI | 71.5 | $15.00 |
| 34 | Claude Haiku 4.5 Anthropic | 71.0 | $1.00 |
| 35 | DeepSeek V3.2 OSS DeepSeek | 70.5 | $0.20 |
| 36 | Claude Sonnet 4.5 Anthropic | 68.0 | $3.00 |
| 37 | Mistral Large OSS Mistral | 67.3 | $2.00 |
| 38 | GPT-4.1 Mini OpenAI | 66.3 | $0.40 |
Other Notable Models
These models don't have published coding benchmark scores yet but are commonly used for coding tasks.
Qwen3 235B A22B
Alibaba · Quality: 87
Gemini 2.0 Flash
Google · Quality: 81
Command R+ (08-2024)
Cohere · Quality: 79
GPT-5 Nano
OpenAI · Quality: 78
Nova Pro 1.0
Amazon · Quality: 78
Llama 3.1 70B Instruct
Meta · Quality: 77
GPT-4.1 Nano
OpenAI · Quality: 75
Gemini 2.0 Flash Lite
Google · Quality: 75
Reka Flash 3
Reka · Quality: 74
Sonar
Perplexity · Quality: 74