Best AI Models for Coding
20 models ranked by coding benchmark performance across HumanEval, SWE-bench, and other coding evaluations.
Best Overall
O4 Mini
OpenAI · Avg: 96.0
Best Value for Coding
Qwen2.5 72B Instruct
Alibaba · $0.12/M in
Best Open Source
DeepSeek V3.2
DeepSeek · Avg: 91.0
| # | Model | Coding Avg | Price |
|---|---|---|---|
| 1 | O4 Mini OpenAI | 96.0 | $1.10 |
| 2 | Grok 3 Beta xAI | 93.8 | $3.00 |
| 3 | DeepSeek V3.2 OSS DeepSeek | 91.0 | $0.20 |
| 4 | GPT-4o (extended) OpenAI | 90.2 | $6.00 |
| 5 | DeepSeek V3 OSS DeepSeek | 89.5 | $0.32 |
| 6 | Gemini 2.5 Flash Google | 88.5 | $0.30 |
| 7 | QwQ 32B OSS Alibaba | 88.0 | $0.15 |
| 8 | Llama 4 Maverick OSS Meta | 87.5 | $0.15 |
| 9 | Qwen2.5 72B Instruct OSS Alibaba | 86.6 | $0.12 |
| 10 | Claude Opus 4 Anthropic | 83.8 | $15.00 |
| 11 | O3 OpenAI | 80.0 | $2.00 |
| 12 | DeepSeek R1 OSS DeepSeek | 78.8 | $0.70 |
| 13 | GPT-5.2 OpenAI | 75.5 | $1.75 |
| 14 | Claude Opus 4.6 Anthropic | 75.0 | $15.00 |
| 15 | GPT-5 OpenAI | 75.0 | $1.25 |
| 16 | Gemini 2.5 Pro Google | 75.0 | $1.25 |
| 17 | GPT-4.1 OpenAI | 74.0 | $2.00 |
| 18 | Claude Sonnet 4 Anthropic | 73.3 | $3.00 |
| 19 | O3 Pro OpenAI | 73.0 | $20.00 |
| 20 | Claude Sonnet 4.6 Anthropic | 72.0 | $3.00 |
Other Notable Models
These models don't have published coding benchmark scores yet but are commonly used for coding tasks.
Grok 4
xAI · Quality: 88
Qwen3 235B A22B
Alibaba · Quality: 87
Mistral Large
Mistral · Quality: 86
Claude 3.5 Haiku
Anthropic · Quality: 82
Command A
Cohere · Quality: 82
Gemini 2.0 Flash
Google · Quality: 81
GPT-4o-mini
OpenAI · Quality: 80
Command R+ (08-2024)
Cohere · Quality: 79
Llama 3.3 70B Instruct
Meta · Quality: 79
GPT-5 Nano
OpenAI · Quality: 78