Fastest AI Models
27 LLMs ranked by speed (tokens per second). For latency-sensitive applications like chatbots, real-time coding, and interactive agents.
Note: Speed ratings are relative estimates. For production latency data, we recommend Artificial Analysis. See also: Full Speed Comparison with TTFT and provider endpoint data.
9
Ultra Fast (100+ tok/s)
13
Fast (50–99 tok/s)
4
Moderate (20–49 tok/s)
1
Slower (<20 tok/s)
Speed Ranking
Full Table
| # | Model | Speed | Price |
|---|---|---|---|
| 1 | Gemini 2.0 Flash Lite Google | 450 tok/s | $0.07 |
| 2 | Gemini 2.0 Flash Google | 400 tok/s | $0.10 |
| 3 | Gemini 2.5 Flash Google | 350 tok/s | $0.30 |
| 4 | GPT-4.1 Nano OpenAI | 200 tok/s | $0.10 |
| 5 | GPT-4o-mini OpenAI | 150 tok/s | $0.15 |
| 6 | Mistral Small 3.1 24B OSS Mistral | 150 tok/s | $0.35 |
| 7 | Claude 3.5 Haiku Anthropic | 120 tok/s | $0.80 |
| 8 | GPT-4.1 OpenAI | 110 tok/s | $2.00 |
| 9 | GPT-4o (extended) OpenAI | 100 tok/s | $6.00 |
| 10 | Llama 4 Maverick OSS Meta | 95 tok/s | $0.15 |
| 11 | Claude Sonnet 4.6 Anthropic | 90 tok/s | $3.00 |
| 12 | Gemini 2.5 Pro Google | 90 tok/s | $1.25 |
| 13 | GPT-5.2 OpenAI | 85 tok/s | $1.75 |
| 14 | Mistral Large OSS Mistral | 80 tok/s | $2.00 |
| 15 | Claude Sonnet 4 Anthropic | 80 tok/s | $3.00 |
| 16 | Llama 3.3 70B Instruct OSS Meta | 80 tok/s | $0.10 |
| 17 | GPT-5 OpenAI | 75 tok/s | $1.25 |
| 18 | Grok 3 Beta xAI | 70 tok/s | $3.00 |
| 19 | O4 Mini OpenAI | 65 tok/s | $1.10 |
| 20 | Qwen2.5 72B Instruct OSS Alibaba | 65 tok/s | $0.12 |
| 21 | Claude Opus 4.6 Anthropic | 50 tok/s | $15.00 |
| 22 | Grok 4 xAI | 50 tok/s | $3.00 |
| 23 | DeepSeek V3.2 OSS DeepSeek | 49 tok/s | $0.20 |
| 24 | Qwen3 235B A22B OSS Alibaba | 40 tok/s | $0.46 |
| 25 | DeepSeek R1 OSS DeepSeek | 30 tok/s | $0.70 |
| 26 | Claude Opus 4 Anthropic | 30 tok/s | $15.00 |
| 27 | O3 OpenAI | 15 tok/s | $2.00 |
Speed + Quality Sweet Spot
Models scoring 70+ quality AND offering good speed. Ranked by speed x quality product.