AI Speed Comparison
Comprehensive speed analysis — time to first token (TTFT), output speed in tokens per second, and per-provider endpoint performance for models available on multiple platforms.
9
Ultra Fast (100+ tok/s)
13
Fast (50-99 tok/s)
27
Models with TTFT data
0
Multi-provider models
Time to First Token (TTFT)
How quickly the model starts responding. Lower is better — critical for real-time applications and chat interfaces.
| # | Model | TTFT |
|---|---|---|
| 1 | Gemini 2.0 Flash Lite Google | 120ms |
| 2 | Gemini 2.0 Flash Google | 140ms |
| 3 | GPT-4.1 Nano OpenAI | 150ms |
| 4 | Gemini 2.5 Flash Google | 160ms |
| 5 | Mistral Small 3.1 24B Mistral | 180ms |
| 6 | Claude 3.5 Haiku Anthropic | 200ms |
| 7 | GPT-4o-mini OpenAI | 210ms |
| 8 | Claude Sonnet 4.6 Anthropic | 240ms |
| 9 | GPT-4.1 OpenAI | 250ms |
| 10 | Llama 4 Maverick Meta | 250ms |
| 11 | Claude Sonnet 4 Anthropic | 260ms |
| 12 | Gemini 2.5 Pro Google | 270ms |
| 13 | Claude Opus 4.6 Anthropic | 280ms |
| 14 | Llama 3.3 70B Instruct Meta | 280ms |
| 15 | Grok 3 Beta xAI | 290ms |
| 16 | GPT-4o (extended) OpenAI | 290ms |
| 17 | Mistral Large Mistral | 310ms |
| 18 | GPT-5.2 OpenAI | 320ms |
| 19 | Qwen2.5 72B Instruct Alibaba | 340ms |
| 20 | Grok 4 xAI | 350ms |
| 21 | Claude Opus 4 Anthropic | 350ms |
| 22 | GPT-5 OpenAI | 380ms |
| 23 | DeepSeek V3.2 DeepSeek | 400ms |
| 24 | Qwen3 235B A22B Alibaba | 420ms |
| 25 | DeepSeek R1 DeepSeek | 500ms |
| 26 | O4 Mini OpenAI | 550ms |
| 27 | O3 OpenAI | 1200ms |
Output Speed Rankings
Tokens generated per second. Higher is better for long-form content generation. See full rankings →
About Speed Measurements
TTFT (Time to First Token) — How long until the model starts generating its response, measured in milliseconds. Critical for chat applications where perceived responsiveness matters.
Output Speed (tok/s) — How many tokens the model generates per second once it starts responding. Important for long-form content generation.
Speed data sourced from Artificial Analysis and provider benchmarks. Actual speeds may vary based on load, region, and input complexity.