AI Speed Comparison

Comprehensive speed analysis — time to first token (TTFT), output speed in tokens per second, and per-provider endpoint performance for models available on multiple platforms.

9

Ultra Fast (100+ tok/s)

13

Fast (50-99 tok/s)

27

Models with TTFT data

0

Multi-provider models

Time to First Token (TTFT)

How quickly the model starts responding. Lower is better — critical for real-time applications and chat interfaces.

# Model TTFT
1 Gemini 2.0 Flash Lite Google 120ms
2 Gemini 2.0 Flash Google 140ms
3 GPT-4.1 Nano OpenAI 150ms
4 Gemini 2.5 Flash Google 160ms
5 Mistral Small 3.1 24B Mistral 180ms
6 Claude 3.5 Haiku Anthropic 200ms
7 GPT-4o-mini OpenAI 210ms
8 Claude Sonnet 4.6 Anthropic 240ms
9 GPT-4.1 OpenAI 250ms
10 Llama 4 Maverick Meta 250ms
11 Claude Sonnet 4 Anthropic 260ms
12 Gemini 2.5 Pro Google 270ms
13 Claude Opus 4.6 Anthropic 280ms
14 Llama 3.3 70B Instruct Meta 280ms
15 Grok 3 Beta xAI 290ms
16 GPT-4o (extended) OpenAI 290ms
17 Mistral Large Mistral 310ms
18 GPT-5.2 OpenAI 320ms
19 Qwen2.5 72B Instruct Alibaba 340ms
20 Grok 4 xAI 350ms
21 Claude Opus 4 Anthropic 350ms
22 GPT-5 OpenAI 380ms
23 DeepSeek V3.2 DeepSeek 400ms
24 Qwen3 235B A22B Alibaba 420ms
25 DeepSeek R1 DeepSeek 500ms
26 O4 Mini OpenAI 550ms
27 O3 OpenAI 1200ms

Output Speed Rankings

Tokens generated per second. Higher is better for long-form content generation. See full rankings →

2 Gemini 2.0 Flash
400 tok/s
3 Gemini 2.5 Flash
350 tok/s
4 GPT-4.1 Nano
200 tok/s
5 GPT-4o-mini
150 tok/s
7 Claude 3.5 Haiku
120 tok/s
8 GPT-4.1
110 tok/s
10 Llama 4 Maverick
95 tok/s
12 Gemini 2.5 Pro
90 tok/s
13 GPT-5.2
85 tok/s
14 Mistral Large
80 tok/s
15 Claude Sonnet 4
80 tok/s

About Speed Measurements

TTFT (Time to First Token) — How long until the model starts generating its response, measured in milliseconds. Critical for chat applications where perceived responsiveness matters.

Output Speed (tok/s) — How many tokens the model generates per second once it starts responding. Important for long-form content generation.

Speed data sourced from Artificial Analysis and provider benchmarks. Actual speeds may vary based on load, region, and input complexity.