What's new

AI Speed Comparison

Comprehensive speed analysis — time to first token (TTFT), output speed in tokens per second, and per-provider endpoint performance for models available on multiple platforms.

11

Ultra Fast (100+ tok/s)

12

Fast (50-99 tok/s)

28

Models with TTFT data

1

Multi-provider models

Time to First Token (TTFT)

How quickly the model starts responding. Lower is better — critical for real-time applications and chat interfaces.

# Model TTFT
1 Gemini 2.0 Flash Lite Google 120ms
2 Gemini 2.0 Flash Google 140ms
3 GPT-4.1 Nano OpenAI 150ms
4 Gemini 2.5 Flash Google 160ms
5 Mistral Small 3.1 24B Mistral 180ms
6 GPT-4.1 Mini OpenAI 190ms
7 Claude 3.5 Haiku Anthropic 200ms
8 GPT-4o-mini OpenAI 210ms
9 Llama 4 Scout Meta 220ms
10 Claude Sonnet 4.6 Anthropic 240ms
11 GPT-4.1 OpenAI 250ms
12 Llama 4 Maverick Meta 250ms
13 Claude Sonnet 4 Anthropic 260ms
14 Gemini 2.5 Pro Google 270ms
15 Claude Opus 4.6 Anthropic 280ms
16 Llama 3.3 70B Instruct Meta 280ms
17 GPT-4o (2024-05-13) OpenAI 290ms
18 Mistral Large Mistral 310ms
19 GPT-5.2 OpenAI 320ms
20 Qwen2.5 72B Instruct Alibaba 340ms
21 Claude Opus 4 Anthropic 350ms
22 GPT-5 OpenAI 380ms
23 DeepSeek V3.2 DeepSeek 400ms
24 Qwen3 235B A22B Alibaba 420ms
25 DeepSeek R1 DeepSeek 500ms
26 O4 Mini OpenAI 550ms
27 o3 Mini OpenAI 600ms
28 O3 OpenAI 1200ms

Output Speed Rankings

Tokens generated per second. Higher is better for long-form content generation. See full rankings →

2 Gemini 2.0 Flash
400 tok/s
3 Gemini 2.5 Flash
350 tok/s
4 GPT-4.1 Nano
200 tok/s
5 GPT-4.1 Mini
160 tok/s
6 GPT-4o-mini
150 tok/s
8 Llama 4 Scout
120 tok/s
9 Claude 3.5 Haiku
120 tok/s
10 GPT-4.1
110 tok/s
12 Llama 4 Maverick
95 tok/s
14 Gemini 2.5 Pro
90 tok/s
15 GPT-5.2
85 tok/s

Per-Provider Endpoint Comparison

Some open-source models are available on multiple platforms with different speed and pricing. Compare endpoints.

Llama 4 Maverick 2 providers
Provider Speed TTFT
Meta Direct
110 tok/s 180ms
AWS Bedrock
85 tok/s 250ms

About Speed Measurements

TTFT (Time to First Token) — How long until the model starts generating its response, measured in milliseconds. Critical for chat applications where perceived responsiveness matters.

Output Speed (tok/s) — How many tokens the model generates per second once it starts responding. Important for long-form content generation.

Speed data sourced from Artificial Analysis and provider benchmarks. Actual speeds may vary based on load, region, and input complexity.