nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents

AIME 2025

reasoning

AIME (American Invitational Mathematics Examination) 2025 consists of 15 extremely challenging mathematics problems. AIME is a prestigious competition that serves as a qualifier for the USA Mathematical Olympiad.

Models Tested

96.7

Best Score

86.0

Average Score

0–100

Scale Range

1.4x

Weight

How It Works

Models solve 15 problems where each answer is an integer from 0 to 999. Problems require sophisticated mathematical reasoning across algebra, geometry, number theory, and combinatorics. Being from 2025, these problems were unlikely to appear in training data.

Why It Matters

AIME 2025 is particularly valuable because the problems are recent enough to avoid data contamination. The difficulty level (top 5% of US high school mathematicians qualify) makes it an excellent discriminator for frontier model reasoning.

Limitations

Only 15 problems means high variance in scores. Integer-only answers miss the reasoning process. Problems are specifically designed for mathematical competition style, not real-world maths applications.

Leaderboard — AIME 2025

#	Model	Provider	Score	Source	Measured
🥇	o3 Pro	OpenAI	96.7	OpenAI	Jun 2025
🥈	o4 Mini	OpenAI	92.7	OpenAI	Apr 2025
🥉	o3	OpenAI	91.6	OpenAI	Apr 2025
4	R1 0528	DeepSeek	87.5	DeepSeek	May 2025
5	Gemini 2.5 Pro Preview 06-05	Google	86.7	Google	Mar 2025
6	Grok 3	xAI	83.9	xAI	Jun 2025
7	R1	DeepSeek	79.8	DeepSeek	Jan 2025
8	QwQ 32B	Alibaba	79.5	Alibaba	Mar 2025
9	Phi-4 Reasoning	Microsoft	75.3	Microsoft	May 2025

All Benchmarks