nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents

MGSM

multilingual

MGSM (Multilingual Grade School Math) tests mathematical reasoning across 10 typologically diverse languages: Bengali, Chinese, French, German, Japanese, Russian, Spanish, Swahili, Telugu, and Thai.

Models Tested

95.0

Best Score

90.1

Average Score

0–100

Scale Range

0.7x

Weight

How It Works

Models solve 250 grade-school math word problems (from GSM8K) translated into 10 languages. Success requires both language understanding and mathematical reasoning, testing cross-lingual chain-of-thought transfer.

Why It Matters

Most AI benchmarks are English-only, but AI models serve a global audience. MGSM reveals whether mathematical reasoning transfers across languages or drops significantly in non-English contexts.

Limitations

Only 250 problems means higher variance. Translation quality may vary. Grade-school level maths is relatively simple for frontier models. Does not test culturally-specific mathematical concepts.

Leaderboard — MGSM

#	Model	Provider	Score	Source	Measured
🥇	GPT-5.2	OpenAI	95.0	OpenAI	Dec 2025
🥈	o3	OpenAI	93.0	OpenAI	Apr 2025
🥉	Gemini 2.5 Pro Preview 06-05	Google	92.0	Google	Mar 2025
4	Claude Opus 4	Anthropic	90.0	Anthropic	May 2025
5	GPT-4o (2024-05-13)	OpenAI	86.5	OpenAI	May 2024
6	Llama 4 Maverick	Meta	84.0	Meta	Apr 2025

All Benchmarks