MGSM

multilingual

MGSM (Multilingual Grade School Math) tests mathematical reasoning across 10 typologically diverse languages: Bengali, Chinese, French, German, Japanese, Russian, Spanish, Swahili, Telugu, and Thai.

6

Models Tested

95.0

Best Score

90.1

Average Score

0–100

Scale Range

0.7x

Weight

How It Works

Models solve 250 grade-school math word problems (from GSM8K) translated into 10 languages. Success requires both language understanding and mathematical reasoning, testing cross-lingual chain-of-thought transfer.

Why It Matters

Most AI benchmarks are English-only, but AI models serve a global audience. MGSM reveals whether mathematical reasoning transfers across languages or drops significantly in non-English contexts.

Limitations

Only 250 problems means higher variance. Translation quality may vary. Grade-school level maths is relatively simple for frontier models. Does not test culturally-specific mathematical concepts.

Leaderboard — MGSM

# Model Provider Score
🥇 GPT-5.2 OpenAI 95.0
🥈 o3 OpenAI 93.0
🥉 Gemini 2.5 Pro Preview 06-05 Google 92.0
4 Claude Opus 4 Anthropic 90.0
5 GPT-4o OpenAI 86.5
6 Llama 4 Maverick Meta 84.0
All Benchmarks