MGSM
multilingualMGSM (Multilingual Grade School Math) tests mathematical reasoning across 10 typologically diverse languages: Bengali, Chinese, French, German, Japanese, Russian, Spanish, Swahili, Telugu, and Thai.
6
Models Tested
95.0
Best Score
90.1
Average Score
0–100
Scale Range
0.7x
Weight
How It Works
Models solve 250 grade-school math word problems (from GSM8K) translated into 10 languages. Success requires both language understanding and mathematical reasoning, testing cross-lingual chain-of-thought transfer.
Why It Matters
Most AI benchmarks are English-only, but AI models serve a global audience. MGSM reveals whether mathematical reasoning transfers across languages or drops significantly in non-English contexts.
Limitations
Only 250 problems means higher variance. Translation quality may vary. Grade-school level maths is relatively simple for frontier models. Does not test culturally-specific mathematical concepts.
Leaderboard — MGSM
| # | Model | Provider | Score | |
|---|---|---|---|---|
| 🥇 | GPT-5.2 | OpenAI | 95.0 | |
| 🥈 | o3 | OpenAI | 93.0 | |
| 🥉 | Gemini 2.5 Pro Preview 06-05 | 92.0 | | |
| 4 | Claude Opus 4 | Anthropic | 90.0 | |
| 5 | GPT-4o | OpenAI | 86.5 | |
| 6 | Llama 4 Maverick | Meta | 84.0 | |