nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents

MathVista

multimodal

MathVista tests mathematical reasoning in visual contexts using 6,141 examples from 31 datasets. It evaluates whether models can solve maths problems involving diagrams, graphs, charts, and geometric figures.

View paper / source

Models Tested

0.0

Average Score

0–100

Scale Range

Weight

How It Works

Models receive images containing mathematical content (function plots, geometry diagrams, statistical charts) and must solve problems that require both visual understanding and mathematical reasoning.

Why It Matters

Mathematical reasoning with visual inputs is a critical skill for STEM applications. MathVista uniquely combines vision and mathematics, testing a capability that pure text benchmarks miss entirely.

Limitations

Image quality and format can affect results. Some problems may be solvable from text descriptions alone. The dataset draws from existing sources which may appear in training data.

Leaderboard — MathVista

No model scores recorded yet for this benchmark.

All Benchmarks