MMMU

multimodal

MMMU (Massive Multi-discipline Multimodal Understanding) tests multimodal AI models on college-level problems that require understanding both images and text across 30 subjects.

View paper / source

Models Tested

0.0

Average Score

0–100

Scale Range

Weight

How It Works

Models receive questions that include images (diagrams, charts, photos, mathematical figures) and must reason about them. Subjects span art, business, science, engineering, medicine, and humanities.

Why It Matters

As AI models become multimodal, MMMU provides a rigorous way to test whether they can truly understand and reason about visual information in academic contexts, not just describe images.

Limitations

Requires multimodal input so cannot be used for text-only models. Image quality and format can affect results. Some questions are US-centric in cultural context.

Leaderboard — MMMU

No model scores recorded yet for this benchmark.

All Benchmarks