MMMU
multimodalMMMU (Massive Multi-discipline Multimodal Understanding) tests multimodal AI models on college-level problems that require understanding both images and text across 30 subjects.
View paper / source0
Models Tested
0.0
Average Score
0–100
Scale Range
1x
Weight
How It Works
Models receive questions that include images (diagrams, charts, photos, mathematical figures) and must reason about them. Subjects span art, business, science, engineering, medicine, and humanities.
Why It Matters
As AI models become multimodal, MMMU provides a rigorous way to test whether they can truly understand and reason about visual information in academic contexts, not just describe images.
Limitations
Requires multimodal input so cannot be used for text-only models. Image quality and format can affect results. Some questions are US-centric in cultural context.