ARC Challenge
reasoningARC (AI2 Reasoning Challenge) tests grade-school level science reasoning. The "Challenge" set contains questions that are difficult for retrieval-based and word co-occurrence methods.
View paper / source0
Models Tested
0.0
Average Score
0–100
Scale Range
0.6x
Weight
How It Works
Multiple-choice science questions from 3rd to 9th grade standardised tests. The Challenge set specifically includes questions that simple statistical methods and retrieval systems get wrong.
Why It Matters
ARC tests fundamental scientific reasoning ability — the kind of common-sense understanding that humans develop early. It helps identify whether models can reason about cause and effect in the physical world.
Limitations
Most modern LLMs now score very highly (>95%), making it less useful for differentiating frontier models. Questions are US-centric.