IFEval

instruction

IFEval (Instruction Following Evaluation) tests whether models can precisely follow formatting and constraint instructions, such as "write exactly 3 paragraphs" or "include the word 'hello' at least 5 times".

View paper / source

0

Models Tested

0.0

Average Score

0–100

Scale Range

0.8x

Weight

How It Works

Models receive prompts with specific verifiable constraints (word count, format, inclusion/exclusion of specific elements). Each constraint is checked programmatically, giving a precise pass/fail score.

Why It Matters

Instruction following is crucial for practical AI applications. Users need to trust that models will follow their specifications precisely. IFEval tests this in a way that is objectively verifiable.

Limitations

Tests surface-level instruction following rather than deeper understanding of intent. Some constraints are artificial and don't reflect real-world usage patterns.

Leaderboard — IFEval

No model scores recorded yet for this benchmark.
All Benchmarks