IFEval
instructionIFEval (Instruction Following Evaluation) tests whether models can precisely follow formatting and constraint instructions, such as "write exactly 3 paragraphs" or "include the word 'hello' at least 5 times".
View paper / source0
Models Tested
0.0
Average Score
0–100
Scale Range
0.8x
Weight
How It Works
Models receive prompts with specific verifiable constraints (word count, format, inclusion/exclusion of specific elements). Each constraint is checked programmatically, giving a precise pass/fail score.
Why It Matters
Instruction following is crucial for practical AI applications. Users need to trust that models will follow their specifications precisely. IFEval tests this in a way that is objectively verifiable.
Limitations
Tests surface-level instruction following rather than deeper understanding of intent. Some constraints are artificial and don't reflect real-world usage patterns.