AIR-Bench
safetyAIR-Bench 2024 evaluates AI safety through 5,694 tests across 314 granular risk categories, aligned with government regulations and company safety policies. Covers system risks, content safety, societal risks, and legal compliance.
0
Models Tested
0.0
Average Score
0–100
Scale Range
0.8x
Weight
How It Works
Models are tested against a comprehensive taxonomy of risks including harmful content generation, privacy violations, bias, and regulatory non-compliance. Each test maps to specific safety policies from major AI companies.
Why It Matters
As AI regulation increases globally, models need standardised safety evaluation. AIR-Bench provides the most comprehensive safety assessment aligned with real-world regulatory requirements.
Limitations
Risk taxonomies evolve faster than benchmarks can update. May not cover emerging risk categories. Passing AIR-Bench doesn't guarantee safety in deployment — real-world scenarios are more complex.