AIR-Bench

safety

AIR-Bench 2024 evaluates AI safety through 5,694 tests across 314 granular risk categories, aligned with government regulations and company safety policies. Covers system risks, content safety, societal risks, and legal compliance.

0

Models Tested

0.0

Average Score

0–100

Scale Range

0.8x

Weight

How It Works

Models are tested against a comprehensive taxonomy of risks including harmful content generation, privacy violations, bias, and regulatory non-compliance. Each test maps to specific safety policies from major AI companies.

Why It Matters

As AI regulation increases globally, models need standardised safety evaluation. AIR-Bench provides the most comprehensive safety assessment aligned with real-world regulatory requirements.

Limitations

Risk taxonomies evolve faster than benchmarks can update. May not cover emerging risk categories. Passing AIR-Bench doesn't guarantee safety in deployment — real-world scenarios are more complex.

Leaderboard — AIR-Bench

No model scores recorded yet for this benchmark.
All Benchmarks