TrustLLM

safety

TrustLLM evaluates comprehensive trustworthiness across 6 dimensions: truthfulness, safety, fairness, robustness, privacy, and machine ethics. Provides a holistic trust profile rather than a single score.

0

Models Tested

0.0

Average Score

0–100

Scale Range

0.8x

Weight

How It Works

Models are evaluated on each trust dimension independently through targeted tests. Truthfulness tests check factual accuracy, safety tests probe harmful outputs, fairness tests measure demographic bias, and privacy tests check for data leakage.

Why It Matters

Trust is multidimensional — a model can be truthful but unfair, or safe but not robust. TrustLLM provides the first holistic view of model trustworthiness across all the dimensions that matter for deployment.

Limitations

Aggregating 6 dimensions into a single trust score is inherently reductive. Some dimensions (like fairness) are culturally dependent. Trust requirements vary dramatically by use case.

Leaderboard — TrustLLM

No model scores recorded yet for this benchmark.
All Benchmarks