LiveCodeBench
codingLiveCodeBench evaluates coding ability on competitive programming problems sourced from live contests (LeetCode, Codeforces, AtCoder) that post-date model training cutoffs.
View paper / source0
Models Tested
0.0
Average Score
0–100
Scale Range
1.2x
Weight
How It Works
Models solve algorithmic programming problems with exact test case verification. Problems are continuously updated from recent programming contests, ensuring they are truly novel for each model being tested.
Why It Matters
By using problems from recent contests, LiveCodeBench minimises data contamination — a major issue with older coding benchmarks. It provides a more honest assessment of a model's algorithmic reasoning ability.
Limitations
Competition programming is a specific skill that doesn't fully represent general software engineering ability. Continuous updates make historical comparisons tricky.