nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents nowJobs market snapshot refreshed nowRecomputed benchmark-weighted quality scores nowUpdated speed measurements nowSynced Chatbot Arena benchmark track nowValidated official pricing snapshots nowPulled latest OpenRouter price index 25 MayOpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform 25 MayPublished the 2026-05-25 daily digest 25 MayWorkbench Launches Open Source BullMQ Dashboard For Node Backends 24 MaySpecBench Tests Reward Hacking In Long Horizon Coding Agents

LiveCodeBench

coding

LiveCodeBench evaluates coding ability on competitive programming problems sourced from live contests (LeetCode, Codeforces, AtCoder) that post-date model training cutoffs.

View paper / source

Models Tested

0.0

Average Score

0–100

Scale Range

1.2x

Weight

How It Works

Models solve algorithmic programming problems with exact test case verification. Problems are continuously updated from recent programming contests, ensuring they are truly novel for each model being tested.

Why It Matters

By using problems from recent contests, LiveCodeBench minimises data contamination — a major issue with older coding benchmarks. It provides a more honest assessment of a model's algorithmic reasoning ability.

Limitations

Competition programming is a specific skill that doesn't fully represent general software engineering ability. Continuous updates make historical comparisons tricky.

Leaderboard — LiveCodeBench

No model scores recorded yet for this benchmark.

All Benchmarks