Progress tracker

AGI Progress

Where are we on the path to artificial general intelligence? Every major lab, researcher, and framework defines it differently. This page tracks them all, maps capabilities, and charts the milestones.

Capability Status

Autonomous Task Execution

Emerging

agency

Best: Claude Code / Devin

Tool Use / Function Calling

Achieved

agency

Best: Claude / GPT-4

Multi-step Reasoning

Achieved

cognition

Best: o3 / DeepSeek-R1

Natural Language Understanding

Achieved

cognition

Best: GPT-4o / Claude Opus 4.6

World Models / Physical Understanding

Not Yet

cognition

Genuinely Original Creative Work

Disputed

creativity

Autonomous Scientific Research

Emerging

innovation

Best: FutureHouse / various

Long Context (1M+ tokens)

Achieved

memory

Best: Gemini 2.5 Pro

Persistent Cross-Session Memory

Partial

memory

Best: ChatGPT memory / Claude memory

Self-Correction / Reflection

Emerging

metacognition

Best: o1 / Claude extended thinking

Vision + Audio + Text

Achieved

perception

Best: GPT-4o / Gemini

Code Generation

Achieved

technical

Best: Claude Code / Codex

How the Labs Define AGI

Anthropic

AI Safety Levels (ASL)

Safety-focused framework tied to deployment policy. Current models operate at ASL-2. ASL-3 would require bio/cyber risk mitigation before deployment.

L1
ASL-1

No meaningful catastrophic risk

L2
ASL-2

Present-day risks, requires current safeguards

L3
ASL-3

Substantially increased risk, enhanced containment needed

L4
ASL-4+

Potentially catastrophic autonomous capabilities

Source ->

Google DeepMind

Levels of AGI

Six performance levels crossed with breadth (narrow vs general). Published Nov 2023. Defines AGI as general-purpose AI at Level 3+ (Expert).

L0
No AI

Narrow non-AI tools

L1
Emerging

Equal to or somewhat better than unskilled human

L2
Competent

At least 50th percentile of skilled adults

L3
Expert

At least 90th percentile of skilled adults

L4
Virtuoso

At least 99th percentile of skilled adults

L5
Superhuman

Outperforms 100% of humans

Source ->

OpenAI

Five Levels of AI

Internal framework leaked July 2024. OpenAI claimed Level 2 reached with o1. Level 3 (agents) is the current frontier as of early 2026.

L1
Chatbots

Conversational AI with natural language

L2
Reasoners

Human-level problem solving

L3
Agents

Systems that can take actions in the world

L4
Innovators

AI that aids in scientific invention

L5
Organisations

AI that can do the work of an entire organisation

Researcher Positions

Gary Marcus

Sceptical Position

Argues AGI requires fundamental breakthroughs in reasoning, reliability, and genuine understanding. Current models are sophisticated pattern matchers that fail on novel situations. Consistent critic of AGI hype.

Mustafa Suleiman

Modern Turing Test

Proposed a practical economic test rather than conversational imitation. Focuses on real-world capability and autonomous decision-making.

Shane Legg (DeepMind co-founder)

Original AGI Definition

Defined AGI as "a machine that can do any intellectual task that a human being can." Co-coined the term "Artificial General Intelligence." Has predicted 50% chance of AGI by 2028.

Yann LeCun

World Models Required

Argues current LLMs cannot achieve AGI because they lack world models, persistent memory, and genuine planning. Token prediction is insufficient. Cofounded AMI Labs in 2026 with $1B+ to pursue his approach.

Our Position

Practical AGI Achieved

Position: AGI was functionally achieved with ChatGPT in November 2022. Since then we have been climbing capability levels. The question is no longer "if AGI" but "what level of AGI and how fast."

L1
Conversational AGI

ChatGPT launch — broad general capability across knowledge domains

L2
Reasoning AGI

o1/o3, DeepSeek-R1 — chain of thought, multi-step problem solving

L3
Agentic AGI

Claude Code, Codex — autonomous task completion with tool use

L4
Collaborative AGI

Multi-agent systems coordinating work across domains

L5
Autonomous AGI

Self-directed systems that can identify and pursue goals independently

Milestones

Jan 2026

Multi-Agent Systems Mature

Claude Code, Codex, and Gemini working together on shared codebases. Agent coordination becomes practical, not theoretical.

Jul 2025

GPT-5 Release

Significant capability jump across all domains. Pushes frontier of what single-model systems can achieve.

May 2025

Claude Opus 4

Extended thinking, agentic capabilities, sustained complex task execution. Powers multi-session autonomous work.

Feb 2025

Claude Code Launch

Anthropic launches agentic coding tool. Autonomous file editing, terminal access, git operations. The agentic era begins.

Jan 2025

DeepSeek-R1

Open-weight reasoning model from China matching o1 performance at a fraction of the cost. Democratises advanced reasoning.

Sept 2024

o1 Release

OpenAI releases o1 with explicit chain-of-thought reasoning. Claims Level 2 (Reasoners) reached. PhD-level science performance.

May 2024

GPT-4o (Omni)

Omni model with native audio/video understanding. Sub-second voice responses. Free tier access.

Mar 2024

Claude 3 Opus

Anthropic releases Claude 3 family. Opus leads multiple benchmarks. First model widely considered to match GPT-4.

Feb 2024

Sora Demonstration

OpenAI demonstrates photorealistic text-to-video generation. Later shut down in 2026 due to cost and moderation challenges.

Dec 2023

Gemini Release

Google releases Gemini, natively multimodal from the ground up.

Jul 2023

Claude 2

Anthropic releases Claude 2 with 100K context window — 10x what was standard at the time.

Mar 2023

GPT-4 Release

Multimodal model passing bar exam, SAT, and various professional benchmarks. Significant quality jump over GPT-3.5.

Nov 2022

ChatGPT Launch

OpenAI releases ChatGPT. Broad conversational AI available to the public for the first time. Reaches 100M users in 2 months.