Live AI intelligence
Choose the right AI model faster.
Start with the current frontier watchlist, cross-check the benchmark-backed composite, then scan what changed today.
24
Frontier watchlist
17
Weighted benchmarks
20
Digest stories
14 Apr 2026
Latest full refresh
Latest news
Open the news deskAnthropic co-founder confirms the company briefed the Trump administration on Mythos
The attacks on Sam Altman are a warning for the AI world
Max Hodak’s Science Corp. is preparing to place its first sensor in a human brain
Chrome now lets you turn AI prompts into repeatable ‘Skills’
Google adds AI Skills to Chrome to help you save favorite workflows
Current frontier
The homepage now starts with the models that are current, not the models with the oldest benchmark advantage.
This lane is a curated frontier watchlist. It mixes benchmark-backed flagships with freshly tracked launches so the homepage answers “what is current?” before it answers “what is heavily evaluated?”
| Model | State | Evidence | Released | Price |
|---|---|---|---|---|
| GPT-5.4 tracking API Vision | tracking awaiting score | tracking official release official release only | 5 Mar 2026 release date | $2.50 / $15.00 |
| Claude Opus 4.6 scored API Vision | active 89 quality | 7 benchmark tracks 7 benchmark rows | 5 Feb 2026 release date | $15.00 / $75.00 |
| Gemini 3.1 Pro partial API Vision Audio | tracking 96 quality | 2 public evals 2 benchmark rows | 19 Feb 2026 release date | $2.00 / $12.00 |
| Claude Sonnet 4.6 scored API Vision | active 86 quality | 4 benchmark tracks 4 benchmark rows | 17 Feb 2026 release date | $3.00 / $15.00 |
| Grok 4.20 tracking API Vision | tracking awaiting score | tracking official release official release only | 31 Mar 2026 release date | $2.00 / $6.00 |
| Qwen 3.6 Plus tracking API Vision | tracking awaiting score | tracking official release official release only | 2 Apr 2026 release date | $0.33 / $1.95 |
| MiniMax M2.7 tracking API | tracking awaiting score | tracking official release official release only | 18 Mar 2026 release date | $0.30 / $1.20 |
| GLM-5 tracking API | tracking awaiting score | tracking official release official release only | 12 Feb 2026 release date | $0.72 / $2.30 |
| Kimi K2.5 tracking API | tracking awaiting score | tracking official release official release only | 28 Mar 2026 release date | $0.57 / $2.30 |
| Gemma 4 tracking Open API Vision | tracking awaiting score | tracking official release official release only | 2 Apr 2026 release date | $0.13 / $0.38 |
What this shows
Frontier first, scoring second.
The homepage no longer treats the benchmark-heavy composite as a “best model right now” answer. This lane is the current model watchlist, and the evaluated composite sits alongside it as a separate scored view.
Top evaluated model
GPT-5.2
The evaluated composite still exists, but it now behaves as a benchmark-backed score rather than a proxy for “latest frontier model.”
- Composite
- 68.0
- Benchmarks
- 8
- Freshness
- 82
Latest tracked release
Claude Opus 4.6 (Fast)
The release desk still covers launches the moment they land, even when benchmark and quality coverage have not caught up yet.
Open release deskBenchmark-backed ranking
The evaluated composite is now explicitly the scored view, not the homepage’s “latest frontier” answer.
The evaluated composite now blends normalized benchmark results, the quality layer, and a freshness signal. It also penalizes thin evidence, stale provider generations, and beta or compact variants so old benchmark saturation stops dominating the homepage story.
| # | Model | Composite | Bench | Coverage | Price |
|---|---|---|---|---|---|
| 01 | GPT-5.2 API Vision | 68.0 composite | 77.9 8 tracks | 53% weighted | $1.75 / $14.00 |
| 02 | Claude Opus 4.6 API Vision | 66.3 composite | 74.8 7 tracks | 45% weighted | $15.00 / $75.00 |
| 03 | Claude Sonnet 4.6 API Vision | 63.3 composite | 83.0 4 tracks | 28% weighted | $3.00 / $15.00 |
| 04 | Llama 4 Maverick Open API Vision | 55.1 composite | 76.9 5 tracks | 28% weighted | $0.15 / $0.60 |
| 05 | DeepSeek V3.2 Open API | 48.2 composite | 87.9 3 tracks | 17% weighted | $0.20 / $0.77 |
| 06 | Gemini 3.1 Pro API Vision Audio | 44.7 composite | 60.4 2 tracks | 15% weighted | $2.00 / $12.00 |
| 07 | O3 API Vision | 43.4 composite | 86.4 11 tracks | 68% weighted | $2.00 / $8.00 |
| 08 | GPT-5 API Vision | 41.1 composite | 84.5 3 tracks | 23% weighted | $1.25 / $10.00 |
| 09 | Gemini 2.5 Pro API Vision Audio | 39.4 composite | 81.7 11 tracks | 68% weighted | $1.25 / $10.00 |
| 10 | Grok 4 API Vision | 39.0 composite | 77.4 7 tracks | 46% weighted | $3.00 / $15.00 |
Top evaluated model
GPT-5.2
OpenAI currently leads the benchmark-backed evaluated set with a composite score of 68.0.
- Benchmark score
- 77.9
- Coverage
- 53%
- Best for
- Chat
Newer tracked launch: Claude Opus 4.6 (Fast). Release coverage is live before it becomes rankable.
Best open model
Llama 4 Maverick
The strongest open-weight entry on the weighted ranking right now, with benchmark coverage baked into the score.
Open source shortlistBest value
Mistral Nemo
Strongest quality-per-cost ratio in the current leaderboard, useful when performance still has to fit a budget.
Full value rankingFrontier signal
The AGI progress view is a compact frontier chart, not just a link.
May 2024
GPT-4o (extended)
Dec 2024
DeepSeek V3
Mar 2025
Gemini 2.5 Pro
Apr 2025
O3
Aug 2025
GPT-5
Frontier leader
GPT-5
This is the highest published AGI-style frontier score in the current benchmark set at 84.2.
12 month gain
+19.7
Change in the frontier signal between the current leader and the last comparable point roughly one year earlier.
Strongest benchmark
Chatbot Arena ELO
The current frontier leader’s strongest normalized result is 92.5 on this track.
Breaking news / daily digest
The current brief.
9 Apr 2026 digest with 20 stories from 677 sources.
Meta Releases Muse Spark - A Natively Multimodal Reasoning model
Muse Spark is a natively multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration.
Mamba 1 & 2 to Mamba 3 Architectural Upgrade
This repository contains the methodology and scripts to bypass training from scratch by structurally transplanting weights from the Mamba-1/Mamba-2 architectures directly into Mamba-3 gates.
Finally Abliterated Sarvam 30B and 105B!
I abliterated Sarvam-30B and 105B - India's first multilingual MoE reasoning models - and found something interesting along the way!
Turbo-OCR for high-volume image and PDF processing
I recently had to process \~940,000 PDFs. I started with the standard OCR tools, but the bottlenecking was frustrating. Even on an RTX 5090, I was seeing low speed.
[AutoBe] Qwen 3.5-27B Just Built Complete Backends from Scratch — 100% Compilation, 25x Cheaper
We benchmarked Qwen 3.5-27B against 10 other models on backend generation — including Claude Opus 4.6 and GPT-5.4. The outputs were nearly identical. 25x cheaper.
Updated data
Pipeline freshness.
Evaluated composite
17 benchmark tracks feed the benchmark-backed scored view.
Pricing & value
Official provider pricing and routed API cost references.
Speed measurements
Latency and tokens-per-second snapshots for tracked models.
Jobs market
1,088 live roles across tracked company boards.
Daily digest
20 stories from 677 sources in the latest brief.
Today in AI
The launch birthdays and lab dates that matter.
No exact anniversary lands today. The next one is Llama 3 released in 4 days.
Llama 3 released
Llama 3 strengthened Meta’s position in open-weight models and raised the bar for broadly available open releases.
Google DeepMind formed
Google merged DeepMind and Google Brain into one lab, concentrating one of the largest frontier AI teams under a single brand.
GPT-4o introduced
GPT-4o fused text, image and audio into a single flagship model and reset the baseline for mainstream multimodal products.
Latest activities
The site changelog, in live form.
Recomputed benchmark-weighted quality scores
Refreshed the model quality layer that feeds ranking and comparison pages.
Updated speed measurements
Refreshed output speed and latency references for tracked models.
Synced Chatbot Arena benchmark track
Updated the frontier conversation signal used in leaderboard weighting.
Validated official pricing snapshots
Rechecked provider pricing pages against the comparison database.
Pulled latest OpenRouter price index
Updated comparison data for providers and routed model endpoints.
Jobs market snapshot refreshed
1,088 open roles across 10 tracked companies.
Mamba 1 & 2 to Mamba 3 Architectural Upgrade
Reddit r/LocalLLaMA featured in the latest daily brief.
Published the 2026-04-09 daily digest
20 stories captured from 677 sources.
The homepage now separates the current frontier watchlist from the evaluated composite. The scored view blends normalized benchmark results, the existing quality layer, and a freshness signal, then penalizes thin evidence, stale provider generations, and beta or compact variants. The AGI panel is a derived frontier signal built from ARC-AGI, GPQA Diamond, Humanity’s Last Exam, MMLU-Pro, SWE-bench Verified, and Chatbot Arena. Read the methodology before treating any ranking as gospel.