References & Sources
Every data point on this site has a source. This page lists all external data sources, academic papers, and benchmark methodologies we reference. See our methodology page for how we score and rank models, and our collection policy for what we will and will not ingest.
Data Sources
We refresh data through a scheduled hourly pipeline, with manual provider-status reruns available when needed. Collection uses identified User-Agent strings and is limited to public APIs, public feeds, official blogs and newsroom pages, documentation pages, and other clearly public endpoints that fit the site brief.
Model Pricing & Availability
HourlyPrimary pricing source; 500+ models with 8 pricing dimensions. No authentication required.
Official GPT and o-series model pricing.
Official Claude model pricing.
Official Gemini model pricing.
Official Mistral model pricing.
Official DeepSeek model pricing.
Official Grok model pricing.
Benchmark Scores
As publishedCrowdsourced Elo ratings from 5M+ human preference votes.
Aggregated benchmark leaderboards across ML tasks.
Standardised evaluations for open-weight models.
Holistic Evaluation of Language Models.
Benchmark results with historical trend data.
Contamination-free benchmark with monthly question refresh.
Abstract reasoning benchmark measuring fluid intelligence.
Real-world GitHub issue resolution benchmark.
1,865 long-horizon tasks across 41 repos; harder successor to SWE-bench.
Code generation benchmark with complex instructions.
Independent speed and quality benchmarks; source for TTFT and output speed data.
Expert-driven evaluations and safety benchmarks.
Speed & Latency
HourlyPrimary source for TTFT (time to first token) and output speed measurements.
Official performance data published by OpenAI, Anthropic, Google, and others.
Research & Trend Data
OngoingLargest public database of notable ML models (3,200+ from 1950–present). Training compute estimates, parameter counts, training costs. CC-BY licensed.
Annual comprehensive report tracking AI across technical, economic, and societal dimensions.
Interactive visualisations of AI model counts, compute growth, and country-level trends.
Safety & Frontier Evaluations
As publishedModel Evaluation & Threat Research. Pre-deployment evaluator for frontier models; publishes RE-Bench.
Evaluated 30+ frontier models across cyber, biology, and autonomy domains.
AI safety evaluations focused on scheming and deception detection.
Model Discovery & News
HourlyModel cards for open-weight models; parameter counts, licences, release dates.
Research papers in cs.AI, cs.LG, and cs.CL.
Industry news and analysis.
AI industry reporting and product coverage.
Consumer and platform coverage from a public AI-specific feed.
Technology reporting filtered for AI-relevant coverage.
Official blogs and newsroom pages from OpenAI, Anthropic, Google, and other major labs.
Academic Papers & Citations
Research papers referenced in our benchmark scores, blog posts, guides, and model evaluations. Sorted by category and year.
Foundational Research
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). NeurIPS 2017.
The Transformer architecture paper.
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I. (2018). OpenAI.
GPT-1.
Radford, A., Wu, J., Child, R., et al. (2019). OpenAI.
GPT-2.
Brown, T., Mann, B., Ryder, N., et al. (2020). NeurIPS 2020.
GPT-3; introduced in-context learning.
Ouyang, L., Wu, J., Jiang, X., et al. (2022). NeurIPS 2022.
InstructGPT / RLHF paper.
Benchmark Methodologies
Hendrycks, D., Burns, C., Basart, S., et al. (2021). ICLR 2021.
MMLU benchmark.
Rein, D., Hou, B.L., Stickland, A.C., et al. (2023). arXiv.
GPQA benchmark; expert-level questions.
Chen, M., Tworek, J., Jun, H., et al. (2021). arXiv.
HumanEval benchmark for code generation.
Hendrycks, D., Burns, C., Kadavath, S., et al. (2021). NeurIPS 2021.
MATH benchmark.
Jimenez, C.E., Yang, J., Wettig, A., et al. (2024). ICLR 2024.
SWE-bench benchmark for software engineering.
Chiang, W.-L., Zheng, L., Sheng, Y., et al. (2024). ICML 2024.
LMSYS Chatbot Arena methodology.
Zhuo, T.Y., Vu, M.C., Chim, J., et al. (2024). arXiv.
BigCodeBench methodology.
White, C., Dooley, S., Roberts, M., et al. (2024). arXiv.
LiveBench methodology.
Domain-Specific Benchmarks
Jin, D., Pan, E., Oufattole, N., et al. (2021). Applied Sciences.
MedQA benchmark used in our Healthcare leaderboard.
Guha, N., Nyarko, J., Ho, D., et al. (2024). NeurIPS 2023 Datasets & Benchmarks Track.
LegalBench benchmark used in our Legal leaderboard.
Chen, Z., Chen, W., Smiley, C., et al. (2021). EMNLP 2021.
FinQA benchmark used in our Finance leaderboard.
Islam, A., Keung, P., Gupta, D., et al. (2023). arXiv.
FinanceBench used in our Finance leaderboard.
Zhou, J., Lu, T., Mishra, S., et al. (2024). ICLR 2024.
WebArena benchmark used in our AI Agents leaderboard.
Mialon, G., Dessì, R., Lomeli, M., et al. (2024). ICLR 2024.
GAIA benchmark used in our AI Agents leaderboard.
Model Technical Reports
Anthropic (2024). Anthropic.
Claude 3 model card and capabilities.
Google DeepMind (2024). arXiv.
Gemini model family.
DeepSeek AI (2025). arXiv.
DeepSeek R1 reasoning model.
Safety & Alignment
Bai, Y., Jones, A., Ndousse, K., et al. (2022). arXiv.
Anthropic RLHF methodology.
Christiano, P., Leike, J., Brown, T., et al. (2017). NeurIPS 2017.
Foundational RLHF paper.
UK AI Security Institute (2025). AISI.
Evaluations of 30+ frontier models.
Compliance & Terms of Service
API Usage
- OpenRouter API — public endpoint, no authentication required. We use their
/api/v1/modelsendpoint which is explicitly designed for programmatic access. - OpenAI API — we optionally use the models list endpoint to verify model availability. Requires an API key when configured.
- HuggingFace Spaces — we access public Gradio API endpoints for Chatbot Arena and Open LLM Leaderboard data.
- News collection — we prioritise public APIs, RSS and Atom feeds, official provider blogs, newsroom pages, and other clearly public source surfaces rather than scraping full article bodies.
Web Scraping Practices
- All scrapers use an identified User-Agent:
The-AI-Resource-Hub-Bot/1.0 - We respect
robots.txtdirectives on all sites - Collection runs on a conservative scheduled cadence and stays well below common rate limits
- We only access publicly available pages and API endpoints
- We do not circumvent paywalls, authentication, or access controls
- Pricing data is factual information used for comparison purposes
Data Licensing
- Epoch AI — data used under CC-BY 4.0 licence. Attribution: epoch.ai/data
- Academic papers — cited under fair use for commentary, comparison, and educational purposes
- Benchmark scores — factual data reported from official sources with full attribution
- Provider logos/names — used nominatively for identification and comparison
Corrections & Takedowns
If you represent a data source listed here and have concerns about how we use your data, please review the repository and contact the site owner via the GitHub profile. We take accuracy and compliance seriously and will review credible requests promptly.
How We Use This Data
For details on our scoring formula, quality metrics, and update frequency, see our Methodology page. For the practical rules behind source collection, routing, and exclusions, see the Collection Policy.