Collection Policy
This is the practical policy behind how the site gathers public information, where we draw the line, and how we review data before it should be trusted on the public site.
Default rule
We prefer official sources first. If a provider, benchmark owner, or platform publishes an official API, docs page, pricing page, changelog, blog, or status page, that should beat third-party summaries wherever possible.
Allowed collection
- Official APIs, official docs, changelogs, and model/pricing pages
- Official blogs, newsroom pages, release notes, and status pages
- Public RSS or Atom feeds where a source provides them
- Public benchmark endpoints, leaderboards, and research repositories
- Public model hubs and dataset indexes where programmatic access is clearly intended
Restricted or manual-review collection
These sources are not banned outright, but they should not quietly flow through the normal automation without a clearer compliance and accuracy path.
- X / Twitter accounts and lists: manual curation or official API only, no unauthorised scraping
- Paywalled or login-gated sources: manual review only unless explicit licensed access exists
- Community forums or user-generated spaces with unclear reuse rights: manual review first
- Any source whose terms, robots rules, or access pattern are unclear enough to create doubt
Disallowed collection
- Bypassing paywalls, authentication, anti-bot checks, or other access controls
- Copying or republishing full third-party articles instead of linking and summarising
- Aggressive crawling that ignores rate limits or source instructions
- Silent reuse of data from sources that explicitly prohibit that collection path
Accuracy workflow
Freshness matters, but freshness without verification is not good enough. The target operating model is:
- Automated first pass: collect candidate updates from allowed public sources on a scheduled cadence.
- Canonical source check: prefer the official provider, benchmark owner, or primary documentation when available.
- Cross-check pass: compare against secondary reputable sources when the official source is missing, delayed, or ambiguous.
- Manual review queue: hold or correct items when routing, naming, pricing, benchmark scope, or terms compliance is uncertain.
- Correction loop: update the site and source notes quickly when a verified error is found.
Routing and relevance
Shared ingestion is not enough by itself. Each website in the estate should only receive the stories that match its brief. AI Resource Hub should get technical AI coverage, not crypto, photography, or other off-brief material. When that kind of item appears, it should be treated as a routing bug and fixed at the pipeline level.
The current routing snapshot is visible on the News Pipeline board.
Corrections and takedowns
If a source owner, lab, researcher, or reader spots a compliance problem or a factual error, we should correct it quickly. For data-source and citation context, see References & Sources. For scoring and ranking logic, see the Methodology page.