⭐ Follow me as a preferred source on Google
Data Study · May 2026

How the Web Actually
Uses Schema.org

Google tracks how many domains use each Schema.org vocabulary term and publishes the data monthly. I analyzed the May 2026 snapshot — 5,545 terms across 6 adoption tiers — to understand what structured data the web actually implements versus what schema.org defines.

Items
All vocabulary terms combined — the full 5,545 rows in the dataset.
Itemtypes
Schema.org classes — types like schema:Person or schema:Product.
Predicates
Schema.org properties — attributes like schema:name or schema:author.

Data source: Schema.org GitHub — data/public_stats/google · Snapshot: May 2026

5,545
Total Terms Tracked
Itemtypes + Predicates
77%
Used by < 1K Domains
the long tail of schema.org
43
Terms at 10M+ Domains
the truly mainstream terms
0.8%
Reach 10M+ Domains
of all defined vocabulary

Key Findings

What the Data Shows

Long Tail

77% of all schema.org terms are used by fewer than 1,000 domains. The vocabulary is vast, but real-world adoption is concentrated in a small fraction of what's defined. Most of schema.org exists in theory only.

Predicates Worse

82% of all schema.org properties (Predicates) are in the <1K bucket, compared to 50% of types (Itemtypes). Properties are more numerous and far less adopted — most have almost no real-world usage.

Types Better Spread

Itemtypes have a more balanced adoption curve. While still heavily skewed toward rare, 40% of types are used by 1K or more domains — versus only 18% of predicates. Types lead real-world usage at every tier above the floor.

The 43

Only 43 terms out of 5,545 reach the 10M+ tier. The structured data that Google sees consistently across the web. 12 are Itemtypes, 31 are Predicates.

Core Types

The 12 most common Itemtypes are overwhelmingly infrastructure-oriented types — including WebSite, WebPage, BreadcrumbList, Organization, Person, and ImageObject. Their prevalence likely reflects the fact that many of these Schema types are automatically generated by popular CMS platforms, themes, and SEO plugins.

Core Properties

The 31 mainstream Predicates are equally predictable — name, url, description, image, author, datePublished. These are properties every CMS and site template already emits. Adoption is high because the barrier is near zero.

Adoption Distribution

Where All 5,545 Terms Actually Land

All Terms by Domain Bucket — May 2026 (n = 5,545)
4,264
< 1K Domains
560
1K – 10K Domains
420
10K – 100K Domains
158
100K – 1M Domains
100
1M – 10M Domains
43
10M+ Domains
Full Distribution Table
All Terms by Bucket — Count and Share
Domain Bucket Count % of Total Itemtypes Predicates
< 1K4,26477.0%4853,779
1K – 10K56010.1%236324
10K – 100K4207.6%151269
100K – 1M1582.9%39119
1M – 10M1001.8%3565
10M+430.8%1231

Interpretation

The distribution follows a classic power law: extremely concentrated at the bottom, with a long, thin tail at the top. The vast majority of Schema.org vocabulary remains theoretical in the sense that it exists within the standard but sees little real-world adoption. Moving from one tier to the next represents an order-of-magnitude increase in domain reach, while each successive tier contains dramatically fewer terms.

Class Comparison

Itemtypes vs. Predicates — Adoption by Tier

958 Itemtypes (types like schema:Product) vs. 4,587 Predicates (properties like schema:author). The adoption patterns are notably different.

Itemtypes — Bucket Distribution (958 total)
Predicates — Bucket Distribution (4,587 total)
50%
Itemtypes in <1K
82%
Predicates in <1K
12
Itemtypes at 10M+
31
Predicates at 10M+

Interpretation

Types Win Mid-Tier

Itemtypes have significantly stronger mid-tier adoption. 24.6% of types land in the 1K–10K bucket vs. only 7.1% of predicates.

Predicates Dominate Rare

Predicates are 4.8× more numerous than types yet overwhelmingly rare. Schema.org defines properties for nearly every conceivable attribute, but most never get implemented. The vocabulary outpaced real-world need.

Predicates Lead at Top

At the 10M+ tier, predicates (31) outnumber types (12) nearly 3:1. This makes sense structurally — a single type declaration like WebSite triggers multiple property declarations (name, url, description, publisher) simultaneously.

Itemtype Adoption

Most Used Schema.org Types by Tier

Focusing on Itemtypes only (not properties). Within each tier, exact ranking is unknown — Google publishes bucket ranges, not precise counts. Types are listed alphabetically within their tier.

Adoption Ladder (click to expand)
10M+
12 types
12
1M – 10M
35 types
35
100K – 1M
39 types
39
10K – 100K
151 types
151
1K – 10K
236 types
236
< 1K
485 types
485

Bar width = number of types in each tier. Click the top two rungs to see the terms.

Interpretation

Infrastructure First

All 12 x 10M+ types are structural, not content types. WebSite, WebPage, BreadcrumbList, ListItem — these describe page architecture. The only entity types are Organization and Person. No Product, Article, or Review made it to this tier.

Rich Results Live at 1M–10M

The types Google recommends for rich results — Product, Review, FAQPage, VideoObject, BlogPosting — are all in the 1M–10M tier, not 10M+.

Event Not in Top 2 Tiers

Event sits in the 100K–1M bucket, despite being a prominent Google rich result type. Structured data adoption for events is notably lower than for products or articles — likely because event sites are a smaller slice of the web.

Mainstream Adoption

The 10M+ Club

These are the schema.org terms that appear on more than 10 million domains. If you're implementing structured data for SEO, this is the list that counts.

Itemtypes — 12 terms at 10M+ domains
BreadcrumbList EntryPoint ImageObject ListItem Organization Person PropertyValueSpecification ReadAction SearchAction Thing WebPage WebSite
Predicates — 31 terms at 10M+ domains
name url description image author datePublished dateModified headline publisher logo sameAs breadcrumb itemListElement item position about potentialAction target query-input urlTemplate mainEntityOfPage primaryImageOfPage isPartOf inLanguage contentUrl thumbnailUrl caption height width valueName valueRequired

Interpretation

Infrastructure Types

The 12 mainstream Itemtypes are structural, not semantic. WebSite, WebPage, BreadcrumbList, ListItem — these describe the architecture of a page, not its topic. Organization and Person are the only entity types that made the cut.

CMS-Emitted Properties

The 31 mainstream Predicates are almost all auto-generated by CMSs and templates. name, url, description, image, datePublished — these appear because WordPress, Yoast, and site builders emit them automatically, not because developers chose them deliberately.

What's Missing

Product, Event, Recipe, FAQPage, HowTo, Review — none reached 10M+. These high-intent rich result types that Google heavily promotes are still used by fewer than 10M domains, suggesting uptake of deliberate structured data is lower than commonly assumed.

The Long Tail

4,264 Terms Used by Fewer Than 1,000 Domains

Long Tail Breakdown by Class
Predicates in <1K3,779 terms — 82% of all predicates
Itemtypes in <1K485 terms — 50% of all itemtypes
All Terms in <1K4,264 terms — 77% of total vocabulary

All Terms at 1M+ Domains143 terms — 2.6% of total vocabulary
All Terms at 10M+ Domains43 terms — 0.8% of total vocabulary

Interpretation

The long tail is not a failure — it is the nature of a general-purpose vocabulary. Schema.org is designed to describe every conceivable type of thing: medical conditions, academic courses, sports events, music recordings, legislative processes. Most of the vocabulary will always be specialized by design.

But it does have a practical implication: implementing any term below the 10K-domain threshold means having an edge over your competitors.

What This Means for Structured Data Strategy

Schema.org, in practice, is shaped far more by implementation defaults than by the full scope of its specification. While the vocabulary contains thousands of terms, real-world usage is heavily concentrated in a small subset.

The data shows a clear long-tail distribution: 77% of all terms are used by fewer than 1,000 domains, and only 0.8% reach the 10M+ tier. This reinforces that Schema.org is not uniformly adopted, but instead follows a steep power law where a small core carries the majority of real-world usage.

At the top of this distribution, adoption is dominated by infrastructure-level constructs — WebSite, WebPage, Organization, Person, BreadcrumbList — along with a small set of universally emitted properties such as name, url, and description. Their prevalence is strongly influenced by CMS defaults, themes, and SEO tooling, which lower the barrier to near-zero implementation effort.

Because most Schema.org adoption is passive — driven by tooling rather than deliberate implementation — there is a real opportunity for those who choose to go further. Intentional, well-structured schema markup remains rare enough that implementing it thoughtfully is still a meaningful competitive differentiator.

Methodology

How the Data Was Collected

Data Source
Step 1 — Source

Google publishes monthly snapshots of schema.org vocabulary adoption to the schema.org GitHub repository at data/public_stats/google/. Files follow the pattern YYYY_MM.csv.


Step 2 — The Bucket System

Google does not publish exact domain counts. Instead, each term is assigned to a bucket: <1K, 1K–10K, 10K–100K, 100K–1M, 1M–10M, or 10M+. Buckets represent ranges, not exact numbers — analysis is tier-based, not precise.


Step 3 — Snapshot Analyzed

This report uses the May 2026 snapshot (2026_05.csv), downloaded June 2026. The dataset reflects Google's crawl of the public web at that point in time.

Dataset at a Glance
Total Rows5,545
Itemtypes958 (17.3%)
Predicates4,587 (82.7%)

Terms in <1K bucket4,264 (77%)
Terms at 10M+43 (0.8%)

Snapshot DateMay 2026
DownloadedJune 2026