Auditing Large Websites

This week's newsletter is sponsored by me 😄
Check out my Python for SEO course
A lot of SEOs speak from their own experience and then generalize it, as if there’s a one-size-fits-all approach. For an industry famous for saying “it depends,” that’s kind of a joke 😄
I’ve heard people confidently say things like “a technical SEO audit is a one-and-done thing.” Sure… maybe if you’re working on a site that will NEVER change. But that's not always the case, especially with large websites. And honestly, sometimes I wonder… have those SEOs ever actually worked in-house? 😀
And when I say large, I mean large. Early in my career, I had the chance to work on gigantic websites with millions of pages. Let me tell you, the technical complexity is almost like a different sport, and the skillset you need goes far beyond a typical audit checklist.
So in this blog, I’m going to chit-chat about some of the key differences I’ve noticed when auditing large websites. Buckle up!
TL;DR
- The rules for auditing large websites with millions of pages are different.
- First thing to do when auditing a large website: find a list of all page templates, estimate the number of pages under each template, and assign business value in t-shirt size value.
- Some audit items like "orphaned pages" may not be relevant anymore. I mean if you have 1M pages, ya some pages maybe orphaned and that's not a problem (unless the page have some special important or something).
- Obviously internal link matters even more for large websites, but due to the website size, orphaned links are unavoidable.
- Crawling a large website is a different beast altogether. I discuss below three approaches other than "just crawl the entire website".
- Educating your devs about when to involve SEO helps prevent issues before they occur... that advice is valid for all websites, but becomes even more important with large websites when changes are happening all the time and the amount of SEO technical debt can accumulate quickly.
Page templates
The first thing I do when I audit a large website, is try to identify different page templates.
What is a page template?
A page template is like a blueprint for a web page. Instead of building every page one by one, a website can use a template to create many pages that follow the same design and structure.
For example
- All blog posts on a site might look the same (title at the top, picture, text, and related posts at the bottom) because they share one template.
- All product pages in an online store might follow another template (product image, price, description, and “add to cart” button).
Ok!
Let's say you have a streaming service business and you want to do a technical audit for the website. Page templates may look something like this

As you can see, I added the number of pages for each page template, but I also added the business priority for each template in terms of t-shirt sizes: S:Small, M:Medium, L:Large.
We are not looking for accuracy for the number of pages, try to estimate if you're unable to get the exact number.
For business priority, think about the most important pages in terms of business value. The homepage is always important. Next you have "purchasable titles" as these pages directly impact revenue.
Tie-breaker rule
If two or more templates have the same business priority, then look at the number of pages. The template used on more pages gets higher priority, since improving its performance has a bigger overall impact (e.g., on page speed or SEO).
Not Relevant Anymore
Something I noticed while doing a technical audit for a gigantic website, some things become irrelevant. As you go through your checklist, some audit items just don't make sense anymore. For example:
- Orphaned pages: pages with no internal links pointing to them. Ya that can happen when you have millions of pages 😄 Please note I'm not saying that internal linking does not matter for larger websites, on the contrary, I'm just saying we cannot expect each page to have an internal link.
- Number of links on a page: there's a rule somewhere, that I'm glade I didn't write 🤣, that says you don't want to have a lot of links on a page ... let me tell you something... if you think this ever mattered, I can confirm it doesn’t 😄... what matters is that you're linking for a reason, and not just for the sake of linking... don't over do it, but be natural about it. Link as much as needed.

- Backlinks Profile: not really a technical audit thing, but large websites tend to attract tons of backlinks and with that naturally comes a lot of low quality spammy links. It doesn't really matter.
Crawling
Crawling a large website is a different fish to fry. Our standard procedure in SEO is to plugin the website URL into ScreamingFrog (or whatever crawler you're using) and click run crawl...
Well not so much for a huge website. Try crawling Amazon, or Wikipedia 😄😄😄 you'll run out of memory and you're not even half way through. That's why in this case you need to use a cloud crawler.
But let me tell you something...
In my opinion (and this is probably going to get me in hot waters as usual) you rarely need to crawl the entire website when your website is that large.
There are few ways you can go about this:
- Audit page templates only, and rely on Google Search Console to report errors like 404s, canonicalization, etc...
- Crawl the website in folders, this may or may not be helpful depending on the size of the smallest folders on the website.
- Run a sampled crawl. What does that mean? you run the crawl till you gather enough information on existing issues, and most of the page templates have been crawled and includes in the report, but not necessarily all pages.
Educate Your Devs
The easiest way to fix technical SEO issues on a large website and keep it in a good shape for as long as possible, is to educate your devs.
On large websites, changes are happening ALL THE TIME everywhere. Devs may not know what changes are sensitive to SEO. Your best bet in this situation is to explain when they need to involve SEO.
- Website migration? --> involve SEO
- Launching new pages? --> involve SEO
- Redesign? --> involve SEO
and so forth... by being involved in the right time, you prevent issues before they happen. Obviously this is true for all websites. But for large websites, it becomes even more important, as technical SEO debts can accumulate quickly.
And That’s a Wrap (Almost 😄)
We love checklists, and they work. But when tackling a large website technical audit, you need to have a different mindset to prioritize and execute your audit.
SEO is not a one size fits all. Most of the statement in that format, end up failing the test of time and experience.
Thanks for reading and see you next newsletter!
Like what you read and want to support me?
- Sign up for my newsletter if you're not already.
- Share the newsletter and invite your friends to signup. Help me reach 2k signups by end of 2025 please 🙂
- Provide feedback on how I can make this newsletter better!!!
- Buy me coffee.
- If you're an SEO tool or an SEO service provider, consider sponsoring my newsletter. I'm also open to other partnership ideas as well.
Disclaimer: LLMs were used to assist in wording and phrasing this blog.
The SEO Riddler Newsletter
Join the newsletter to receive the latest updates in your inbox.