Google Updates Indexation Policy; Limits Crawler To 15 MB HTML Content
In a surprising development, Google has updated its Googebot’s help document and announced that the Googlebot would crawl only the first 15 MB of a web page. It means that anything on the web page beyond that 15 MB cut-off point will not be crawled, indexed, or included in search engine rankings calculations.
Here is what Google’s updated help document states:
“Any resources referenced in the HTML such as images, videos, CSS, and JavaScript are fetched separately. After the first 15 MB of the file, Googlebot stops crawling and only considers the first 15 MB of the file for indexing. The file size limit is applied on the uncompressed data.”
Google’s John Mueller briefly elaborated on this change and confirmed that this change would only apply to the HTML file itself.
“It’s specific to the HTML file itself like it’s written. Embedded resources/content pulled in with IMG tags is not a part of the HTML file,” clarified John Mueller on Twitter.
What is the potential impact of this change?
Theoretically, this change should not affect a large majority of websites. That is because the best SEO practices recommend keeping HTML pages to 100 KB or less. If a website has been following the recommended SEO practices, and the HTML file size is around that 100 KB mark, it should be completely fine and unaffected by this change.
However, not all websites follow the recommended practices to a T. If you potentially have some important content on a web page beyond the 15 MB cut-off point, it will likely be affected.
Besides, it may sound theoretically worrisome, but 15 MB is a reasonably big size limit. So most websites should be fine.
Google PageSpeed Insights
The easiest way to identify if this is going to be an issue for you or not is to use a free tool like Google PageSpeed Insights and check the HTML size of your web pages.
What SEOs and webmasters could do
There isn’t much to do except follow the best SEO practices that keep the page size as small as possible.
If you run into the aforementioned issue, there are two steps you can take:
- First, make sure that all the important content that you want Google to crawl and index is included near the top of the web page.
- Second, do not encode images and videos directly into the HTML. That’s because, as John Mueller clarified, embedded content that is pulled in with IMG tags wouldn’t be a part of the HTML file and, therefore, won’t be counted when calculating the 15 MB HTML file.
If you have any questions about this update or how you ensure your website is in proper shape for the best possible SEO results, feel free to contact us.