A technical SEO published a case study of how he solved a curious Crawled Currently Not Indexed problem on his site. While the solution he found might not be universal to others experiencing this problem, his method for identifying the problem and solving it presents a useful walkthrough for solving technical SEO problems.
What happened to his site indexing was really weird. But his solution was straightforward and makes sense.
I discovered a description of this problem on a tweet by Adam Gent (@Adoubleagent)
A little blog post about a technical SEO issue I had on my tiny website.
A Curious Case of Canonicalization –> https://t.co/pC2QAYLjq9
TL; DR – Google can get canonicalization very wrong which can impact SEO traffic.
— Adam Gent (@Adoubleagent) November 3, 2021
Crawled – Currently Not Indexed
There are many anecdotal reports of Crawled Currently Not Indexed on Facebook, Twitter and even in John Mueller’s Office-hours hangouts.
In a recent Office-hours hangout someone asked why Google Search Console (GSC) was showing Crawled Not Indexed but when you click through they turn out to be indexed. John Mueller answered that it’s just a lag between reports.
And in another Office-hours hangout John Mueller pointed out that it’s entirely normal for a site to have many page not be indexed.
“…if you have a smaller site and you’re seeing a significant part of your pages are not being indexed, then I would take a step back and try to reconsider the overall quality of the website and not focus so much on technical issues for those pages.
The other thing to keep in mind with regards to indexing, is it’s completely normal that we don’t index everything off of the website.
And over time, when you get to like 200 pages on your website and we index 180 of them, then that percentage gets a little bit smaller.”
While both of those are good reasons to explain why the Crawled Not Indexed issue is happening to some people, that is not the reason Adam Gent discovered.
Adam Gent discovered an entirely different problem that appeared to be an algorithm issue at Google itself. There was nothing wrong with the site itself, the problem was with Google’s indexing.
Why Crawled – Currently Not Indexed
Adam reviewed the GSC Index Coverage report and discovered that Google was crawling and indexing his feeds as if they were HTML pages.
He took random words from those pages and did a site: search with those words and discovered that the feed page content was indeed indexed.
To make matters worse, Google had apparently canonicalized the content on the RSS feed over the actual web page, accounting for why the real web pages were crawled but not indexed.
The RSS feed Was Generated by WordPress
An odd thing about this case is that when you look at the feed page it renders like a web page and not how an XML file usually renders.
Screenshot of Cache of RSS Feed
I might be wrong but that doesn’t look like a normal RSS feed. It looks like an HTML page.
Although the underlying code really is XML that’s not how most feeds normally look.
Could that have played a role in why Google chose to canonicalize the feed?
It’s hard to understand how that could happen because there are so many signals like internal linking that under usual circumstances would cause Google to favor the HTML pages as canonical.
How Adam Fixed the Problem
After Adam figured out what happened he removed those WordPress generated feed pages, submitted the feed URLs for a crawl and then 404’d the pages.
After those pages were dropped from the index he next submitted the correct URLs to Google and within a few days the problem was fixed.
What Caused the Problem?
Adam wrote that the problem appears to be on Google’s side.
I asked around and someone told me that apparently a few years ago Google started indexing feeds but that he thought this problem had been fixed.
I’m not an expert on XML but it seems unusual that the feed resembles an HTML page instead of the normal XML layout that shows up without HTML styling.
The feed doesn’t look normal so it seems like that whatever is making it look like that might be an underlying cause.
Regardless, if you’re having Crawled Currently Not Indexed problems, this is one more thing to check in case it’s also happening to you.
Read the original post that walks through solving the problem:
A Curious Case of Canonicalization