Google indexes iFrame content

August 20, 2021

I always assumed that web crawlers were either unable or intentionally instructed to avoid indexing iFrame content. Turns out I’ve been wrong all along.

Over at CSS-Tricks, we embed CodePen demos on, like, practically every post we publish. Those are iFrames that contain content hosted on CodePen. Search engines won’t crawl cross-origin content, right? Something something same-origin policy something something.

Assumptions and ignorance are apparently a bad mix because this is what we saw popping up in a Google search for “css backdrop-filter”:

Showing aGoogle search result. The page title is backdrop-filter, blur, 5 pixels.

But is this correct?

The problem is that page title is nowhere in the post content. It’s certainly not some landmark element for web crawlers, like an <h1>… or any other heading for that matter. Except that it is exactly that in one of the post’s embedded iFrames.

A screenshot of the embedded iframe with Safari's developer tools open to the right of it. A line of code is highlighted in the developer tools showing the heading 1 element that contains the content being pulled into the search engine results.
There it is! 🕵️

And this is only happening with Google.

Big deal, right? The bigger news is that… Google indexes iFrames. No, it’s not big news. I mean, it’s news to my 2015 understanding of SEO. Here’s what one (top ranking) search result says on the matter:

The fact that the search engine can crawl website content of iFrames indicates that Google has taken great strides to mitigate detriments caused by inline frames. Although Google’s ability to crawl iFrames is limited, Webmasters can also take advantage of this action by managing inbound links from search engine optimization (SEO) advertising blogs or other content.

Timothy Carter, “Google Is Known to Crawl iFrames on Your Website”

Good, bad… meh?

The title of that post is spot on because Google might crawl an iFrame. It also might not. It’s Google’s choice and they apparently decide based on surrounding content and backlinks. I assume (there we go again) this is because iFrames may contain genuinely relevant content and Google wants to consider every bit of valuable content it can to return the most accurate search results.

And like any software or machine, Google’s not always going to be right as far as what it returns. It’s not that it was terribly wrong in our specific case — but it definitely overlooked the most relevant <h1> in the document (iFrame included) in favor of another another one it found.

I suppose the harm, as that article continues, is the possibility of trouncing the authority of source content:

The paradox of Google’s crawling capabilities is that the search engine now penalizes over-optimized websites. Contrary to marketing strategy, the effect of over-optimization can lead to Penguin penalties. This includes iFrames content sharing embed with other sites. This is the danger of embedding content from another unknown website. Without continuous monitoring of backlink detriments, and request for link disavowal from Google, a site can by default get swept under the rug with nearly zero top level results in the future.

That’s a debate for folks with a better understanding of SEO. I’ve clearly shown my ineptitude on the subject. But, if you just so happen to be the owner of the content that is contained in an iFrame and you embed that iFrame on your own site, you might be able to chuck some sort nofollow tag in there or something to prevent that?

<meta name="robots" content="noindex, nofollow">

This just in…

Of course as I’m writing this, Chris found some news from a super good source that is super new (two days old). It doesn’t provide any definitive answer, but does observe Google re-writing <title> tags:

In theory, it sounds like Google may choose to grab any relevant text from a page and display it as the title in SERPs. That’s long been the case for meta descriptions, as Google can dynamically adjust the description in search snippets to better match a user’s query.

And it may just wind up being some sort of experiment.

It’s impossible to draw any conclusions about Google rewriting title tags at this time. Google is known to run A/B tests in live search results, so it’s possible what we’re seeing will go back to normal in the near future.

So for now, it’s sort of a wait-and-see approach.