Between a quarter and a third of everything on the web is copied from somewhere else

There’s a lot of junk on the web. There is also a lot of good stuff on the web. And then there is the stuff that’s been lifted from the good and dropped amid the dross—the aggregation, the block-quotes, the straight-off copy-paste jobs.

By Leo Mirani1 min readUpdated July 21, 2022

Add QZ to Google

The extent of that duplication now has a number: according to Matt Cutts, a long time Google $GOOGL search engineer who developed Google’s family-friendly “SafeSearch” filter and who now leads Google’s web spam team, “something like 25% or 30% of the web’s content is duplicate content.”

The essential business news, delivered fresh every morning.

Join 500,000+ readers who start their day with Quartz.

By subscribing, you agree to our Terms of Service and Privacy Policy.

Nonetheless, if search engines didn’t have a way to detect duplications, the internet would be almost unnavigable. Google’s approach, as you’ll almost certainly have noticed when you use it, is to omit pages that have very similar content, but to offer users the ability to see the similar results if they’re really interested. Things that are auto-created, however, like a blog that’s made up entirely of feeds from other blogs, might be treated as spam, Cutts says. That means most people will never encounter this large chunk of the internet. It also means there’s that much less you need to get through before you finish reading the entire internet this morning.

Note: This post is based on a YouTube video posted by Cutts, which means in a manner of speaking, this too is duplicate content.