There are few subjects in the world of search engine optimization (SEO) that case more fear and confusion than the dreaded “duplicate content penalty”.
According to popular legend, if you post content to your site that too closely resembles articles on other web pages (whether unintentionally or purposefully, through the misuse of PLR content, manufacturer’s product descriptions and other stock text), the Googlebot will tag your site as a scammer, assess a penalty, and downgrade your rankings in the natural search pages.
But with the amount of content on the web today, is this really something you need to worry about? Let’s take a closer look at what duplicate content is and whether your site could be subject to penalties from including this repetitive text…
First of all, it’s worth pointing out that duplicate content can be caused by a number of different scenarios:
Really, only a few of these examples represent malicious intent (in the case of spamming). Most of the time, the inclusion of duplicate content on websites is harmless in nature – if the site owner even knows it’s occurring at all!
And given that there’s rarely a negative intent behind instances of duplicate content, would it really be fair to punish webmasters who are doing their best to supply a feature rich environment for their users?
It wouldn’t – and fortunately, it doesn’t happen. The Google Webmaster Central Blog sums this up nicely, saying:
“Let’s put this to bed once and for all, folks: There’s no such thing as a “duplicate content penalty.” At least, not in the way most people mean when they say that.”
Essentially, the article confirms that while there is no penalty assessed automatically for the benign instances of duplicate content described above, the search giant does reserve the right to penalize sites that violate its Webmaster Guidelines. There are three specific instances referenced where this may occur:
This makes sense. Google’s primary goal is to provide its users with the best possible search results, and it stands to reason that a group of several sites all displaying the same content can’t meet this need. To keep people coming back (and to keep its advertising revenue up), Google must filter out these lower-quality results, and picking up on duplicated content is one of the ways it’s able to do so.
However, the fact that Google doesn’t automatically issue a duplicate content penalty to repetitive articles doesn’t tell us how it does handle these situations. And indeed, the second half of the Google Webmaster Central blog quote referenced above (“At least, not in the way most people mean when they say that”) seems to leave open the possibility that Google does, in fact, factor content uniqueness into its algorithms somehow.
A post in the Google Webmaster Tools help section regarding duplicate content issues arising from malicious scraper sites does a good job of clarifying how the search engine handles these instances:
“Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results. If your site suffers from duplicate content issues, and you don’t follow the advice listed above, we do a good job of choosing a version of the content to show in our search results.”
The key to understanding how duplicate content is processed lies in the final sentence of this quote. When the Google search engine spiders are confronted with repetitive text, instead of automatically issuing a penalty or decreasing a site’s rankings, they use a separate process to determine which version of the duplicate content should be displayed in the results. This causes most of the duplicate content instances to be filtered out in favor of one (or more) selected version(s) which will appear in the SERPs.
To determine which instances should be filtered out and which pages should remain, the Googlebot applies a series of filters that mimic its overall ranking algorithms. For example, this process might look at:
Google claims that these filters do a good job prioritizing content creators over malicious spammers, but mistakes do occur. Even if your SERPs rankings aren’t suffering as a result of scraper sites, be aware that even benign instances of duplicate content (for example, multiple cached page versions or duplicate links) can lead to increased bandwidth usage, possibly slowing down or crashing your site.
For this reason, it’s important to be aware of any potential duplicate content issues that may exist on your site, as well as to remedy them as quickly as possible to prevent any negative effects from occurring. Even if you aren’t actively scraping content from other websites, you might be surprised to find that the structure of your website is causing issues in this regard.
To identify any instances of duplicate content that exist on your site, use tools like Google’s Webmaster Central or Blekko.com. Then, if you do come across any issues, take the following steps to minimize any negative impact these instances could be having on your site’s performance or rankings:
The Google Webmaster Tools help section offers more tips for managing potential duplicate content filtering, which can be useful if your site is experiencing negative consequences as a result of similar text. However, for most webmasters, these tips won’t be necessary. Following website architecture best practices and publishing good, unique content should be enough for most people to avoid any negative impact due to duplicate content penalties.
Image: miguelavg