Should You Be Concerned About Duplicate Content Penalties?

What have I done!?

There are few subjects in the world of search engine optimization (SEO) that case more fear and confusion than the dreaded “duplicate content penalty”.

According to popular legend, if you post content to your site that too closely resembles articles on other web pages (whether unintentionally or purposefully, through the misuse of PLR content, manufacturer’s product descriptions and other stock text), the Googlebot will tag your site as a scammer, assess a penalty, and downgrade your rankings in the natural search pages.

But with the amount of content on the web today, is this really something you need to worry about?  Let’s take a closer look at what duplicate content is and whether your site could be subject to penalties from including this repetitive text…

First of all, it’s worth pointing out that duplicate content can be caused by a number of different scenarios:

  • Copying and pasting content from another person’s site to your own (as described above)
  • Having another person copy your unique content and paste it to his own site (called “scraping”)
  • Copying manufacturer product descriptions “as is” when reselling products from a third-party merchant
  • Repeating product description section headings multiple website pages (for example, “Specifications” or “Sizing Information”)
  • Using forums which render multiple versions of your pages on display (particularly in mobile environments)
  • Having a “printer friendly” version of your content pages available for users
  • Using an ecommerce site structure that allows products to be displayed on multiple pages
  • Making use of proxy browsers which store multiple versions of your site

Really, only a few of these examples represent malicious intent (in the case of spamming).  Most of the time, the inclusion of duplicate content on websites is harmless in nature – if the site owner even knows it’s occurring at all!

And given that there’s rarely a negative intent behind instances of duplicate content, would it really be fair to punish webmasters who are doing their best to supply a feature rich environment for their users?

It wouldn’t – and fortunately, it doesn’t happen.  The Google Webmaster Central Blog sums this up nicely, saying:

“Let’s put this to bed once and for all, folks: There’s no such thing as a “duplicate content penalty.”  At least, not in the way most people mean when they say that.”

Essentially, the article confirms that while there is no penalty assessed automatically for the benign instances of duplicate content described above, the search giant does reserve the right to penalize sites that violate its Webmaster Guidelines.  There are three specific instances referenced where this may occur:

  • The creation of multiple pages, domains or subdomains with substantially relevant content,
  • The use of “cookie cutter” affiliate pages provided by affiliate programs and used by multiple program participants, and
  • Sites that do not add value beyond promoting affiliate products.

This makes sense.  Google’s primary goal is to provide its users with the best possible search results, and it stands to reason that a group of several sites all displaying the same content can’t meet this need.  To keep people coming back (and to keep its advertising revenue up), Google must filter out these lower-quality results, and picking up on duplicated content is one of the ways it’s able to do so.

However, the fact that Google doesn’t automatically issue a duplicate content penalty to repetitive articles doesn’t tell us how it does handle these situations.  And indeed, the second half of the Google Webmaster Central blog quote referenced above (“At least, not in the way most people mean when they say that”) seems to leave open the possibility that Google does, in fact, factor content uniqueness into its algorithms somehow.

A post in the Google Webmaster Tools help section regarding duplicate content issues arising from malicious scraper sites does a good job of clarifying how the search engine handles these instances:

“Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results. If your site suffers from duplicate content issues, and you don’t follow the advice listed above, we do a good job of choosing a version of the content to show in our search results.”

The key to understanding how duplicate content is processed lies in the final sentence of this quote.  When the Google search engine spiders are confronted with repetitive text, instead of automatically issuing a penalty or decreasing a site’s rankings, they use a separate process to determine which version of the duplicate content should be displayed in the results.  This causes most of the duplicate content instances to be filtered out in favor of one (or more) selected version(s) which will appear in the SERPs.

To determine which instances should be filtered out and which pages should remain, the Googlebot applies a series of filters that mimic its overall ranking algorithms.  For example, this process might look at:

  • The number of backlinks pointing at each instance of duplicate content
  • The overall authority of each domain hosting instances of duplicate content
  • The amount of unique content residing on each site besides the duplicate content

Google claims that these filters do a good job prioritizing content creators over malicious spammers, but mistakes do occur.  Even if your SERPs rankings aren’t suffering as a result of scraper sites, be aware that even benign instances of duplicate content (for example, multiple cached page versions or duplicate links) can lead to increased bandwidth usage, possibly slowing down or crashing your site.

For this reason, it’s important to be aware of any potential duplicate content issues that may exist on your site, as well as to remedy them as quickly as possible to prevent any negative effects from occurring.  Even if you aren’t actively scraping content from other websites, you might be surprised to find that the structure of your website is causing issues in this regard.

To identify any instances of duplicate content that exist on your site, use tools like Google’s Webmaster Central or Blekko.com.  Then, if you do come across any issues, take the following steps to minimize any negative impact these instances could be having on your site’s performance or rankings:

  • Re-write any content on your site that’s been tagged as unoriginal.  This is especially important if you’re using manufacturer stock product descriptions – since only a few instances of this content will be allowed in the search results pages, you limit your ability to get ranked by including this text.
  • Use appropriate redirection codes.  Moving content on your site can lead to unintended duplicate content filtering, so be sure to implement proper 301 redirects when changing article locations across your pages.
  • Be consistent in your linking.  Because link structures play a tremendous role in how users access your content and how the search engines navigate your pages, pay special attention to the way you build your links to be sure they’re consistent across your site.

The Google Webmaster Tools help section offers more tips for managing potential duplicate content filtering, which can be useful if your site is experiencing negative consequences as a result of similar text.  However, for most webmasters, these tips won’t be necessary.  Following website architecture best practices and publishing good, unique content should be enough for most people to avoid any negative impact due to duplicate content penalties.

Image: miguelavg

 

Leave a Reply

© 2013 Websites Blog. Powered by Homestead