One of the major penalty issues we work with is Panda and duplicate content. Duplicate content generally presents itself in two ways:
- Duplicate content between inner-site pages on your website – having the exact same text on two pages or more.
- Duplicate content between two different domains – Having the exact same content on two different sites without giving attribution to the original.
When the Panda penalty was first released in February of 2011 there was a bit of a misconception regarding the type of duplicate content Google would penalize. First, duplicate content is not the only area that Panda targets. Panda targets content that’s “thin” and provides very little value to humans. The big grey area here is “valuable.”
Just because Panda targets thin and duplicate content does not mean really long articles that pass a copy-scape check (plagiarism check) are going to win the optimization game every time. If that were the case, we could use tools that just scraped the internet for content to create articles that would allow us to rank for everything. No, the key here is value! Value is the essential item that differentiates “thin” content from great content.
You will find examples of short articles that provide good value in press release or news sites. Generally, there are a lot of articles that are small in nature in news spaces that receive excellent rankings because they get a lot of human interactions via social space, comments, and back-links. This is a perfect example of how Google’s algorithm is at work.
So, understanding some of the basics of how Panda penalizes on-site content helps us understand when, and how we should use the rel=”canonical” tag on our website.
When and How Should You Be Using a Canonical Tag?
We should not be using rel=”canonical” on pages with duplicate content that seek to merely “game” the search engines. The rel=”canonical” is to be used in situations where duplicate content is necessary to better explain a product on a website.
For example, let’s say I have a website that sells basketballs. Generally, these basketballs differentiate in size, Children’s, Womans, and Men’s basketballs. Going a step further, there are different brands of basketballs for each size that have to be organized. Going a step further, there are different colors of these basketballs… uh oh… now we are going to have some serious issues with duplicate content.
Here’s the problem, writing content for thousands of pages that deal with very similar content is not only difficult but it can incur a Panda Penalty for having very “similar duplicate spun content” that describes each type of basketball. The more logical way to deal with ranking is to think of ranking Category landing pages for “theme keywords”.
Let’s Skitch a visual of what I’m talking about:
In the example above there’s a general category page that is about basketballs. This is the page we want to rank for our key-terms. We will use a rel=”canonical” tag to tell Google that we know the content on the sub-pages is generally thin and duplicate, and that the main page we want to rank which provides the most value is the category page.
Here’s another Skitch:
We will place <link rel=”canonical” href=”http://url.com/basketballs” /> on each page in the header to show Google, Bing, and Yahoo that ‘basketballs’ is the page we want to rank.
This is a classic example of a category URL creating multiple versions of the same page
3 Things Happen When the Canonical Tag is Not Used:
- The sites linking power is diluted
- Google has no clue what page to rank
- Google sees your website as less relevant to searchers
Not using the canonical tag can cause a significant loss of rankings for site owners. If your product website is experiencing a penalty in rankings it could be from a lack of proper site structure and use of proper canonicals.