Duplicate content is substantive blocks of content within or across domains that either completely match each other or are very similar. Search Engines will index the ONE version of the content they feel is the original and most authoritative and will disregard the others. Some malicious reasons for duplicate content are when someone illegally copies someone else’s content and hurts the victim’s ranking.
However, most of the time it’s due to technical, non-malicious reasons why duplicate content exists on the Web:
- Country-specific content stored on one domain;
- Inconsistent internal linking (you link to the same page with http://www.example.com/page/ and http://www.example.com/page, and http://www.example.com/page/index.htm);
- Ecommerce sites have products shown or linked via multiple distinct URLs (session IDs, URL parameters used for tracking and sorting);
- Archived Web pages that are created by your content management system (CMS system);
- Printer-only versions of Web pages;
- Comment pagination (when each comment has its own page if you click on it and everything in the comment section is followed by spiders);
- Forums that generate both regular and stripped-down mobile-targeted pages.
To ensure that visitors see the content you want them to, do the following:
- Use top-level domains whenever possible to handle country-specific content to help search engines to serve the most appropriate version of a document. For example, http://www.example.es that contains Spain-oriented content is much better than http://www.example.com/es or http://es.example.com.
- Keep your internal linking consistent. For example, don’t link to http://www.example.com/page/ and http://www.example.com/page and http://www.example.com/page/index.htm. These can be seen as all different pages by a search engines spider.
- Research how your content management system displays the content of your site. Blogs, forums and related systems often show the same content in multiple formats. For example, a blog entry may appear on the home page of a blog, in an archive page and in a page of other entries with the same label or tag.
- If you’ve restructured your site, use 301 redirects (“Permanent Redirect”) in your .htaccess file to smartly redirect users and search engines’ bots. (In Apache, you can do this with an .htaccess file; in IIS, you can do this through the administrative console.)
- Avoid using similar content on different pages. If you have many pages that are similar, consider expanding each page or consolidating the pages into one.
- Use the rel=”canonical” link element inside the less important pages that have similar content to an important one (the canonical one). You can specify a canonical page to search engines by adding a <link > element with the attribute rel=”canonical” to the <head> section of the non-canonical version of the page. Adding this link and attribute lets you identify sets of identical content and suggest to Google what page of all those with identical content is important. For example, if your important page URL is http://example.com/page1.html, add <link rel=”canonical” href=”http://example.com/page1.html”/> into the <head> section of all non-canonical versions of the page. Go here to learn more.
Note: If you find that another site is duplicating your content by scraping, you may file a DMCA request to Google, Yahoo, and Bing to claim ownership of the content and request removal of the other site from search engines’ index.
Google’s Panda update was originally designed to rid the search results of duplicate, spun and just low quality content.