Fixing duplicate content caused by sessions


duplicate content from url
As mentioned in Webmastercentral blog Google duplicate content caused by URL parameters, and you

URL parameters, like session IDs or tracking IDs, cause duplicate content, because the same page is accessible through numerous URLs.
More information on the dynamic urls effect to search engines as well as how to manage them using yahoo's site explorer you can find here:
http://help.yahoo.com/l/us/yahoo/search/siteexplorer/dynamic/index.html

So if your CMS, blog or e-commerce has AJAX back button navigation, or just a login system with session IDs appended to the url, it causes duplicate content! The same is true for frames with tracking ids attached.

After a long searching following with a technique from webmasterworld's member JDmorgan I've succeeded to get ~90% of my website content fully spidered.

Here is how to implement this technique on practice with dynamic content using .htaccess:
Just put the following lines in your .htaccess file and test

first we allow only .html pages to be spidered
#allow only .html requests

RewriteCond %{query_string} .
RewriteRule ^([^.]+)\.html$ http://your_web_site.com/$1.html? [R=301,L]
then we remove all the sessionid parameters when a page is being called by bots
#remove URL sessionids
RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp [OR]
RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
RewriteCond %{HTTP_USER_AGENT} Teoma
RewriteCond %{QUERY_STRING} ^(([^&]+&)+)*PHPSESSid=[0-9a-f]*&(.*)$
RewriteRule ^$ http://your_web_site.com/?%1%3 [R=301,L]
Additional information:
A chain of 301 redirects could cause you a loss of PageRank.
So please check that your 301 redirects are final i.e. they point to an end page and not to another redirect page. You can use for Firefox's LiveHTTPHeaders extension to do this kind of check.

Next you can read some more methods of escaping the sandbox: Deoptimizing - a new way of SEO

Enjoy and be welcome to share your experience!

0 коментара: