
As mentioned in Webmastercentral blog Google duplicate content caused by URL parameters, and you
URL parameters, like session IDs or tracking IDs, cause duplicate content, because the same page is accessible through numerous URLs.More information on the dynamic urls effect to search engines as well as how to manage them using yahoo's site explorer you can find here:
http://help.yahoo.com/l/us/yahoo/search/siteexplorer/dynamic/index.html
So if your CMS, blog or e-commerce has AJAX back button navigation, or just a login system with session IDs appended to the url, it causes duplicate content! The same is true for frames with tracking ids attached.
After a long searching following with a technique from webmasterworld's member JDmorgan I've succeeded to get ~90% of my website content fully spidered.
Here is how to implement this technique on practice with dynamic content using .htaccess:
Just put the following lines in your .htaccess file and test
first we allow only .html pages to be spidered
#allow only .html requeststhen we remove all the sessionid parameters when a page is being called by bots
RewriteCond %{query_string} .
RewriteRule ^([^.]+)\.html$ http://your_web_site.com/$1.html? [R=301,L]
#remove URL sessionidsAdditional information:
RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp [OR]
RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
RewriteCond %{HTTP_USER_AGENT} Teoma
RewriteCond %{QUERY_STRING} ^(([^&]+&)+)*PHPSESSid=[0-9a-f]*&(.*)$
RewriteRule ^$ http://your_web_site.com/?%1%3 [R=301,L]
A chain of 301 redirects could cause you a loss of PageRank.
So please check that your 301 redirects are final i.e. they point to an end page and not to another redirect page. You can use for Firefox's LiveHTTPHeaders extension to do this kind of check.
Next you can read some more methods of escaping the sandbox: Deoptimizing - a new way of SEO
Enjoy and be welcome to share your experience!

0 коментара:
Post a Comment