Let's continue with the steps that I've taken to make Google Bot respider and reindex my website again. I hope the advises below will help you to achieve a better indexing for your website. In the following article I'll assume that you are using Apache web server.
Requirements: You will need an access to your webserver .htaccess file and have mod_rewrite module enabled in your Apache configuration.
STEP 1a: Make static out of dynamic pages
Using mod_rewrite you could rewrite your dynamic pages to look like a static ones. So if you've got a dynamic .php page with parameters - you could rewrite the url to look like a normal .html page:www.your_website.com/look_item.php?item_id=14
for the web surfer will become:
www.your_website.com/item-14.html
HOW:
You have to add the following lines to your .htaccess file(placed in the root directory of your web server):RewriteEngine on
RewriteRule item-([0-9]+)\.html http://your_website.com/sub_directory/look_item.php?item_id=$1 [L,R=301]
STEP 1b: Transfer the accumulated PR
Next type in the Google's search box: site:http://your_website.com
This query will show you which webpages Google has indexed from your website. The key here is to check if they have any accumulated PageRank. Since you've been moving to search engine pleasant(static .html) URLs you'll want to transfer the accumulated PR from the dynamic .php url to the main or corresponding static .html url.
Here is an illustrated example about transferring a .php url request with already accumulated PR to the main page url of the web site:
HOW:
Add the following line in your .htaccess file:RewriteRule look_item\.php http://website.com/sub_directory/ [L,R=301]
Some explanations:
301 means Moved Permanently - the Search Engine Bot will then map http://website.com/sub_directory/ instead of look_item.php as legitimate source of information.
STEP 1c: Play with Robots.txt
In order to avoid Google from indexing both .html and .php pages thus assuming them as duplicate content which is bad:
Create file robots.txt, put it into root website directory and include there all the dynamic .php files that you don't want to be indexed. Reason for doing it is because you'll want to transfer the existing PageRank from dynamic .php url to the more accessible static .html url.
HOW: Place in your robots.txt the following lines:User-agent: *
Disallow: look_item.phpImportant Note: If your website is hosted under a subdomain, and you don't have access to the root domain robots.txt file, you MUST use meta attributes nofollow, noindex to achieve the same result as robots.txt Disallow directive. Otherwise the robots will only check the root domain robots.txt and will skip your robots.txt file. Thus you still will be feeding the search engines with duplicate content which is a bad thing.
STEP 2a: Redirect www to non-www urls
Just check the PageRank of your web page with and without the preceding "www". If it is different you are losing PageRank and promoting duplicate content to Google. This happens because some sites are link to you with like: http://www.yourwebsite.com and some http://yourwebsite.com
It's hard to control the sites linking to your website whether they link using "www" or "non-www". This time Apache Redirect and Rewrite rules come to help to transfer the www-urls of your website to a non-www urls. Again to avoid PR loss and duplicates you will want your website url to be accessible from only 1 location.
HOW: Place at the end of your robots.txt the following lines:
RewriteCond %{HTTP_HOST} ^www\.your_website\.com [nc]STEP 2a: Redirect index.php to root website url
RewriteRule (.*) http://your_website.com/$1 [R=301,L]
There is one more step to achieving non-duplicated content. You must point your index.html, index.htm, index.asp or index.php to the ./ or root of your website.
HOW: Insert in your robots.txt the following lines before the previous mentioned two lines:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /subdomain/index\.php\ HTTP/Note: If your website is hosted under a sub domain fill its name in the /subdomain part. If not just delete the /subdomain . You can replace index.php with index.html, index.asp or whatever suits you.
RewriteRule index\.php http://yourwebsite.com/subdomain/ [R=301,L]
For more information on achieving non-duplicated content and escaping from google's supplemental index then continue reading here:
STEP 3: Create custom error 404 page:
in your .htaccess file type:
ErrorDocument 404 /your_website_dir/eror404.htmlThen create custom webpage named eror404.html to instruct the user what to do when came across a non-existent page.
Congratulations: I hope that by following those steps your website will be re-indexed soon.
Cheers!

0 коментара:
Post a Comment