When is this penalty applied?
This kind of penalty is applied by search engines such as Google when there is an indication of two exactly the same versions of your site's content.
How can your website become a victim of such penalty?
The modern content management systems(CMS) and community forums offer numerous possibilities of managing new content, but because of their deep structure their URLs are very longer. So search engines are unable to fully spider the site.
The solution to webmasters was to rewrite the old url so index.php?mode=look_article&article_id=12 url now becomes just article-12.html. As a first step it serves its purpose, but if left like this the two urls are going to be indexed. If we look through the eyes of a search engine we'll see same content having 2 instances and of course the duplicate filter is raised:
I-st instance: index.php?mode=look_article&article_id=12Easy solution
II-nd instance: article-12.html
The solution is done via PHP language and using .htaccess Apache file.
First off we'll rewrite our URLs so they can be search friendly. Let's assume that we've to redirect our index.php?mode=look_article&article_id=... to article-....html
Create an empty .htaccess file and place this. First edit the code and fill in your website address. If you don't have subdomain then erase the subdomain variable also.
RewriteEngine on
RewriteRule article-([0-9]+)\.html http://www.yourwebsite/subdomain/index.php?mode=look_article&article_id=$1&rw=on
RewriteCond %{the_request} ^[A-Z]{3,9}\ /subdomain/index\.php\ HTTP/
RewriteRule index\.php http://www.yourwebsite/subdomain/ [R=301,L]
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} ^www\.yourwebsite\.subdomain [nc]
RewriteRule ^(.*)$ http://yourwebsite/subdomain/$1 [R=301,L]
Explanation:
- RewriteRule article-([0-9]+)\.html http://www.yourwebsite/subdomain/index.php?mode=look_article&article_id=$1&rw=on
Those lines allow article-12.html to be loaded internaly as index.php?mode=look_article&article_id=12
The variable &rw=on is important for the later php code. So dont forget to include it.
- RewriteCond %{the_request} ^[A-Z]{3,9}\ /subdomain/index\.php\ HTTP/
RewriteRule index\.php http://www.yourwebsite/subdomain/ [R=301,L]
This lines avoid considering index.php as a separate page thus lowering your website PR and will transfer all the PR from index.php to your domain.
- RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} ^www\.yourwebsite\.subdomain [nc]
RewriteRule ^(.*)$ http://yourwebsite/subdomain/$1 [R=301,L]
This will avoid duplicate urls such as www and non-www and transfer all the requests and PR to the non-www site.
Then create file header.php and include in your website before all other files:
Put there:
$rw=$_GET['rw'];
if ($rw=="on") { echo "<meta content=\"index,follow\" name=\"robots\" />"; }
else { echo "<meta content=\"noindex,nofollow\" name=\"robots\" />"; }
This will point the search engine to index only the pages that will have rw flag set to on. These pages will be the previous set like article-12.html pages.
Of course if you have access to your robots.txt file and to your root domain then you can just put the file: look_article there and you are done:
User-agent: * Disallow: /look_article.php
Notes: For those using CMS - check out whether your pages are still accessible using different parameters in the URL
Example: you've deleted an article with id=17 but the empty template would be still accessible producing header status 200 OK code - this will be surely recognized as a spam from Google.
Solution:
1.Find out those empty pages and give them header status 404 not found code:
header("Status: 404 Not Found");
2. Create error404.html file explaining that the user is trying to access a non-existent page.
3.Then add in your .htaccess file the custom 404 error page:
ErrorDocument 404 /your_domain_name/error404.html
This way the search engine spider wont penalize your template displaying empty information - it will now see those pages as a 404 not-found document.
Next step involves cleaning up of an already indexed but duplicated website content in order to regain the search engine's trust. So happy reading.
Hope it helps! by Nevyan Neykov
4 коментара :
so you wanna say the solution is to show to Google a bunch of 404 (not found) pages of your site? you would do more harm than good... even from the point of view of the visitors. nobody wants to find not-working pages. the whole scheme is good but not complete. one should 301 redirect the pages that return a 404 towards the working, short (SEF) urls . this way you might not even lose rankings.
yes, instead returning 200(found) on an empty page you should return 404(not found), otherwise you'll end up with too many identical(template-based) empty pages which is considered as spam.
In summary: when the webpage is deleted - return 404, when moved to diffrent one redirect it via - 301. Thus both visitors and engines will be happy.
i heard that poeple have designed websites with duplicate urls. like :
googl.com
facbook.com
do you know about it. and do you know about any duplicate site that is still running.
yes, there are lots of scam sites designed just to fool one's attention. In fact some domain companies are getting such similar names to re-sell them later, and the spyware/rogue creators use this method to advertise and sell their products.
Post a Comment