BingBot & BaiduSpider do not respect robots.txt

After my CPU usage suddenly went over 400% due to the bots starting my site, I created a robots.txt file and then put the file in my root, for example "www.example.com/":

User-agent: *
Disallow: /

Google now respects this file, and there are no more cases in my Google log file. However, BingBot and BaiduSpider still appear in my log (and enough).

Since I had this huge increase in CPU usage, as well as bandwidth, and my hosting provider was about to suspend my account, I first deleted all my pages (in case there was a nasty script), loaded blank pages, blocked all bots through The IP address in .htaccess and then the generated robots.txt file.

I searched everywhere to confirm that I had taken the right steps (I have not tried the "ReWrite" option in .htaccess yet).

Can anyone confirm that what I did should do the job? (Since I started this venture, my CPU usage dropped to 120% within 6 days, but at least IP blocking should have reduced CPU usage to my usual 5-10%).

+5
source share
1 answer

If these are legitimate spiders from Bingbot and Baiduspider, then they should read your robots.txt file as indicated. However, this may take some time before they pick it up and take effect if these files were previously indexed - this is probably here too.

, , Baiduspider robots.txt (.. Googlebot) . , URL Disallow: , Baiduspider /. Googlebot URL http://example.com/private/, Disallow: /priv, Baiduspider .

:
http://www.baidu.com/search/robots_english.html

+2

All Articles