Bad Bot – Google

Over the past week or so the Internet speed from TWC RoadRunner has taken a strange hit. I just updated the server and moved my Godaddy hosted WordPress installation to it. So I have to consider that something could have crept in during the upgrade, but everything looks pretty normal until I log into MyTWC and look at the Internet usage stats.
RR-2013 RR-May-2013

As you can see, the spike starts around May 23, 2013. So I have a port on the switch configured to echo all traffic traversing the Internet connection to it and crank up tshark to have a look. Seems that GoogleBot discovered my photo archive and was attempting to cache every image. This led me to the apache logs and where I saw thousands upon thousands of Google Image Bot entries. So I tried to use webmaster tools to slow it down, set up a delay in robots.txt (Google ignores those by the way), upload a sitemap, and finally a request to exclude the zp directory.

What worked? Initially nothing short of the firewall rule I created to keep the bot at bay overnight. It appears Google is now recognizing the requested crawl delay, but not the directory exclusion request. The firewall may go back into effect soon, but I really just wish Google would play fair and recognize robots.txt settings instead of requiring the webmaster tools to accomplish the same thing.

Leave a Reply

Your email address will not be published. Required fields are marked *