Bingbot
Bingbot is a web crawler operated by Microsoft for the purpose of providing data to their search engine Bing. It is easily confused and it loves to try low-case versions of URLs as well as other pages there are no links to.
Identification[edit]
Bingbot identifies itself in logs as either:
"Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
or
Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
All instances used by it's spider use IPs with a PTR record which looks something like:
msnbot-40-77-167-105.search.msn.com
There are a few spiders using the bingbot user-agent which are not bingbot. Some of these are operating out of Microsoft's own Azure cloud service. Crawlers faking the bingbot user-agent will not have the matching PTR records
Love for low-case versions of URL and other confuse[edit]
Bingbot's mobile crawler loves to try the low-case version of every URL it encounters. It will try to get the F-Droid article as /f-droid
even though nothing links to that - it will simply turn /F-Droid
into /f-droid
on it's on.
It is possible to solve this by adding redirects from low-case versions to the actual URL in all instances where the URL has a upper-case letter. This may be unpractical if there is a large number of pages with upper-case letters. Don't bother if that's the case. Microsoft's got a slice of the desktop market share but they do not have a measurable share of the mobile search market. Adapting to their bot's massive stupidity is not important.
bingbot will also try to access all URLs on both the www.example.com
and the example.com
version of a domain if one is and always was a permanent redirect to the other. It will also keep on trying all URLs on both http
and https
even if http just redirects to https. This is normal and nothing to be concerned about, it's just bingbot being stupid.
Related information[edit]
See Web crawlers for other software crawling around the Internets
Enable comment auto-refresher