Bingbot

From LinuxReviews
Jump to navigationJump to search

Bingbot is a web crawler operated by Microsoft for the purpose of providing data to their search engine Bing. It is easily confused and it loves to try low-case versions of URLs as well as other pages there are no links to.

Identification[edit]

Bingbot identifies itself in logs as either:

"Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

or

Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

All instances used by it's spider use IPs with a PTR record which looks something like:

msnbot-40-77-167-105.search.msn.com

There are a few spiders using the bingbot user-agent which are not bingbot. Some of these are operating out of Microsoft's own Azure cloud service. Crawlers faking the bingbot user-agent will not have the matching PTR records

Love for low-case versions of URL and other confuse[edit]

Bingbot's mobile crawler loves to try the low-case version of every URL it encounters. It will try to get the F-Droid article as /f-droid even though nothing links to that - it will simply turn /F-Droid into /f-droid on it's on.

It is possible to solve this by adding redirects from low-case versions to the actual URL in all instances where the URL has a upper-case letter. This may be unpractical if there is a large number of pages with upper-case letters. Don't bother if that's the case. Microsoft's got a slice of the desktop market share but they do not have a measurable share of the mobile search market. Adapting to their bot's massive stupidity is not important.

bingbot will also try to access all URLs on both the www.example.com and the example.com version of a domain if one is and always was a permanent redirect to the other. It will also keep on trying all URLs on both http and https even if http just redirects to https. This is normal and nothing to be concerned about, it's just bingbot being stupid.

Related information[edit]

See Web crawlers for other software crawling around the Internets


Add your comment
LinuxReviews welcomes all comments. If you do not want to be anonymous, register or log in. It is free.