Comparison of HOST file blacklists
Linux and other operating systems support using a hosts file for matching domains to IPs. Linux systems use the file
/etc/hosts. This functionality can be used to filter away bad hostnames used for malware, spam and other undesirable evil by matching bad DNS hostnames to
0.0.0.0. There's many huge lists you can download and use floating around the Internets. Here's a comparison of some of the more popular ones.
Host files can be placed in the file
/etc/hosts. It can be used for pointing to valid hosts like machines on your LAN. It can also be used to blacklist. This can be done in two ways, either by an entry with
127.0.0.1 domain.tld or
0.0.0.0 domain.tld. The latter is preferable. Using
127.0.0.1 will result in time-outs if nothing is listening on
0.0.0.0 will not. A locally running webserver will be hit with requests either way. You may want to convert host files using
sed or, if you prefer,
awk (why not perl, you may wonder. no reason, if that works for you then great!).
Host Blacklists Reviewed
This is a rather short lists (1286) entries listing subdomains used by the notorious 2o7 tracking-service which has plagued the Internet for more than a decade. It's maintained by github user "FadeMind" and it can be downloaded from https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.2o7Net/hosts
This is a rather short and specialized list limited to one tracking service. Still, it's worth adding if you are making your own
/etc/hosts using multiple sources.
AdAway is an Android app for blocking advertisements and is, as such, mainly targeting advertisements served to mobile users. It is available for free in the F-Droid app store. This blacklist is rather short with about 500 entries.
This is one of the few blacklists which can safely be used with no issues or risk of false positives. It targets the most common and annoying trackers and advertisement servers and that's it. The raw blacklist can be acquired from https://raw.githubusercontent.com/AdAway/adaway.github.io/master/hosts.txt
hpHosts is a service run by malwarebytes which offers a rather broad range of small host files covering different classifications. It's no "one stop shop" where you can grab some huge fits-the-bill hosts file. And that is probably a good thing since a lot of malwarebytes choices are questionable at best.
hpHost's advertisement list (ATS) appears to have advertisement-related sites on it. It gets dicey very quickly when moving on to their other lists such as the "warez/piracy sites" (WRZ) blacklist. The site thelinuxcode.com, as an example, does explain "HOW TO DOWNLOAD TORRENT WITH COMMAND LINE IN UBUNTU". An article explaining how one can use BitTorrent software to download Fedora ISOs on Ubuntu Linux does not make a site a "warez site" - but if you think it does then malwarebytes products and their hpHosts files are for you.
You can download the various hpHosts blacklists from http://hosts-file.net/?s=Download
KADhosts is a Polish blacklist which is supposedly limited to "Fraud/adware/scam websites". It is really a uBlock Origin filter which can be found at https://github.com/PolishFiltersTeam/KAD and it's homepage is at https://kadantiscam.netlify.com/. It is available as a host file at https://github.com/PolishFiltersTeam/KADhosts
There are some valid questions worth asking the maintainers of this blacklist. One would be Why is there a section called "fake news" in a list which is supposedly limited to fraud/adware/scam sites?
Steven Black's "Unified hosts file"
Steven Black's hostss are files made using collections of other hosts files such as adaway.org, mvps.org, malwaredomainlist.com and someonewhocares.org. The shortest "adware + malware" list weighs in at 1.2 MB and holds 40k entries. It seems fine. Things get rather strange very quickly when examining the rest of the categories this person provides blacklists for. There is a "fakenews" news blacklist which lists just about every single site where real journalists do investigative and objective reporting. Not a single known propaganda-site which peddles fake news continuously has made it into this list. This raises questions regarding this persons ability to understand the difference between up and down, left and right as well as good or bad sites. Using anything from this source without checking every domain is inadvisable and checking 40-55k domains for false positives would be a full-time job for a month. The better option is to avoid. The various hosts files can be found at https://github.com/StevenBlack/hosts
The "Ultimate" hosts blacklist is probably the biggest HOST file blacklist on the Internet. It weighs in at 43 MB. Doing a few quick greps on this large file reveals that it's utterly useless as a blacklist. The amount of false positives in this collection is probably not that high in terms of percentages since it, as of now, contains 1.860.653 different domains. That's a lot. It also means that there's a lot of entries who could be wrong and there appears to be a lot of oddities in this file.
First strange thing we noticed is that this site is on the lists for some reason yet none of the sources the list is supposedly made up of has this site listed. No other blacklist we are aware of does. The second odd thing we noticed is that this is the only blacklist which lists the Free Software Foundation, specifically their Free Software Directory. Linux Today and a lot of other Linux-related sites are only on this very huge blacklist.
There's a lot of other problems with this list beyond just Linux sites being blacklisted. Norway's biggest newspaper VG (vg.no) is blacklisted. So is the Russian newspaper pravda.ru. We could go on but you get the idea: The "Ultimate" list can claim it's the largest blacklist because they apparently took a large list of known domains and just threw it in there without doing any kind of check to see if those domains should go on a blacklist or not. The amount of legitimate sites on that list is just too long for there to have been any effort at all; it really does look like they just pulled random lists of domains from search-engines and threw them in to make their list as long as possible. It may be the "largest" but it's also the most useless. You won't even be able to read your countries biggest newspaper if you try using that joke.
The hosts file can be downloaded from https://github.com/mitchellkrogza/Ultimate.Hosts.Blacklist if you want to check if your site is on it. It probably is regardless of what kind of site you run.
Yoyo's Adservers is a small-ish list of about 3000 advertisement servers. That's what it focuses on and that's what it is. There's no useful sites thrown in by mistake as far as we can tell, it's just advertisement servers. This list looks safe to use with no risk of false positives or issues. It's homepage is at pgl.yoyo.org/adservers and the raw list can be acquired from pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&mimetype=plaintext&useip=0.0.0.0 - yes, that's a very long URL. But it's good, it is simply Yoyo's way of telling you that you can change the format by altering the links variables.
If you are looking for a general host blacklist to use you're best off sticking with a shorter one which focuses on advertisement and tracking servers and only advertisement and tracking servers. Every blacklist which attempts to go beyond that get some sites, or in the case of "fake news" all sites, wrong. It would also be mentioned that browser-based content filters are better suited for removing advertisements anyway; they can remove sub-folders like
/ads/ without making the whole site unavailable.