|Original author(s)||Bradford L. Barrett|
2.23-08 / August 26, 2013
|Operating system||Linux, Cross-platform|
|Available in||Over 30 languages|
|Type||Web Traffic Analytics|
Webalizer is a primarily Apache log file analyzer that creates a simple report with some small graphs and a high-level overview of page-views, unique visits, files, kilobytes transferred and the total amounts of file hits. It's fast, and it supports a history file that allows you to simply throw old logs away once webalizer is done with them. It does have some shortcomings like the inability to separate browsers user-agent strings by device and operating system.
Webalizer has not been updated since 2013. It works fine, but don't expect it to get any new features beyond what the current version has.
Webalizer can create a simple HTML page with some graphs and some data about your website's traffic.
It's main areas are:
- Monthly Statistics (no graphs, just numbers)
- Daily Statistics (with graph) with hits, files, pages, visits, sites and KBytes. It doesn't say how many pages a single visitor viewed or things like that (you can just divide pages by visits to know) but it does give a nice overview.
- Hourly Statistics (with a graph). The statistics it shows you are per-hour for the entire month.
- Top XXX URLs (configurable how many)
- Top XXX URLs by KBytes
- Entry and Exit pages
- Top XXX referrers
- Top search strings (if you configured it so it knows about search-engines)
- Top XXX user-agent strings (you have to manually configure it if you want them grouped)
- Top XXX of total countries (this works fine if you configure it to use a GeoIP database)
Webalizer does give an overview of the most important things you need to know about a site if, and only if, you done a tad too much configuration compared to what should be required. Put simply, there are quite a few things that requires manual configuration. Bots, web spiders and other things that shouldn't be included as regular visitors have to be manually excluded. User-agents are not grouped by default. They can be grouped, but you have to list all the strings you want to be grouped and you can only use one string. That makes it hard to put those using Chrome on Windows in one group and those who use Chrome on Android in another group. You do have one string, so you could put everyone using Windows in one group and those using Android in another; but then you can't tell how many Windows-users use what browser. It would be nice if Webalizer could group user-agents using two strings but it can't.
Webalizer is better than nothing, and it's something that does work. And it's fairly fast thanks to it's support for history files. You can parse a lot with Webalizer, re-set the log and keep the historical statistics. This is also nice from a privacy-perspective, you can run Webalizer every 6 hours and eradicate the old log while keeping the actual statistics from prior periods intact.
One big disadvantage of using Webalizer is that it's something that doesn't "just work" unless you spend perhaps too much time configuring it. It will just work for ages once that's done.
Webalizer lets you ignore user-agents, URLs/"pages" and file-types so they don't show up or count as regular page-views (for example, you don't want
load.php to be counted as a "page" if you use MediaWiki like this site does).
Webalizer is a nice web log analyzer, but it does require quite a lot of configuration to make the statistics it presents meaningful and how you should configure it will depend on what content management system(s) you use.
Installation And Configuration
All the distributions seem to have
webalizer included as a package with that name. The bigger problem you will have once you have it installed will be configuring it.
You should acquire the geodb-latest.tgz file from ftp://ftp.mrunix.net/pub/webalizer/geodb/geodb-latest.tgz before you configure it. Web browsers like Chromium are removing support for the
ftp:// protocol, just use
wget in a terminals if your web browser can't
webalizer lets you run it with
-c configurationfile. You can easily run it with five different configuration files for five different sites.
The bare minimum a webalizer log file should have is
HostName yoursite.tld LogFile access_log OutputDir /home/httpd/vhosts/yoursite/httpdocs/webstat/ HistoryName /home/httpd/vhosts/yoursite/statistics/logs/webalizer.hist IncrementalName /home/httpd/vhosts/yoursite/statistics/logs/webalizer.current Incremental yes UseHTTPS yes DNSChildren 0 TopURLs 50 TopReferrers 50 TopAgents 40 TopSites 0 TopKSites 0 AllSearchStr yes Quiet yes FoldSeqErr yes
webalizer needs to be able to write to the
OutputDir and the
TopKSites statistics are only interesting if you want to make one statistics page for multiple sites.
You will want to extract the
geodb-latest.tgz mentioned earlier and place somewhere webalizer can read and tell webalizer to use it in the configuration file:
GeoDB yes GeoDBDatabase /home/httpd/webalizer/GeoDB.dat
You will likely also want quite a few
IgnoreURL directives like
PageType options listing file-types that count as pages:
PageType htm* PageType cgi PageType php PageType shtml
Search engines are only recognized as referrers if they are listed. So you need a long list of those:
SearchEngine 348north.com search= SearchEngine abcsearch.com terms= SearchEngine alltheweb.com q=
Any website that's been around for a few months get lots and lots of web crawlers visiting. Most of them are worthless, but that's another story. You will want a long list of
IgnoreAgent directives so those are ignored from the statistics:
IgnoreAgent 360Spider IgnoreAgent FemtosearchBot IgnoreAgent www.semrush.com/bot IgnoreAgent www.bing.com/bingbot IgnoreAgent www.sogou.com
..and lastly, if you want to group web browser user-agents together so multiple versions of say Mozilla Firefox are just listed as Firefox, you'll need a lot of options with both
HideAgent. If you add a
GroupAgent without a following
HideAgent you'll get one entry with the user-agents grouped and another with them individually.
Those entries can look like
GroupAgent "Mozilla/5.0 (X11; CrOS x86_64" ChromiumOS HideAgent Mozilla/5.0 (X11; CrOS x86_64 GroupAgent "Win64; x64; rv:" Firefox on Windows HideAgent Win64; x64; rv:
See Webalizer/Configuration file example for an example of a configuration file with very long lists of
IgnoreAgent and other configuration directives a webalizer configuration file needs in order to get meaningful statistics from it.
Verdict And Conclusion
A big and clear down-side to using Webalizer is that it needs to be properly configured in order to produce anything that's useful. If you, for example, don't give it a increasingly long list of user-agents to ignore with
IgnoreAgent directives you'll get statistics that would show double or triple your actual traffic if you run a small website.
If you're willing to spend some time configuring Webalizer then it's kind-of fine as an alternative to things like Google Analytics. You'll get less information, but you will get the most important details like how many people visited your site, how many pages they viewed, what pages are most visited, when people visit and a few other statistics.
- Analog is a similar log analyzer. It's far worse, it doesn't support partial updates and it's not better than Webalizer.
The webalizer website is at webalizer.org. It hasn't been updated in a decade, but it's there and it works.