Webalizer

From LinuxReviews
Jump to navigationJump to search
Webalizer
Webalizer October 2020.jpg
Original author(s)Bradford L. Barrett
Initial release1997
Stable release
2.23-08 / August 26, 2013; 8 years ago (2013-08-26)
Written inC
Operating systemLinux, Cross-platform
Available inOver 30 languages
TypeWeb Traffic Analytics
LicenseGNU GPL
Websitewww.webalizer.org
Ksysguard-icon-breeze.svg

Webalizer is a primarily Apache log file analyzer that creates a simple report with some small graphs and a high-level overview of page-views, unique visits, files, kilobytes transferred and the total amounts of file hits. It's fast, and it supports a history file that allows you to simply throw old logs away once webalizer is done with them. It does have some shortcomings like the inability to separate browsers user-agent strings by device and operating system.

Webalizer has not been updated since 2013. It works fine, but don't expect it to get any new features beyond what the current version has.

at a glance
Increase.svg Decrease.svg

Webalizer

  • Analyzes web traffic using logs. Page tags, JavaScript or elements on web pages is not required.
  • Uses a history file where it stores relevant historical information. You don't need to store old logs (unlike Analog)
  • Setting it up is a bit hard and time-consuming
  • Quite limited in terms of the reports you can make
  • Requires manual configuration if you want user-agents grouped
    • User-agents can only be grouped by one string, making it hard/impossible to group them by OS and device

Features[edit]

Webalizer can create a simple HTML page with some graphs and some data about your website's traffic.

It's main areas are:

  • Monthly Statistics (no graphs, just numbers)
  • Daily Statistics (with graph) with hits, files, pages, visits, sites and KBytes. It doesn't say how many pages a single visitor viewed or things like that (you can just divide pages by visits to know) but it does give a nice overview.
  • Hourly Statistics (with a graph). The statistics it shows you are per-hour for the entire month.
  • Top XXX URLs (configurable how many)
  • Top XXX URLs by KBytes
  • Entry and Exit pages
  • Top XXX referrers
  • Top search strings (if you configured it so it knows about search-engines)
  • Top XXX user-agent strings (you have to manually configure it if you want them grouped)
  • Top XXX of total countries (this works fine if you configure it to use a GeoIP database)

Webalizer does give an overview of the most important things you need to know about a site if, and only if, you done a tad too much configuration compared to what should be required. Put simply, there are quite a few things that requires manual configuration. Bots, web spiders and other things that shouldn't be included as regular visitors have to be manually excluded. User-agents are not grouped by default. They can be grouped, but you have to list all the strings you want to be grouped and you can only use one string. That makes it hard to put those using Chrome on Windows in one group and those who use Chrome on Android in another group. You do have one string, so you could put everyone using Windows in one group and those using Android in another; but then you can't tell how many Windows-users use what browser. It would be nice if Webalizer could group user-agents using two strings but it can't.

Webalizer October 2020-hourly.jpg
An example of statistics created by Webalizer.

Webalizer is better than nothing, and it's something that does work. And it's fairly fast thanks to it's support for history files. You can parse a lot with Webalizer, re-set the log and keep the historical statistics. This is also nice from a privacy-perspective, you can run Webalizer every 6 hours and eradicate the old log while keeping the actual statistics from prior periods intact.

One big disadvantage of using Webalizer is that it's something that doesn't "just work" unless you spend perhaps too much time configuring it. It will just work for ages once that's done.

Webalizer lets you ignore user-agents, URLs/"pages" and file-types so they don't show up or count as regular page-views (for example, you don't want load.php to be counted as a "page" if you use MediaWiki like this site does).

Webalizer is a nice web log analyzer, but it does require quite a lot of configuration to make the statistics it presents meaningful and how you should configure it will depend on what content management system(s) you use.

Installation And Configuration[edit]

All the distributions seem to have webalizer included as a package with that name. The bigger problem you will have once you have it installed will be configuring it.

You should acquire the geodb-latest.tgz file from ftp://ftp.mrunix.net/pub/webalizer/geodb/geodb-latest.tgz before you configure it. Web browsers like Chromium are removing support for the ftp:// protocol, just use wget in a terminals if your web browser can't ftp://.

webalizer lets you run it with -c configurationfile. You can easily run it with five different configuration files for five different sites.

The bare minimum a webalizer log file should have is

HostName yoursite.tld
LogFile   access_log
OutputDir /home/httpd/vhosts/yoursite/httpdocs/webstat/

HistoryName     /home/httpd/vhosts/yoursite/statistics/logs/webalizer.hist
IncrementalName /home/httpd/vhosts/yoursite/statistics/logs/webalizer.current
Incremental     yes

UseHTTPS yes

DNSChildren 0

TopURLs         50
TopReferrers    50
TopAgents       40
TopSites        0
TopKSites       0
AllSearchStr yes

Quiet           yes
FoldSeqErr      yes

webalizer needs to be able to write to the OutputDir and the HistoryName and IncrementalName files.

TopSites and TopKSites statistics are only interesting if you want to make one statistics page for multiple sites.

You will want to extract the geodb-latest.tgz mentioned earlier and place somewhere webalizer can read and tell webalizer to use it in the configuration file:

GeoDB yes
GeoDBDatabase /home/httpd/webalizer/GeoDB.dat

You will likely also want quite a few IgnoreURL directives like

IgnoreURL /w/

And some PageType options listing file-types that count as pages:

PageType        htm*
PageType        cgi
PageType        php
PageType        shtml

Search engines are only recognized as referrers if they are listed. So you need a long list of those:

SearchEngine    348north.com    search=
SearchEngine    abcsearch.com   terms=
SearchEngine    alltheweb.com   q=

Any website that's been around for a few months get lots and lots of web crawlers visiting. Most of them are worthless, but that's another story. You will want a long list of IgnoreAgent directives so those are ignored from the statistics:

IgnoreAgent 360Spider
IgnoreAgent FemtosearchBot
IgnoreAgent www.semrush.com/bot
IgnoreAgent www.bing.com/bingbot
IgnoreAgent www.sogou.com

..and lastly, if you want to group web browser user-agents together so multiple versions of say Mozilla Firefox are just listed as Firefox, you'll need a lot of options with both GroupAgent and HideAgent. If you add a GroupAgent without a following HideAgent you'll get one entry with the user-agents grouped and another with them individually.

Those entries can look like

GroupAgent      "Mozilla/5.0 (X11; CrOS x86_64" ChromiumOS
HideAgent       Mozilla/5.0 (X11; CrOS x86_64

GroupAgent      "Win64; x64; rv:" Firefox on Windows
HideAgent       Win64; x64; rv:

See Webalizer/Configuration file example for an example of a configuration file with very long lists of SearchEngine and IgnoreAgent and other configuration directives a webalizer configuration file needs in order to get meaningful statistics from it.

Verdict And Conclusion[edit]

Webalizer can be used to create some basic statistics about a website's traffic using nothing but a web servers log files. It can't produce as much information about a sites visitors as tools using JavaScript can, so the information it can present is somewhat limited. That may or may not be a good thing depending on how you look at it. That it produces what it does using nothing but logs can be an advantage, you don't have to violate people's privacy or run JavaScript in their web browsers.

A big and clear down-side to using Webalizer is that it needs to be properly configured in order to produce anything that's useful. If you, for example, don't give it a increasingly long list of user-agents to ignore with IgnoreAgent directives you'll get statistics that would show double or triple your actual traffic if you run a small website.

If you're willing to spend some time configuring Webalizer then it's kind-of fine as an alternative to things like Google Analytics. You'll get less information, but you will get the most important details like how many people visited your site, how many pages they viewed, what pages are most visited, when people visit and a few other statistics.

Alternatives[edit]

  • Analog is a similar log analyzer. It's far worse, it doesn't support partial updates and it's not better than Webalizer.
  • Open Web Analytics is not a log analyzer. It is a more fully featured web analytics server written in PHP. It can provide more detailed information than what logs can but it's also far more privacy-invasive to your sites visitors as it uses cookies and, depending how you configure and deploy it, client-side JavaScript.

Links[edit]

The webalizer website is at webalizer.org. It hasn't been updated in a decade, but it's there and it works.


Add your comment
LinuxReviews welcomes all comments. If you do not want to be anonymous, register or log in. It is free.