Comparison of Compression Algorithms
GNU/Linux and *BSD has a wide range of compression algorithms available for file archiving purposes. There's gzip, bzip2, xz, lzip, lzma, lzop and less free tools like rar, zip, arc to choose from. Knowing which one to use can be such confusing. Here's an attempt to give you an idea how the various choices compare.
Most file archiving and compression on GNU/Linux and BSD is done with the
tar utility. It's name is short for tape archiver which is why every
tarcommand you will use ever has to include the
f flag to tell it that you are will be working on files not a ancient tape device. Creating a compressed file with tar is typically done by running tar
f and a compression algorithms flag followed by files and/or directories. The compression flag options are:
|short option||long option||algorithm|
So which should you use? It depends on the level of compression you want and speed you desire. You may have to pick just one of the two. Speed will depend widely on what binary you use for the compression algorithm you pick. As you will see below: There is a huge difference between using the standard bzip2 binary most (all?) distributions use by default and parallel pbzip2 which can into multi-core computing.
Compressing The Kernel (5.1.11)
|Note: These tests were done using a Ryzen 1600X with 2xSamsung SSDs in RAID1. The differences between bzip2 and pbzip2 and xz and pixz will be much smaller on a dual-core. We could test on slower systems if anyone cares, but that seems unlikely.|
These results are what you can expect in terms of relative performance when using tar to compress the kernel with
tar c --algo -f linux-5.1.11.tar.algo linux-5.1.11/ (or
tar cfX linux-5.1.11.tar.algo linux-5.1.11/ or
tar c -I"programname -options" -f linux-5.1.11.tar.algo linux-5.1.11/)
In the case of bzip2 and pbzip2 the binary was simply switched using a symbolic link.
Ruling out cache impact was done by running
sync; echo 3 > /proc/sys/vm/drop_caches between runs.
The exact number will vary depending on your CPU, number of cores and SSD/HDD speed but the relative performance differences will be somewhat similar.
|lzip||5m7.392s||108M||lzip||c --lzip -f|
|lzip||0m55.115s||109M||lzip||c -Iplzip||Parallel lzip, default level -6|
|plzip||2m5.645s||102M||lzip||c -I"plzip -9"||Parallel lzip at best compression -9|
|xz||1m3.713s||108M||pigz||c -Ipigz -f||Parallel xz|
|xz||1m42.497s||103M||pigz||c -I"pixz -9" -f||Parallel xz using best compression|
A few minor details should be apparent from above numbers.
- a) Standard xz compression is really slow compared to everything else. It is also what results in the best compression.
- b) pixz is five times faster than xz unless you're a core-let in which case it won't make any difference. pixz at it's best compression level -9 provides the best speed and compression.
- c) the difference between bzip2 and pbzip2 is huge. It may not appear that way since bzip is so much faster than xz but it's actually more than ten times faster.
- d) pbzip2's default compression is apparently it's best at -9. A close-up inspection of the output files reveal that they are identical (130260727b) with and without -9.
pixz at -9 comes out as a clear winner both when considering compression and speed/compression. The one huge draw-back it has is that pixz is not a drop-in replacement for xz. Simply making xz a symbolic link to pigz won't work, it has to be invoked with
-I"pixz -9" to be used as a compressor.
plzip is a real contender to pixz. It's about the same as pixz at default settings but much slower at the highest compression ratio. Both xz and lzip use different Lempel-Ziv-Markov chain algorithm implementations which is why they perform somewhat similar.
pbzip2 wins when speed is a consideration and a slight increase in the output size is acceptable.
|TIP: xz is just as fast as bzip2 and gzip when it comes to decompression and it's better compression - while more time-consuming - can make a real difference if you are going to distribute a file to hundreds or thousands of users. This is why the Linux Kernel is distributed using .tar.xz archives. It is an alternative worth considering if the difference between 120 and 110 MiB matters. You may want to use pixz or plzip with the best compression flag |