Comparison of Compression Algorithms

GNU/Linux and *BSD have a wide range of compression algorithms available for file archiving purposes. There's gzip, bzip2, xz, lzip, lzma, lzop and less free tools like rar, zip, arc to choose from. Knowing which one to use can be so confusing. Here's an attempt to give you an idea how the various choices compare.

Introduction[edit]

Most file archiving and compression on GNU/Linux and BSD is done with the tar utility. Its name is short for tape archiver, which is why every tar command you will use ever has to include the f flag to tell it that you will be working on files and not an ancient tape device (note that modern tape devices do exist for server back up purposes, but you will still need the f flag for them because they're now regular block devices in /dev). Creating a compressed file with tar is typically done by running tar create f and a compression algorithms flag followed by files and/or directories. The standard compression flags are:

Short Option	Long Option	Algorithm
`z`	`--gzip`	gzip
`j`	`--bzip2`	bzip2
`J`	`--xz`	xz
`Z`	`--compress`	LZA (compress)
	`--lzip`	lzip
	`--lzma`	lzma
	`--zstd`	zstd

These are not your only options, there's more. tar accepts -I to invoke any third party compression utility.

Short Option	Algorithm
`-Iplzip`	Parallel lzip
`-Ipigz`	Parallel gzip
`-Ipxz`	Parallel XZ (LZMA)
The above arguments will only work if you actually have `plzip` and `pigz` installed. Also note that you will have to have `c` or `x` before and `-f` after `-I` when you use `-I`. Example: `tar c -I"pigz -9" -f archive.tar.gz folder/`

So which should you use? It depends on the level of compression you want and speed you desire. You may have to pick just one of the two. Speed will depend widely on what binary you use for the compression algorithm you pick. As you will see below: There is a huge difference between using the standard bzip2 binary most (all?) distributions use by default and parallel pbzip2 which can into multi-core computing.

Compressing The Linux Kernel[edit]

Note: These tests were done using a Ryzen 2600 with Samsung SSDs in RAID1. The differences between bzip2 and pbzip2 and xz and pxz will be much smaller on a dual-core. We could test on slower systems if anyone cares, but that seems unlikely given that only 3 people a month read this article and those three people use Windows, macOS and an Android phone respectively.

The following results are what you can expect in terms of relative performance when using tar to compress the Linux kernel with tar c --algo -f linux-5.8.1.tar.algo linux-5.8.1/ (or tar cfX linux-5.8.1.tar.algo linux-5.8.1/ or tar c -I"programname -options" -f linux-5.8.1.tar.algo linux-5.8.1/)

Ruling out cache impact was done by running sync; echo 3 > /proc/sys/vm/drop_caches between runs.

The exact number will vary depending on your CPU, number of cores and SSD/HDD speed but the relative performance differences will be somewhat similar. Also, keep in mind when compressing large files from fast sources to slow destinations (e.g. 8GB from an NVMe to an HDD or USB 2.0 drive), I/O bottlenecks actually hinder the "faster" algorithms: slower parallelized algorithms with better compression rates are more effective in such scenarios, since they actually spend less time waiting for the destination device to write the buffered/cached compressed data once the buffer/caches fill up.

Algorithm	Time	Size	Command	Parameters	Comment
none	`0m0.934s`	`939M`	`tar`	`cf`	tar itself is an archiving tool, you do not need to compress the archives.
gzip	`0m23.502s`	`177M`	`gzip`	`cfz`
gzip	`0m3.132s`	`177M`	`pigz`	`c -Ipigz -f`	Parallel gzip using pigz 2.4.
bzip2	`1m0.798s`	`134M`	`bzip2`	`cfj`	Standard bzip2 will only use one core (at 100%)
bzip2	`0m9.091s`	`135M`	`pbzip2`	`c -Ipbzip2 -f`	Parallel bzip2. pbzip2 process used about `900 MiB` RAM at maximum.
lz4	`0m3.914s`	`287M`	`lz4`	`c -I"lz4" -f`	Really fast but the resulting archive is barely compressed. Worst compression king.
lz4	`0m56.506s`	`207M`	`lz4 -12`	`c -I"lz4 -12" -f`	Supports levels `-[1-12]`. Uses 1 core, and there does not appear to be any multi-threaded variant.
lzip	`4m42.017s`	`116M`	`lzip`	`c --lzip -f`	v1.21. Standard lzip will only use one core (at 100%). Very slow.
lzip	`0m42.542`	`118M`	`plzip`	`c -Iplzip -f`	plzip 1.8 (Parallel lzip), default level (-6).
lzip	`1m39.697s`	`110M`	`plzip -9`	`c -I"plzip -9" -f`	Parallel lzip at best compression (`-9`). plzip process used `5.1 GiB` RAM at maximum.
xz	`5m2.952s`	`114M`	`xz`	`cfJ`	Standard xz will only use one core (at 100%). Unbearably slow.
xz	`0m53.569s`	`115M`	`pxz`	`c -Ipxz -f`	Parallel PXZ 4.999.9beta. Process used `1.4 GiB` RAM at maximum.
xz	`1m33.441s`	`110M`	`pxz -9`	`c -I"pxz -9" -f`	Parallel PXZ 4.999.9beta using its best possible compression. pxz process used `3.5 GiB` at maximum.
zstd	`0m3.034s`	`167M`	`zstd`	`c --zstd -f`	zstd uses 1 core by default.
zstd	`1m18.238s`	`117M`	`zstd -19 -T0`	`c -I"zstd -19 -T0" -f`	`-19` gives the best possible compression and `-T0` utilizes all cores. If a non-zero number is specified, zstd uses that many cores.

Notable Takeaways[edit]

A few minor points should be apparent from above numbers:

All the standard binaries GNU/Linux distributions give you as a default for all the commonly used compression algorithms are extremely slow compared to the parallel implementations that are available but not defaults.
- This is true for bzip, there is a huge difference between 10 seconds and one minute. And it is specially true for lzip and xz, the difference between one minute and five is significant.
- The difference between the pigz parallel implementation of gzip and regular gzip may appear to be small since both are very fast. The difference between 3 and 23 seconds is huge in terms of percentage.
lzip and xz offer the best compression. They are also the slowest alternatives. This is especially true if you do not use the parallel implementations.
- Both plzip (5.1 GiB) and pxz (3.5 GiB at -9) use a lot of memory. Expect much worse performance on memory-constrained machines.
The difference between bzip2 and pbzip2 is huge. It may not appear that way since bzip is so much faster than xz and lzip but pbzip actually about ten times faster than regular bzip.
pbzip2's default compression is apparently it's best at -9. A close-up inspection of the output files reveal that they are identical (130260727b) with and without -9.

zstd, appears to be the clear winner, with leading compression speed, decompression speed, and acceptable compression ratio.

Decompressing The Linux Kernel[edit]

Compression ratio is not the only concern one may want to consider, a well-compressed archive that takes forever to decompress will make end-users unhappy. Thus; it may be worth-while to look at the respective decompression speeds.

Keep in mind that most people will not use any parallel implementation to decompress their archives, it is much more likely that they will use whatever defaults the distributions provide. And those would be.. the single-threaded implementations.

We tested decompressing using a cheat: tar xf<options> linux-5.8.1.tar.<algo> -C /tmp/ with /tmp being a tmpfs (=RAMdrive). The numbers will therefore absolutely not reflect real-world numbers. The reason we tested this way is to illustrate the difference between the pure decompression time without being bothered with disk I/O limitations.

Algorithm	Time	Command	Parameters	Comments
none	`0m1.204s`	`tar`	`-xf`	Raw tar with no compression.
gzip2	`0m4.232s`	`gzip2`	`-xfz`
gzip	`0m2.729s`	`pigz`	`-x -Ipigz -f`	gzip is a clear winner if decompression speed is the only consideration.
bzip2	`0m20.181s`	`bzip2`	`xfj`
bzip2	`0m19.533s`	`pbzip2`	`-x -Ipbzip2 -f`	The difference between bzip2 and pbzip2 when decompressing is barely measurable
lzip	`0m10.590s`	`lzip`	`-x --lzip -f`
lz4	`0m1.873s`	`lz4`	`-x -Ilz4 -f`	Fastest of them all but not very impressive considering the compression it offers is almost nonexistant.
lzip	`0m8.982s`	`plzip`	`-x -Iplzip -f`
xz	`0m7.419s`	`xz`	`-xfJ`	xz offers the best decompression speeds of all the well-compressed algorithms.
xz	`0m7.462s`	`pxz`	`-x -Ipxz -f`
zstd	`0m3.095s`	`zstd`	`-x --zstd -f`	When compressed with no options (the default compression level is 3).
zstd	`0m2.556s`	`zstd`	`x --zstd -f`	When compressed with `tar c -I"zstd -19 -T0"` (compression level 19)

TIP: tar is typically able to figure out what kind of archive you are trying to extract.
tar xf linux-5.8.1.tar.xz and tar xfJ linux-5.8.1.tar.xz will both work. You need to specify what kind of compression algorithm you want to use when you make an archive but you can omit algorithm-specific flags and let tar figure it out when you extract archives.

xz is the fastest decompressing well-compressed algorithm. gzip does offer much faster decompression but the compression ratio gzip offers is far worse. bzip2 offers much faster compression than xz but xz decompresses a lot faster than bzip2.
zstd is also looking very good when the best compression level 19 and multiple cores are used. Decompression is very fast and it is faster, not slower, when higher compression is used.

zram block drive compression[edit]

The Linux kernel allows you to create a compressed block device in RAM using the zram module. It is typically used to create a compressed RAM-backed RAM device but it does not have to be used for that purpose; you can use it like you would use any block device like a HDD or a NVMe drive. The Linux kernel supports several compression algorithms for zram devices:

$ cat /sys/block/zram0/comp_algorithm
lzo lzo-rle lz4 lz4hc 842 [zstd]

Benchmarking how these in-kernel compression algorithms block devices work in a repeatable way is a bit tricky. Here's what happens if you extract Linux 5.9 rc4 to a uncompressed kernel tmpfs:

tar xf linux-5.9-rc4.tar.gz -C /tmp/

and then create and mount a compressed zram file system using the various compression algorithms:

# Make sure you use zramX not 0 if you already
# have a zram device for swap.
#
# Create a new zram drive
cat /sys/class/zram-control/hot_add

# Select compression algorithm
echo lzo > /sys/block/zram0/comp_algorithm

# Make it big enough for the 1.1G kernel source tree
echo 2G > /sys/block/zram0/disksize

# Create a file system
mkfs.ext4  /dev/zram0

# Mount it to /mnt/tmp
mount -v /dev/zram0 /mnt/tmp

We repeated the above steps of each of the available compression algorithms (lzo lzo-rle lz4 lz4hc 842 zstd) and did the same "benchmark":

time cp -r /tmp/linux-5.9-rc4/ /mnt/tmp/
sync;zramctl

We then used zramctl to see the compressed and the total memory use by the zram device.

In case you want to try yourself, do this between each run:

umount /mnt/tmp
echo 0 > /sys/class/zram-control/hot_remove

These are the results:

Storing Linux 5.9 rc4
on a compressed zram block device
Algorithm	cp time	Data	Compressed	Total
lzo	4.571s	1.1G	387.8M	409.8M
lzo-rle	4.471s	1.1G	388M	410M
lz4	4.467s	1.1G	403.4M	426.4M
lz4hc	14.584s	1.1G	362.8M	383.2M
842	22.574s	1.1G	538.6M	570.5M
zstd	7.897s	1.1G	285.3M	298.8M

|Time, in this case, is mostly irrelevant. There is a practical difference, and the numbers in the above table do vary. However, be aware that kernel write caching was not disabled. The numbers provide an indication, and they are what time returned. They just don't accurately reflect the total time before all data was actually "written" to the zram device.

It would seem that the zstd compression algorithm is vastly superior when it comes to compressing the Linux kernel in memory. It is also notably slower than lzo-rle, not that the times listed above are very accurate, they should merely be taken as an indication.

We are not entirely clear on what compression level the kernel uses for zstd by default. For comparison,

tar c -I"zstd -19 -T0" -f linux-5.9-rc4.tar.zstd linux-5.9-rc4

produces a 117 MiB large linux-5.9-rc4.tar.zstd file while

tar c -I"zstd -3 -T0" -f linux-5.9-rc4.tar.zstd linux-5.9-rc4

produces a 166 MiB file in 1.389 seconds. Going down to level 1 (-1 increases the file size to 186M while the time is reduced to 0m1.064s. That's still one hundred megabyte less than what the Linux kernel version 5.9 rc4 uses to store itself on a zram block device. It is safe to say that the compression you can expect when you use the kernel-provided implementations of various compression algorithms differs from what you get when you create archives using tar.