Lempel-Ziv-Markov chain algorithm

From LinuxReviews
Jump to navigationJump to search

The Lempel-Ziv-Markov chain-Algorithm (LZMA) is an encoding scheme which allows you to compress files immensely by using hours of raw CPU-power while allocating half your systems memory to gain a notable compression increase over both bzip2 and gzip. LZMA is not actually one concrete algorithm, there are several different algorithm using the encoding scheme. The most notable and standardized ones are xz and lzip.

Technology

Paul Sladen had this to say about our ace algorithm in 2007:

"LZMA is effectively deflate (zlib, gzip, zip) with a larger dictionary size, 32MB instead of 32kB. LZMA stands for Lempel-Ziv-Markov chain-Algorithm, after string back-references have been located, values are reduced using a Markov chain range-encoder (aka arithmetic coding) instead of Huffman coding."

The most common LZMA implementations

The by far most commonly used LZMA implementations is xz and this is the compression algorithm used to distribute the Linux kernel. There is also a competing implementation called lzip which is favored by GNU Guix.

Both the standard xz and lzip binaries you get with the distributions standard repositories are single-threaded and really slow. Both have a parallel implementation, pixz for xz and plzip for lzip. pixz is in most distributions repositories, plzip is not. Compiling plzip is fairly easy.

The difference between xz and lzip and their parallel counterparts is huge. As you can see in our Comparison of Compression Algorithms it's 5 minutes with xz at default compression vs less than two with pixz at it's best compression level. The results for lzip are similar.

xz basics

xz can be invoked by tar with the J flag. To create an archive you would use:

tar cfvJ archive.tar.xz folder1/ folder2/

Decompression is done the same way, use tar xf per usual and add the J flag,

tar xfvJ mystuff.tar.xz

You can find more information about xz in xz's fine manual. See the tar manual if you are not familiar with the standard way of compressing and decompressing files.

pixz basics

pixz can be used to create archives by running tar with create then -Iarchiveprogram as in -Ipixz and then -f to specify a file. Add the files or folders to be archived as last arguments. Example:

tar c -Ipixz -f archive.tar.xz folder1/ folder2/

pixz can be used for decompression too[1] the difference between using it and standard xz with xfvJ is so small it's not worth the hassle.

lzip basics

tar has support for lzip and can use it if it is present. tar has NOT given lzip it's own one-letter flag like j to use bzip2 or J for xz. Instead it's delegated to a --lzip flag. Thus, compressing with it is done like this:

tar c --lzip -f archive.tar.lz folder1/ folder2/

Do note that compressing with lzip is really slow. You want to install plzip if you want to use lzip. Distributions do not have it so you will probably have to compile to get the technology.

plzip basics

plzip can be invoked by adding -I to tar. Making an archive is done like this:

tar c -Iplzip -f archive.tar.lz folder1/ folder2/

How do xz and lzip compare to bzip2

You will find that creating a LZMA archives with both xz and lzip slower than bzip2. This applies to both xz and lzip vs bzip2 and pixz and plzip vs pbzip2. See Comparison of Compression Algorithms for tests and more detailed information.

notes