Lempel-Ziv-Markov chain algorithm
The Lempel-Ziv-Markov chain-Algorithm (LZMA) is an encoding scheme which allows you to compress files immensely by using hours of raw CPU-power while allocating half your systems memory to gain a notable compression increase over both bzip2 and gzip. LZMA is not actually one concrete algorithm, there are several different algorithm using the encoding scheme. The most notable and standardized ones are xz and lzip.
Paul Sladen had this to say about our ace algorithm in 2007:
"LZMA is effectively deflate (zlib, gzip, zip) with a larger dictionary size, 32MB instead of 32kB. LZMA stands for Lempel-Ziv-Markov chain-Algorithm, after string back-references have been located, values are reduced using a Markov chain range-encoder (aka arithmetic coding) instead of Huffman coding."
The most common LZMA implementations
The by far most commonly used LZMA implementations is xz and this is the compression algorithm used to distribute the Linux kernel. There is also a competing implementation called lzip which is favored by GNU Guix.
Both the standard xz and lzip binaries you get with the distributions standard repositories are single-threaded and really slow. Both have a parallel implementation, pixz for xz and plzip for lzip. pixz is in most distributions repositories, plzip is not. Compiling plzip is fairly easy.
The difference between xz and lzip and their parallel counterparts is huge. As you can see in our Comparison of Compression Algorithms it's 5 minutes with xz at default compression vs less than two with pixz at it's best compression level. The results for lzip are similar.
xz can be invoked by tar with the
J flag. To create an archive you would use:
tar cfvJ archive.tar.xz folder1/ folder2/
Decompression is done the same way, use
tar xf per usual and add the
tar xfvJ mystuff.tar.xz
pixz can be used to create archives by running
-Iarchiveprogram as in
-Ipixz and then
-f to specify a file. Add the files or folders to be archived as last arguments. Example:
tar c -Ipixz -f archive.tar.xz folder1/ folder2/
pixz can be used for decompression too the difference between using it and standard xz with
xfvJ is so small it's not worth the hassle.
tar has support for lzip and can use it if it is present.
tar has NOT given
lzip it's own one-letter flag like
j to use
xz. Instead it's delegated to a
--lzip flag. Thus, compressing with it is done like this:
tar c --lzip -f archive.tar.lz folder1/ folder2/
Do note that compressing with lzip is really slow. You want to install plzip if you want to use lzip. Distributions do not have it so you will probably have to compile to get the technology.
plzip can be invoked by adding
-I to tar. Making an archive is done like this:
tar c -Iplzip -f archive.tar.lz folder1/ folder2/
How do xz and lzip compare to bzip2
You will find that creating a LZMA archives with both xz and lzip slower than bzip2. This applies to both xz and lzip vs bzip2 and pixz and plzip vs pbzip2. See Comparison of Compression Algorithms for tests and more detailed information.