tar is the standard tool for compressing and decompressing files on Linux system and it is the tool to use regardless of what kind of compression is used; compression and decompression of file archives using gzip, bzip2, xz, lzip, lzma and a few more is all typically done with tar.
tar's name is short for tape archiver which is why every
tarcommand you will use ever has to include the
f flag to tell it that you are will be working on files not a ancient tape device.
tar is used to collate collections of files into one larger file container for archiving and distribution while preserving file system information such as user and group permissions, dates, and directory structures. Containers created by tar are typically filtered through a compression algorithm to make the archive's size smaller. tar does not do any compression on it's own. tar's file format is standardized by POSIX.1-1988 and later POSIX.1-2001.
Creating a compressed file with tar is typically done by running tar
file and a compression algorithms flag followed by files and/or directories. Decompression is done by running tar with the flags e
ffile and a compression flag.
tar itself will just bundle a bunch of files together and not compress them so simply creating an archive
archive.tar will make that file be as big as the sum of the files put in them. tar archives which have been filtered through a compression algorithm are typically named
.tar.compressiontype so a bzip2 archive created with
tar cfJ would be named
The common compression types and the flags to use them are:
|short option||long option||file extension|
Brief Comparison of Compression Algorithms
- gzip is the fastest compression technology. It is designed for speed.
- bzip2 is slightly slower and compresses slightly better. However, bzip2 has a multi-core implementation called pbzip2 and using this is a lot faster than both the standard gzip and xz binaries.
- xz is extremely slow when it comes to compression and takes ages compared to gzip. It does compress significantly better than gzip and also better than bzip2. Decompression on the other hand is on-par with gzip and bzip2. xz uses the Lempel-Ziv-Markov chain algorithm encoding scheme.
- lzip is another implementation of the Lempel-Ziv-Markov chain algorithm. It does not have it's own short-hand switch. It is gaining popularity so it could get one.
HOWTO use tar to create archives
Pure tar archives are created with the cf switch.
- -c create a new archive
- -f use archive file or device F (default "-", meaning stdin/stdout)
While the manual and many examples refer to the switches with a
- and the - can be used it's actually not needed. There is no difference between
tar cf and
tar -cf and
tar -c -f.
This will create a pure container with the contents of
files/*.txt and stuff/*.jpg without doing any kind of compression:
tar cf container.tar files/*.txt stuff/*.jpg
You can add the fine
v flag for verbose output of what is going on:
tar cfv container.tar files/*.txt stuff/*.jpg
You will want to (ab)use some kind of compression for your archive. This is done by adding the right switch and the right extension to the archive you create. That would be
j and the extension
bz2 for bzip2. So we add that to the above example and get:
tar cfvj container.tar.bz2 files/*.txt stuff/*.jpg
Similarly, creating a gzip archive, where the swich is
z and the extension is
.gz, is done with
tar cfvz container.tar.gz files/*.txt stuff/*.jpg
Non-shorthand compression algorithms
You have to add the entire
--switch if you want to use a "special" algorithm without it's own short-hand flag and you have to use the
-f flag last and in this case it needs to have the
- in front to indicate it's a flag.
For example, creating a lzip archive is done with:
tar cv --lzip -f container.tar.lz files/*.txt stuff/*.jpg
HOWTO extract tar archives
Extraction of tar archives is done by running
tar with the e
xtract flag, the
file flag since we are using files not a tape drive like it's the 1980s and a flag for the compression that was used - followed by the filename. A uncompressed archive will obviously not need a compression flag. Extracting one is done with:
tar xf archive.tar
Tar has a handy
-C flag to extract files into a specific folder. Using it is as simple as adding
-C folder/. Using
-C is completely optional, files are simply extracted to the folder you are in if you omit it.
tar xf archive.tar -C ~/stuff/new/software
Most archives you encounter will be compressed with some kind of compression. Look at the files file extension and identify it's compression by it's name. archive.tar.xz would be using the xz. The flag for that is
archive.tar.xz can be extracted into
tar xfJ archive.tar.xz -C ~/stuff/new/
gzip's decompressed with
tar xfz archive.tar.gz
and bzip2's decompressed with
tar xfvj archive.tar.bz2
Use the whole
--switch if there is no short-hand for the algorithm and add
-f last before the archive's name. Extracting
--lzip archives is done with
tar xv --lzip -f archive.tar.lz
|TIP: All the distributions (we are aware of) use a binary called bzip2 for bzip2 compression and decompression by default. There is also a version called |
We hope you learned something. Good luck.