tar

From LinuxReviews
Jump to navigationJump to search

tar is the standard tool for compressing and decompressing files on Linux systems and it's the tool to use regardless of what kind of compression is used; compression and decompression of file archives using gzip, bzip2, xz, lzip, lzma and a few more are all typically done with tar.

Introduction

tar's name is short for tape archiver which is why every tar command you will ever use has to include the f flag to tell it that you will be working on files, not an ancient tape device.

tar is used to collate collections of files into one larger file container for archiving and distribution while preserving file system information such as user and group permissions, dates, and directory structures. Containers created by tar are typically filtered through a compression algorithm to make the archive's size smaller. tar does not do any compression on its own. tar's file format is standardized by POSIX.1-1988 and later POSIX.1-2001.

Creating a compressed file with tar is typically done by running tar create file and a compression algorithms flag followed by files and/or directories. Decompression is done by running tar with the flags extract file and a compression flag.

tar itself will just bundle a bunch of files together and not compress them so simply creating an archive archive.tar will result in a file as big as the sum of the files inside it. tar archives which have been filtered through a compression algorithm are typically named .tar.compressiontype so a bzip2 archive created with tar cfJ would be named archive.tar.bz2

The common compression types and the flags to use them are:

short option long option file extension
z --gzip .tar.gz
j --bzip2 .tar.bz2
J --xz .tar.xz
z --compress ?
--lzip tar.lz
--lzma tar.lzma
--zstd .tar.z

Brief Comparison of Compression Algorithms

  • gzip is the fastest compression technology. It is designed for speed.
  • bzip2 is slightly slower and compresses slightly better. However, bzip2 has a multi-core implementation called pbzip2 and using this is a lot faster than both the standard gzip and xz binaries.
  • xz is extremely slow when it comes to compression and takes ages compared to gzip. It does compress significantly better than gzip and also better than bzip2. Decompression on the other hand is on-par with gzip and bzip2. xz uses the Lempel-Ziv-Markov chain algorithm encoding scheme.
  • lzip is another implementation of the Lempel-Ziv-Markov chain algorithm. It does not have its own short-hand switch, though it's gaining popularity so it could get one.

HOWTO use tar to create archives

Pure tar archives are created with the cf switch.

  • -c create a new archive
  • -f use archive file or device F (default "-", meaning stdin/stdout)

While the manual and many examples refer to the switches with a - and the - can be used it's actually not needed. There is no difference between tar cf and tar -cf and tar -c -f.

This will create a pure container with the contents of files/*.txt and stuff/*.jpg without doing any kind of compression:

tar cf container.tar files/*.txt stuff/*.jpg

You can add the fine v flag for verbose output of what is going on:

tar cfv container.tar files/*.txt stuff/*.jpg

You will want to (ab)use some kind of compression for your archive. This is done by adding the right switch and the right extension to the archive you create. That would be j and the extension bz2 for bzip2. So we add that to the above example and get:

tar cfvj container.tar.bz2 files/*.txt stuff/*.jpg

Similarly, creating a gzip archive, where the swich is z and the extension is .gz, is done with

tar cfvz container.tar.gz files/*.txt stuff/*.jpg

Non-shorthand compression algorithms

You have to add the entire --switch if you want to use a "special" algorithm without its own short-hand flag and you have to use the -f flag last and in this case it needs to have the - in front to indicate it's a flag.

For example, creating an lzip archive is done with:

tar cv --lzip -f container.tar.lz files/*.txt stuff/*.jpg

HOWTO extract tar archives

Extraction of tar archives is done by running tar with the extract flag, the file flag since we are using files not a tape drive like it's the 1980s and a flag for the compression that was used - followed by the filename. An uncompressed archive will obviously not need a compression flag. Extracting one is done with:

tar xf archive.tar

Tar has a handy -C flag to extract files into a specific folder. Using it is as simple as adding -C folder/. Using -C is completely optional, files are simply extracted to the folder you are in if you omit it.

tar xf archive.tar -C ~/stuff/new/software

Most archives you encounter will be compressed with some kind of compression. Look at the files file extension and identify it's compression by it's name. archive.tar.xz would be using the xz. The flag for that is J so archive.tar.xz can be extracted into ~/stuff/new/ with:

tar xfJ archive.tar.xz -C ~/stuff/new/

gzip's decompressed with tar xfz archive.tar.gz

and bzip2's decompressed with tar xfvj archive.tar.bz2

Use the whole --switch if there is no short-hand for the algorithm and add -f last before the archive's name. Extracting --lzip archives is done with

tar xv --lzip -f archive.tar.lz

Lovelyz Kei ProTip.jpg
TIP: All the distributions (we are aware of) use a binary called bzip2 for bzip2 compression and decompression by default. There is also a version called pbzip2 which can into multi-core. bzip2 is single-threaded. The performance difference is huge depending on now many cores you have. If you are not a core-let you should absolutely install pbzip2 and mv /usr/bin/bzip2 /usr/bin/bzip2.bak; ln -s /usr/bin/pbzip2 /usr/bin/bzip2

We hope you learned something. Good luck.

More information

Still Confused? Ask Questions


Add your comment
LinuxReviews welcomes all comments. If you do not want to be anonymous, register or log in. It is free.