Multiples of bytes

From LinuxReviews
Jump to navigationJump to search

A byte is a byte but a megabyte may not be the size you think it is. There has historically been different standards, a megabyte could be 1 000 000 (106 or 10002 B) or 1 048 576 bytes (220 or 10242 B). The modern IEC standard refers to 1 048 576 as a "mebibyte" (MiB) to differentiate it from the decimal 1 000 000 (106 B) megabyte (MB). The IEC definitions are widely accepted as modern standards. Linux tools use K, M and G to specify what IEC defines as kibibytes, mebibytes and gibibytes.

Historical confusion[edit]

Operating systems begun describing 1024 bytes as a kilobyte in the 1970s. Computers were 8-bit at the time and there was no simple term to describe 1024 bytes. Kilo was picked because it was close enough. A kilo is 1000 grams, not 1024, but 1K or KB was used to describe 1KB anyway. The terms megabyte and gigabyte became commonly used to describe larger sizes. JEDEC formalized these commonly used terms in standards like 100B.01.

Current Standards[edit]

The International Electrotechnical Commission (IEC) re-defined kilobytes, megabytes and gigabytes in "Amendment 2 to IEC International Standard IEC 60027-2" in 1998. Their standard requires one kilobyte to strictly mean 1000 bytes. They introduced new words to describe the old kilobyte, megabyte and gigabyte as well as new terms to describe larger sizes. The traditional kilobyte (KB) became a kibibyte (KiB), a megabyte (MB) became a mebibyte (MiB) and a gigabyte (GB) became a gibibyte (GiB). A vast majority of the corporations in the computer industry went along with IEC's standard.

Quantities of bytes
Decimal
Value Metric Linux
1000 kB kilobyte KB
10002 MB megabyte MB
10003 GB gigabyte GB
10004 TB terabyte TB
10005 PB petabyte PB
10006 EB exabyte EB
10007 ZB zettabyte ZB
10008 YB yottabyte YB
Binary prefix
Value IEC 80000-13 JEDEC Linux
1024 KiB kibibyte KB kilobyte K KiB
10242 MiB mebibyte MB megabyte M MiB
10243 GiB gibibyte GB gigabyte G GiB
10244 TiB tebibyte T TiB
10245 PiB pebibyte P PiB
10246 EiB exbibyte E EiB
10247 ZiB zebibyte Z ZiB
10248 YiB yobibyte Y YiB

GNU/Linux[edit]

GNU/Linux tools are typically able to accept powers of both 1000 (KB) and 1024 (K or KiB) when sizes are specified.

fallocate, from util-linux, can be used to preallocate space for a file (create a file of a given size). The fallocate manual[1] describes its length argument as:

"The length and offset arguments may be followed by the multiplicative suffixes KiB (=1024), MiB (=1024*1024), and so on for GiB, TiB, PiB, EiB, ZiB, and YiB (the "iB" is optional, e.g., "K" has the same meaning as "KiB") or the suffixes KB (=1000), MB (=1000*1000), and so on for GB, TB, PB, EB, ZB, and YB."

fallocate manual

fallocate test.file -l 1K will create a 1024 byte file (specifying with lowercase, as in fallocate test.file -l 1k, will also work).

fallocate test.file -l 1KB will create a 1000 byte file.

du, from coreutils, is commonly used to show a close estimation of how much space files or directories use. du will show powers of 1024 when the -h option is used. --si can be used to show powers of 1000. Running du -h a file which is 8689264 bytes will show 8.3M while du --si on the same file will claim it is 8.7M. It is interesting to note that du will use the M suffix in both cases.

Beware Of And Prepared For Some Confuse[edit]

mdadm, which is used to manage Linux software RAID arrays, has a manual which, as of v4.1-rc2, states[2]:

"-z, --size=

A suffix of 'K', 'M' or 'G' can be given to indicate Kilobytes, Megabytes or Gigabytes respectively."

mdadm manual page as of v4.1-rc2

K, M and G will always (in any Linux tool we checked, anyway) use sizes in powers of 1024 and mdadm is no exception. The manual describes sizes specified by M (10242) as a switch using sizes in "Megabytes". This is correct according to the historical JEDEC standard where one megabyte is 1048576 (10242) bytes but it is incorrect according to the IEC standard where a megabyte is defined as 10002 and 10242 is called a mebibyte.

There is a lot of confuse like that in GNU/Linux manual pages as well as HOWTOs and documentation. As a general rule of thumb: K, M, G, T and so on will always mean kibibyte (1024), mebibyte (10242), gibibyte (10243), tebibyte (10244) and so on even if the manual says M = megabyte. Adding a B as in KB, MB, GB, TB will make command line tools use IEC power of ten sizes specifying a IEC kilobyte (1000B), megabyte (10002B), gigabyte (10003B), terabyte (10004B) and so on.

Notes[edit]


Add your comment
LinuxReviews welcomes all comments. If you do not want to be anonymous, register or log in. It is free.