HOWTO Test Disk I/O Performance

Here's several methods for testing I/O performance on GNU/Linux ranging from irrelevant tools like dd that are utterly worthless for this purpose to actually useful ways to determine a drives real-world performance.

Almost Useless Methods

These methods are highly recommended on a lot of pages on the Internet even though they are close to utterly useless. However, they will provide you with some very basic data which can help troubleshoot if a drive is connected to a lower SATA version than expected (you should be checking that with smartctl --all /dev/sda but..) and you can get a very rough idea how a drive performs.

hdparm

hdparm can give an indication of a drives sequential read and reads from a drives and it's cache. The typical way of using it is hdparm --direct -t -T /dev/sdX where

--direct means we by-pass the kernels cache and use O_DIRECT to the drives cache^[1]
-T tests read speed from the cache (either the kernels or the drives if --direct is used)^[1]
-t indicates a drives read speed.^[1]
/dev/sdX would be your SSD or HDD.

It's output will look like:

# The Seagate Momentus 5400.6, ST9160301AS
# hdparm  --direct -t -T /dev/sda

/dev/sda:
 Timing O_DIRECT cached reads:   122 MB in  2.02 seconds =  60.31 MB/sec
 Timing O_DIRECT disk reads: 184 MB in  3.02 seconds =  61.02 MB/sec

or

# Western Digital Red (WDC WD30EFRX-68EUZN0)
# hdparm  --direct -t -T /dev/sda

/dev/sda:
 Timing O_DIRECT cached reads:   946 MB in  2.00 seconds = 472.43 MB/sec
 Timing O_DIRECT disk reads: 464 MB in  3.01 seconds = 154.26 MB/sec

or

# Samsung SSD 750 EVO 500GB (S36SNWAH582670L)
# hdparm  --direct -t -T /dev/sde

/dev/sde:
 Timing O_DIRECT cached reads:   984 MB in  2.00 seconds = 491.87 MB/sec
 Timing O_DIRECT disk reads: 1470 MB in  3.00 seconds = 489.76 MB/sec

You may notice that there's quite the difference between the three outputs above. hdparm can give you an general idea how a drive performs. But that's all you get. Two drives with similar numbers could perform very differently in situations where there's many random reads and writes; large sequential disk reads is not a very typical load. Then again, if the numbers are wildly different - like the above numbers are - and you have one drive with O_DIRECT disk reads at 60 MB/sec and other at 490 MB/sec it's very much likely that the one capable of doing 490 MB/sec is faster in just about every single workload.

dd

A lot of pages will recommend using dd to test disk performance. It's manual page clearly indicates that it's purpose is to "convert and copy a file"^[2]. dd will output how much time it takes to complete an operation - which does give a general idea how a drive performs. But that's all you get.

You can play with six dd parameters, two of which you should change, to get various performance-estimates:^[2]

if= an input device like /dev/zero or /dev/random
of= a file to write like test.file
bs= is important for "benchmarking". It specifies how many bytes are written per operation. It can be specified using k, M, G, etc.
count= specifies how many operations to do. bs=2M count=10 writes 10*2 MB = 20MB data.
oflag=dsync is something you want to always include when doing "benchmarks" with dd. oflag= specifies option flags and you want dsync (use synchronized I/O for data). You wouldn't want nonblock which specifies use non-blocking I/O or other flags.

dd if=/dev/zero of=test.file bs=64M count=1 oflag=dsync

would output something like

# dd if=/dev/zero of=test.file bs=64M count=1 oflag=dsync
1+0 records in
1+0 records out
67108864 bytes (67 MB, 64 MiB) copied, 0.285204 s, 235 MB/s

while increasing the count to 16 changes the above command so it writes 64MB 16 times = 1 GB:

# dd if=/dev/zero of=test.file bs=64M count=16 oflag=dsync
16+0 records in
16+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.45435 s, 241 MB/s

The output of this test on the "The Seagate Momentus 5400.6" (ST9160301AS) from the hdparm example above shows that dd can be useful to get an idea how a drive performs:

# dd if=/dev/zero of=test.file bs=64M count=16 oflag=dsync
16+0 records in
16+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 20.4171 s, 52.6 MB/s

Keep in mind that using a high bs= will increase the amount of data written to a very high amount if count= is used, bs=1G count=64 will write 64 GB of data.

A low number bs= number as in bs=512 count=1000 will write 512 bytes a 1000 times which amounts to a mere 512 KB. However, the throughput will be much lower since a disk sync is done each time 512 bytes is written. And the will be a difference between newer and older machines.

Old hardware:

$ dd if=/dev/zero of=test.file bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 71.6597 s, 7.1 kB/s

New hardware:

$ dd if=/dev/zero of=test.file bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 5.80012 s, 88.3 kB/s

The above outputs are very different and they do give an indication how these machines perform - but it's not that useful.

Methods of testing I/O performance which gives useful information reflecting real-world use

Two small programs for disk I/O testing stand out as more useful than most other methods: fio and bonnie++.

fio - The "flexible I/O tester"

fio is available on most distributions as a package with that name. It won't be installed by default, you will need to get it. You can click apt://fio (Ubuntu) or appstream://fio (Plasma Discover) to install it (on some distributions, anyway).

fio is not at all strait-forward or easy to use. It requires quite a lot of parameters. The ones you want are:

--name to name your test-runs "job". It's required.
--eta-newline= forces a new line for every 't' period. You'll may want --eta-newline=5s
--filename= to specify a filename to write from.
--rw= specifies if you want to a read (--rw=read) or write (--rw=write) test
--size= decides how big of a test-file it should use. --size=2g may be a good choice. A file (specified with --filename=) this size will be created so you will need to have free space for it. Increasing to --size=20g or more may give a better real-world result for larger HDDs.
- A small 200 MB file on a modern HDD won't make the read/write heads move very far. A very big file will.
--io_size= specifies how much I/O fio will do. Settings it to --io_size=10g will make it do 10 GB worth of I/O even if the --size specifies a (much) smaller file.
--blocksize= specifies the block-size it will use, --blocksize=1024k may be a good choice.
--ioengine= specifies a I/O test method to use. There's a lot to choose from. Run fio --enghelp for a long list. fio is a very versatile tool, whole books can and probably are written about it. libaio, as in --ioengine=libaio is a good choice and it is what we use in the examples below.
--fsync= tells fio to issue a fsync command which writes kernel cached pages to disk every number of blocks specified.
- --fsync=1 is useful for testing random reads and writes.
- --fsync=10000 can be used to test sequential reads and writes.
--iodepth= specifies a number of I/O units to keep in-flight.
--direct= specifies if direct I/O, which means O_DIRECT on Linux systems, should be used. You want --direct=1 to do disk performance testing.
--numjobs= specifies the number of jobs. One is enough for disk testing. Increasing this is useful if you want to test how a drive performs when many parallel jobs are running.
--runtime= makes fio terminate after a given amount of time. This overrides other values specifying how much data should be read or written. Setting --runtime=60 means that fio will exit and show results after 60 seconds even if it's not done reading or writing all the specified data. One minute is typically enough to gather useful data.
--group_reporting makes fio group it's reporting which makes the output easier to understand.

Put all the above together and we have some long commands for testing disk I/O in various ways.

Note: A file --filename= will be created with the specified --size= on the first run. This file will be created using random data due to the way some drives handle zeros. The file can be re-used in later runs if you specify the same filename and size each run.

Testing sequential read speed with very big blocks

fio --name TEST --eta-newline=5s --filename=temp.file --rw=read --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting

The resulting output will have a line under Run status group 0 (all jobs): which looks like:

WD Blue 500 GB SSD (WDC WDS500G2B0A-00SM50): bw=527MiB/s (552MB/s), 527MiB/s-527MiB/s (552MB/s-552MB/s), io=10.0GiB (10.7GB), run=19442-19442msec
The Seagate Momentus 5400.6: READ: bw=59.0MiB/s (62.9MB/s), 59.0MiB/s-59.0MiB/s (62.9MB/s-62.9MB/s), io=3630MiB (3806MB), run=60518-60518msec

The result should be close to what the hard drive manufacturer advertised and they won't be that far off the guessimates hdparm provides with the -t option. Testing this on a two-drive RAID1 array will result in both drives being utilized:

Two Samsung SSDs: READ: bw=1037MiB/s (1087MB/s), 1037MiB/s-1037MiB/s (1087MB/s-1087MB/s), io=10.0GiB (10.7GB), run=9878-9878msec

Testing sequential write speed with very big blocks

fio --name TEST --eta-newline=5s --filename=temp.file --rw=write --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting

This will a line under "Run status group 0 (all jobs):" like

WRITE: bw=55.8MiB/s (58.5MB/s), 55.8MiB/s-55.8MiB/s (58.5MB/s-58.5MB/s), io=3378MiB (3542MB), run=60575-60575msec

Note: Many modern SSDs with TLC (Tripple Level Cell) NAND will have a potentially large SLC (Single Level Cell) area used to cache writes. The drives firmware moves that data to the TLC area when the drive is otherwise idle. Doing 10 GB of I/O to a 2 GB during 60 seconds - what the above example does - is not anywhere near enough to account for the SLC cache on such drives.

You will probably not be copying 100 GB to a 240 GB SSD on a regular basis so that may have little to no practical significance. However, do know that if you do a test (assuming you have 80 GB free) of a WD Green SSD with 100 GB of I/O to a 80 GB file with a 5 minute (60*5=300) limit you'll get a lot lower results than you get if you write 10 GB to a 2 GB file. To test yourself, try

fio --name TEST --eta-newline=5s --filename=temp.file --rw=write --size=60g --io_size=100g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=300 --group_reporting

You need to increase size (files used for testing), io_size (amount of I/O done) and runtime (length the test is allowed to run to by-pass a drives caches.

Testing random 4K reads

Testing random reads is best done with a queue-depth of just one (--iodepth=1) and 32 concurrent jobs (--numjobs=32).

This will reflect real-world read performance.

fio --name TEST --eta-newline=5s --filename=temp.file --rw=randread --size=2g --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=32 --runtime=60 --group_reporting

Some example results:

The Seagate Momentus 5400.6: READ: bw=473KiB/s (484kB/s), 473KiB/s-473KiB/s (484kB/s-484kB/s), io=27.9MiB (29.2MB), run=60334-60334msec
WD Blue 500 GB SSD (WDC WDS500G2B0A-00SM50): READ: bw=284MiB/s (297MB/s), 284MiB/s-284MiB/s (297MB/s-297MB/s), io=16.6GiB (17.8GB), run=60001-60001msec

As these example results show: The difference between an older 5400 RPM HDD and a average low-end SSD is staggering when it comes to random I/O. There is a world of difference between half a megabyte and 284 megabytes per second.

Mixed random 4K read and write

The --rw option randrw tells fio to do both reads and writes. And again, a queue-depth of just one (--iodepth=1) and 32 concurrent jobs (--numjobs=32) will reflect high real-world load. This test will show the absolute worst I/O performance you can expect. Don't be shocked if a HDD shows performance-numbers that are in the low percentages of what it's specifications claim it can do.

fio --name TEST --eta-newline=5s --filename=temp.file --rw=randrw --size=2g --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=1 --runtime=60 --group_reporting

bonnie++

bonnie++ is a disk and filesystem benchmark suite. Installing it (the package is typically named bonnie++) and running it with no parameters will start a benchmark of the disk and file system in the current working directory.

bonnie++ will take care of caching and syncing and test random reads, writes and test small and large file system updates. It's tests do take some time, and by "some time" we mean hours on an old machine with an older HDD.

bonnie++ providers more real-world like testing than hdparm and dd do. However, it does have some flaws: It is single-threaded which means that some operations will appear to be slower than they actually are on any multi-core machine.

bonnie++ is a nice tool if you just want to install something and run it without having to think about adding parameters and wait and get usable results.

Note: bonnie++ will write to a test file which will be named Bonnie.$pid. This file is left behind if you abort it by pressing ctrl-c. It can be many gigabytes large.

Drives With Special Characteristics

Some HDDs and storage-solutions have special properties which should be accounted for.

"Shingled magnetic recording" (SMR) drives

Seagate has a technology called "Shingled magnetic recording" (SMR) which crams tracks closer together than they should be. Writing to a track on a SMR drive makes the drive re-write nearby tracks too. These drives will have a large on-board memory buffer and a "normal" area on the platters for caching writes that need to be done the "SMR-way" later on. This area is typically 20-40 GB depending on the drives size. The result is that SMR drives behave in a way regular drives don't: The first 20 GB written to a SMR drive will be written at expected speeds that are fairly normal for a modern HDD. Additional data written after that will bring write speeds to a crawling halt as in near-zero while the drive writes the data in its "write-buffer" and re-writes tracks near those were the new data is placed.

SMR drives can be accurately benchmarked by writing a really large amount of data to it (60 GB or so). What you'll find is that read and write speeds are absolutely dismal once it's buffer is full. This is why it's best to simply avoid Shingled magnetic recording drives.

Multi-nand SSDs

Most modern consumer SSDs have slower TLC (triple layer cell) nand and a small area of SLC (single layer cell) which is used to cache immediate writes. The drives firmware will move the data in the SLC area to the TLC area when the drive is mostly idle. What this means, in practical terms, is that a 1 GB write-test, be it sequential or random writes, will indicate a performance-level which is far higher than what you get if you exceed the SLC area. If the SLC area is 20 GB and you copy 40 GB you'll find that write performance drops by a noticeable amount. Account for this if you will be using a SSD to copy many very large files on a regular basis.

Enterprise grade SSDs will mostly not have this problem - which is something their price will reflect. You can be sure all the cheaper consumer grade SSDs like Kingson's A400 and A1000 series, WD Green and WD Blue and similarly priced drives do have this kind of behaviour.

Benchmarking Cloud/VPS storage solutions

It is actually quite hard to benchmark the performance you can expect from a cloud provider or a virtual private server provider. You can run benchmarks and get results which may or may not mean something when you deploy real-world applications. Your VPS instance could be writing to the host OS's cache when you think it's doing actual disk writes.

References

↑ ^1.0 ^1.1 ^1.2 hdparm's manual page
↑ ^2.0 ^2.1 dd's manual page

Enable comment auto-refresher

Anonymous (f4c150c0af)

29 months ago

Score 0

Excellent information. Other sites talking about dd and hdparm are useless.

Permalink | Reply

Anonymous (a228f0987f)

22 months ago

Score 0

great article but in your fio example for mixed read\write where you state "queue-depth of just one (--iodepth=1) and 32 concurrent jobs (--numjobs=32) will reflect high real-world load" your subsequent command only has --numjobs=1

Permalink | Reply

Anonymous (211212ef72)

2 months ago

Score 0

In 'Mixed random 4K read and write', the example doesn't match the statement on --numjobs, it should be 32 I think?

Permalink | Reply

[hdparmmanual-1] 1.0 ^1.1 ^1.2 hdparm's manual page

[ddmanual-2] 2.0 ^2.1 dd's manual page

[1]

[2]