HOWTO Test Disk I/O Performance
Here's several methods for testing I/O performance on GNU/Linux ranging from irrelevant tools like dd that are utterly worthless for this purpose to actually useful ways to determine a drives real-world performance.
- 1 Almost Useless Methods
- 2 Methods of testing I/O performance which gives useful information reflecting real-world use
- 3 Drives With Special Characteristics
- 4 References
Almost Useless Methods
These methods are highly recommended on a lot of pages on the Internet even though they are close to utterly useless. However, they will provide you with some very basic data which can help troubleshoot if a drive is connected to a lower SATA version than expected (you should be checking that with
smartctl --all /dev/sda but..) and you can get a very rough idea how a drive performs.
hdparm can give an indication of a drives sequential read and reads from a drives and it's cache. The typical way of using it is
hdparm --direct -t -T /dev/sdX where
--directmeans we by-pass the kernels cache and use O_DIRECT to the drives cache
-Ttests read speed from the cache (either the kernels or the drives if
-tindicates a drives read speed.
/dev/sdXwould be your SSD or HDD.
It's output will look like:
# The Seagate Momentus 5400.6, ST9160301AS # hdparm --direct -t -T /dev/sda /dev/sda: Timing O_DIRECT cached reads: 122 MB in 2.02 seconds = 60.31 MB/sec Timing O_DIRECT disk reads: 184 MB in 3.02 seconds = 61.02 MB/sec
# Western Digital Red (WDC WD30EFRX-68EUZN0) # hdparm --direct -t -T /dev/sda /dev/sda: Timing O_DIRECT cached reads: 946 MB in 2.00 seconds = 472.43 MB/sec Timing O_DIRECT disk reads: 464 MB in 3.01 seconds = 154.26 MB/sec
# Samsung SSD 750 EVO 500GB (S36SNWAH582670L) # hdparm --direct -t -T /dev/sde /dev/sde: Timing O_DIRECT cached reads: 984 MB in 2.00 seconds = 491.87 MB/sec Timing O_DIRECT disk reads: 1470 MB in 3.00 seconds = 489.76 MB/sec
You may notice that there's quite the difference between the three outputs above.
hdparm can give you an general idea how a drive performs. But that's all you get. Two drives with similar numbers could perform very differently in situations where there's many random reads and writes; large sequential disk reads is not a very typical load. Then again, if the numbers are wildly different - like the above numbers are - and you have one drive with
O_DIRECT disk reads at 60 MB/sec and other at 490 MB/sec it's very much likely that the one capable of doing 490 MB/sec is faster in just about every single workload.
A lot of pages will recommend using
dd to test disk performance. It's manual page clearly indicates that it's purpose is to "
convert and copy a file".
dd will output how much time it takes to complete an operation - which does give a general idea how a drive performs. But that's all you get.
You can play with six dd parameters, two of which you should change, to get various performance-estimates:
if=an input device like
of=a file to write like
bs=is important for "benchmarking". It specifies how many bytes are written per operation. It can be specified using k, M, G, etc.
count=specifies how many operations to do.
bs=2M count=10writes 1*10 MB = 20MB data.
oflag=dsyncis something you want to always include when doing "benchmarks" with
oflag=specifies option flags and you want
dsync(use synchronized I/O for data). You wouldn't want
use non-blocking I/Oor other flags.
dd if=/dev/zero of=test.file bs=64M count=1 oflag=dsync
would output something like
# dd if=/dev/zero of=test.file bs=64M count=1 oflag=dsync 1+0 records in 1+0 records out 67108864 bytes (67 MB, 64 MiB) copied, 0.285204 s, 235 MB/s
while increasing the count to 16 changes the above command so it writes 64MB 16 times = 1 GB:
# dd if=/dev/zero of=test.file bs=64M count=16 oflag=dsync 16+0 records in 16+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.45435 s, 241 MB/s
The output of this test on the "The Seagate Momentus 5400.6" (ST9160301AS) from the
hdparm example above shows that
dd can be useful to get an idea how a drive performs:
# dd if=/dev/zero of=test.file bs=64M count=16 oflag=dsync 16+0 records in 16+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 20.4171 s, 52.6 MB/s
Keep in mind that using a high
bs= will increase the amount of data written to a very high amount if
count= is used,
bs=1G count=64 will write 64 GB of data.
A low number
bs= number as in
bs=512 count=1000 will write 512 bytes a 1000 times which amounts to a mere 512 KB. However, the throughput will be much lower since a disk sync is done each time 512 bytes is written. And the will be a difference between newer and older machines.
$ dd if=/dev/zero of=test.file bs=512 count=1000 oflag=dsync 1000+0 records in 1000+0 records out 512000 bytes (512 kB, 500 KiB) copied, 71.6597 s, 7.1 kB/s
$ dd if=/dev/zero of=test.file bs=512 count=1000 oflag=dsync 1000+0 records in 1000+0 records out 512000 bytes (512 kB, 500 KiB) copied, 5.80012 s, 88.3 kB/s
The above outputs are very different and they do give an indication how these machines perform - but it's not that useful.
Methods of testing I/O performance which gives useful information reflecting real-world use
Two small programs for disk I/O testing stand out as more useful than most other methods:
fio - The "flexible I/O tester"
fio is available on most distributions as a package with that name. It won't be installed by default, you will need to get it. You can click apt://fio (Ubuntu) or appstream://fio (Plasma Discover) to install it (on some distributions, anyway).
fio is not at all strait-forward or easy to use. It requires quite a lot of parameters. The ones you want are:
--nameto name your test-runs "job". It's required.
--eta-newline=forces a new line for every 't' period. You'll may want
--filename=to specify a filename to write from.
--rw=specifies if you want to a read (
--rw=read) or write (
--size=decides how big of a test-file it should use.
--size=2gmay be a good choice. A file (specified with
--filename=) this size will be created so you will need to have free space for it. Increasing to
--size=20gor more may give a better real-world result for larger HDDs.
- A small 200 MB file on a modern HDD won't make the read/write heads move very far. A very big file will.
--io_size=specifies how much I/O
fiowill do. Settings it to
--io_size=10gwill make it do 10 GB worth of I/O even if the
--sizespecifies a (much) smaller file.
--blocksize=specifies the block-size it will use,
--blocksize=1024kmay be a good choice.
--ioengine=specifies a I/O test method to use. There's a lot to choose from. Run
fio --enghelpfor a long list.
fiois a very versatile tool, whole books can and probably are written about it.
libaio, as in
--ioengine=libaiois a good choice and it is what we use in the examples below.
fioto issue a fsync command which writes kernel cached pages to disk every number of blocks specified.
--fsync=1is useful for testing random reads and writes.
--fsync=10000can be used to test sequential reads and writes.
--iodepth=specifies a number of I/O units to keep in-flight.
--direct=specifies if direct I/O, which means O_DIRECT on Linux systems, should be used. You want
--direct=1to do disk performance testing.
--numjobs=specifies the number of jobs. One is enough for disk testing. Increasing this is useful if you want to test how a drive performs when many parallel jobs are running.
fioterminate after a given amount of time. This overrides other values specifying how much data should be read or written. Setting
fiowill exit and show results after 60 seconds even if it's not done reading or writing all the specified data. One minute is typically enough to gather useful data.
fiogroup it's reporting which makes the output easier to understand.
Put all the above together and we have some long commands for testing disk I/O in various ways.
|Note: A file |
Testing sequential read speed with very big blocks
fio --name TEST --eta-newline=5s --filename=temp.file --rw=read --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
The resulting output will have a line under
Run status group 0 (all jobs): which looks like:
- WD Blue 500 GB SSD (WDC WDS500G2B0A-00SM50): bw=527MiB/s (552MB/s), 527MiB/s-527MiB/s (552MB/s-552MB/s), io=10.0GiB (10.7GB), run=19442-19442msec
- The Seagate Momentus 5400.6: READ: bw=59.0MiB/s (62.9MB/s), 59.0MiB/s-59.0MiB/s (62.9MB/s-62.9MB/s), io=3630MiB (3806MB), run=60518-60518msec
The result should be close to what the hard drive manufacturer advertised and they won't be that far off the guessimates
hdparm provides with the
-t option. Testing this on a two-drive RAID1 array will result in both drives being utilized:
- Two Samsung SSDs: READ: bw=1037MiB/s (1087MB/s), 1037MiB/s-1037MiB/s (1087MB/s-1087MB/s), io=10.0GiB (10.7GB), run=9878-9878msec
Testing sequential write speed with very big blocks
fio --name TEST --eta-newline=5s --filename=temp.file --rw=write --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
This will a line under "Run status group 0 (all jobs):" like
- WRITE: bw=55.8MiB/s (58.5MB/s), 55.8MiB/s-55.8MiB/s (58.5MB/s-58.5MB/s), io=3378MiB (3542MB), run=60575-60575msec
|Note: Many modern SSDs with TLC (Tripple Level Cell) NAND will have a potentially large SLC (Single Level Cell) area used to cache writes. The drives firmware moves that data to the TLC area when the drive is otherwise idle. Doing 10 GB of I/O to a 2 GB during 60 seconds - what the above example does - is not anywhere near enough to account for the SLC cache on such drives.
You will probably not be copying 100 GB to a 240 GB SSD on a regular basis so that may have little to no practical significance. However, do know that if you do a test (assuming you have 80 GB free) of a WD Green SSD with 100 GB of I/O to a 80 GB file with a 5 minute (60*5=300) limit you'll get a lot lower results than you get if you write 10 GB to a 2 GB file. To test yourself, try
You need to increase size (files used for testing), io_size (amount of I/O done) and runtime (length the test is allowed to run to by-pass a drives caches.
Testing random 4K reads
Testing random reads is best done with a queue-depth of just one (
--iodepth=1) and 32 concurrent jobs (
This will reflect real-world read performance.
fio --name TEST --eta-newline=5s --filename=temp.file --rw=randread --size=2g --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=32 --runtime=60 --group_reporting
Some example results:
- The Seagate Momentus 5400.6: READ: bw=473KiB/s (484kB/s), 473KiB/s-473KiB/s (484kB/s-484kB/s), io=27.9MiB (29.2MB), run=60334-60334msec
- WD Blue 500 GB SSD (WDC WDS500G2B0A-00SM50): READ: bw=284MiB/s (297MB/s), 284MiB/s-284MiB/s (297MB/s-297MB/s), io=16.6GiB (17.8GB), run=60001-60001msec
As these example results show: The difference between an older 5400 RPM HDD and a average low-end SSD is staggering when it comes to random I/O. There is a world of difference between half a megabyte and 284 megabytes per second.
Mixed random 4K read and write
fio to do both reads and writes. And again, a queue-depth of just one (
--iodepth=1) and 32 concurrent jobs (
--numjobs=32) will reflect high real-world load. This test will show the absolute worst I/O performance you can expect. Don't be shocked if a HDD shows performance-numbers that are in the low percentages of what it's specifications claim it can do.
fio --name TEST --eta-newline=5s --filename=temp.file --rw=randrw --size=2g --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=1 --runtime=60 --group_reporting
bonnie++ is a disk and filesystem benchmark suite. Installing it (the package is typically named
bonnie++) and running it with no parameters will start a benchmark of the disk and file system in the current working directory.
bonnie++ will take care of caching and syncing and test random reads, writes and test small and large file system updates. It's tests do take some time, and by "some time" we mean hours on an old machine with an older HDD.
bonnie++ providers more real-world like testing than hdparm and dd do. However, it does have some flaws: It is single-threaded which means that some operations will appear to be slower than they actually are on any multi-core machine.
bonnie++ is a nice tool if you just want to install something and run it without having to think about adding parameters and wait and get usable results.
|Note: bonnie++ will write to a test file which will be named Bonnie.$pid. This file is left behind if you abort it by pressing ctrl-c. It can be many gigabytes large.|
Drives With Special Characteristics
Some HDDs and storage-solutions have special properties which should be accounted for.
"Shingled magnetic recording" (SMR) drives
Seagate has a technology called "Shingled magnetic recording" (SMR) which crams tracks closer together than they should be. Writing to a track on a SMR drive makes the drive re-write nearby tracks too. These drives will have a large on-board memory buffer and a "normal" area on the platters for caching writes that need to be done the "SMR-way" later on. This area is typically 20-40 GB depending on the drives size. The result is that SMR drives behave in a way regular drives don't: The first 20 GB written to a SMR drive will be written at expected speeds that are fairly normal for a modern HDD. Additional data written after that will bring write speeds to a crawling halt as in near-zero while the drive writes the data in its "write-buffer" and re-writes tracks near those were the new data is placed.
SMR drives can be accurately benchmarked by writing a really large amount of data to it (60 GB or so). What you'll find is that read and write speeds are absolutely dismal once it's buffer is full. This is why it's best to simply avoid Shingled magnetic recording drives.
Most modern consumer SSDs have slower TLC (triple layer cell) nand and a small area of SLC (single layer cell) which is used to cache immediate writes. The drives firmware will move the data in the SLC area to the TLC area when the drive is mostly idle. What this means, in practical terms, is that a 1 GB write-test, be it sequential or random writes, will indicate a performance-level which is far higher than what you get if you exceed the SLC area. If the SLC area is 20 GB and you copy 40 GB you'll find that write performance drops by a noticeable amount. Account for this if you will be using a SSD to copy many very large files on a regular basis.
Enterprise grade SSDs will mostly not have this problem - which is something their price will reflect. You can be sure all the cheaper consumer grade SSDs like Kingson's A400 and A1000 series, WD Green and WD Blue and similarly priced drives do have this kind of behaviour.
Benchmarking Cloud/VPS storage solutions
It is actually quite hard to benchmark the performance you can expect from a cloud provider or a virtual private server provider. You can run benchmarks and get results which may or may not mean something when you deploy real-world applications. Your VPS instance could be writing to the host OS's cache when you think it's doing actual disk writes.