HOWTO Test Disk I/O Performance
Here's several methods for testing I/O performance on GNU/Linux ranging from irrelevant tools like dd that are utterly worthless for this purpose to actually useful ways to determine a drives real-world performance.
Almost Useless Methods[edit]
These methods are highly recommended on a lot of pages on the Internet even though they are close to utterly useless. However, they will provide you with some very basic data which can help troubleshoot if a drive is connected to a lower SATA version than expected (you should be checking that with smartctl --all /dev/sda
but..) and you can get a very rough idea how a drive performs.
hdparm[edit]
hdparm
can give an indication of a drives sequential read and reads from a drives and it's cache. The typical way of using it is hdparm --direct -t -T /dev/sdX
where
--direct
means we by-pass the kernels cache and use O_DIRECT to the drives cache[1]-T
tests read speed from the cache (either the kernels or the drives if--direct
is used)[1]-t
indicates a drives read speed.[1]/dev/sdX
would be your SSD or HDD.
It's output will look like:
# The Seagate Momentus 5400.6, ST9160301AS # hdparm --direct -t -T /dev/sda /dev/sda: Timing O_DIRECT cached reads: 122 MB in 2.02 seconds = 60.31 MB/sec Timing O_DIRECT disk reads: 184 MB in 3.02 seconds = 61.02 MB/sec
or
# Western Digital Red (WDC WD30EFRX-68EUZN0) # hdparm --direct -t -T /dev/sda /dev/sda: Timing O_DIRECT cached reads: 946 MB in 2.00 seconds = 472.43 MB/sec Timing O_DIRECT disk reads: 464 MB in 3.01 seconds = 154.26 MB/sec
or
# Samsung SSD 750 EVO 500GB (S36SNWAH582670L) # hdparm --direct -t -T /dev/sde /dev/sde: Timing O_DIRECT cached reads: 984 MB in 2.00 seconds = 491.87 MB/sec Timing O_DIRECT disk reads: 1470 MB in 3.00 seconds = 489.76 MB/sec
You may notice that there's quite the difference between the three outputs above. hdparm
can give you an general idea how a drive performs. But that's all you get. Two drives with similar numbers could perform very differently in situations where there's many random reads and writes; large sequential disk reads is not a very typical load. Then again, if the numbers are wildly different - like the above numbers are - and you have one drive with O_DIRECT disk reads
at 60 MB/sec and other at 490 MB/sec it's very much likely that the one capable of doing 490 MB/sec is faster in just about every single workload.
dd[edit]
A lot of pages will recommend using dd
to test disk performance. It's manual page clearly indicates that it's purpose is to "convert and copy a file
"[2]. dd
will output how much time it takes to complete an operation - which does give a general idea how a drive performs. But that's all you get.
You can play with six dd parameters, two of which you should change, to get various performance-estimates:[2]
if=
an input device like/dev/zero
or/dev/random
of=
a file to write liketest.file
bs=
is important for "benchmarking". It specifies how many bytes are written per operation. It can be specified using k, M, G, etc.count=
specifies how many operations to do.bs=2M count=10
writes 10*2 MB = 20MB data.oflag=dsync
is something you want to always include when doing "benchmarks" withdd
.oflag=
specifies option flags and you wantdsync
(use synchronized I/O for data). You wouldn't wantnonblock
which specifiesuse non-blocking I/O
or other flags.
dd if=/dev/zero of=test.file bs=64M count=1 oflag=dsync
would output something like
# dd if=/dev/zero of=test.file bs=64M count=1 oflag=dsync 1+0 records in 1+0 records out 67108864 bytes (67 MB, 64 MiB) copied, 0.285204 s, 235 MB/s
while increasing the count to 16 changes the above command so it writes 64MB 16 times = 1 GB:
# dd if=/dev/zero of=test.file bs=64M count=16 oflag=dsync 16+0 records in 16+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.45435 s, 241 MB/s
The output of this test on the "The Seagate Momentus 5400.6" (ST9160301AS) from the hdparm
example above shows that dd
can be useful to get an idea how a drive performs:
# dd if=/dev/zero of=test.file bs=64M count=16 oflag=dsync 16+0 records in 16+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 20.4171 s, 52.6 MB/s
Keep in mind that using a high bs=
will increase the amount of data written to a very high amount if count=
is used, bs=1G count=64
will write 64 GB of data.
A low number bs=
number as in bs=512 count=1000
will write 512 bytes a 1000 times which amounts to a mere 512 KB. However, the throughput will be much lower since a disk sync is done each time 512 bytes is written. And the will be a difference between newer and older machines.
Old hardware:
$ dd if=/dev/zero of=test.file bs=512 count=1000 oflag=dsync 1000+0 records in 1000+0 records out 512000 bytes (512 kB, 500 KiB) copied, 71.6597 s, 7.1 kB/s
New hardware:
$ dd if=/dev/zero of=test.file bs=512 count=1000 oflag=dsync 1000+0 records in 1000+0 records out 512000 bytes (512 kB, 500 KiB) copied, 5.80012 s, 88.3 kB/s
The above outputs are very different and they do give an indication how these machines perform - but it's not that useful.
Methods of testing I/O performance which gives useful information reflecting real-world use[edit]
Two small programs for disk I/O testing stand out as more useful than most other methods: fio
and bonnie++
.
fio - The "flexible I/O tester"[edit]
fio
is available on most distributions as a package with that name. It won't be installed by default, you will need to get it. You can click apt://fio (Ubuntu) or appstream://fio (Plasma Discover) to install it (on some distributions, anyway).
fio
is not at all strait-forward or easy to use. It requires quite a lot of parameters. The ones you want are:
--name
to name your test-runs "job". It's required.--eta-newline=
forces a new line for every 't' period. You'll may want--eta-newline=5s
--filename=
to specify a filename to write from.--rw=
specifies if you want to a read (--rw=read
) or write (--rw=write
) test--size=
decides how big of a test-file it should use.--size=2g
may be a good choice. A file (specified with--filename=
) this size will be created so you will need to have free space for it. Increasing to--size=20g
or more may give a better real-world result for larger HDDs.- A small 200 MB file on a modern HDD won't make the read/write heads move very far. A very big file will.
--io_size=
specifies how much I/Ofio
will do. Settings it to--io_size=10g
will make it do 10 GB worth of I/O even if the--size
specifies a (much) smaller file.--blocksize=
specifies the block-size it will use,--blocksize=1024k
may be a good choice.--ioengine=
specifies a I/O test method to use. There's a lot to choose from. Runfio --enghelp
for a long list.fio
is a very versatile tool, whole books can and probably are written about it.libaio
, as in--ioengine=libaio
is a good choice and it is what we use in the examples below.--fsync=
tellsfio
to issue a fsync command which writes kernel cached pages to disk every number of blocks specified.--fsync=1
is useful for testing random reads and writes.--fsync=10000
can be used to test sequential reads and writes.
--iodepth=
specifies a number of I/O units to keep in-flight.--direct=
specifies if direct I/O, which means O_DIRECT on Linux systems, should be used. You want--direct=1
to do disk performance testing.--numjobs=
specifies the number of jobs. One is enough for disk testing. Increasing this is useful if you want to test how a drive performs when many parallel jobs are running.--runtime=
makesfio
terminate after a given amount of time. This overrides other values specifying how much data should be read or written. Setting--runtime=60
means thatfio
will exit and show results after 60 seconds even if it's not done reading or writing all the specified data. One minute is typically enough to gather useful data.--group_reporting
makesfio
group it's reporting which makes the output easier to understand.
Put all the above together and we have some long commands for testing disk I/O in various ways.
Note: A file --filename= will be created with the specified --size= on the first run. This file will be created using random data due to the way some drives handle zeros. The file can be re-used in later runs if you specify the same filename and size each run.
|
Testing sequential read speed with very big blocks[edit]
fio --name TEST --eta-newline=5s --filename=temp.file --rw=read --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
The resulting output will have a line under Run status group 0 (all jobs):
which looks like:
- WD Blue 500 GB SSD (WDC WDS500G2B0A-00SM50): bw=527MiB/s (552MB/s), 527MiB/s-527MiB/s (552MB/s-552MB/s), io=10.0GiB (10.7GB), run=19442-19442msec
- The Seagate Momentus 5400.6: READ: bw=59.0MiB/s (62.9MB/s), 59.0MiB/s-59.0MiB/s (62.9MB/s-62.9MB/s), io=3630MiB (3806MB), run=60518-60518msec
The result should be close to what the hard drive manufacturer advertised and they won't be that far off the guessimates hdparm
provides with the -t
option. Testing this on a two-drive RAID1 array will result in both drives being utilized:
- Two Samsung SSDs: READ: bw=1037MiB/s (1087MB/s), 1037MiB/s-1037MiB/s (1087MB/s-1087MB/s), io=10.0GiB (10.7GB), run=9878-9878msec
Testing sequential write speed with very big blocks[edit]
fio --name TEST --eta-newline=5s --filename=temp.file --rw=write --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
This will a line under "Run status group 0 (all jobs):" like
- WRITE: bw=55.8MiB/s (58.5MB/s), 55.8MiB/s-55.8MiB/s (58.5MB/s-58.5MB/s), io=3378MiB (3542MB), run=60575-60575msec
Note: Many modern SSDs with TLC (Tripple Level Cell) NAND will have a potentially large SLC (Single Level Cell) area used to cache writes. The drives firmware moves that data to the TLC area when the drive is otherwise idle. Doing 10 GB of I/O to a 2 GB during 60 seconds - what the above example does - is not anywhere near enough to account for the SLC cache on such drives.
You will probably not be copying 100 GB to a 240 GB SSD on a regular basis so that may have little to no practical significance. However, do know that if you do a test (assuming you have 80 GB free) of a WD Green SSD with 100 GB of I/O to a 80 GB file with a 5 minute (60*5=300) limit you'll get a lot lower results than you get if you write 10 GB to a 2 GB file. To test yourself, try
You need to increase size (files used for testing), io_size (amount of I/O done) and runtime (length the test is allowed to run to by-pass a drives caches. |
Testing random 4K reads[edit]
Testing random reads is best done with a queue-depth of just one (--iodepth=1
) and 32 concurrent jobs (--numjobs=32
).
This will reflect real-world read performance.
fio --name TEST --eta-newline=5s --filename=temp.file --rw=randread --size=2g --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=32 --runtime=60 --group_reporting
Some example results:
- The Seagate Momentus 5400.6: READ: bw=473KiB/s (484kB/s), 473KiB/s-473KiB/s (484kB/s-484kB/s), io=27.9MiB (29.2MB), run=60334-60334msec
- WD Blue 500 GB SSD (WDC WDS500G2B0A-00SM50): READ: bw=284MiB/s (297MB/s), 284MiB/s-284MiB/s (297MB/s-297MB/s), io=16.6GiB (17.8GB), run=60001-60001msec
As these example results show: The difference between an older 5400 RPM HDD and a average low-end SSD is staggering when it comes to random I/O. There is a world of difference between half a megabyte and 284 megabytes per second.
Mixed random 4K read and write[edit]
The --rw
option randrw
tells fio
to do both reads and writes. And again, a queue-depth of just one (--iodepth=1
) and 32 concurrent jobs (--numjobs=32
) will reflect high real-world load. This test will show the absolute worst I/O performance you can expect. Don't be shocked if a HDD shows performance-numbers that are in the low percentages of what it's specifications claim it can do.
fio --name TEST --eta-newline=5s --filename=temp.file --rw=randrw --size=2g --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=32 --runtime=60 --group_reporting
bonnie++[edit]
bonnie++ is a disk and filesystem benchmark suite. Installing it (the package is typically named bonnie++
) and running it with no parameters will start a benchmark of the disk and file system in the current working directory.
bonnie++ will take care of caching and syncing and test random reads, writes and test small and large file system updates. It's tests do take some time, and by "some time" we mean hours on an old machine with an older HDD.
bonnie++ providers more real-world like testing than hdparm and dd do. However, it does have some flaws: It is single-threaded which means that some operations will appear to be slower than they actually are on any multi-core machine.
bonnie++ is a nice tool if you just want to install something and run it without having to think about adding parameters and wait and get usable results.
Note: bonnie++ will write to a test file which will be named Bonnie.$pid. This file is left behind if you abort it by pressing ctrl-c. It can be many gigabytes large. |
Drives With Special Characteristics[edit]
Some HDDs and storage-solutions have special properties which should be accounted for.
"Shingled magnetic recording" (SMR) drives[edit]
Seagate has a technology called "Shingled magnetic recording" (SMR) which crams tracks closer together than they should be. Writing to a track on a SMR drive makes the drive re-write nearby tracks too. These drives will have a large on-board memory buffer and a "normal" area on the platters for caching writes that need to be done the "SMR-way" later on. This area is typically 20-40 GB depending on the drives size. The result is that SMR drives behave in a way regular drives don't: The first 20 GB written to a SMR drive will be written at expected speeds that are fairly normal for a modern HDD. Additional data written after that will bring write speeds to a crawling halt as in near-zero while the drive writes the data in its "write-buffer" and re-writes tracks near those were the new data is placed.
SMR drives can be accurately benchmarked by writing a really large amount of data to it (60 GB or so). What you'll find is that read and write speeds are absolutely dismal once it's buffer is full. This is why it's best to simply avoid Shingled magnetic recording drives.
Multi-nand SSDs[edit]
Most modern consumer SSDs have slower TLC (triple layer cell) nand and a small area of SLC (single layer cell) which is used to cache immediate writes. The drives firmware will move the data in the SLC area to the TLC area when the drive is mostly idle. What this means, in practical terms, is that a 1 GB write-test, be it sequential or random writes, will indicate a performance-level which is far higher than what you get if you exceed the SLC area. If the SLC area is 20 GB and you copy 40 GB you'll find that write performance drops by a noticeable amount. Account for this if you will be using a SSD to copy many very large files on a regular basis.
Enterprise grade SSDs will mostly not have this problem - which is something their price will reflect. You can be sure all the cheaper consumer grade SSDs like Kingson's A400 and A1000 series, WD Green and WD Blue and similarly priced drives do have this kind of behaviour.
Benchmarking Cloud/VPS storage solutions[edit]
It is actually quite hard to benchmark the performance you can expect from a cloud provider or a virtual private server provider. You can run benchmarks and get results which may or may not mean something when you deploy real-world applications. Your VPS instance could be writing to the host OS's cache when you think it's doing actual disk writes.
Enable comment auto-refresher
Anonymous (f4c150c0af)
Permalink |
Anonymous (a228f0987f)
Permalink |
Anonymous (211212ef72)
Permalink |
Anonymous (2b7b64e1e7)
Permalink |