Memory benchmarking

From LinuxReviews
Jump to navigationJump to search

Linux Distributions appear to lack very simple tools for doing simple memory benchmarks of computers and VPS nodes. However, there are some tools which can be easily compiled and used to get a rough yet useful estimate of a computers memory performance.

STREAM

The The Department of Computer Science at the University of Virginia in the US has a very simple memory benchmarking tool called STREAM. It is written with "supercomputers" in mind but it works just fine on any computer.

STREAM's homepage has nothing but links benchmark results which are less interesting. However, the FAQ's getting started page has a link to the raw source-code and instructions for compiling it. You only need to download one file and compile it with GCC using standard options to use STREAM on a GNU/Linux machine. Download stream.c:

wget http://www.cs.virginia.edu/stream/FTP/Code/stream.c

(steam.c 5.10, copy in case it gets removed)

and compile it:

gcc -O stream.c -o stream

then run your fine stream binary:

./stream

Results

AMD A8-7600 Radeon R7 / DDR3 1600

Running ./stream produces the following on a AMD A8-7600 Radeon R7 with DDR3 memory running at 1600 MT/s (can be checked with dmidecode |grep MT):

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            6762.8     0.024147     0.023659     0.025088
Scale:           6749.1     0.024086     0.023707     0.025246
Add:             7896.6     0.030838     0.030393     0.032152
Triad:           7896.3     0.030804     0.030394     0.032172

AMD Opteron(tm) Processor 3365 / DDR3 1333

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            6238.6     0.026221     0.025647     0.028108
Scale:           6035.5     0.027099     0.026510     0.028166
Add:             7016.9     0.035991     0.034203     0.038751
Triad:           6783.5     0.036459     0.035380     0.039577

AMD Ryzen 5 2400G / DDR4 3000

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           15299.3     0.010567     0.010458     0.010715
Scale:          15714.2     0.010439     0.010182     0.010942
Add:            19301.9     0.012571     0.012434     0.012680
Triad:          19060.0     0.012695     0.012592     0.012823

AMD Ryzen 5 1600X / DDR4 2666

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           14815.0     0.011304     0.010800     0.012203
Scale:          15555.3     0.011094     0.010286     0.012718
Add:            18744.1     0.013326     0.012804     0.014540
Triad:          18873.8     0.013266     0.012716     0.014157

Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz / DDR3 1600

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           13963.0     0.011535     0.011459     0.011761
Scale:          13701.0     0.012012     0.011678     0.012899
Add:            15690.4     0.015560     0.015296     0.016412
Triad:          15439.2     0.015704     0.015545     0.016123

Intel(R) Pentium(R) CPU N4200 @ 1.10GHz / DDR3 1600

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            6326.4     0.026008     0.025291     0.027652
Scale:           6189.1     0.028329     0.025852     0.033533
Add:             6569.6     0.036891     0.036532     0.037660
Triad:           6708.0     0.036325     0.035778     0.037086

Testing larger amounts

Compiling stream with -DSTREAM_ARRAY_SIZE=100000000 increases it's memory use from a default of just 0.2 GB to 2.2 GB - which is nothing on modern machines. It may be wise to call the binary something reflecting the array side used to compile it:

gcc -O -DSTREAM_ARRAY_SIZE=100000000 stream.c -o stream.100M

The results are very similar with the larger array size. The memory test does take longer.

sysbench

sysbench is a cli benchmarking suite which is available in most distributions under the package name sysbench. It comes with several built-in tests:

  • fileio - File I/O test
  • cpu - CPU performance test
  • memory - Memory functions speed test
  • threads - Threads subsystem performance test
  • mutex - Mutex performance test

The memory test will, as the name implies, benchmark memory performance. It has several options which can be revealed by running sysbench memory help

  • --memory-block-size=SIZE size of memory block for test [1K]
  • --memory-total-size=SIZE total size of data to transfer [100G]
  • --memory-scope=STRING memory access scope {global,local} [global]
  • --memory-hugetlb[=on|off] allocate memory from HugeTLB pool [off]
  • --memory-oper=STRING type of memory operations {read, write, none} [write]
  • --memory-access-mode=STRING memory access mode {seq,rnd} [seq]

Running sysbench with the default options for a test is done by sysbench testname run. To run a write memory test, which is the default, type:

sysbench memory run

You may also want to check sysbench --memory-access-mode=rnd memory run to see the differences between sequential (the default test) and random memory writes.

AMD A8-7600 Radeon R7 / DDR3 1600 (4x4 GB)

  • seq: 2767.71 MiB/sec
  • rnd: 538.37 MiB/sec

AMD Opteron(tm) Processor 3365 / DDR3 1333

  • seq: 2229.98 MiB/sec

AMD Ryzen 5 2400G / DDR4 2800 (2x16 GB)

  • seq: 5979.86 MiB/sec
  • rnd: 1872.83 MiB/sec

AMD Ryzen 5 1600X / DDR4 2666 (4x8 GB)

  • seq: 6197.66 MiB/sec
  • rnd: 1729.55 MiB/sec

Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz / DDR3 1600

  • seq: 4936.25 MiB/sec
  • rnd: 1533.87 MiB/sec

Intel(R) Pentium(R) CPU N4200 @ 1.10GHz / DDR3 1600

  • seq: 3400.94 MiB/sec

Takeaways

Memory performance depends just as much or more on the CPU than the memory type and speed. This is specially true when it comes to DDR 1600 on both older AMD processors and low-powered Intel processors like the N4200. You would think that the same type of memory sticks would perform similarly on two different machines but that is simply not the case. The CPU/platform (and perhaps other factors like motherboard) makes a huge difference.


avatar

Anonymous (409cac0fc2)

2 months ago
Score 0

$ dmesg | grep -i CPU0 [ 0.316999] smpboot: CPU0: AMD Phenom(tm) II X4 965 Processor (family: 0x10, model: 0x4, stepping: 0x3) $ ./stream.100M Function Best Rate MB/s Avg time Min time Max time Copy: 6064.5 0.026591 0.026383 0.027162 Scale: 5908.2 0.027230 0.027081 0.027543 Add: 6920.2 0.034789 0.034681 0.035098

Triad: 6864.2 0.035369 0.034964 0.036059
Add your comment
LinuxReviews welcomes all comments. If you do not want to be anonymous, register or log in. It is free.