Talk:Memory benchmarking
From LinuxReviews
Jump to navigationJump to searchThe results are being pessimized by not compiling with -O2 or -O3, gcc % ./streamOO # compiled with gcc -O1 (default) ------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 8 bytes per array element. ------------------------------------------------------------- Array size = 80000000 (elements), Offset = 0 (elements) Memory per array = 610.4 MiB (= 0.6 GiB). Total memory required = 1831.1 MiB (= 1.8 GiB). Each kernel will be executed 20 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 102530 microseconds. (= 102530 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 78073.3 0.025361 0.016395 0.043100 Scale: 46793.9 0.039523 0.027354 0.071281 Add: 59053.5 0.053389 0.032513 0.131950 Triad: 59999.9 0.047117 0.032000 0.083272 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays ------------------------------------------------------------- % ./stream3 # compiled with gcc -O3 ------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 8 bytes per array element. ------------------------------------------------------------- Array size = 80000000 (elements), Offset = 0 (elements) Memory per array = 610.4 MiB (= 0.6 GiB). Total memory required = 1831.1 MiB (= 1.8 GiB). Each kernel will be executed 20 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 27529 microseconds. (= 27529 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 91584.9 0.016514 0.013976 0.025457 Scale: 66969.9 0.023857 0.019113 0.041276 Add: 74011.0 0.037041 0.025942 0.118478 Triad: 76138.9 0.035763 0.025217 0.092531 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------