The Benefits Of Having A Compressed zram Swap Device On Linux

From LinuxReviews
Jump to navigationJump to search
Memory icon.svg

The zram Linux kernel module lets you create a compressed memory block device that can be used to create compressed memory-backed swap devices. There are some trade-offs, compile-times and things of that nature can be lower with a zram swap device but it could also result in a gigantic performance penalty. It seems that bigger seems better on real hardware with a slow HDD while smaller is better in virtual machines.

written by 윤채경 (Yoon Chae-kyung). published 2020-09-19last edited 2020-09-20

Vmstat showing a machine under memory pressure.jpg
vmstat showing a machine swapping in and out due to memory over-commitment. Failure to deal with intensive swapping on a high-value production server could get you fired.

Linux, and other operating systems, tend to crawl to a grinding halt in low-memory situations. Hard drives, SSDs and even NVMe drives are magnitudes slower than RAM. This leads to a huge performance drop when a machine runs out of actual memory and starts using a spinning hard drive or a SSD as if it were fast random access memory. This is where a compressed-memory swap device could and should, but sometimes isn't, be beneficial.

How much, if any, performance benefit does zram a compressed RAM swap drive provide? We decided to see by measuring how long it takes to compile Chromium 85. Did did that

  • In a virtual machine with no practical memory constraints compared to how it takes with 8 GiB memory with various-sized zram swap devices.
  • On really weak old real AMD AM1 quad-core Athlon 5350 at an impressive 2.05 GHz with a really slow old HDD.

The Fedora 33 operating system, scheduled towards the end of October, has zram swap enabled by default using tool called zram-generator. The new default Fedora configuration is to have no real disk-backed swap (which rules out suspending to disk) and instead have a zram compressed memory swap device half the size of the total system memory on machines with 8 GiB RAM or less. There is an upper limit of 4 GiB for machines with more RAM so it will never be bigger than 4 GiB even if you have 128 GiB RAM.

Chromium Compile Failure With 8 GiB RAM and 4 GiB zswap.jpg
Chromium 85 failing to compile in a VM with 8 GiB RAM, no real swap and a 4 GiB zram swap device. This is the default configuration on Fedora 33.

We first timed the Chromium 85 compile-time on Fedora 32 in a QEMU virtual machine with 20 GiB RAM to get an example of how it takes to compile when memory is not a restriction. We then checked the compile-time with 8 GiB RAM, no or a 20 GiB hard drive swap and various zram configurations. 8 GiB memory is far less than Chromium requires to compile which means that the system did a lot of swapping throughout the build process.

Compiling Chromium without a HDD backed swap device in addition to a zram swap with a 8 GiB system RAM limitation failed with an out-of-memory error. Chromium can simply not be compiled using the new default Fedora 33 swap device configuration on machines with 8 GiB RAM or less so the Fedora 33 out-of-the-box configuration gets a fail. Having a long compile, or some other long-running task, die instead of finishing after swapping for a while is annoying so we do not recommend using a zram swap device as a substitute for a disk swap like Fedora 33 defaults to doing. A zram swap is better used as an addition to a disk-backed swap.

The results of our test were not as we expected. A 2 GiB zram swap drive had a very small benefit, a 1 GiB zram swap had an even smaller benefit while a larger 4 GiB zram swap resulted in a huge and very notable performance penalty. We used zstd compression in all the tests because it results in vastly better compression ratios than the default lzo-rle algorithm.

Chromium 85 Compile times
Ryzen 2600, 6 Cores/Threads (QEMU VM), 8 GiB RAM
RAM ZRAM SWAP Size Compression Disk SWAP Total compile time ZRAM "Benefit"
20G None 20G 4h 30m (270m)
8G None 20G 5h 33m (333m) Baseline
8G 4G None FAIL
8G 1G zstd 20G 5h 26m (326m) -7m
8G 2G zstd 20G 5h 19m (319m) -14m
8G 2G lzo-rle 20G 5h 24m (324) -9m
8G 4G zstd 20G 6h 21m (381m) +48m
8G 4G lzo-rle 20G 6h 3m (363m) +30m

Ramlets never learn

The difference between the QEMU virtual machine run with enough RAM and the best scenario with a zram swap device is huge. Having enough RAM to begin with is clearly better than trying to solve a severe lack of RAM by using a compressed-memory swap.

Chromium Compile In VM With 8 GiB memory and 2GiB zram swap-01.jpg
Fedora 32 compiling Chromium in a QEMU virtual machine. Memory is constrained to 8 GiB and zram is configured on a 2 GiB zstd-compressed drive. Notice how vmstat, in the lower window, is showing a lot of data being read from and written to swap.

It does seem a bit odd that a 2 GiB zram swap performs better than a 1 GiB zram swap yet a 4 GiB zram swap is far worse than having none at all. The difference between a 2 and 4 GiB zram is, in this case, huge. The obvious question is why. There could be other factors at play here. Remember, the above tests are done in a virtual machine. Disk swap may not be what it is on a real machine since the underlying host will do a lot of caching. The above tests say something but they are sort-of comparing a zram swap in a virtual machine with a memory-cached swap. There is no such thing as a uncompressed-memory-cached swap on real hardware but there can be if you are renting a cloud provider instance. A look at real hardware swapping to a real, and really slow, HDD may provide a useful comparison.

The Weakest Hardware Money Could Buy In 2014

White Box With AMD-5350-APU.jpg
This fine white case from Swedish Högdata is the pinnacle of Swedish computer case innovation. It currently houses a AMD Athlon 5350 Quad-Core APU with a max speed of a whopping 2.05 GHz. It has 8 GiB RAM and a 250 GiB HDD of unknown origin.

The AMD AM1 quad-core Athlon 5350 APU was not exactly powerful when it was launched in 2014. It is a rather week APU with four slow cores with a top clock speed of 2.05 GHz. It doesn't get faster if you pair it with 8 GiB DDR3 1600 MHz RAM and a 250 GiB Seagate Barracuda 7200 RPM HDD from who knows when. Compiling Chromium 85 on such a system is a futile exercise and a complete waste of time. We did so over and over again anyway, just because.

Chromium 85 Compile times
Athlon 5350 APU, 8 (7) GiB RAM
RAM zram swap zswap (cache) compression time difference
7 GiB none none none 29h 9m (1748m48.292s) Baseline
7 GiB GiB none lzo-rle 29h 14m 1753m57.703s +5 minutes
7 GiB 1 GiB none zstd 30h 23m (1823m56.761s) +1 hours 15 minutes
7 GiB 2 GiB none zstd 28h 46m (1725m45.160s) -23 minutes
7 GiB 3.5 GiB
(50% of physical RAM)
none lzo-rle 27h 25m (1645m13.886s) -1 hour 44 minutes
7 GiB 3.5 GiB
(50% of physical RAM)
none zstd 27h 28m (1647m58.215s) -1 hour 41 minutes
7 GiB none 10% (700 MiB)
zbud
zstd 29h m4hm (1782m4.096s) + 33 minutes
7 GiB none 10% (700 MiB)
z3fold
zstd 30h 18m (1818m52.867s) +1 hour 10 minutes
7 GiB none 10% (700 MiB)
zsmalloc
zstd 28h 43m (1723m9.704s) -26 minutes
7 GiB none 20% (1400 MiB)
z3fold
zstd 30h 18m (1818m55.137s) +1 hour 10 minutes

There is no "enough RAM" baseline to be had on this machine. It's got 8 GiB RAM. Well, actually, it's 7 since 1 GiB is reserved for the GPU. We could have reduced the memory reserved by the GPU in BIOS but we thought it would be better to give the compile slightly less memory in order to put more pressure on the Chromium 85 compilation process. ninja launches one thread per core, and this machine's only got four of them.

The results on the AMD Athlon 5350, real hardware, are the exact opposite of what the QEMU virtual machine test showed: More zram swap is better. Adding a 1 GiB zram swap increased the compilation time by over an hour. Using a 2 GiB zram swap helped, but not by much. 23 minutes isn't all that when you're looking at one day and six hours of total compile time.

A 3.5 GiB zstd compressed zram swap, that's half the size of the machines available memory, shaved almost two hours off the 27.5 hour long Chromium 85 compile. More zram swap is apparently better on real hardware (in this particular test, anyway).

We can only speculate as to why a 1 GiB zram swap results in a performance penalty on real hardware when it provides a benefit in a virtual machine. The same goes for having a 3.5 or 4 GiB zram swap, bigger resulted in the best result on the AMD Athlon 5350 and the worst result in the QEMU virtual machine.

The results are what they are, we'll leave it to you to try to make any sense of them.

How does this translate to other every-day workloads? We don't really know. Compiling Chromium over and over under the exact same constraints on either the same virtual machine or the same hardware is an easy way to produce repeatable results. Measuring what happens when you run Chromium, or Firefox, and you open too many tabs.. isn't as easy.

The Linux kernel zram module is not new, it has been around for quite some time. You can use it to create a compressed RAM swap device on any Linux distribution as long as it isn't ancient. Just don't make it too big and use it as an addition to, not a substitute for, a disk-backed swap device.

0.00
(0 votes)

avatar

Anonymous user #1

one month ago
Score 0++
Very interestring article. Thank you for your work!
avatar

Anonymous user #2

one month ago
Score 0++

For using compressed memory alongside disk-backed swap, zswap should work better than zram, because it avoids the priority inversion problem. With zram+swap, the stuff that gets swapped first (which is more stale) goes into the zram, and the stuff that gets swapped later (which is more active) goes onto the disk.

The downsides are:

1. It doesn't work without a disk-backed swap.

2. The compression ratio isn't quite as good, because you're stuck with the zbud or z3fold allocators, which max out at 2-fold and 3-fold compression respectively. Zram uses zsmalloc.
Add your comment
LinuxReviews welcomes all comments. If you do not want to be anonymous, register or log in. It is free.