Linux Kernel 5.5 Will Not Fix The Frequent Intel GPU Hangs In Recent Kernels

From LinuxReviews
Jump to navigationJump to search
Tux.png

Linux users running machines with Intel integrated graphics have been struggling with frequent system hangs and other problems caused by a buggy i915 kernel module for Intel iGPUs for quite some time. 5.3 series kernels went from being completely useless to problematic as of 5.3.14 while 5.4 series kernels remain utterly broken. Several fixes attempting solve some of the more common problems with Intel graphics chips have been merged into the Linux Kernel mainline git tree the last few days. Problems with frequent hangs remain and it looks like Linux Kernel 5.5 will be as problematic as previous kernels for those using Intel integrated graphics.

written by 林慧 (Wai Lin). published 2020-01-12last edited 2020-09-14

Intel-DG1.jpg
Intel DG1 dedicated graphics card prototype launched at CES 2020.

The freedesktop issue tracker for Intel graphics is riddled with complains from an increasing amount of frustrated users who experience GPU hangs and total system freezes due to bugs in the i915 kernel driver for Intel graphics chips. The woes begun with kernel 5.1 and they have only gotten worse, not better, with each new 5.x series kernel release. One of the more scandalous bugs where the i915 would drop writes to pages it did not own was fixed in kernel 5.3.14 but other problems remain in 5.3-series kernels and new ones were introduced in 5.4 series kernels.

The bug reports are piling up:

There has been dozens and dozens of other bug reports filed. Most of those have been closed as being duplicate of bug #673 with little to no effort to clarify if they are related or just exhibit the same kinds of hangs and problems.

The high number of problems reported on the issue tracker during the last few months may not reflect the real number of affected systems and users. The Intel GPU bug tracker is subject to freedesktop organization's authoritarian and draconian 白左 "Code of Conduct" which applies to posts on their bug tracker and everything else posted anywhere on the Internet. People with a basic understanding of right and wrong will naturally refuse to participate under those terms. The new gitlab issue tracker freedesktop recently uses requires JavaScript which makes it hard to use that site and it ensures that searching using ctrl+f is impossible. This may turn another group of privacy-aware users away.

From Bad To Worse

Users of 5.4 series kernels report that it is getting worse, not better, with each minor version.

"I can confirm that it is now much worse with 5.4.10-arch1-1. It usually froze for me once every 2 days (I think with 5.4.8-arch1-1), but today it froze 5 times already (and it's 3pm where I am). Just browser, discord, spotify and arduino IDE."

zjeffer

"It does seem to have (subjectively) gotten worse with 5.4.10-arch1-1somehow, even light loads (browser) trigger the condition now."

Yann

"Problem still exists on latest arch with plain 5.4.10-arch1-1 I experience a crash approx once per day."

Tom Schlenkhoff

"I used to have the same issue where I received Resetting rcs0 for hang on rcs0 warnings but it rarely happens recently. Instead, my laptop sometimes completely freezes and requires hard reboot, so no log can be recovered. This usually happens after I unplug the HDMI cable connected to a monitor. I don't know if this information is useful."

Junnan Zhang

Firmware Woes

Some of the problems with the i915 kernel module are related to Intel's firmware which is causing issues with older kernels as far back as 4.19. Bugs like i915 8086:5917 subsystem 1028:0817 System suspend hang when i915/kbl_dmc_ver1_*.bin installed (933) describe issues where removing the firmware is the only known solution.

What firmware is loaded is indicated in the kernel ring buffer (viewable with dmesg) with a message like:

[drm] Finished loading DMC firmware i915/bxt_dmc_ver1_07.bin (v1.7)
Initialized i915 1.6.0 20191101 for 0000:00:02.0 on minor 0

The i915 module's use of binary blob firmware means that a kernel which has worked fine on a machine could suddenly have major issues if a system upgrade as updated the firmware files (provided by the linux-firmware package).

It is possible to avoid the Intel iGPU firmware by eradicating the i915 modules firmware folder /lib/firmware/i915/. The i915 module will give a warning if the firmware is missing:

i915 0000:00:02.0: Direct firmware load for i915/bxt_dmc_ver1_07.bin failed with error -2
i915 0000:00:02.0: Failed to load DMC firmware i915/bxt_dmc_ver1_07.bin. Disabling runtime power management.
i915 0000:00:02.0: DMC firmware homepage: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915

Intel's "FIRMWARE page states that

"DMC provides additional graphics low-power idle states. It provides capability to save and restore display registers across these low-power states independently from the OS/Kernel."

Intel's "FIRMWARE page
as of January 11th, 2020

Temperatures on older Intel chips are, oddly, lower without the firmware which should provide "lower-power idle states".

Skylake and newer processors require GuC/HuC firmware for certain video decoding features. Older chips do not.

Not loading the Intel firmware should not be seen as a solution to anything. It is a factor one should be aware of: A given kernel version will behave differently depending on the firmware version being loaded. Avoiding the firmware should only be seen as a temporary solution if it is known to avoid machine-specific problems.

High Temperatures

Hangs are not the only problem with the last few major Linux kernel versions. High temperatures are also a problem.

Temperature problems are somewhat related to HD Graphics 620: VAAPI performs poorly (956) which outlines how Intel iGPUs run at housefire temperatures during video playback.

Kernel 5.5 Will Solve Some Issues But Problems Remain

Acer Swift SFS113-31 20191130 074743.jpg
The Acer Swift SF113-31 has a Intel "Apollo Lake" Goldmount N4200 SoC with a iGPU using the i915 kernel module. The latest kernels do not provide a problem-free experience on that notebook computer.

Bug i915 0000:00:02.0: GPU HANG: ecode 7:0:0x00000000, hang on rcs0 (446) was closed with git commit "drm/i915/gt: Do not restore invalid RS state. NULL pointer dereference in i915_active_acquire since Linux 5.4 (827) is fixed by commit "drm/i915: Hold reference to intel_frontbuffer as we track activity. Those are steps in the right direction.

However, Intel employees are very quick to mark any and all bugs related to GPU hangs as being duplicate of GPU hang on transition to idle (673). A close-up inspection of several bugs who are supposedly duplicates appear to be more unique cases. It is hard to say exactly how many different unsolved issues there still are with Intel's i915 kernel GPU driver. Bug 673 remains open.

Cryptic-intel-message.jpg
"[drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}": The current state of Intel's i915 kernel driver. What did they mean by this?

5.5.0-rc5-Hyemii-00257-gac61145a725a - git as of January 11th - appears to be slight better than 5.3 and 5.4 series kernels but it is far from problem-free and random system freeze remain a problem. It is therefore almost guaranteed that the final 5.5 Linux kernel will have most of the same frustrating problems previous kernels have had. That's a bit sad, it is not like those who are stuck with a Intel-powered laptop can work around the problems with Intel's i915 kernel driver by sticking a AMD GPU in it. There simply isn't room.

Perhaps Linux Kernel 5.6 will work fine with Intel iGPUs. Perhaps not. It is impossible to guess before the merge window, which has yet to open, closes. The only thing we can say for sure is that Kernel 5.5 will NOT be great for Intel iGPU users.

Kemonomimi rabbit.svg
Updated: Kernel 5.5 rc7 appears to be problem-free on low-powered Intel chips with the kernel parameters intel_idle.max_cstate=1 i915.enable_dc=0. You may not need both. Goldmount chips will only need i915.enable_dc=0, Baytrail chips need intel_idle.max_cstate=1. Some chips need both. See Intel graphics for additional tips.


0.00
(0 votes)

avatar

Anonymous user #1

8 months ago
Score 0++

After trying to install several linux distro's (they all frooze) I found the solution below posted on the web. It got rid of freezing on my ubuntu laptop and my debian desktop.

Recent Linux kernels and Intel Bay Trail CPUs are not really good friends. A nasty bug has been around for several years and no permanent fix has been found yet. But, there is a workaround that prevents frequent crashes as explained here. You can also apply it on your Asterisk PBX whether it is on a physical Bay Trail based machine or on a VirtualBox virtual machine running on a Bay Trail based host. Simply edit the file /etc/default/grub with your favorite editor as below: nano /etc/default/grub and inside that file modify the value of the variable called GRUB_CMDLINE_LINUX_DEFAULT, so that it includes the parameters "intel_idle.max_cstate=1" and "consoleblank=0". The first is to ease up the Bay Trail bug issue, the second is to prevent the monitor from blanking. After editing the line might look like this: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_idle.max_cstate=1 consoleblank=0 acpi=force" Then, save the file and exit. Next, run the command: sudo update-grub to regenerate the grub configuration file updated with your new line parameters. You could verify the result of the proper update by checking the content of the file /boot/grub/grub.cfg. Finally reboot your computer so that the new configuration is put to use. Hope this helps improving the reliability of your FreePBX/Asterisk system.

© Copyright 1991-2019 – cilicia.us & The Cilician Gazette – All rights reserved
avatar

Anonymous user #3

5 months ago
Score 0++
'consoleblank=0' seems to help my Intel J3710 CPU, fingers crossed...
avatar

Anonymous user #2

8 months ago
Score 0++
Intel - driving more users back to 4.x kernels, AMD or other OSes with their kernel regressions, lol.
avatar

Anonymous user #5

2 months ago
Score 0++
same issue on kernel 4.15.0-112!
avatar

Anonymous user #1

8 months ago
Score 0++

My Tumbleweed got update last Monday to kernel 5.5.1 Linux xxxx 5.5.1-1-default #1 SMP Tue Feb 4 06:56:24 UTC 2020 (1d61c83) x86_64 x86_64 x86_64 GNU/Linux

Since then, i had two days without frozen KDE after i had problems almost every day since several weeks. Most time (or always ?) in combination using VMs in VirtualBox.

I keep cross the fingers ...
avatar

Anonymous user #2

7 months ago
Score 0++
Still works proper for 2 weeks now, after update to 5.5.1 on Tumbleweed.
avatar

Anonymous user #1

8 months ago
Score 0++
Constant system freezing made me replace the "linux" kernel arch package by "linux-lts" which was using the 4.19 kernel. Btw, linux-lts was just updated to 5.4 some days ago, what brought the freezing back. Just downgraded to prior version (4.19.101-1) and everything is ok. A "linux-lts419" is also available at AUR.
avatar

Anonymous user #3

7 months ago
Score 0++
Had freeze issues with a Bay Trail laptop. Now with an i7 8550. Both software issues. I am pretty confident my next laptop comes Intel free.
avatar

Anonymous user #3

7 months ago
Score 0++

Intel Celeron G3900 HD510. Пользовался Fedora 31, когда ядро обновилось с 5.3 на 5.4 начались зависания системы намертво, потом погиб загрузчик. Пришлось перейти на Debian 10 с ядром 4.19 проблемы закончились.

Intel Celeron G3900 HD510. Fedora 31 used it, when the kernel was updated from 5.3 to 5.4, system freezes started tightly, then the bootloader died. I had to upgrade to Debian 10 with kernel 4.19 the problems were over.
avatar

Anonymous user #1

7 months ago
Score 0++
Thinking that LTS kernels should be very stable, I installed 5.4 a week after it was released. With the fourth freeze, enough was enough: went back to 4.19 under Manjaro.
avatar

Gbaconniere

6 months ago
Score 0++

Thanks for this great article!

I referenced it Ubuntu Bug 1868551 Screen freezes : NULL pointer dereference i915_active_acquire since Linux 5.4 Debian Bug #954817 linux-image-5.4.0-4-amd64: NULL pointer dereference in i915_active_acquire since Linux 5.4

Debian Bug #949369 i915: kernel crash in i915_active_acquire()
avatar

Anonymous user #4

6 months ago
Score 0++

i915.enable_dc=0 and intel_idle.max_cstate=1 cannot help on Debian linux-image-5.4.0-4-amd64/5.4.19-1,

just install buster kernel linux-image-4.19.0-8-amd64/4.19.98-1
avatar

Anonymous user #4

6 months ago
Score 0++
linux-drm-tip fixes the problem entirely, but i have absolutely no idea which one of the hundreds of commits actually fixes it. also, i still can't get a straight answer as to how many of the drm-tip commits are going to make it into 5.7.
avatar

Anonymous user #1

6 months ago
Score 0++
5.5 (>5.5.12) is more stable than 5.4 with Intel Core i7-10510U CPU (Comet Lake)
avatar

Anonymous user #4

5 months ago
Score 0++

5.7-rc3 seems to have fixed most issues for me.

i7-8550U, Intel UDD 620, openSUSE Leap 15.2
avatar

Anonymous user #5

5 months ago
Score 0++
Apparently the latest updates to the 4.19 kernel are adding bugs, my computer has frozen 2 times. That is little but before it did not happen, I do not know if the best option is not to update the kernel any more or try 5.6
avatar

Anonymous user #5

5 months ago
Score 0++
I just stuck with my 2013-era Radeon. never had any trouble.
avatar

Anonymous user #3

5 months ago
Score 0++
with the last update the same thing started happening in kernel 4.19
avatar

Anonymous user #1

5 months ago
Score 0++
The problem is still not solved I'm stuck on 5.2.9
avatar

Anonymous user #4

4 months ago
Score 0++
Still not solved. Stuck on 5.6.15.
avatar

Anonymous user #5

3 months ago
Score 0++
Stuck on 5.6.19: discord seems to trigger freezes very often. Optimus when shutting down nvidia card too.
avatar

Anonymous user #1

3 months ago
Score 0++
Problem not solved, the version that works better for me is 5.6.4-050604.202004131234, but it still freezes on certain programs specially when using scrolling features (like on Chrome for example)
avatar

Anonymous user #1

3 months ago
Score 0++
Problem still here...
avatar

Anonymous user #1

3 months ago
Score 0++

On my Fedora 32 system, I solved this problem temporarily (5.6.19-300 kernel) by doing the following as root: 1) cd /usr/lib 2) mkdir removed_firmwre 3) mv firmware/i915 removed_firmware 4) (crossing my fingers) shutdown -r now The reboot went smoothly and the computer has been running problem-free for a few days. Fedora updated the kernel to 5.7.6; but I haven't put the firmware back yet. I might wait for a

firmware update.
avatar

Anonymous user #6

2 months ago
Score 0++

I've been getting a compositor issue with 5.7.14 on my 2 lower end systems (Cherrytrail & Baytrail) while the same Devuan Beowulf install seems to work fine on a not-so-recent-anymore i3-based HP laptop: https://unix...th-linux-5-7

Devuan's own 5.6 kernel package seems to work fine, but I haven't yet been able to run any of the machines for a longer period of time under 5.x (they do fine under 4.19.118).
avatar

Anonymous user #7

one month ago
Score 0++

Same here, without any i915 firmwares without any firmwares, gpu hang again and again Seems NOT a firmware bug (or: not only), but looks like a planned obsolescence

(see https://bugz...i?id=1843274 sor détails)
avatar

Anonymous user #8

one month ago
Score 0++
Still waiting for a fix, any news someone ?
avatar

Anonymous user #9

29 days ago
Score 0++

I thought i was the only one with this problem (Linux Mint with Kernel 5.4.0.47.51) Going to Kernel 4 seemed to have solved the issue.

Trying now GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_idle.max_cstate=1 consoleblank=0 acpi=force" with Kernel 5.4. Fingers crossed
avatar

Anonymous user #10

28 days ago
Score 0++
Nope, hung itself up yesterday
avatar

Anonymous user #11

5 days ago
Score 0++

After much time having this problem, today after one week can say (time of tests)... I can say I have a "new" computer. It's working very well, this problem seems to have disappeared.

My current configuration:

- OS: GNU/Debian Testing - Kernel version: 5.8.0

- i915 Firmware version: kbl_dmc_ver1_04.bin
avatar

Anonymous user #12

3 days ago
Score 0++
What have you done ? Just install 5.8 ?
avatar

Anonymous user #12

3 days ago
Score 0++
I will try linux zen 5.9 and report if it was a success (from linux 5.2.9)
avatar

Anonymous user #12

3 days ago
Score 0++

RESULT:

It still freeze on 5.9.1 WTF !!!
Add your comment
LinuxReviews welcomes all comments. If you do not want to be anonymous, register or log in. It is free.