Linux Kernel 5.5 Will Not Fix The Frequent Intel GPU Hangs In Recent Kernels

From LinuxReviews
Jump to navigationJump to search
Tux.png

Linux users running machines with Intel integrated graphics have been struggling with frequent system hangs and other problems caused by a buggy i915 kernel module for Intel iGPUs for quite some time. 5.3 series kernels went from being completely useless to problematic as of 5.3.14 while 5.4 series kernels remain utterly broken. Several fixes attempting solve some of the more common problems with Intel graphics chips have been merged into the Linux Kernel mainline git tree the last few days. Problems with frequent hangs remain and it looks like Linux Kernel 5.5 will be as problematic as previous kernels for those using Intel integrated graphics.

written by 林慧 (Wai Lin) 2020-01-12 - last edited 2020-12-05. © CC BY

Intel-DG1.jpg
Intel DG1 dedicated graphics card prototype launched at CES 2020.

The freedesktop issue tracker for Intel graphics is riddled with complaints from an increasing amount of frustrated users who experience GPU hangs and total system freezes due to bugs in the i915 kernel driver for Intel graphics chips. The woes begun with kernel 5.1 and they have only gotten worse, not better, with each new 5.x series kernel release. One of the more scandalous bugs where the i915 would drop writes to pages it did not own was fixed in kernel 5.3.14 but other problems remain in 5.3-series kernels and new ones were introduced in 5.4 series kernels.

The bug reports are piling up:

There has been dozens and dozens of other bug reports filed. Most of those have been closed as being duplicate of bug #673 with little to no effort to clarify if they are related or just exhibit the same kinds of hangs and problems.

The high number of problems reported on the issue tracker during the last few months may not reflect the real number of affected systems and users. The Intel GPU bug tracker is subject to freedesktop organization's authoritarian and draconian 白左 "Code of Conduct" which applies to posts on their bug tracker and everything else posted anywhere on the Internet. People with a basic understanding of right and wrong will naturally refuse to participate under those terms. The new gitlab issue tracker freedesktop recently uses requires JavaScript which makes it hard to use that site and it ensures that searching using ctrl+f is impossible. This may turn another group of privacy-aware users away.

From Bad To Worse

Users of 5.4 series kernels report that it is getting worse, not better, with each minor version.

"I can confirm that it is now much worse with 5.4.10-arch1-1. It usually froze for me once every 2 days (I think with 5.4.8-arch1-1), but today it froze 5 times already (and it's 3pm where I am). Just browser, discord, spotify and arduino IDE."

zjeffer

"It does seem to have (subjectively) gotten worse with 5.4.10-arch1-1somehow, even light loads (browser) trigger the condition now."

Yann

"Problem still exists on latest arch with plain 5.4.10-arch1-1 I experience a crash approx once per day."

Tom Schlenkhoff

"I used to have the same issue where I received Resetting rcs0 for hang on rcs0 warnings but it rarely happens recently. Instead, my laptop sometimes completely freezes and requires hard reboot, so no log can be recovered. This usually happens after I unplug the HDMI cable connected to a monitor. I don't know if this information is useful."

Junnan Zhang

Firmware Woes

Some of the problems with the i915 kernel module are related to Intel's firmware which is causing issues with older kernels as far back as 4.19. Bugs like i915 8086:5917 subsystem 1028:0817 System suspend hang when i915/kbl_dmc_ver1_*.bin installed (933) describe issues where removing the firmware is the only known solution.

What firmware is loaded is indicated in the kernel ring buffer (viewable with dmesg) with a message like:

[drm] Finished loading DMC firmware i915/bxt_dmc_ver1_07.bin (v1.7)
Initialized i915 1.6.0 20191101 for 0000:00:02.0 on minor 0

The i915 module's use of binary blob firmware means that a kernel which has worked fine on a machine could suddenly have major issues if a system upgrade as updated the firmware files (provided by the linux-firmware package).

It is possible to avoid the Intel iGPU firmware by eradicating the i915 modules firmware folder /lib/firmware/i915/. The i915 module will give a warning if the firmware is missing:

i915 0000:00:02.0: Direct firmware load for i915/bxt_dmc_ver1_07.bin failed with error -2
i915 0000:00:02.0: Failed to load DMC firmware i915/bxt_dmc_ver1_07.bin. Disabling runtime power management.
i915 0000:00:02.0: DMC firmware homepage: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915

Intel's "FIRMWARE page states that

"DMC provides additional graphics low-power idle states. It provides capability to save and restore display registers across these low-power states independently from the OS/Kernel."

Intel's "FIRMWARE page
as of January 11th, 2020

Temperatures on older Intel chips are, oddly, lower without the firmware which should provide "lower-power idle states".

Skylake and newer processors require GuC/HuC firmware for certain video decoding features. Older chips do not.

Not loading the Intel firmware should not be seen as a solution to anything. It is a factor one should be aware of: A given kernel version will behave differently depending on the firmware version being loaded. Avoiding the firmware should only be seen as a temporary solution if it is known to avoid machine-specific problems.

High Temperatures

Hangs are not the only problem with the last few major Linux kernel versions. High temperatures are also a problem.

Temperature problems are somewhat related to HD Graphics 620: VAAPI performs poorly (956) which outlines how Intel iGPUs run at housefire temperatures during video playback.

Kernel 5.5 Will Solve Some Issues But Problems Remain

Acer Swift SFS113-31 20191130 074743.jpg
The Acer Swift SF113-31 has a Intel "Apollo Lake" Goldmount N4200 SoC with a iGPU using the i915 kernel module. The latest kernels do not provide a problem-free experience on that notebook computer.

Bug i915 0000:00:02.0: GPU HANG: ecode 7:0:0x00000000, hang on rcs0 (446) was closed with git commit "drm/i915/gt: Do not restore invalid RS state. NULL pointer dereference in i915_active_acquire since Linux 5.4 (827) is fixed by commit "drm/i915: Hold reference to intel_frontbuffer as we track activity. Those are steps in the right direction.

However, Intel employees are very quick to mark any and all bugs related to GPU hangs as being duplicate of GPU hang on transition to idle (673). A close-up inspection of several bugs who are supposedly duplicates appear to be more unique cases. It is hard to say exactly how many different unsolved issues there still are with Intel's i915 kernel GPU driver. Bug 673 remains open.

Cryptic-intel-message.jpg
"[drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}": The current state of Intel's i915 kernel driver. What did they mean by this?

5.5.0-rc5-Hyemii-00257-gac61145a725a - git as of January 11th - appears to be slight better than 5.3 and 5.4 series kernels but it is far from problem-free and random system freeze remain a problem. It is therefore almost guaranteed that the final 5.5 Linux kernel will have most of the same frustrating problems previous kernels have had. That's a bit sad, it is not like those who are stuck with a Intel-powered laptop can work around the problems with Intel's i915 kernel driver by sticking a AMD GPU in it. There simply isn't room.

Perhaps Linux Kernel 5.6 will work fine with Intel iGPUs. Perhaps not. It is impossible to guess before the merge window, which has yet to open, closes. The only thing we can say for sure is that Kernel 5.5 will NOT be great for Intel iGPU users.

Kemonomimi rabbit.svg
Update: Kernel 5.5 rc7 appears to be problem-free on low-powered Intel chips with the kernel parameters intel_idle.max_cstate=1 i915.enable_dc=0. You may not need both. Goldmount chips will only need i915.enable_dc=0, Baytrail chips need intel_idle.max_cstate=1. Some chips need both. See Intel graphics for additional tips.


5.00
(one vote)


avatar

Anonymous (1fdb390b)

49 months ago
Score 0++

Thanks for this great article!

I referenced it Ubuntu Bug 1868551 Screen freezes : NULL pointer dereference i915_active_acquire since Linux 5.4 Debian Bug #954817 linux-image-5.4.0-4-amd64: NULL pointer dereference in i915_active_acquire since Linux 5.4

Debian Bug #949369 i915: kernel crash in i915_active_acquire()
avatar

TmTFx

39 months ago
Score 0++

My laptop has an Intel(R) Pentium(R) CPU N3700, I experienced a lot of hangs/freezes so I added to grub these 2 parameters and seems it won't freeze again:

processor.max_cstate=1 intel_idle.max_cstate=0

(system Fedora 33)
avatar

Anonymous (774a614d19)

33 months ago
Score 0
I tried that on Debian, Dell 11-3162, Intel(R) Pentium(R) CPU N3710 @ 1.60GHz, and did not notice a difference :(. I added to the one that starts with 20_c inside /ect/grub.d. Did I add it to the wrong place or is it ust helpful to your P.C.
avatar

Anonymous (cf7ecb02)

35 months ago
Score 0
where are all the comments ?
avatar

Anonymous (cf7ecb02)

35 months ago
Score 0
We were discussing it, why everything disappeared ?!! The problem is still not solved for a lot of people !
avatar

Anonymous (774a614d19)

33 months ago
Score 0
RIGHT!!!!!!!!!!!!!!!!!!!!!!!
avatar

Anonymous (7e0a1a60)

35 months ago
Score 0
Yep, this is still an issue. It is slightly better with the current 5.12.7 release. However, the problems might reappear!
avatar

Anonymous (774a6c2bb2)

33 months ago
Score 0

My Dell 11-3162 is buggy af. I found I had the "modesetting" display driver. Adding a directory and bash file in the right spot changed the driver to "intel" which made the device useable; but still very buggy (over half my screen is random black squares right now)

processor is Intel(R) Pentium(R) CPU N3710 @ 1.60GHz
avatar

Anonymous (774a614d19)

33 months ago
Score 0
It didn't fix mine completely (maybe bc touch screen or something, maybe bug, idk) but it did help a lot and may fix it for others. I'm talking about Arids reply to my thread in debian forums http://forum...8718#p738718
avatar

AbstractApproach

33 months ago
Score 0++
I am 774a614d19 btw, made an account
avatar

Anonymous (b865270314)

30 months ago
Score 0
Just for anyone passing through, this problem may have been solved in Kernel 5.11. I was having this problem with kernel 5.4 on two HP Stream laptops from 2014 (not sure of the specific Pentium chipset off the top of my head) and they are both now working fine. On one machine I have started using Fedora 34 - supplied with 5.11, but now up to kernel 5.13 - and on the other I manually upgraded Ubuntu 20.04 to kernel 5.11. There have been no freezes since.
avatar

Anonymous (677751bf07)

24 months ago
Score 0
They issues were mostly sovled with 5.11, but came back in 5.12! :((((
avatar

Anonymous (fb60bd67a7)

20 months ago
Score 0
intel is shit
avatar

Anonymous (f4e0b1ebe1)

20 months ago
Score 0

look up "intel me" or "intel management engine" for more reasons to hate this company. mental outlaw has a good video on this (not him, btw, just a fan of his channel).

although amd isn't much better in this regard, they have their own "amd psp" (amd platform "security" processor (security, yeah right)).

who knows what apple and google and those other cpu manufacturers include in their own shit.


on a similar note, it would be pretty cool if one of the LinuxReviews staff could create an article covering intel me/amd psp and other surveillance chips, and also an article on the new M1 macs that everyone's all excited about (including asahi linux - the only reason to use those M1 macs anyway loooool).
Add your comment
LinuxReviews welcomes all comments. If you do not want to be anonymous, register or log in. It is free.