Bug 1176646 - Latest Kernel has i915 GPU Hang
Latest Kernel has i915 GPU Hang
Status: NEW
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
x86-64 openSUSE Tumbleweed
: P5 - None : Critical with 6 votes (vote)
: ---
Assigned To: openSUSE Kernel Bugs
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-09-16 22:51 UTC by Emr Rec
Modified: 2022-01-19 18:41 UTC (History)
6 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
dmesg output (19.63 KB, application/x-bzip)
2020-09-16 22:51 UTC, Emr Rec
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Emr Rec 2020-09-16 22:51:03 UTC
Created attachment 841741 [details]
dmesg output

The following kernel works fine with my i915 GPU (5.6.14-1). However, the latest kernel 5.7+ (kernel-default-5.8.7-1.2.x86_64) causes my GPU to hang and it flickers and can never login. Switch back to 5.6 and it works fine. 

There have been other reports on Fedora with the same thing. https://bugzilla.redhat.com/show_bug.cgi?id=1843274

I have attached my dmesg output. I use the zap option to kill the xorg server and kdm comes up just fine most of the time. Trying to login is no good though. Also I can ctrl-alt-f2 to go on command line just fine. So it seems to be xorg with the new kernel.

[  163.236358] ------------[ cut here ]------------
[  163.236365] WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:4576 default_wake_function+0x16/0x30
[  163.236366] Modules linked in: xt_MASQUERADE xt_addrtype iptable_nat nf_nat nf_log_ipv4 nf_log_common ipt_REJECT nf_reject_ipv4 xt_state xt_LOG xt_multiport xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c af_packet iscsi_ibft iscsi_boot_sysfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter rfkill tun binfmt_misc dmi_sysfs msr crypto_simd glue_helper dm_crypt intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic coretemp ledtrig_audio kvm_intel snd_hda_codec_hdmi snd_hda_intel kvm snd_intel_dspcfg irqbypass snd_hda_codec at24 mei_hdcp iTCO_wdt snd_hda_core intel_pmc_bxt crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_vendor_support cryptd snd_hwdep pcspkr snd_pcm snd_timer fan thermal snd e1000e soundcore i2c_i801 tiny_power_button i2c_smbus button mei_me lpc_ich mei hid_generic usbhid i915 i2c_algo_bit drm_kms_helper xhci_pci xhci_pci_renesas syscopyarea sysfillrect
[  163.236389]  sysimgblt fb_sys_fops cec xhci_hcd rc_core ehci_pci ehci_hcd drm crc32c_intel usbcore video sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
[  163.236396] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.7-1-default #1 openSUSE Tumbleweed
[  163.236397] Hardware name: Gigabyte Technology Co., Ltd. B85M-Gaming 3/B85M-Gaming 3, BIOS F1 09/05/2014
[  163.236399] RIP: 0010:default_wake_function+0x16/0x30
[  163.236401] Code: e8 df 53 45 00 eb 99 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f7 c2 fe ff ff ff 75 09 48 8b 7f 08 e9 da f9 ff ff <0f> 0b 48 8b 7f 08 e9 cf f9 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
[  163.236401] RSP: 0018:ffffa51080003eb8 EFLAGS: 00010086
[  163.236402] RAX: ffffffffb0cf0e30 RBX: ffffa51080203db0 RCX: ffffa51080003ed0
[  163.236403] RDX: 00000000fffffffb RSI: 0000000000000003 RDI: ffffa51080203db0
[  163.236404] RBP: ffff980009218568 R08: 0000000000000000 R09: ffffffffc0321600
[  163.236404] R10: ffff97ffdbd86280 R11: 0000000000000001 R12: 0000000000000046
[  163.236405] R13: ffff980009218560 R14: ffffa51080003ed0 R15: ffffa51080003f60
[  163.236406] FS:  0000000000000000(0000) GS:ffff980017200000(0000) knlGS:0000000000000000
[  163.236406] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  163.236407] CR2: 00007f8ff0b07000 CR3: 000000003f80a001 CR4: 00000000000606f0
[  163.236408] Call Trace:
[  163.236409]  <IRQ>
[  163.236414]  autoremove_wake_function+0xe/0x30
[  163.236460]  __i915_sw_fence_complete+0x160/0x1c0 [i915]
[  163.236483]  dma_i915_sw_fence_wake_timer+0x2c/0x50 [i915]
[  163.236506]  signal_irq_work+0x21c/0x310 [i915]
[  163.236510]  irq_work_single+0x2c/0x40
[  163.236511]  irq_work_run_list+0x2d/0x40
[  163.236512]  irq_work_run+0x14/0x30
[  163.236514]  __sysvec_irq_work+0x2d/0xb0
[  163.236517]  asm_call_on_stack+0x12/0x20
[  163.236518]  </IRQ>
[  163.236520]  sysvec_irq_work+0x6f/0x90
[  163.236522]  asm_sysvec_irq_work+0x12/0x20
[  163.236524] RIP: 0010:cpuidle_enter_state+0xb6/0x3f0
[  163.236525] Code: 10 ee c0 4e e8 eb 52 8d ff 49 89 c7 0f 1f 44 00 00 31 ff e8 cc 69 8d ff 80 7c 24 0f 00 0f 85 d4 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 e0 01 00 00 49 63 d4 4c 2b 7c 24 10 48 8d 04 52 48
[  163.236526] RSP: 0018:ffffffffb2403e58 EFLAGS: 00000246
[  163.236527] RAX: ffff98001722e680 RBX: ffff980017239600 RCX: 000000000000001f
[  163.236527] RDX: 0000000000000000 RSI: 000000002817b731 RDI: 0000000000000000
[  163.236528] RBP: ffffffffb24f6260 R08: 0000002601a4c721 R09: 00000000000002ee
[  163.236528] R10: ffff98001722d444 R11: 00000000000005f4 R12: 0000000000000004
[  163.236529] R13: 0000000000000004 R14: 0000000000000004 R15: 0000002601a4c721
[  163.236531]  ? cpuidle_enter_state+0xa4/0x3f0
[  163.236533]  cpuidle_enter+0x29/0x40
[  163.236535]  cpuidle_idle_call+0x13f/0x210
[  163.236536]  do_idle+0x73/0xd0
[  163.236538]  cpu_startup_entry+0x19/0x20
[  163.236541]  start_kernel+0x485/0x4a4
[  163.236544]  secondary_startup_64+0xb6/0xc0
[  163.236546] ---[ end trace 5ea2057b70f281c8 ]---
[  163.236548] ------------[ cut here ]------------

[  574.899156] i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:8edcfc79, in ksplashqml [3389]
[  574.899728] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[  575.003261] i915 0000:00:02.0: [drm] ksplashqml[3389] context reset due to GPU hang
[  580.786335] i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:8edcfc79, in ksplashqml [3389]
[  580.786882] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[  580.890229] i915 0000:00:02.0: [drm] ksplashqml[3389] context reset due to GPU hang
[  583.858065] i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:8edcfc7b, in plasmashell [3461]
[  583.858294] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[  583.961652] i915 0000:00:02.0: [drm] plasmashell[3461] context reset due to GPU hang
[  586.929534] i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:8edcfc7b, in kwin_x11 [3427]
[  586.929859] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[  587.033116] i915 0000:00:02.0: [drm] kwin_x11[3427] context reset due to GPU hang
Comment 1 Emr Rec 2020-09-26 16:53:50 UTC
I tried the latest kernel 5.8.10 to no avail. I go to a virtual terminal and uninstall the latest kernel. This reverts me to "Linux moz 5.6.14-1-default #1 SMP Wed May 20 08:32:48 UTC 2020 (b0ab48a) x86_64 x86_64 x86_64 GNU/Linux", which works just fine.

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
        Subsystem: Gigabyte Technology Co., Ltd Device d000
        Flags: bus master, fast devsel, latency 0, IRQ 26
        Memory at f7800000 (64-bit, non-prefetchable) [size=4M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at f000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 2
        Capabilities: [a4] PCI Advanced Features
        Kernel driver in use: i915
        Kernel modules: i915
Comment 2 Emr Rec 2020-09-26 16:58:51 UTC
Adding more information from lspci:

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
        Subsystem: Gigabyte Technology Co., Ltd Device d000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 26
        Region 0: Memory at f7800000 (64-bit, non-prefetchable) [size=4M]
        Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at f000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee01004  Data: 4021
        Capabilities: [d0] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a4] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: i915
        Kernel modules: i915
Comment 3 Takashi Iwai 2020-09-30 16:59:38 UTC
I believe only upstream can solve it fully.  Care to report to the upstream bug tracker, e.g. gitlab.fredesktop.org issues?

It's an old chip (Haswell) and Intel people might not be interested in fixing that so much, though.
Comment 4 Felix Miata 2020-10-05 08:19:39 UTC
Could this be DE/WM or DDX related? I've yet to experience this on either of my Haswells, e.g. with TDE/TDM:

# inxi -SGay
System:
  Host: ab85m Kernel: 5.7.11-1-default x86_64 bits: 64 compiler: gcc v: 10.1.1
  parameters: BOOT_IMAGE=/boot/vmlinuz root=LABEL=redacted noresume
  ipv6.disable=1 net.ifnames=0 mitigations=auto consoleblank=0
  video=1440x900@60 3
  Desktop: Trinity R14.0.8 tk: Qt 3.5.0 info: kicker wm: Twin 3.0 dm: TDM
  Distro: openSUSE Tumbleweed 20201002
Graphics:
  Device-1: Intel Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics
  vendor: ASUSTeK driver: i915 v: kernel bus ID: 00:02.0 chip ID: 8086:0402
  Display: x11 server: X.Org 1.20.9 driver: modesetting unloaded: fbdev,vesa
  alternate: intel display ID: :0 screens: 1
  Screen-1: 0 s-res: 2560x1440 s-dpi: 120 s-size: 541x304mm (21.3x12.0")
  s-diag: 621mm (24.4")
  Monitor-1: DP-1 res: 2560x1440 hz: 60 dpi: 109 size: 598x336mm (23.5x13.2")
  diag: 686mm (27")
  OpenGL: renderer: Mesa DRI Intel HD Graphics (HSW GT1) v: 4.5 Mesa 20.1.8
  compat-v: 3.0 direct render: Yes

or KDE3/KDM3:
# inxi -Gay
Graphics:
  Device-1: Intel 4th Generation Core Processor Family Integrated Graphics
  vendor: Micro-Star MSI driver: i915 v: kernel bus ID: 00:02.0 chip ID: 8086:041e...

On both I use only the modesetting DDX.
Comment 5 Emr Rec 2020-10-10 16:34:45 UTC
Please see the following redhat bugzilla. I have reverted my kernel back to 5.6 many times after zypper dup's because of this. I also added a kernel history in zypp.conf to 10 in case I miss it one day. :)

http://bugzilla.opensuse.org/show_bug.cgi?id=1176646
Comment 6 Emr Rec 2020-10-10 16:35:31 UTC
(In reply to Emr Rec from comment #5)
> Please see the following redhat bugzilla. I have reverted my kernel back to
> 5.6 many times after zypper dup's because of this. I also added a kernel
> history in zypp.conf to 10 in case I miss it one day. :)
> 
> http://bugzilla.opensuse.org/show_bug.cgi?id=1176646

Woops wrong link: https://bugzilla.redhat.com/show_bug.cgi?id=1843274
Comment 7 Miroslav Beneš 2022-01-07 13:16:23 UTC
Forgotten bug...

Emr, has the situation improved in TW? There has been a lot of development on i915 front since then?
Comment 8 k ts 2022-01-19 18:41:40 UTC
(In reply to Miroslav Beneš from comment #7)
> Forgotten bug...
> 
> Emr, has the situation improved in TW? There has been a lot of development
> on i915 front since then?

What's your cpu model?  I have Pentium g3250 Haswell and it's still there.
https://gitlab.freedesktop.org/drm/intel/-/issues/2024
https://gitlab.freedesktop.org/drm/intel/-/issues/3123