Bug 1179092 - [i915] Screen occasionally goes blank, sometimes staying that way until I press a key
[i915] Screen occasionally goes blank, sometimes staying that way until I pre...
Status: NEW
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
Other Other
: P5 - None : Normal with 5 votes (vote)
: ---
Assigned To: openSUSE Kernel Bugs
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-11-23 10:12 UTC by Tristan Miller
Modified: 2021-07-29 08:25 UTC (History)
13 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
psychonaut: needinfo? (tiwai)


Attachments
dmesg.log from shortly after screen blanks (996.31 KB, text/plain)
2020-11-26 11:48 UTC, Tristan Miller
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tristan Miller 2020-11-23 10:12:10 UTC
I'm running openSUSE Tumbleweed with an Intel UHD Graphics 630 (Desktop) and the modprobe i915 driver.  I use Plasma as my desktop environment, and this had been working without significant problems since I first set up the machine around April 2019.  But for the last month or two, I've been experiencing a problem whereby my entire desktop (both screens on a dual-monitor setup) go completely black.  I can't reliably reproduce the problem, though it happens several times a day, usually (but not always) when I'm using the mouse to drag-select.  Sometimes the display stays blank for only a second, but sometimes it freezes that way until I press a key on the keyboard.

On the assumption that this is a graphics driver issue, here's the output of hwinfo --gfxcard:

20: PCI 02.0: 0300 VGA compatible controller (VGA)              
  [Created at pci.386]
  Unique ID: _Znp.gLFUXSVXjU5
  SysFS ID: /devices/pci0000:00/0000:00:02.0
  SysFS BusID: 0000:00:02.0
  Hardware Class: graphics card
  Device Name: "Onboard - Video"
  Model: "Intel UHD Graphics 630 (Desktop)"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x3e92 "UHD Graphics 630 (Desktop)"
  SubVendor: pci 0x1043 "ASUSTeK Computer Inc."
  SubDevice: pci 0x8694 
  Driver: "i915"
  Driver Modules: "i915"
  Memory Range: 0xa0000000-0xa0ffffff (rw,non-prefetchable)
  Memory Range: 0x90000000-0x9fffffff (ro,non-prefetchable)
  I/O Ports: 0x4000-0x403f (rw)
  Memory Range: 0x000c0000-0x000dffff (rw,non-prefetchable,disabled)
  IRQ: 125 (372520 events)
  Module Alias: "pci:v00008086d00003E92sv00001043sd00008694bc03sc00i00"
  Driver Info #0:
    Driver Status: i915 is active
    Driver Activation Cmd: "modprobe i915"
  Config Status: cfg=new, avail=yes, need=no, active=unknown

Primary display adapter: #20
Comment 1 Stefan Dirsch 2020-11-23 11:55:26 UTC
Thanks. This sounds like a kernel regression. Since kernel 5.9 or already 5.8. I suggest to set

  drm.debug=0xe

as kernel boot parameter and then run 'dmesg' command after this issue happened, so we may see the culprit.
Comment 2 Takashi Iwai 2020-11-23 13:25:17 UTC
Might be relevant with bug 1178474?
Comment 3 Michael Hirmke 2020-11-23 19:48:47 UTC
Same problem here.

 $ inxi -G
Graphics:  Device-1: Intel UHD Graphics 620 driver: i915 v: kernel 
           Display: x11 server: X.Org 1.20.9 driver: modesetting resolution: 2560x1440~60Hz 
           OpenGL: renderer: Mesa DRI Intel UHD Graphics 620 (KBL GT2) v: 4.6 Mesa 20.2.2
Comment 4 Thomas Zimmermann 2020-11-24 13:28:21 UTC
All reports here and in bug 1178474 are against Intel.

(In reply to Stefan Dirsch from comment #1)
> Thanks. This sounds like a kernel regression. Since kernel 5.9 or already

In the other bug, it works with 5.8.

> 5.8. I suggest to set
> 
>   drm.debug=0xe
> 
> as kernel boot parameter and then run 'dmesg' command after this issue
> happened, so we may see the culprit.
Comment 5 Tristan Miller 2020-11-24 13:46:59 UTC
(In reply to Stefan Dirsch from comment #1)
> Thanks. This sounds like a kernel regression. Since kernel 5.9 or already
> 5.8. I suggest to set
> 
>   drm.debug=0xe
> 
> as kernel boot parameter and then run 'dmesg' command after this issue
> happened, so we may see the culprit.

I've got this parameter set and will report back once with dmesg output once the problem recurs.  (In the meantime, maybe Michael wants to do the same.)
Comment 6 Thomas Zimmermann 2020-11-25 09:17:37 UTC
(In reply to Stefan Dirsch from comment #1)
> Thanks. This sounds like a kernel regression. Since kernel 5.9 or already
> 5.8. I suggest to set
> 
>   drm.debug=0xe
> 
> as kernel boot parameter and then run 'dmesg' command after this issue
> happened, so we may see the culprit.

Assuming that bug 1178474 is related, please see https://bugzilla.suse.com/show_bug.cgi?id=1178474#c4
Comment 7 Felix Miata 2020-11-25 18:30:18 UTC
If Intel UHD Graphics 630 8086:3e92 is Haswell, this might be https://gitlab.freedesktop.org/drm/intel/-/issues/2024

# inxi -Gxx | grep -A3 "ip ID"
           v: kernel bus ID: 00:02.0 chip ID: 8086:041e
           Display: server: X.Org 1.20.3 driver: modesetting unloaded: fbdev,vesa alternate: intel resolution: 1920x1200~60Hz
           s-dpi: 120
           OpenGL: renderer: Mesa DRI Intel Haswell v: 4.5 Mesa 18.3.2 compat-v: 3.0 direct render: Yes
Comment 8 Tristan Miller 2020-11-26 11:48:33 UTC
Created attachment 843898 [details]
dmesg.log from shortly after screen blanks

Attached is the output of dmesg from a couple seconds after I experienced the bug.  (In this case, my screen went blank for about a second, but it came back to normal without me having to press a key.) drm.debug=0xe had been set as a kernel boot parameter.
Comment 9 Thomas Zimmermann 2020-11-27 08:14:15 UTC
(In reply to Tristan Miller from comment #8)
> Created attachment 843898 [details]
> dmesg.log from shortly after screen blanks
> 
> Attached is the output of dmesg from a couple seconds after I experienced
> the bug.  (In this case, my screen went blank for about a second, but it
> came back to normal without me having to press a key.) drm.debug=0xe had
> been set as a kernel boot parameter.

Just like in the other bug's dmesg, there's nothing really sticking out. :/
Comment 10 Bengt Gördén 2020-11-30 09:29:17 UTC
Some new findings. I saw an email about fedora 33 working with 5.9 and so tried a live version with Rawhide and 5.10.0. No lockup. One difference was Wayland. So I rebooted my Opensuse TW with kernel 5.9.8 and switched to Wayland in SDDM. No lockups for 1h last night and no lockups for 1h this morning.

Anyone got some suggestions how to proceed with the fault isolation with kernel 5.9, i915 and X?
Comment 11 Bengt Gördén 2020-11-30 19:00:13 UTC
(In reply to Bengt Gördén from comment #10)

Sorry. This was meant to go in the bug that I filed (#1178474). Will update to.
Comment 12 Patrik Jakobsson 2020-12-04 14:55:53 UTC
I suspect this is related to either PSR or DC sleep states.

Can you try setting i915.enable_dc=0 and i915.enable_psr=0. Try them one at a time so we know which one (if any) helps.
Comment 13 Tristan Miller 2020-12-04 19:52:38 UTC
(In reply to Patrik Jakobsson from comment #12)
> Can you try setting i915.enable_dc=0 and i915.enable_psr=0. Try them one at
> a time so we know which one (if any) helps.

OK, will do.  I'll report back next week.
Comment 14 Tristan Miller 2020-12-10 12:46:55 UTC
I can now report that i915.enable_dc=0 does not solve the problem.  I'll try again with i915.enable_psr=0 though as I will be out of the office, I might not be able to report whether this works until next week.
Comment 15 Tristan Miller 2020-12-14 11:18:10 UTC
Patrik, i915.enable_psr=0 doesn't work either.

Anything else I can try?
Comment 16 Takashi Iwai 2020-12-17 09:11:41 UTC
Could you check whether 5.10.x kernel still shows the problem?  Try the kernel in OBS Kernel:stable repo, for example.

If the problem persists, try the kernel in OBS home:tiwai:kernel:drm-tip repo.  It's a built from drm-tip git branch and updated daily.  This is the code usually upstream devs ask at first.
Comment 17 Tristan Miller 2020-12-21 20:12:16 UTC
(In reply to Takashi Iwai from comment #16)
> Could you check whether 5.10.x kernel still shows the problem?  Try the
> kernel in OBS Kernel:stable repo, for example.

Confirming problem still occurs with 5.10.1-2.g8f3d468-default from Kernel:stable.

> If the problem persists, try the kernel in OBS home:tiwai:kernel:drm-tip
> repo.  It's a built from drm-tip git branch and updated daily.  This is the
> code usually upstream devs ask at first.

I'll try this next.
Comment 18 Tristan Miller 2020-12-30 09:38:37 UTC
The problem is still reproducible with kernel-vanilla-5.10.0-9.1.g5079778.x86_64 from home:tiwai:kernel:drm-tip.

Takashi and Patrick, anything else you want me to try?
Comment 19 Tristan Miller 2021-01-04 12:32:16 UTC
(In reply to Tristan Miller from comment #18)
> The problem is still reproducible with
> kernel-vanilla-5.10.0-9.1.g5079778.x86_64 from home:tiwai:kernel:drm-tip.

Also reproducible with kernel-vanilla-5.11.rc1-7.1.g444acc7.x86_64.
Comment 20 Gerhard Lorbeer 2021-01-05 14:34:44 UTC
Same Hardware (UHD Graphics 630). One UHD Monitor attached via DP.
Kernel 5.10.3-1.2-X86_64 no problems.
Since update today to 5.10.4-1.1-X86_64 Monitor goes black after a time.
No reaction on mouse or keyboard.
Same behaviour in console-mode (Strg-Alt-F1).

My workaround to boot in old kernel 5.10.3
Comment 21 Takashi Iwai 2021-01-05 14:37:15 UTC
5.10.4 kernel got another problem from a bug in HD-audio Intel HDMI stuff (see bug 1180563), and you might have hit this one.  The fix for HDMI audio is on its way.
Comment 22 Gerhard Lorbeer 2021-01-13 11:17:51 UTC
(In reply to Gerhard Lorbeer from comment #20)
> Same Hardware (UHD Graphics 630). One UHD Monitor attached via DP.
> Kernel 5.10.3-1.2-X86_64 no problems.
> Since update today to 5.10.4-1.1-X86_64 Monitor goes black after a time.
> No reaction on mouse or keyboard.
> Same behaviour in console-mode (Strg-Alt-F1).
> 
> My workaround to boot in old kernel 5.10.3

kernel 5.10.5 fixes my problem
thanks
Comment 23 Tristan Miller 2021-01-13 13:51:08 UTC
(In reply to Gerhard Lorbeer from comment #22)
> kernel 5.10.5 fixes my problem

But not mine. :(
Comment 24 Tristan Miller 2021-02-01 09:58:00 UTC
Is there an upstream bug report for this issue?  (I assume it isn't openSUSE-specific, since it's reproducible with the vanilla kernel packages.)
Comment 25 Thomas Zimmermann 2021-02-08 09:20:13 UTC
FYI, the very similar issue 1178474 can be avoided by using Gnome in Wayland mode. I suspect the issue in how kernel and X server interact with each other.
Comment 26 Tristan Miller 2021-07-29 08:25:34 UTC
Problem is still reproducible with kernel-default-5.13.3.  Takashi, is there anything else you want me to try?