Bug 1178474 - [i915] Display freezes for up to tens of seconds with kernel 5.9
[i915] Display freezes for up to tens of seconds with kernel 5.9
Status: RESOLVED WORKSFORME
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
x86-64 openSUSE Tumbleweed
: P5 - None : Major (vote)
: ---
Assigned To: openSUSE Kernel Bugs
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-11-05 13:33 UTC by Bengt Gördén
Modified: 2022-02-15 08:30 UTC (History)
15 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Dmesg after the issue occured (497.48 KB, text/plain)
2020-11-25 09:15 UTC, Thomas Zimmermann
Details
dmesg output with drm.debug=0xe shortly after the problem appears (183.92 KB, text/plain)
2020-11-25 09:35 UTC, Bengt Gördén
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bengt Gördén 2020-11-05 13:33:28 UTC
Since I zypper duped to TW 20201030 on my laptop[1] I've got freezes for some seconds up to tens of seconds. Nothing conclusive in the logs and nothing weird in dmesg. I suspect the kernel as I rebooted with the old kernel, 5.8.12-1-default, and everything worked as before.

A weird work around from the freezes is that if I (in plasma) hit alt-tab the system goes back to being responsive, but only until next short freeze. Still nothing conclusive in the logs.

[1] https://linux-hardware.org/?probe=543c444fdf
Comment 1 Thomas Zimmermann 2020-11-23 13:05:14 UTC
I have this bug since TW 20201119. I have Gnome 3.28 plus Kernel 5.9

> Linux linux-uq9g 5.9.8-2-default #1 SMP Thu Nov 12 07:43:32 UTC 2020 (ea93937) x86_64 x86_64 x86_64 GNU/Linux
Comment 2 Thomas Zimmermann 2020-11-23 13:08:10 UTC
I have a T450 with Intel chipset:

00:02.0 VGA compatible controller: Intel Corporation HD Graphics 5500 (rev 09) (prog-if 00 [VGA controller])
        Subsystem: Lenovo Device 5036
        Flags: bus master, fast devsel, latency 0, IRQ 47
        Memory at e0000000 (64-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=512M]
        I/O ports at 3000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 2
        Capabilities: [a4] PCI Advanced Features
        Kernel driver in use: i915
        Kernel modules: i915

I'll try to boot with the old kernel 5.8.
Comment 3 Thomas Zimmermann 2020-11-23 13:10:20 UTC
Quoting myself from the email to opensuse-factory

> In regular intervals, gvim stops updating the screen and freezes. I have to flip to separate workspace, which un-freezes gvim. All my key strokes were processes, as I can see the characters after unfreezing. Also, sometimes the overview screen doesn't update the small workspace previews.

> I suspect this is a bug in GNOME shell 3.38, which maybe doesn't compose the updated windows correctly.

I suspected a GNOME issue, although the bug report points in a different direction.
Comment 4 Thomas Zimmermann 2020-11-25 09:15:43 UTC
Created attachment 843857 [details]
Dmesg after the issue occured

This is the dmesg output with drm.debug=0x1f enabled, shortly after the problem occurred.
Comment 5 Bengt Gördén 2020-11-25 09:35:27 UTC
Created attachment 843858 [details]
dmesg output with drm.debug=0xe shortly after the problem appears

This is a dmesg (booted with drm.debug=0xe) shortly after the problem appears.
Comment 6 Bengt Gördén 2020-11-30 19:01:21 UTC
Some new findings. I saw an email about fedora 33 working with 5.9 and so tried a live version with Rawhide and 5.10.0. No lockup. One difference was Wayland. So I rebooted my Opensuse TW with kernel 5.9.8 and switched to Wayland in SDDM. No lockups for 1h last night and no lockups for 1h this morning.

Anyone got some suggestions how to proceed with the fault isolation with kernel 5.9, i915 and X?
Comment 7 Thomas Zimmermann 2020-12-01 08:41:31 UTC
(In reply to Bengt Gördén from comment #6)
> Some new findings. I saw an email about fedora 33 working with 5.9 and so
> tried a live version with Rawhide and 5.10.0. No lockup. One difference was
> Wayland. So I rebooted my Opensuse TW with kernel 5.9.8 and switched to
> Wayland in SDDM. No lockups for 1h last night and no lockups for 1h this
> morning.

Great find. So it's probably in how the Xorg driver interacts with the kernel.

> Anyone got some suggestions how to proceed with the fault isolation with
> kernel 5.9, i915 and X?

Ideally, you could bisect the issue with the upstream kernel. I'm trying this myself, but it goes really slow as I can only reproduce the bug on my daily work machine.

What I found so far it that the issue got introduced somewhere between v5.8 and v5.9. Although others reported problems with v5.8, I can't reproduce them.
Comment 8 Thomas Zimmermann 2020-12-01 09:18:58 UTC
(In reply to Thomas Zimmermann from comment #7)
> What I found so far it that the issue got introduced somewhere between v5.8
> and v5.9. Although others reported problems with v5.8, I can't reproduce
> them.

I think I just saw one of these errors on v5.8. It's just a lot less often. I have to verify. :/
Comment 9 Bengt Gördén 2020-12-01 12:50:48 UTC
Yesterday I upgraded to TW 20201129 and there where still lockups. I googled around and found this link:

https://linuxreviews.org/Linux_5.9_Is_Released_With_New_Drivers,_Improved_AMD_GPU_Support,_And_Support_The_x86-64_FSGSBASE_CPU_Instructions#Intel's_Also_In_The_GPU_Game_Now

I started reading and there is ahci.mobile_lpm_policy=1 and intel_idle.max_cstate=1. I rebooted with those statements. Now I've been running 5.9.10 for 1.5h and no lockups. I'm not sure if both statements are needed. Will try later when everything seems stable enough.
Comment 10 Bengt Gördén 2020-12-01 16:09:20 UTC
(In reply to Bengt Gördén from comment #9)
> Will try later when everything seems stable enough.

Unfortunately it isn't. Uptime is 4h51m and I've had one short lockup so far. It's much better than before but not gone.
Comment 11 Thomas Zimmermann 2020-12-02 10:57:40 UTC
Found it in v5.7
Comment 12 Martin Wilck 2020-12-02 12:40:05 UTC
(In reply to Bengt Gördén from comment #9)

> I started reading and there is ahci.mobile_lpm_policy=1 and
> intel_idle.max_cstate=1.

What does this do to your battery life?
Comment 13 Bengt Gördén 2020-12-02 13:26:51 UTC
(In reply to Martin Wilck from comment #12)
> (In reply to Bengt Gördén from comment #9)
> 
> > I started reading and there is ahci.mobile_lpm_policy=1 and
> > intel_idle.max_cstate=1.
> 
> What does this do to your battery life?

Not sure right now but I will measure it after the weekend. Need to have this laptop going without a hiccup for a few days. But I suspect it's going to decrease the battery life. At least what I've read so far.

I ran 5.9.10 over night and when I woke up this morning it was completely frozen. So the boot statements didn't help at all except it seems to have put off the inevitably for some time. Last night I ran 5.9.10 for about 6h with just one temporary lockup (around 10 sec).
Comment 14 Mark Draheim 2020-12-04 18:49:10 UTC
I wondered if I am the only one experiencing this. I have no hard lockups but I have keystrokes appearing after pauses and YT video freezing and unfreezing at regular intervals while audio keeps playing. The delayed showing of keys I typed is irritating without end.

This started with kernel 5.9. Luckily, I pinned kernel 5.8.15, that shows none of these problems, and am now waiting for a fix.

Laptop is a Lenovo Thinkbook 13s-IML
Comment 15 Patrik Jakobsson 2020-12-06 16:49:54 UTC
As requested in bsc#1179092

Can you try setting i915.enable_dc=0 and i915.enable_psr=0. Try them one at a time so we know which one (if any) helps.
Comment 16 Mark Draheim 2020-12-06 17:26:17 UTC
(In reply to Patrik Jakobsson from comment #15)

> Can you try setting i915.enable_dc=0 and i915.enable_psr=0. Try them one at
> a time so we know which one (if any) helps.

for me, neither has any positive effect. Disabling dc made video playback stuttering a tiny tad less annoying but it came with visual artefacts like old window decorations blinking in. Disabling psr got me a hard lockup 30 seconds into the session.

On a general note, I had screen lockups occasionally with kernels before 5.9 on this laptop i7-10th but not on an old Skylake laptop with integrated intel graphics. But the video stuttering definitely started with kernel 5.9. It shows in video playback freeze framing every few seconds. On the desktop I have window-fade-on-close which quite often yields the window stopping at half transparent and then closing as in the fading started, then screen does not update, then window is gone. It is probably the same with the typing keys lag, ie screen is simply not updating for a second, then the chars I typed appear at once.
Comment 17 Takashi Iwai 2020-12-06 19:27:52 UTC
If the bug is related with the power managing of i915: there is the update of i915 firmware files in the last week, and it might be worth to try.
The updated kernel-firmware-* packages are found in OBS home:tiwai:branches:Kernel:HEAD/kernel-firmware repo,
  http://download.opensuse.org/repositories/home:/tiwai:/branches:/Kernel:/HEAD/standard/

Can anyone test this?
Comment 18 Thomas Zimmermann 2020-12-07 08:11:49 UTC
Hi

(In reply to Mark Draheim from comment #14)
> I wondered if I am the only one experiencing this. I have no hard lockups
> but I have keystrokes appearing after pauses and YT video freezing and
> unfreezing at regular intervals while audio keeps playing. The delayed
> showing of keys I typed is irritating without end.
> 
> This started with kernel 5.9. Luckily, I pinned kernel 5.8.15, that shows
> none of these problems, and am now waiting for a fix.

I did some testing of older kernels and was able to see this issue with kernels at least as old as 5.7. I seems to have increased in frequency with 5.9, though.
Comment 19 Takashi Iwai 2020-12-07 08:19:18 UTC
Then we might be looking for multiple issues that appear as the similar behavior: the screen lockup.  But one case is (supposedly happening often after the system resume) a complete screen freeze, while another one is a temporary freeze until the key stroke or some other action.  But who knows...
Comment 20 Mark Draheim 2020-12-09 21:44:18 UTC
for lack of ideas, I tried the old disable-vsync. And it does make a difference for me. I had plasma compositor vsync set to auto. Now switched to never. YT video now plays without freezeframing every few seconds. Typing seems fine, too. What puzzles me is that the micro freezes are not present with kernel 5.8 but they do show with every 5.9 kernel when vsync is set to auto. Anyway, worksforme and I am happy for now.
Comment 21 Takashi Iwai 2020-12-17 09:12:20 UTC
Could you check whether 5.10.x kernel still shows the problem?  Try the kernel in OBS Kernel:stable repo, for example.

If the problem persists, try the kernel in OBS home:tiwai:kernel:drm-tip repo.  It's a built from drm-tip git branch and updated daily.  This is the code usually upstream devs ask at first.
Comment 22 Bengt Gördén 2020-12-17 14:19:52 UTC
(In reply to Takashi Iwai from comment #21)
> Could you check whether 5.10.x kernel still shows the problem?  Try the
> kernel in OBS Kernel:stable repo, for example.

Just rebooted (6 minutes ago) my machine (lenovo x1 yoga with i915 with KDE/plasma) and so far so good. I'll test this during the day/evening and report back.

# uname -a
Linux linux-jrxm 5.10.1-2.g8f3d468-default #1 SMP Tue Dec 15 06:32:57 UTC 2020 (8f3d468) x86_64 x86_64 x86_64 GNU/Linux

# uptime
 15:18:32  up   0:06,  18 users,  load average: 0.63, 0.61, 0.34
Comment 23 Bengt Gördén 2020-12-17 15:00:38 UTC
(In reply to Takashi Iwai from comment #21) 
> If the problem persists, 

The problem started to appear again after 17-18 minutes and kept on coming quite often.

> try the kernel in OBS home:tiwai:kernel:drm-tip
> repo.  It's a built from drm-tip git branch and updated daily.  This is the
> code usually upstream devs ask at first.

# uname -a
Linux linux-jrxm 5.10.0-5.g33cc490-vanilla #1 SMP Thu Dec 17 00:00:08 UTC 2020 (33cc490) x86_64 x86_64 x86_64 GNU/Linux

# uptime
 15.55.02  uppe  0.05,  18 användare,  medellast: 0,39, 0,80, 0,44


Unfortunately same problem with this kernel. One thing though. The kernel in home:tiwai:kernel:drm-tip was slightly older than the one in OBS Kernel:stable, as you can see from my "uname".
Comment 24 Bengt Gördén 2020-12-17 15:11:17 UTC
(In reply to Takashi Iwai from comment #19)
> Then we might be looking for multiple issues that appear as the similar
> behavior: the screen lockup.  But one case is (supposedly happening often
> after the system resume) a complete screen freeze, while another one is a
> temporary freeze until the key stroke or some other action.  But who knows...

I should have ad that my problem is as I described in the bug, temporary freezes. I've actually had one complete freeze during the first tests I did but it hasn't appeared again so I just thought it was a fluke.
Comment 25 Takashi Iwai 2020-12-17 15:25:16 UTC
(In reply to Bengt Gördén from comment #23)
> (In reply to Takashi Iwai from comment #21) 
> > If the problem persists, 
> 
> The problem started to appear again after 17-18 minutes and kept on coming
> quite often.

OK, then this problem still persists.  I have hoped that 5.10 works better (at least it works stably on my machines), but it's not for all cases, as it seems.

> > try the kernel in OBS home:tiwai:kernel:drm-tip
> > repo.  It's a built from drm-tip git branch and updated daily.  This is the
> > code usually upstream devs ask at first.
> 
> # uname -a
> Linux linux-jrxm 5.10.0-5.g33cc490-vanilla #1 SMP Thu Dec 17 00:00:08 UTC
> 2020 (33cc490) x86_64 x86_64 x86_64 GNU/Linux
> 
> # uptime
>  15.55.02  uppe  0.05,  18 användare,  medellast: 0,39, 0,80, 0,44
> 
> 
> Unfortunately same problem with this kernel. One thing though. The kernel in
> home:tiwai:kernel:drm-tip was slightly older than the one in OBS
> Kernel:stable, as you can see from my "uname".

This is intentional.  It's based on the 5.10.0 without stable updates, and it's fixed so until the drm-tip branch it self moves out of it.  The kernel is provided primarily for testing the graphics stuff, so all other bugs (and fixes) are ignored.
Comment 26 Thomas Zimmermann 2021-02-08 09:18:19 UTC
Some more information and a workaround:

I tried to boot with i915.mitigations=off in hope this would fix the issue. But this doesn't work.

What *does* work is to use Gnome in Wayland mode. During login, pick 'Gnome' instead of 'Gnome under X11'. I strongly suspect that the issue is in the interaction of kernel driver and X server.
Comment 27 Miroslav Beneš 2022-02-11 12:47:25 UTC
It has been a while. Is there anything new? TW has v5.16 kernel now, so the situation might be better.
Comment 28 Bengt Gördén 2022-02-14 21:11:32 UTC
(In reply to Miroslav Beneš from comment #27)
> It has been a while. Is there anything new? TW has v5.16 kernel now, so the
> situation might be better.

Sorry about this. This can be closed as far as I'm concerned. It works since some kernels back. Don't know what made it work.
Comment 29 Miroslav Beneš 2022-02-15 08:30:13 UTC
Thanks for the feedback. Closing.

Thomas, I suppose it also works for you. If not, please reopen.