Bug 1213787 - Plasma desktop is entirely grey with 64+GB of RAM
Summary: Plasma desktop is entirely grey with 64+GB of RAM
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: x86-64 openSUSE Tumbleweed
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: openSUSE Kernel Bugs
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-30 20:17 UTC by W
Modified: 2024-06-25 17:52 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Photo of the screen when bug occurs (466.51 KB, image/jpeg)
2023-07-30 20:17 UTC, W
Details
dmesg and hwinfo output (425.32 KB, application/x-gzip)
2023-08-02 00:02 UTC, W
Details

Note You need to log in before you can comment on or make changes to this bug.
Description W 2023-07-30 20:17:20 UTC
Created attachment 868516 [details]
Photo of the screen when bug occurs

Problem: OpenSuse Tumbleweed boots to an entirely grey desktop when using KDE Plasma with 64GB or more of RAM.

How to reproduce: Install 64GB or more of RAM, boot normally.

What I expect to happen: the normal KDE Plasma desktop displays.

What happens: other than the cursor, which appears normal, the entire screen displays grey. The cursor will change to, e.g., the text entry cursor when over the location of a text entry field, but that field will not be visible. It's like everything loads properly, then is covered with a layer of grey. When using ctrl-alt-F1/F2 to switch to and from a GUI, the correct desktop will display briefly.


Other notes:
- Problem occurs with 64 or 96GB of RAM installed.
- Problem occurs both with the version of KDE installed from the installation media snapshot from July 14, 2023, as well as the version of KDE installed after running sudo zypper update today (July 30, 2023).
- System is AMD-based, and using the integrated GPU of a Ryzen 9 7900.
- The console works as expected.
- Problem does not occur with XFCE as the desktop environment.
- Problem does not occur when using Manjaro's KDE Plasma installation.
- Problem does not occur with 16, 32, or 48GB of memory installed in the same system, using any combination of modules available to me.
- There are no errors (lines with EE) in Xorg.0.log

Screenshot attached is from when Night Mode is on. During the day the red tint is gone and it is grey.
Comment 1 Fabian Vogt 2023-07-31 06:16:05 UTC
Sounds like some resource conflicts with the graphics driver. Reassigning to kernel.
Comment 2 Takashi Iwai 2023-07-31 07:33:57 UTC
Is it a regression from the early releases?

You can try the older kernels from my archives in OBS, e.g. home:tiwai:kernel:6.3, home:tiwai:kernel:6.2, etc.
Comment 3 W 2023-08-01 01:03:33 UTC
(In reply to Takashi Iwai from comment #2)
> Is it a regression from the early releases?
> 
> You can try the older kernels from my archives in OBS, e.g.
> home:tiwai:kernel:6.3, home:tiwai:kernel:6.2, etc.

It appears to be. kernel-default 6.1.12 does not exhibit the problem, but all of 6.2.12, 6.3.9, and 6.4.2 do.

Manjaro was running 6.1 (and, again, did not have the problem).

OpenSuse Leap 15.5 with 5.14 also has the problem.
Comment 4 Takashi Iwai 2023-08-01 05:27:13 UTC
OK, if it starts from 6.1, maybe the earlier releases of Leap 15.4 kernel might still work.  Could you verify the behavior with the older kernels of Leap 15.4?
The GM kernel is found at
  http://download.opensuse.org/distribution/leap/15.4/repo/oss/
and update kernels are found at
  http://download.opensuse.org/update/leap/15.4/sle/

Also, please give the hwinfo output from the working case, and dmesg outputs from both working and non-working cases, too.
Comment 5 W 2023-08-02 00:02:40 UTC
Created attachment 868588 [details]
dmesg and hwinfo output
Comment 6 W 2023-08-02 00:03:59 UTC
I've attached the dmesg output from both 6.4 (non-working), and 6.1 (working); as well as the hwinfo output from 6.1. Apologies for making this two comments.

One thing I noticed is that in the *working* dmesg output, the following line appears four times:
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

I have not yet tried the kernel from Leap 15.4. I'll update here when I do.
Comment 7 W 2023-08-02 05:16:54 UTC
Using the kernel from Leap 15.4 (5.14.21-150400.24.69.1), Leap 15.5 does not exhibit the grey screen, but doesn't appear to detect the monitor correctly: resolution is 1024x768 rather than 3840x2160. This is in both X and the terminals.
Comment 8 Takashi Iwai 2023-08-02 07:20:43 UTC
Then the old Leap 15.4 kernel worked just casually because it failed to get the full resolution by some reason...

Could you report this to the upstream, gitlab.freedsktop.org Issues (DRM/AMD)?
Comment 9 Takashi Iwai 2023-08-02 13:40:00 UTC
Also, the problem is gone when you reduce the memory size with mem= boot option, e.g. mem=32G, with the recent 6.4.x kernel?  Then please get the dmesg outputs from both working and non-working cases.  It might help to identify the problem.

In addition, it's worth to boot with "drm.debug=0x1e" and "log_buf_len=16M" options until the screen show up, and get the dmesg outputs in both working and non-working cases, too.
Comment 10 Takashi Iwai 2023-08-12 07:56:51 UTC
I guess it's the bug the upstream fixed right now by the commit 08fffa74d9772d9538338be3f304006c94dde6f0
  drm/amd: Disable S/G for APUs when 64GB or more host memory

I'm building a kernel with the fix backport in OBS home:tiwai:bsc1213787 repo.
After the build finishes (takes an hour or so), the test package will be available at
  http://download.opensuse.org/repositories/home:/tiwai:/bsc1213787/standard/
Please give it a try.
Comment 11 W 2023-08-12 18:18:57 UTC
(In reply to Takashi Iwai from comment #10)
> I guess it's the bug the upstream fixed right now by the commit
> 08fffa74d9772d9538338be3f304006c94dde6f0
>   drm/amd: Disable S/G for APUs when 64GB or more host memory
> 
> I'm building a kernel with the fix backport in OBS home:tiwai:bsc1213787
> repo.
> After the build finishes (takes an hour or so), the test package will be
> available at
>   http://download.opensuse.org/repositories/home:/tiwai:/bsc1213787/standard/
> Please give it a try.

Sorry for the delay, I haven't had time to work on this until today.

mem=32G did indeed work to prevent the bug with the 6.4.x kernel, but of course halved the available memory.

The new kernel you built also results in a correct display at idle. I saw in the patch notes that it may result in problems under memory pressure, but it sounds like there's work ongoing on the problem.

I wonder why the problem doesn't occur when using XFCE window manager if it's a kernel/display driver issue at root.

Thank you very much for your help. Do you want to collect any more information about this from my system?
Comment 12 W 2023-08-12 18:30:28 UTC
Two notes:
- 6.4.x in my previous comment should be 6.4.2, I did not try mem=32G with 6.4.9
- With kernel 6.4.2, if I use a monitor with 1920x1080 resolution rather than 3840x2160, the problem does not occur with 64GB of memory.
Comment 13 Takashi Iwai 2023-08-13 07:06:04 UTC
(In reply to W from comment #11)
> I wonder why the problem doesn't occur when using XFCE window manager if
> it's a kernel/display driver issue at root.

It's a good question.  Might it be the way it's rendering, e.g. with composite or not?

In anyway, it's confirmed that the problem is tied with the commit, and the workaround is present.  Let's close for now.