Bug 1188954 - black screen because dm tries to start before /dev/dri/card0 has been created during init
Summary: black screen because dm tries to start before /dev/dri/card0 has been created...
Status: IN_PROGRESS
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Current
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 1206316
  Show dependency treegraph
 
Reported: 2021-08-01 07:00 UTC by Felix Miata
Modified: 2024-03-27 22:41 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Xorg.0.log and dmesg (110.42 KB, text/plain)
2021-08-01 07:00 UTC, Felix Miata
Details
Xorg.0.log booted without radeon.cik_support=0 amdgpu.cik_support=1 (49.23 KB, text/plain)
2021-08-01 21:28 UTC, Felix Miata
Details
Xorg.0.log & dmesg after enabling IOMMU in BIOS; with radeon.cik_support=0 amdgpu.cik_support=1; without drm.debug=0x1e log_buf_len=1M (97.86 KB, text/plain)
2021-08-01 21:28 UTC, Felix Miata
Details
Xorg.0.log.old; journalctl -b; dmesg; Xorg.0.log (518.04 KB, text/plain)
2021-08-05 01:53 UTC, Felix Miata
Details
Xorg.0.log.old; dmesg; journalctl -b (936.70 KB, text/plain)
2021-08-07 07:05 UTC, Felix Miata
Details
systemd-analyze blame (30.38 KB, text/plain)
2022-10-24 03:28 UTC, Felix Miata
Details
first and second and third Xorg.0.logs from a fresh boot (fail=1, fail=2, success=3) (44.48 KB, text/plain)
2022-12-21 05:19 UTC, Felix Miata
Details
Xorg.0.log from TW20230412 w/ SDDM/Plasma (33.46 KB, text/plain)
2023-04-14 05:52 UTC, Felix Miata
Details
Xorg.0.log from the initial post-boot X failure on KBL GT2 Slowroll host ab250 (4.67 KB, text/plain)
2023-12-14 03:31 UTC, Felix Miata
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Miata 2021-08-01 07:00:27 UTC
Created attachment 851460 [details]
Xorg.0.log and dmesg

Initial summary:
black screen on initial X open on Kaveri Radeon R7, /dev/dri/card0: No such file or directory with radeon.cik_support=0 amdgpu.cik_support=1

To reproduce:
1-boot with included on kernel cmdline radeon.cik_support=0 amdgpu.cik_support=1
2-wait for X to fail to start
3-login on a tty
4-systemctl restart xdm

Actual behavior:
1-X fails to start, finding no /dev/dri/card0, leaving login prompt on tty1 on display screen
2-X starts normally using amdgpu DDX driver

Expected behavior:
1-X starts normally using amdgpu DDX driver

Tested with kernels 5.12.13, 5.11.16, 5.7.11
# inxi -SGIay
System:
  Host: asa88 Kernel: 5.12.13-1-default x86_64 bits: 64 compiler: gcc
  v: 11.1.1
  parameters: BOOT_IMAGE=/boot/vmlinuz root=LABEL=tvgp07stw noresume
  ipv6.disable=1 net.ifnames=0 mitigations=auto consoleblank=0
  radeon.cik_support=0 amdgpu.cik_support=1 video=1024x768@60
  video=1440x900@60 drm.debug=0x1e log_buf_len=1M
  Console: tty pts/0 DM: TDM Distro: openSUSE Tumbleweed 20210730
Graphics:
  Device-1: AMD Kaveri [Radeon R7 Graphics] vendor: ASUSTeK driver: amdgpu
  v: kernel alternate: radeon bus-ID: 00:01.0 chip-ID: 1002:130f
  class-ID: 0300
  Display: server: X.org 1.20.12 driver: loaded: vesa
  unloaded: fbdev,modesetting alternate: ati
  Message: Advanced graphics data unavailable for root.
Info:...inxi: 3.3.06

Comments:
1-without radeon.cik_support=0 amdgpu.cik_support=1 on cmdline, X cannot be coaxed into using amdgpu DDX driver via /etc/X11/xorg.conf.d/*conf
2-without radeon.cik_support=0 amdgpu.cik_support=1, greeter startup completes (normally, on first try) using modesetting DIX driver
Comment 1 Felix Miata 2021-08-01 07:06:22 UTC
Behavior remains the same with 5.13.4 kernel, and with omission of non-essential cmdline options.
Comment 2 Stefan Dirsch 2021-08-01 08:15:48 UTC
Not sure why you want to enable Sea Islands (CIK)  support in amdgpu kernel driver. I doubt it gets sufficient testing. 

I suggest to use radeon kernel module (default). Then possibly with "radeon" DDX, but "modesetting" should be fine as well, then using Mesa driver for acceleration via Glamor. If this fails as well, we can discuss again.
Comment 3 Felix Miata 2021-08-01 21:28:02 UTC
Created attachment 851463 [details]
Xorg.0.log booted without radeon.cik_support=0 amdgpu.cik_support=1

(In reply to Stefan Dirsch from comment #2)
> Not sure why you want to enable Sea Islands (CIK)  support in amdgpu kernel
> driver. I doubt it gets sufficient testing. 

10 months ago it (A10-7850K) worked at each boot just fine, as it does now once xdm is restarted.  I have another Kaveri that still does work just fine, without need to restart xdm first thing after booting:
# cat inxi-tw20210730.txt
# pinxi -SCzy
System:
  Kernel: 5.12.13-1-default x86_64 bits: 64 Desktop: Trinity R14.0.10
  Distro: openSUSE Tumbleweed 20210730
Machine:
  Type: Desktop Mobo: ASRock model: FM2A88X Extreme6+ serial: <filter>
  UEFI: American Megatrends v: P4.20 date: 01/13/2016
CPU:
  Info: Quad Core model: AMD PRO A8-8650B R7 10 Compute Cores 4C+6G bits: 64
  type: MCP cache: L2: 2 MiB
  Speed: 1396 MHz min/max: 1400/3200 MHz Core speeds (MHz): 1: 1396 2: 1392
  3: 1397 4: 1381
# pinxi -Gazy
Graphics:
  Device-1: AMD Kaveri [Radeon R7 Graphics] vendor: ASRock driver: amdgpu
  v: kernel alternate: radeon bus-ID: 00:01.0 chip-ID: 1002:1313
  class-ID: 0300
  Display: x11 server: X.Org 1.20.12 driver: loaded: amdgpu
  unloaded: fbdev,modesetting,vesa alternate: ati display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1200 s-dpi: 120 s-size: 406x254mm (16.0x10.0")
  s-diag: 479mm (18.9")
  Monitor-1: DisplayPort-0 res: 1920x1200 hz: 60 dpi: 94
  size: 519x324mm (20.4x12.8") diag: 612mm (24.1")
  OpenGL: renderer: AMD KAVERI (DRM 3.40.0 5.12.13-1-default LLVM 12.0.1)
  v: 4.6 Mesa 21.1.5 direct render: Yes
# systemd-analyze
Startup finished in 9.750s (firmware) + 23.795s (loader) + 1.952s (kernel) + 2.480s (initrd) + 3.207s (userspace) = 41.185s
multi-user.target reached after 3.145s in userspace
# dmesg | grep amdgpu
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz root=LABEL=zd8p07stw noresume ipv6.disable=1 net.ifnames=0 mitigations=auto consoleblank=0 radeon.cik_support=0 amdgpu.cik_support=1 video=1024x768@60 video=1440x900@60 5
[    0.019636] Kernel command line: BOOT_IMAGE=/boot/vmlinuz root=LABEL=zd8p07stw noresume ipv6.disable=1 net.ifnames=0 mitigations=auto consoleblank=0 radeon.cik_support=0 amdgpu.cik_support=1 video=1024x768@60 video=1440x900@60 5
[    6.559662] [drm] amdgpu kernel modesetting enabled.
[    6.559836] amdgpu: Topology: Add APU node [0x0:0x0]
[    6.559894] fb0: switching to amdgpudrmfb from EFI VGA
[    6.560540] amdgpu 0000:00:01.0: vgaarb: deactivate vga console
[    6.560775] amdgpu 0000:00:01.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    6.577972] amdgpu 0000:00:01.0: amdgpu: Fetched VBIOS from ROM BAR
[    6.577978] amdgpu: ATOM BIOS: 113-SPEC-102
[    6.578227] amdgpu 0000:00:01.0: amdgpu: VRAM: 1024M 0x000000F400000000 - 0x000000F43FFFFFFF (1024M used)
[    6.578231] amdgpu 0000:00:01.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
[    6.578343] [drm] amdgpu: 1024M of VRAM memory ready
[    6.578347] [drm] amdgpu: 3072M of GTT memory ready.
[    6.586607] [drm] amdgpu: dpm initialized
[    6.861924] amdgpu 0000:00:01.0: amdgpu: SE 1, SH per SE 1, CU per SH 8, active_cu_number 6
[    7.037007] fbcon: amdgpudrmfb (fb0) is primary device
[    7.490778] amdgpu 0000:00:01.0: [drm] fb0: amdgpudrmfb frame buffer device
[    7.525260] [drm] Initialized amdgpu 3.40.0 20150101 for 0000:00:01.0 on minor 0

> I suggest to use radeon kernel module (default). Then possibly with "radeon"
> DDX, but "modesetting" should be fine as well, then using Mesa driver for
> acceleration via Glamor. If this fails as well, we can discuss again.

In comment #0 my last sentence would seem to have covered this. Anyway:
# inxi -SCMzy
System:
  Kernel: 5.13.4-1-default x86_64 bits: 64 Desktop: Trinity R14.0.10
  Distro: openSUSE Tumbleweed 20210730
Machine:
  Type: Desktop Mobo: ASUSTeK model: A88X-PRO v: Rev X.0x serial: <filter>
  UEFI: American Megatrends v: 2603 date: 03/10/2016
CPU:
  Info: Quad Core model: AMD A10-7850K Radeon R7 12 Compute Cores 4C+8G
  bits: 64 type: MCP cache: L2: 2 MiB
  Speed: 1689 MHz min/max: 1700/3700 MHz Core speeds (MHz): 1: 1689 2: 1700
  3: 1699 4: 1695
# inxi -Gazy
Graphics:
  Device-1: AMD Kaveri [Radeon R7 Graphics] vendor: ASUSTeK driver: radeon
  v: kernel alternate: amdgpu bus-ID: 00:01.0 chip-ID: 1002:130f
  class-ID: 0300
  Display: x11 server: X.Org 1.20.12 driver: loaded: modesetting
  unloaded: fbdev,vesa alternate: ati display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1200 s-dpi: 120 s-size: 406x254mm (16.0x10.0")
  s-diag: 479mm (18.9")
  Monitor-1: DP-1 res: 1920x1200 hz: 60 dpi: 94 size: 519x324mm (20.4x12.8")
  diag: 612mm (24.1")
  OpenGL: renderer: AMD KAVERI (DRM 2.50.0 5.13.4-1-default LLVM 12.0.1)
  v: 4.5 Mesa 21.1.5 direct render: Yes
# systemd-analyze
Startup finished in 5.901s (firmware) + 22.155s (loader) + 1.645s (kernel) + 2.238s (initrd) + 3.324s (userspace) = 35.265s
multi-user.target reached after 3.306s in userspace

I checked BIOS and found IOMMU was disabled. After enabling, dmesg got very noisy using drm.debug=0x1e log_buf_len=1M, with the following repeated about 20 times:
[    8.376985] [drm:amdgpu_atombios_encoder_dpms [amdgpu]] encoder dpms 37 to mode 3, devices 00000001, active_devices 00000000
[    8.387861] [drm:dce_v8_0_program_watermarks [amdgpu]] force priority to high
[    8.388035] [drm:dce_v8_0_program_watermarks [amdgpu]] force priority to high
[    8.388330] [drm:dce_v8_0_program_watermarks [amdgpu]] force priority to high
[    8.388484] [drm:dce_v8_0_program_watermarks [amdgpu]] force priority to high
[    8.388663] [drm:dce_v8_0_program_watermarks [amdgpu]] force priority to high
[    8.388817] [drm:dce_v8_0_program_watermarks [amdgpu]] force priority to high
Comment 4 Felix Miata 2021-08-01 21:28:11 UTC
Created attachment 851464 [details]
Xorg.0.log & dmesg after enabling IOMMU in BIOS; with radeon.cik_support=0 amdgpu.cik_support=1; without drm.debug=0x1e log_buf_len=1M

After enabling IOMMU in BIOS, no improvement was apparent. I contacted ASUS about this via browser chat. It was escalated. I'm supposed to hear back in 24-48 hours via email.
Comment 5 Stefan Dirsch 2021-08-01 21:47:12 UTC
So things are simply working with default "radeon" kernel driver. No idea what's remaining here.
Comment 6 Felix Miata 2021-08-02 04:43:46 UTC
(In reply to Stefan Dirsch from comment #5)
> So things are simply working with default "radeon" kernel driver. No idea
> what's remaining here.

The amdgpu DDX driver behaved totally as expected on TW until relatively recently. I have no recollection when it stopped behaving normally, but I'm guessing I was hoping when it first occurred it was temporary or fluke and would disappear on its own. I cannot reproduce on the same PC using Debian 10 or 11, Fedora 34 or Leap 15.3. They all work perfectly fine at the outset with amdgpu kernel driver, radeon.cik_support=0 amdgpu.cik_support=1 and the amdgpu DDX. TW works fine too, except on initial X start at boot.

There is one other openSUSE quirk, and that is that 15.3 will not load the amdgpu DDX unless explicitly directed in /etc/X11/xorg.conf.d/ via Driver "amdgpu". All others need no help from /etc/X11/xorg.con*.

There is a Fedora quirk too, which I doubt is connected, because it also happens with kernel-radeon/X-modesetting. When X starts SDDM, X immediately crashes and restarts, with this Xorg.0.log.old tail:
[    12.301] (II) UnloadModule: "libinput"
[    12.303] (WW) xf86OpenConsole: VT_ACTIVATE failed: Input/output error
[    12.303] (EE)
Fatal server error:
[    12.303] (EE) xf86OpenConsole: Switching VT failed
...
[    12.303] (WW) xf86CloseConsole: KDSETMODE failed: Input/output error
[    12.303] (WW) xf86CloseConsole: VT_GETMODE failed: Input/output error
[    12.303] (WW) xf86CloseConsole: VT_ACTIVATE failed: Input/output error
[    12.303] (EE) Server terminated with error (1). Closing log file.

Maybe it just needs more time to go away magically, or show up somewhere else. :P

I have lots of freespace on the SSD, so when I get the urge or need to do a fresh install of TW I'll see if it reproduces.
Comment 7 Stefan Dirsch 2021-08-02 11:18:54 UTC
Ok. If it fails only during initial X startup, this looks like a timing issue, i.e. kernel module is not being initialized in time before X gets started. Maybe amdgpu kernel module is missing from initrd, but radeon is (since it's the default driver), i.e. adding amdgpu to initrd may help (if it's really missing). I can't say anything about other Linux distros. They may use completely different kernel versions and patches for them.
On openSUSE amdgpu DDX is being used when "amdgpu" kernel module is being loaded. There should be no need to configure it when the package xf86-video-amdgpu is being installed. I still don't understand why you want to use "amdgpu" drvier with your hardware though.
Comment 8 Felix Miata 2021-08-02 15:24:18 UTC
(In reply to Stefan Dirsch from comment #7)
> Ok. If it fails only during initial X startup, this looks like a timing
> issue, i.e. kernel module is not being initialized in time before X gets
> started. Maybe amdgpu kernel module is missing from initrd, but radeon is
> (since it's the default driver), i.e. adding amdgpu to initrd may help (if
> it's really missing).

asa88:/boot # head -n2 /etc/os-release
NAME="openSUSE Tumbleweed"
# VERSION="20210730"
asa88:/boot # lsinitrd initrd-5.13.4-1-default | grep AMD
-rw-r--r--   1 root     root         7876 Aug  1 02:08 kernel/x86/microcode/AuthenticAMD.bin
asa88:/boot # lsinitrd initrd-5.13.4-1-default | grep -i radeon
asa88:/boot # lsinitrd initrd-5.13.4-1-default | grep -i amdgpu
asa88:/boot # lsinitrd initrd-5.12.13-1-default | grep -i amdgpu
asa88:/boot # lsinitrd initrd-5.11.16-1-default | grep -i amdgpu
asa88:/boot # lsinitrd initrd-5.10.16-1-default | grep -i amdgpu
asa88:/boot # lsinitrd initrd-5.9.14-1-default | grep -i amdgpu
asa88:/boot # lsinitrd initrd-5.8.15-1-default | grep -i amdgpu
asa88:/boot # lsinitrd initrd-5.7.11-1-default | grep -i amdgpu
asa88:/boot # 

On Intel Haswell lsinitrd /boot/initrd | grep i915 returns null too.

>  I still don't understand why you want
> to use "amdgpu" drvier with your hardware though.

I started to when I learned it was possible. Until this, I had only seen one reason not to (unique set of connector names is a nuisance). I have to wonder how much testing following refactoring the radeon gets by developers, given how old the hardware is that isn't supported by the modesetting DIX.
Comment 9 Stefan Dirsch 2021-08-02 16:25:29 UTC
I don't know why neither amdgpu, nor radeon, nor i915 being added to initrd of your system. Looks like there is no timing issue with radeon with not existing in initrd - in contrary to amdgpu driver.

"modesetting" X driver is supposed to work with any hardware with working DRM/modestting kernel driver.
Comment 10 Felix Miata 2021-08-05 01:53:35 UTC
Created attachment 851529 [details]
Xorg.0.log.old; journalctl -b; dmesg; Xorg.0.log

I have 15.3 freshly updated on an old GeForce PC doing essentially the same thing. On boot, X fails to start. After login and systemctl restart xdm, normalcy begins. Booting older kernels is no help. Switching from TDM to XDM directly using update-alternatives --configure default-displaymanager didn't help either.

# inxi -CSy
System:
  Host: g5eas Kernel: 5.3.18-59.16-default x86_64 bits: 64 Desktop: Trinity
  Distro: openSUSE Leap 15.3
CPU:
  Info: Single Core model: Intel Pentium 4 bits: 64 type: MT cache: L2: 2 MiB
  Speed: 3200 MHz min/max: N/A Core speeds (MHz): 1: 3200 2: 3200
# inxi -Gayz
Graphics:
  Device-1: XGI Z7/Z9 vendor: Gigabyte driver: N/A bus-ID: 0a:03.0
  chip-ID: 18ca:0020 class-ID: 0300
  Device-2: NVIDIA G98 [GeForce 8400 GS Rev. 2] vendor: PNY driver: nouveau
  v: kernel bus-ID: 0b:00.0 chip-ID: 10de:06e4 class-ID: 0300
  Display: x11 server: X.Org 1.20.3 driver: loaded: modesetting
  unloaded: fbdev,vesa alternate: nouveau,nv,nvidia display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1200 s-dpi: 120 s-size: 406x254mm (16.0x10.0")
  s-diag: 479mm (18.9")
  Monitor-1: DVI-I-1 res: 1920x1200 hz: 60 dpi: 94
  size: 519x324mm (20.4x12.8") diag: 612mm (24.1")
  OpenGL: renderer: NV98 v: 3.3 Mesa 20.2.4 direct render: Yes
#
Note that there is no way via PC BIOS to make the on the motherboard XGI GPU disappear.

I found all this out after trying to reproduce this on NVidia with TW20210803 on this PC, but since today's zypper dup, it refuses to mount / RW until I login and execute mount -o remount,rw /.
Comment 11 Stefan Dirsch 2021-08-05 13:05:33 UTC
Ok. So even more confusing details from a completely different system. Thanks!
Comment 12 Felix Miata 2021-08-07 07:05:37 UTC
Created attachment 851587 [details]
Xorg.0.log.old; dmesg; journalctl -b

Happens in Intel Kaby Lake too:
# lspci -nnk | grep VGA
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 630 [8086:5912] (rev 04)
# inxi -Sy | grep istro
  Distro: openSUSE Tumbleweed 20210805
Again, no automatic restart was attempted. I must run systemctl restart xdm to get the greeter open. 30 of these are in dmesg:
[    5.131236] i915 0000:00:02.0: [drm:intel_hdmi_set_edid [i915]] HDMI GMBUS EDID read failed, retry using GPIO bit-banging
[  143.509712] i915 0000:00:02.0: [drm:intel_hdmi_set_edid [i915]] HDMI GMBUS EDID read failed, retry using GPIO bit-banging
That made me think maybe hardware issue, so I connected a different display, but no joy.
Comment 13 Felix Miata 2021-08-13 01:55:44 UTC
(In reply to Stefan Dirsch from comment #7)
> Ok. If it fails only during initial X startup, this looks like a timing
> issue, i.e. kernel module is not being initialized in time before X gets
> started.

As previously noted, this has turned out *not* to be limited to AMD. After reproducing this on a second Kaby Lake (gb250) I did some experimenting.

This is typical content of my grub.cfg linu lines where this reproduces:
	mitigations=auto consoleblank=0 video=1440x900@60 5
Changing it on gb250 to:
	mitigations=auto video=1440x900@60 5
*sometimes* avoids "(EE) open /dev/dri/card0: No such file or directory", resulting in expected startup.

Other times, the screen is black as noted in comment 0, while other times, focus returns to the login prompt on tty1.

I switched back to the comment #12 Kaby Lake (host ab250), tried the same things, and cannot get X to start on first try no matter what. There's also this:
# systemd-analyze
Startup finished in 17.795s (firmware) + 5.243s (loader) + 1.458s (kernel) + 1.406s (initrd) + 2.985s (userspace) = 28.889s
graphical.target reached after 2.961s in userspace

It seems consoleblank=0 on kernel command line *can* somehow affect timing of the KMS kernel module loading, whether it be i915, amdgpu or radeon, but the real or main problem must be that each KMS module simply is not getting loaded soon enough.

Is there anything that can advance KMS module loading?
Comment 14 Felix Miata 2021-08-14 07:43:09 UTC
Since hosts ab250, asa88 & gb250 where this reproduces are all running TDM, which a forum responder suggested might be at fault, on comment #0 host asa88 I did a fresh minimal/KDE (no-recommends) install of TW20210810, without NetworkManager, without Wicked, with systemd-networkd, without sddm, without lightdm, with xdm. This bug reproduces randomly on it. In 11 successive reboots:
1-success
2-success
3-failure
4-success
5-success
6-success
7-failure
8-failure
9-failure
10-failure
11-failure

Example failure stats (from #3 above):
# systemd-analyze
Startup finished in 5.768s (firmware) + 7.100s (loader) + 1.846s (kernel) + 2.102s (initrd) + 3.486s (userspace) = 20.304s
graphical.target reached after 3.477s in userspace
# systemd-analyze critical-chain
...
graphical.target @3.477s
└─multi-user.target @3.477s
  └─kbdsettings.service @1.007s +2.469s
    └─basic.target @996ms
      └─sockets.target @996ms
        └─telnet.socket @995ms
          └─sysinit.target @987ms
            └─systemd-update-utmp.service @958ms +28ms
              └─systemd-tmpfiles-setup.service @927ms +28ms
                └─local-fs.target @924ms
                  └─usr-local.mount @876ms +48ms
                    └─systemd-fsck@dev-disk-by\x2dlabel-tvgp04usrlcl.service @663ms +204ms
                      └─local-fs-pre.target @650ms
                        └─systemd-tmpfiles-setup-dev.service @630ms +17ms
                          └─kmod-static-nodes.service @556ms +55ms
                            └─systemd-journald.socket
                              └─system.slice
                                └─-.slice

The following is from #11 above:
# systemd-analyze blame | head -n22
2.521s kbdsettings.service
1.902s systemd-random-seed.service
1.541s sshd.service
1.140s dracut-initqueue.service
 659ms initrd-switch-root.service
 638ms chronyd.service
 471ms smartd.service
 396ms issue-generator.service
 304ms user@0.service
 303ms initrd-parse-etc.service
 182ms systemd-fsck@dev-disk-by\x2dlabel-tvgp04usrlcl.service
 179ms systemd-fsck@dev-disk-by\x2dlabel-tvgp06pub.service
 179ms systemd-fsck@dev-disk-by\x2dlabel-tvgp05home.service
 178ms systemd-udevd.service
 137ms systemd-networkd.service
 100ms display-manager.service
 100ms systemd-logind.service
  92ms systemd-udev-trigger.service
  82ms apparmor.service
  63ms sound-extra.service
  62ms systemd-tmpfiles-clean.service
  61ms modprobe@drm.service

Without radeon.cik_support=0 amdgpu.cik_support=1 on cmdline I managed success on 11 straight boots, so it seems amdgpu.ko.xz simply doesn't get loaded as fast/soon as radeon.ko.xz. 

I started a forum thread about this in June:
https://forums.opensuse.org/showthread.php/555514-boots-too-fast-for-Xorg-to-run
Comment 15 Felix Miata 2021-08-14 22:15:24 UTC
Using systemd-networkd instead of wicked or networkmanager makes the (~6s) difference between KMS module finishing loading soon enough or not.

This is from an xdm startup failing using systemd-networkd:
# journalctl -b -o short-monotonic -u display-manager.service -u systemd-modules-load.service -g St
-- Journal begins at Sat 2021-01-16 18:26:29 EST, ends at Sat 2021-08-14 16:50:47 EDT. --
[    3.849113] asa88 systemd[1]: Stopped Load Kernel Modules.
[    6.231843] asa88 systemd[1]: Starting X Display Manager...
[    7.383982] asa88 display-manager[651]: Starting service tdm
[    7.384359] asa88 systemd[1]: Started X Display Manager.

This is from an xdm startup succeeding using wicked:
# journalctl -b -o short-monotonic -u display-manager.service -u systemd-modules-load.service -g St
-- Journal begins at Fri 2021-08-13 23:33:06 EDT, ends at Sat 2021-08-14 17:38:24 EDT. --
[    3.731005] localhost systemd[1]: Stopped Load Kernel Modules.
[   13.312481] asa88 systemd[1]: Starting X Display Manager...
[   13.411758] asa88 display-manager[1013]: Starting service xdm
[   13.412175] asa88 systemd[1]: Started X Display Manager.

All my systems are configured with static IP, without IPV6 enabled, and without resolvconf of any kind enabled.

It looks like the summary should be:

		display-manager.service starts too soon when using systemd-networkd
or
		using systemd-networkd instead of wicked or networkmanager results in
		systemd-modules-load.service functionally finishing after display-manager.service starts
Comment 16 Dean Martin 2021-08-14 22:54:09 UTC
(In reply to Felix Miata from comment #15)
> Using systemd-networkd instead of wicked or networkmanager makes the (~6s)
> difference between KMS module finishing loading soon enough or not.
> 
> This is from an xdm startup failing using systemd-networkd:
> # journalctl -b -o short-monotonic -u display-manager.service -u
> systemd-modules-load.service -g St
> -- Journal begins at Sat 2021-01-16 18:26:29 EST, ends at Sat 2021-08-14
> 16:50:47 EDT. --
> [    3.849113] asa88 systemd[1]: Stopped Load Kernel Modules.
> [    6.231843] asa88 systemd[1]: Starting X Display Manager...
> [    7.383982] asa88 display-manager[651]: Starting service tdm
> [    7.384359] asa88 systemd[1]: Started X Display Manager.
> 
> This is from an xdm startup succeeding using wicked:
> # journalctl -b -o short-monotonic -u display-manager.service -u
> systemd-modules-load.service -g St
> -- Journal begins at Fri 2021-08-13 23:33:06 EDT, ends at Sat 2021-08-14
> 17:38:24 EDT. --
> [    3.731005] localhost systemd[1]: Stopped Load Kernel Modules.
> [   13.312481] asa88 systemd[1]: Starting X Display Manager...
> [   13.411758] asa88 display-manager[1013]: Starting service xdm
> [   13.412175] asa88 systemd[1]: Started X Display Manager.
> 
> All my systems are configured with static IP, without IPV6 enabled, and
> without resolvconf of any kind enabled.
> 
> It looks like the summary should be:
> 
> 		display-manager.service starts too soon when using systemd-networkd
> or
> 		using systemd-networkd instead of wicked or networkmanager results in
> 		systemd-modules-load.service functionally finishing after
> display-manager.service starts

That is what using early KMS (module added to initrd) may help with, as Stefan Dirsch already mentioned back in comment #7.
Comment 17 Felix Miata 2021-08-16 04:50:36 UTC
I don't want to force early via including graphics modules in initrd.

One of the Shaman Penguins on the forum thread provided a workaround I can use until the real offender can be isolated and possibly dealt with:

# systemctl cat display-manager.service
...
# /etc/systemd/system/display-manager.service.d/override.conf
[Unit]
After= systemd-udev-settle.service
Requires=systemd-udev-settle.service
Comment 18 Dean Martin 2021-08-17 01:08:55 UTC
(In reply to Felix Miata from comment #17)
> I don't want to force early via including graphics modules in initrd.
> 
> One of the Shaman Penguins on the forum thread provided a workaround I can
> use until the real offender can be isolated and possibly dealt with:
> 
> # systemctl cat display-manager.service
> ...
> # /etc/systemd/system/display-manager.service.d/override.conf
> [Unit]
> After= systemd-udev-settle.service
> Requires=systemd-udev-settle.service

Your choice of display-manager is TDM. Have you tested other display-managers for this behaviour?
Comment 19 Felix Miata 2021-08-18 02:11:50 UTC
One only. On the fresh TW20210810 installation on host asa88 from which comment #17 resulted, no DM other than XDM is or was installed.

I found another AMD host with TW20210810/TDM that requires this workaround, ara88, CPU/APU: AMD PRO A8-8650B R7.

As deano noted in the forum thread, on https://adamsdesk.com/blog/2021/02/15/gdm-no-longer-starts-automatically/ is explained this had been surfacing apparently more than 6 months ago on Arch and Fedora. There it points out the likelihood of this happening was discovered over 9 years ago: https://gitlab.gnome.org/GNOME/gdm/-/issues/103.

In the adamsdesk article one workaround (out of several) was suggested, and repeated in my forum thread, which I initially found not to work on asa88 with TDM: Wants=dev-dri-card0.device & After=dev-dri-card0.device instead of After=systemd-udev-settle.service & Requires=systemd-udev-settle.service. I later discovered that I hadn't coupled the dev-dri-card0.device method with a required /etc/udev/rules.d/99-make-udev-drm-aware.rules containing 'SUBSYSTEM=="drm", TAG+="systemd"'. With the udev rule, dev-dri-card0.device also works for hosts asa88, ab250 and gb250, but not host ara88.

I need to note to that systemd-udev-settle.service as shipped, which ends with "ExecStart=udevadm settle", does not work. In order to work it needs the version shipped in 15.2, which ends with "ExecStart=/usr/bin/udevadm settle".
Comment 20 Felix Miata 2022-02-01 04:52:09 UTC
Still happening on the comment #17 TW host running Tiwai's simpledrm kernel, and no workarounds engaged:
# rpm -qa | egrep 'gdm|sddm|lightdm|kdm|tdm|xdm'
xdm-1.1.12-17.2.x86_64
# inxi -Sy
System:
  Host: asa88 Kernel: 5.16.3-4.gc7377e3-default x86_64 bits: 64
    Desktop: KDE Plasma 5.23.5 Distro: openSUSE Tumbleweed 20220130
# inxi -Gayz
Graphics:
  Device-1: AMD Kaveri [Radeon R7 Graphics] vendor: ASUSTeK driver: amdgpu
    v: kernel alternate: radeon bus-ID: 00:01.0 chip-ID: 1002:130f
    class-ID: 0300
  Display: x11 server: X.Org 1.21.1.3 compositor: kwin_x11 driver:
    loaded: amdgpu unloaded: modesetting alternate: ati,fbdev,vesa
    display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1200 s-dpi: 120 s-size: 406x254mm (16.0x10.0")
    s-diag: 479mm (18.9")
  Monitor-1: HDMI-A-0 res: 1920x1200 hz: 60 dpi: 94
    size: 519x324mm (20.4x12.8") diag: 612mm (24.1")
  OpenGL:
    renderer: AMD KAVERI (DRM 3.44.0 5.16.3-4.gc7377e3-default LLVM 13.0.0)
    v: 4.6 Mesa 21.3.4 direct render: Yes
#
Comment 21 Felix Miata 2022-04-27 05:20:51 UTC
Continues on host asa88 with TW20220425, and others I haven't kept track of.
Comment 22 Felix Miata 2022-04-29 08:15:33 UTC
Happens on fresh installation of 15.4 beta with KDM3 on Kaby Lake host gb250:
# pinxi -GISaz
System:
  Kernel: 5.14.21-150400.19-default arch: x86_64 bits: 64 compiler: gcc
    v: 7.5.0 parameters: BOOT_IMAGE=/boot/vmlinuz root=LABEL=<filter>
    noresume ipv6.disable=1 net.ifnames=0 mitigations=auto consoleblank=0
    video=1440x900@60 5
  Desktop: KDE v: 3.5.10 tk: Qt v: 3.3.8c info: kicker wm: kwin vt: 7
    dm: KDM Distro: openSUSE Leap 15.4 Beta
Graphics:
  Device-1: Intel HD Graphics 630 vendor: Gigabyte driver: i915 v: kernel
    ports: active: HDMI-A-1 empty: DP-1, DP-2, HDMI-A-2, HDMI-A-3
    bus-ID: 00:02.0 chip-ID: 8086:5912 class-ID: 0300
  Display: x11 server: X.Org v: 1.20.3 driver: X: loaded: modesetting
    unloaded: fbdev,vesa alternate: intel gpu: i915 display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1200 s-dpi: 120 s-size: 406x254mm (15.98x10.00")
    s-diag: 479mm (18.85")
  Monitor-1: HDMI-A-1 mapped: HDMI-1 model: NEC EA243WM serial: <filter>
    built: 2011 res: 1920x1200 hz: 60 dpi: 94 gamma: 1.2
    size: 519x324mm (20.43x12.76") diag: 612mm (24.1") ratio: 16:10 modes:
    max: 1920x1200 min: 640x480
  OpenGL: renderer: Mesa Intel HD Graphics 630 (KBL GT2) v: 4.6 Mesa 21.2.4
    direct render: Yes
Info:...Shell: Bash v: 4.4.23 running-in: konsole pinxi: 3.3.15-3
# ls -1 /sys/class/drm/
card0
card0-DP-1
card0-DP-2
card0-HDMI-A-1
card0-HDMI-A-2
card0-HDMI-A-3
renderD128
version
# systemd-analyze critical-chain
...
graphical.target @2.993s
└─multi-user.target @2.993s
  └─kbdsettings.service @866ms +2.126s
    └─basic.target @853ms
      └─sockets.target @852ms
        └─telnet.socket @852ms
          └─sysinit.target @842ms
            └─systemd-update-utmp.service @829ms +9ms
              └─systemd-tmpfiles-setup.service @716ms +110ms
                └─local-fs.target @709ms
                  └─usr-local.mount @692ms +14ms
                    └─systemd-fsck@dev-disk-by\x2dlabel-pi3p04usrlcl.service @648ms +37ms
                      └─local-fs-pre.target @625ms
                        └─systemd-tmpfiles-setup-dev.service @590ms +33ms
                          └─kmod-static-nodes.service @511ms +51ms
                            └─systemd-journald.socket
                              └─system.slice
                                └─-.slice
Comment 23 Felix Miata 2022-09-19 07:03:57 UTC
Both 15.4 & 15.3 using TDM suffer this with their latest kernels 24.21 & 59.93 on comment 10 host g5eas with chip-ID: 10de:06e4 & kernel module nouveau. SDDM in current TW has this problem not.

I wonder if this is aggravated on this host because the 18ca:0020 XGI IGP cannot be disabled, and doesn't seem to be by the presence of the ancient PCI GPU. I don't recognize existence of any module for it among kernel/drivers/gpu/drm.

Comment #0 host asa88's motherboard died, so I moved its APU to host ara88 in the process of confirming the death. Checking ara88's status re this is yet todo....
Comment 24 Felix Miata 2022-10-24 03:28:22 UTC
Created attachment 862373 [details]
systemd-analyze blame

This 475 line attachment is from today's fresh installation. I spent all day yesterday and the first hours of today beating my head against this wall with this PC's previous 15.4, an upgrade from 15.3 many moons ago then ending with kernel-default-5.14.21-150400.22.1 as latest kernel, then three separate attempts to upgrade from 15.3 again, before deciding on a fresh start.

(In reply to Stefan Dirsch from comment #7)
> Ok. If it fails only during initial X startup, this looks like a timing
> issue, i.e. kernel module is not being initialized in time before X gets
> started. Maybe amdgpu kernel module is missing from initrd, but radeon is
> (since it's the default driver), i.e. adding amdgpu to initrd may help (if
> it's really missing).

On 15.4 post-kernel-default-5.14.21-150400.22.1, the Kaveri [Radeon R7 Graphics] chip-ID: 1002:130f PC is stubbornly refusing to load a kernel graphics module before X tries its first start on each boot. XDM won't auto restart, so I get either solid black screen, or a tty1 login prompt, depending on what linu line parameters are used, and/or which display drivers are configured, and/or whether I have validly reconfigured dracut for graphics module loading, and/or I've blacklisting of radeon in /etc/modulesload.d/.

# systemd-analyze critical-chain
...
graphical.target @4.227s
└─multi-user.target @4.226s
  └─kbdsettings.service @2.171s +2.055s
    └─basic.target @2.154s
      └─sockets.target @2.154s
        └─telnet.socket @2.154s
          └─sysinit.target @2.148s
            └─systemd-backlight@backlight:acpi_video0.service @3.476s +7ms
              └─system-systemd\x2dbacklight.slice @3.271s
                └─system.slice
                  └─-.slice

Questions:

1-How do I guarantee earliest possible loading of whatever kernel graphics module is needed or wanted, whether radeon or amdgpu? Is force_drivers+=" amdgpu " in a file of any name in /etc/dracut.conf.d/ sufficient to ensure loading amdgpu comes first? Solely? Is omit_drivers+=" radeon " needed as well?

2-Does initrd by default include whatever blacklisting is contained in /etc/modprobe.d/? Is "blacklist radeon" in any file in this directory sufficient? Do filenames here need to end in .conf to be utilized?

3-Could a delay of first X start on custom want or after existence of /dev/dri/card0: in /usr/lib/systemd/system/display-manager.service work? Shouldn't that already be happening?

Googling early graphics loading has been getting me nothing but *NVidia* and/or Youtubes. :( 15.3 & TW are behaving perfectly using only amdgpu, without any heroics, on same PC.
Comment 25 Stefan Dirsch 2022-10-24 09:53:30 UTC
You could check which DRM drivers have been added to initrd by running

  lsinitrd /boot/initrd | grep drm

I believe only the needed (default) driver is being added to initrd. So, yes it would make sense to force dracut to add "amdgpu" driver and save space and not include "radeon" driver. As you suggested

force_drivers+=" amdgpu"
omit_drivers+="radeon"

Of course you need to regenerate the initrd afterwards.

Xserver is being started by the displaymanager. So if anyone can/should wait for existence of /dev/dri it would be the DM. But I'm afraid this won't help. As you noticed the load of the module is being triggered by Xserver, the device is just not available in time.
Comment 26 Stefan Dirsch 2022-10-25 11:34:32 UTC
(In reply to Stefan Dirsch from comment #25)
> You could check which DRM drivers have been added to initrd by running
> 
>   lsinitrd /boot/initrd | grep drm
> 
> I believe only the needed (default) driver is being added to initrd. So, yes
> it would make sense to force dracut to add "amdgpu" driver and save space
> and not include "radeon" driver. As you suggested
> 
> force_drivers+=" amdgpu"
> omit_drivers+="radeon"
> 
> Of course you need to regenerate the initrd afterwards.
> 
> Xserver is being started by the displaymanager. So if anyone can/should wait
> for existence of /dev/dri it would be the DM. But I'm afraid this won't
> help. As you noticed the load of the module is being triggered by Xserver,
> the device is just not available in time.

So does this help?
Comment 27 Felix Miata 2022-10-26 05:02:13 UTC
I gave this more thought and reached the conclusion that at the KMS switch point is when a kernel graphics module should load, but does not. When the screen goes black as 1024x768 mode ends, it should then enable either the native mode, or the cmdline video= mode. That's not been happening.

Between comments #24 & #25, I took a trip through BIOS setup and found IOMMU disabled. I switched it to enabled, and found the bad behavior seemed to be gone. However, after a break for sleep, on next considerable number of boots the bad behavior was back. So I decided to forget about it and do other things that needed doing. During that lull, I stumbled onto rd.driver.pre=amdgpu, so on return to try to follow-up here I added it. It didn't seem to make any difference.

Today is a new day. So far, every boot with an initrd built to include:
# lsinitrd /boot/initrd | grep drm | grep -v ^d
-rw-r--r--   1 root     root      3417336 Sep  8 13:12 lib/modules/5.14.21-150400.24.21-default/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko.zst
-rw-r--r--   1 root     root       301834 Sep  8 13:12 lib/modules/5.14.21-150400.24.21-default/kernel/drivers/gpu/drm/drm.ko.zst
-rw-r--r--   1 root     root       182288 Sep  8 13:12 lib/modules/5.14.21-150400.24.21-default/kernel/drivers/gpu/drm/drm_kms_helper.ko.zst
-rw-r--r--   1 root     root         8183 Sep  8 13:12 lib/modules/5.14.21-150400.24.21-default/kernel/drivers/gpu/drm/drm_ttm_helper.ko.zst
-rw-r--r--   1 root     root        22889 Sep  8 13:12 lib/modules/5.14.21-150400.24.21-default/kernel/drivers/gpu/drm/scheduler/gpu-sched.ko.zst
-rw-r--r--   1 root     root        47953 Sep  8 13:12 lib/modules/5.14.21-150400.24.21-default/kernel/drivers/gpu/drm/ttm/ttm.ko.zst
As a result of
force_drivers+=" amdgpu"
omit_drivers+="radeon"

But, a boot without such an initrd, and without rd.driver.pre=amdgpu, following a reboot from same initrd but with rd.driver.pre=amdgpu, just behaved as expected. The following boot, also without rd.driver.pre=amdgpu, black screened with (EE) open /dev/dri/card0: No such file or directory yet again. Another, with rd.driver.pre=amdgpu, also bad. Later with a no driver "forcing" initrd, I got a black screen boot with rd.driver.pre=amdgpu, followed by a normal boot without rd.driver.pre=amdgpu followed by a black screen boot with rd.driver.pre=amdgpu, so I took rd.driver.pre=amdgpu out of the default boot stanza. Next boot was black, followed by an almost black, both using an initrd explicitly omitting radeon but not mentioning amdgpu. Then again without changing anything, next boot was normal/as expected. I switched back to initrd with force amdgpu & omit radeon (initrd for .24.21 #7), and got good boots >5X in a row. There's just no rhyme or reason to whether /dev/dri/card0 appears soon enough or not without force-amd/omit-radeon, but with is a solution that is less than a 100% guarantee.

I think I want to see what the long-awaited next kernel version brings, but ATM, helps seems to be something like ~97% yes. 

On occasion, the "black" screen turns out to be not 100% black. Occasionally a tty1 screen will have output, but with the brightness turned down to something in the neighborhood of 10% or less.

FWIW, same machine has no direct I/O at all on Fedora 36 except with 5.18 kernel instead of 5.19.17, but is fine on 37:
https://bugzilla.redhat.com/show_bug.cgi?id=2130843#c1 That, this, and threads in various forums in recent weeks makes me think something is going wrong upstream in kernel with AMD/ATI graphics.
Comment 28 Felix Miata 2022-12-21 05:19:26 UTC
Created attachment 863603 [details]
first and second and third Xorg.0.logs from a fresh boot (fail=1, fail=2, success=3)

Yet another host suffering this without having i915 included in the initrd. There are three things to distinguish this one from previous comments. First: this one locks up at the black screen, no response to keyboard. Remote login enables xdm to be restarted, after which operation is normal. Second, KDM3 is the DM. Third, when X attempts start automatically for the second time, it uses /dev/dri/card1 instead of /dev/dri/card0. The manual xdm start also uses /dev/dri/card1. This /dev/dri/card1 usage on #2+ starts is not unique. I've seen it with other problem hosts not already mentioned here.
# inxi -SG
System:
  Host: gx280 Kernel: 6.0.12-1-default arch: i686 bits: 32 Desktop: KDE
    v: 3.5.10 Distro: openSUSE Tumbleweed 20221219
Graphics:
  Device-1: Intel 82915G/GV/910GL Integrated Graphics driver: i915 v: kernel
  Display: x11 server: X.Org v: 21.1.4 driver: X: loaded: intel
    unloaded: fbdev,modesetting,vesa dri: i915 gpu: i915
    resolution: 1680x1050~60Hz
  API: OpenGL v: 2.1 Mesa 22.2.4 renderer: i915 (: 915G)
# dmesg | grep ailed
[   62.345100] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[   62.345122] cfg80211: failed to load regulatory.db
[   80.820683] simple-framebuffer simple-framebuffer.0: [drm:drm_atomic_helper_check_planes] [CRTC:34:crtc-0] atomic driver check failed
[   80.820690] simple-framebuffer simple-framebuffer.0: [drm:drm_atomic_check_only] atomic driver check for 963f11c1 failed: -22
[  145.537805] i915 0000:00:02.0: [drm:i915_gem_execbuffer2_ioctl [i915]] copy 1 exec entries failed
[  201.786429] i915 0000:00:02.0: [drm:i915_gem_execbuffer2_ioctl [i915]] copy 1 exec entries failed
[ 2583.712955] i915 0000:00:02.0: [drm:i915_gem_execbuffer2_ioctl [i915]] copy 1 exec entries failed
Comment 29 Felix Miata 2023-04-14 05:52:59 UTC
Created attachment 866300 [details]
Xorg.0.log from TW20230412 w/ SDDM/Plasma

Host fi965 is another victim, currently TW20230412 on an old PCIe Radeon  HD 8570 / R5 430 OEM R7 240/340 Radeon 520 OEM 1002:6611. It has two discrete installations. One with KDM3/KDE3, the other with SDDM/Plasma, both suffering. From the SDDM:
# dmesg | grep aile
[   30.152899] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[   30.152906] cfg80211: failed to load regulatory.db
# journalctl -b | grep aile
Apr 13 21:37:28 fi965 systemd-vconsole-setup[167]: Failed to import credentials, ignoring: No such file or directory
Apr 14 01:38:21 fi965 kernel: platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
Apr 14 01:38:21 fi965 kernel: cfg80211: failed to load regulatory.db
Apr 14 01:38:48 fi965 nscd[790]: 790 stat failed for file `/etc/services'; will try again later: No such file or directory
# grep /dev/dr /var/log/Xorg.0.log
[  1486.092] (II) xfree86: Adding drm device (/dev/dri/card1)
[  1486.098] (II) Applying OutputClass "AMDgpu" to /dev/dri/card1
[  1486.106] (EE) open /dev/dri/card0: No such file or directory
[  1486.106] (II) Applying OutputClass "AMDgpu" options to /dev/dri/card1
# grep /dev/dr /var/log/Xorg.0.log.old
[   619.557] (II) xfree86: Adding drm device (/dev/dri/card1)
[   619.697] (II) Applying OutputClass "AMDgpu" to /dev/dri/card1
[   619.795] (EE) open /dev/dri/card0: No such file or directory
[   619.806] (II) Applying OutputClass "AMDgpu" options to /dev/dri/card1
#
Absent vtty or remote login for systemctl restart xdm, post-grub activity ends with login prompt on vtty1.
Comment 30 Stefan Dirsch 2023-04-17 11:07:05 UTC
Hmm. For some reason amdgpu driver takes /dev/dri/card1. I suggest to try again without xf86-video-amdgpu package installed, i.e. let modesetting X driver take over the device.

Apart from that this all looks like a timing issue, i.e. driver not being initialized in time before X gets started.
Comment 31 Felix Miata 2023-12-14 03:31:02 UTC
Created attachment 871337 [details]
Xorg.0.log from the initial post-boot X failure on KBL GT2 Slowroll host ab250

Host ab250 here is Intel Kaby Lake with Tumbleweed, Slowroll, 15.4, 15.5 and 15.5, among other distros, installed on it. Like several other openSUSE installations here on various hosts, TW, SR & Leap, X will fail to start, claiming /dev/dri/card0 does not exist, but instead of a black screen, it exhibits multi-user.target behavior. When I login and systemctl restart xdm, the greeter starts, and X is running using /dev/dri/card1, with /dev/dri/card0 non-existant.
# systemd-analyze
Startup finished in 13.602s (firmware) + 9.196s (loader) + 1.801s (kernel) + 2.357s (initrd) + 4.994s (userspace) = 31.953s
graphical.target reached after 4.993s in userspace
# systemd-analyze critical-chain
...
graphical.target @4.993s
└─multi-user.target @4.993s
  └─kbdsettings.service @2.865s +2.126s
    └─systemd-vconsole-setup.service @2.655s +175ms
      └─systemd-journald.socket
        └─system.slice
          └─-.slice
# dmesg | grep aile
[    6.200638] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[    6.200641] cfg80211: failed to load regulatory.db
[    6.538658] i915 0000:00:02.0: [drm] [ENCODER:94:DDI A/PHY A] failed to retrieve link info, disabling eDP
# journalctl -b --no-hostname | grep aile
Dec 13 21:43:06 kernel: platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
Dec 13 21:43:06 kernel: cfg80211: failed to load regulatory.db
Dec 13 21:43:06 kernel: i915 0000:00:02.0: [drm] [ENCODER:94:DDI A/PHY A] failed to retrieve link info, disabling eDP
Dec 13 21:43:06 systemd[1]: kbdsettings.service: Failed with result 'signal'.
#
Comment 32 Stefan Dirsch 2024-01-06 18:40:38 UTC
Hmm. So seems we see here a timing issue as well on Intel. First time I hear about this.
Comment 33 Michal Suchanek 2024-01-06 18:47:55 UTC
Maybe it would make sense to wait for /dev/dri/card0 to appear for a few seconds before starting the display manager. Likely that timeout will need to be reached on systems that do not have 3D acceleration support, only modesetting.
Comment 34 Felix Miata 2024-03-21 03:10:09 UTC
What's apparently been happening in most installations for a while is the DM retries, so eventually launches:

# inxi -CGSz --hostname
System:
  Host: ab250 Kernel: 6.4.0-150600.9-default arch: x86_64 bits: 64
  Desktop: TDE (Trinity) v: R14.1.1 Distro: openSUSE Leap 15.6 Beta
CPU:
  Info: quad core model: Intel Core i5-7500T bits: 64 type: MCP cache:
    L2: 1024 KiB
  Speed (MHz): avg: 800 min/max: 800/3300 cores: 1: 800 2: 800 3: 800 4: 800
Graphics:
  Device-1: Intel HD Graphics 630 driver: i915 v: kernel
  Display: x11 server: X.Org v: 1.21.1.11 driver: X: loaded: modesetting
    unloaded: fbdev,vesa dri: iris gpu: i915 resolution: 1: 2560x1440~60Hz
    2: 1920x1200~60Hz 3: 1680x1050~60Hz
  API: OpenGL v: 4.6 vendor: intel mesa v: 23.3.4 renderer: Mesa Intel HD
    Graphics 630 (KBL GT2)
# lsinitrd /boot/initrd | grep 915
# systemd-analyze critical-chain
...
graphical.target @4.586s
└─multi-user.target @4.586s
  └─kbdsettings.service @2.422s +2.163s
    └─systemd-vconsole-setup.service @2.102s +299ms
      └─systemd-journald.socket
        └─system.slice
          └─-.slice
# grep dev/dr /var/log/Xorg.0*
/var/log/Xorg.0.log:[    20.631] (II) xfree86: Adding drm device (/dev/dri/card0)
/var/log/Xorg.0.log:[    20.666] (II) modeset(0): using drv /dev/dri/card0
/var/log/Xorg.0.log.old:[     5.606] (EE) open /dev/dri/card0: No such file or directory
/var/log/Xorg.0.log.old:[     5.606] (EE) open /dev/dri/card0: No such file or directory
#

# inxi -CGSz --hostname
System:
  Host: ab250 Kernel: 6.6.21-1-longterm arch: x86_64 bits: 64
  Desktop: KDE v: 3.5.10 Distro: openSUSE Tumbleweed-Slowroll 20240213
CPU:
  Info: quad core model: Intel Core i5-7500T bits: 64 type: MCP cache:
    L2: 1024 KiB
  Speed (MHz): avg: 800 min/max: 800/3300 cores: 1: 800 2: 800 3: 800 4: 800
Graphics:
  Device-1: Intel HD Graphics 630 driver: i915 v: kernel
  Display: x11 server: X.Org v: 21.1.11 driver: X: loaded: modesetting
    dri: iris gpu: i915 resolution: 1: 2560x1440~60Hz 2: 1920x1200~60Hz
    3: 1680x1050~60Hz
  API: EGL v: 1.5 drivers: iris,swrast platforms: x11,surfaceless,device
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 23.3.6
    renderer: Mesa Intel HD Graphics 630 (KBL GT2)
# lsinitrd /boot/initrd | grep i915
# systemd-analyze critical-chain
...
graphical.target @5.686s
└─multi-user.target @5.686s
  └─kbdsettings.service @3.561s +2.122s
    └─systemd-vconsole-setup.service @3.351s +152ms
      └─systemd-journald.socket
        └─system.slice
          └─-.slice
# grep dev/dr /var/log/Xorg.0*
/var/log/Xorg.0.log:[    22.161] (II) xfree86: Adding drm device (/dev/dri/card1)
/var/log/Xorg.0.log:[    22.207] (II) modeset(0): using drv /dev/dri/card1
/var/log/Xorg.0.log.old:[     7.135] (EE) open /dev/dri/card0: No such file or directory
/var/log/Xorg.0.log.old:[     7.135] (EE) open /dev/dri/card0: No such file or directory
#
Comment 35 Michal Suchanek 2024-03-21 09:02:47 UTC
So this is the result of trying to boot as fast as possible, and the display manager starts before the system is ready.

Systemd maintainers, please advise how to delay display manager startup until all drivers are loaded.
Comment 36 Franck Bui 2024-03-21 09:25:22 UTC
(In reply to Michal Suchanek from comment #35)
> So this is the result of trying to boot as fast as possible, and the display
> manager starts before the system is ready.
> 
> Systemd maintainers, please advise how to delay display manager startup
> until all drivers are loaded.

I'm not aware of any way to do that (and that would be ugly to do so).

When a process needs a device, the traditional way is to rely on udev for waiting for the device before accessing it.
Comment 37 Takashi Iwai 2024-03-21 09:34:54 UTC
You should include amdgpu driver in initrd.  That's done in the early stage, hence such a timing problem can be avoided in most cases.  And that's the openSUSE default behavior.

It worked with radeon (casually) likely because it can initialize the device much quicker than amdgpu that needs the firmware loading and more complex tasks.

There is no general solution if the device is being initialized at a late stage, AFAIK.  You can tweak the stuff that matches with your hardware setup, but it can't be applied generically.
Comment 38 Michal Suchanek 2024-03-21 09:39:11 UTC
We don't know what device is needed, only that if some devices are not initialized the display server fails to start.
Comment 39 Stefan Dirsch 2024-03-21 10:03:58 UTC
Just to confirm the observation. The displaymanager tries 3 times to start Xserver before it fails in a fatal way. It wasn't meant as a workaround for timing issues though. 

As Takashi I see no generic approach to address this. Of course you could add
some weird sleep loops until you run into a predefined timeout, before you start
the displaymanager. But even this may fail in the end. And it makes booting
slower for everyone ...
Comment 40 Michal Suchanek 2024-03-21 10:14:40 UTC
And we do not know what driver is needed, it varies depending on hardware.

We also do not know which one is needed if more than one graphics card is present or if all are needed.
Comment 41 Stefan Dirsch 2024-03-21 10:51:35 UTC
(In reply to Michal Suchanek from comment #40)
> And we do not know what driver is needed, it varies depending on hardware.

Usually it would be /dev/dri/card0 (of course only on systems with VGA device).

> We also do not know which one is needed if more than one graphics card is
> present or if all are needed.

Yeah. That's true. So on some systems things will still fail miserably and on many systems it will make the startup slower.
Comment 42 Felix Miata 2024-03-21 15:02:56 UTC
(In reply to Takashi Iwai from comment #37)
> You should include amdgpu driver in initrd.  That's done in the early stage,
> hence such a timing problem can be avoided in most cases.  And that's the
> openSUSE default behavior.

Forcing i915 apparently caused bug 1206316 so I stopped force including it.

Following comes from the two comment #34 installations, while same host's TW uses the same configuration and suffers the same delay.:
# cat /etc/dracut.conf.d/*conf /disks/sslo/etc/dracut.conf.d/*conf | egrep -v '^\#|^$'
persistent_policy="by-uuid"
persistent_policy="by-label"
compress="xz"
hostonly="yes"
omit_drivers+=" btrfs crypto dmraid encryptfs i18n iscsi lvm lvm2 plymouth raid1 md_mod resume sata_sil uefi-lib usb_storage watchdog "
omit_dracutmodules+=" resume "
persistent_policy="by-uuid"
persistent_policy="by-label"
compress="xz"
hostonly="yes"
omit_drivers+=" btrfs crypto dmraid encryptfs i18n iscsi lvm lvm2 plymouth raid1 md_mod resume sata_sil uefi-lib usb_storage watchdog "
omit_dracutmodules+=" resume "
# ls -gG /etc/dracut.conf.d/
-rw-r--r-- 1 894 Feb 27 12:59 10-persistent_policy.conf
-rw-r--r-- 1  29 Nov 10  2022 13-persistent-local.conf
-rw-r--r-- 1 366 Aug 13  2023 90-local.conf
-rw-r--r-- 1 491 Feb 27 12:59 99-debug.conf
# rpm -qf 99-debug.conf 10-persistent_policy.conf
dracut-059+suse.506.gd33b6bef-150600.1.32.x86_64
dracut-059+suse.506.gd33b6bef-150600.1.32.x86_64
# cat /etc/dracut.conf.d/10-persistent_policy.conf | egrep -v '^\#|^$'
persistent_policy="by-uuid"
Comment 43 Felix Miata 2024-03-27 22:41:25 UTC
Back to the comment #0 PC, no amount of waiting gets X going. Systemctl restart xdm is required:
# inxi -GSz
System:
  Kernel: 6.6.22-1-longterm arch: x86_64 bits: 64
  Console: pty pts/0 Distro: openSUSE Tumbleweed-Slowroll 20240213
Graphics:
  Device-1: AMD Kaveri [Radeon R7 Graphics] driver: amdgpu v: kernel
  Display: server: X.org v: 1.21.1.11 driver: X: loaded: vesa
    unloaded: fbdev,modesetting gpu: amdgpu resolution: 1: 2560x1440
    2: 1680x1050 3: 1920x1200 4: 1680x1050
  API: EGL v: 1.5 drivers: radeonsi,swrast platforms: surfaceless,device
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: mesa v: 23.3.6 note: incomplete
    (EGL sourced) renderer: AMD Radeon R7 Graphics (radeonsi kaveri LLVM
    17.0.6 DRM 3.54 6.6.22-1-longterm), llvmpipe (LLVM 17.0.6 256 bits)
  API: Vulkan Message: No Vulkan data available.
# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz root=LABEL=zd8p19sslo noresume ipv6.disable=1 net.ifnames=0 radeon.cik_support=0 amdgpu.cik_support=1 consoleblank=0 preempt=full mitigations=off
# ls -gGh /dev/dri
total 0
drwxr-xr-x 2       80 Mar 27 18:26 by-path
crw-rw---- 1 226,   1 Mar 27 18:26 card1
crw-rw---- 1 226, 128 Mar 27 18:26 renderD128
# grep /dev/dr /var/log/Xorg.0.log*
/var/log/Xorg.0.log:[    11.244] (EE) open /dev/dri/card0: No such file or directory
/var/log/Xorg.0.log:[    11.244] (EE) open /dev/dri/card0: No such file or directory
/var/log/Xorg.0.log.old:[   869.564] (II) xfree86: Adding drm device (/dev/dri/card1)
/var/log/Xorg.0.log.old:[   869.570] (II) Applying OutputClass "AMDgpu" to /dev/dri/card1
/var/log/Xorg.0.log.old:[   869.576] (EE) open /dev/dri/card0: No such file or directory
/var/log/Xorg.0.log.old:[   869.577] (II) Applying OutputClass "AMDgpu" options to /dev/dri/card1
# systemd-analyze critical-chain
...
graphical.target @10.040s
└─multi-user.target @10.040s
  └─kbdsettings.service @7.945s +2.093s
    └─systemd-vconsole-setup.service @7.315s +599ms
      └─systemd-journald.socket
        └─system.slice
          └─-.slice