Bugzilla – Bug 1227363
PM1743 vroc raid1 install SLE-15-SP6.os into emergency mode
Last modified: 2024-07-04 08:28:28 UTC
Reproduce Steps: 1. Use two PM1743 1.92TB group vroc raid1 2. Install the SLE-15-SP6 system 3. The system installation is completed. The system restarts into emergency mode. Investigation: We add "rd.udev.debug" into cmdline, we found the following log, (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: Unable to get real path for '/sys/bus/pci/drivers/vmd/0000:c7:00.5/domain/device'' (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: /dev/nvme1n1 is not attached to Intel(R) RAID controller.' (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: No OROM/EFI properties for /dev/nvme1n1' (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: no RAID superblock on /dev/nvme1n1.' (udev-worker)[2149]: nvme1n1: Process '/sbin/mdadm -I /dev/nvme1n1' failed with exit code 1. And according to our analysis, the issue's root cause is as following, After a NVMe disk is probed/added by the nvme driver, the udevd executes some rule scripts by invoking mdadm command to detect if there is a mdraid associated with this NVMe disk. The mdadm determines if one NVMe devce is connected to a particular VMD domain by checking the domain symlink. Here is the root cause: Thread A Thread B Thread mdadm vmd_enable_domain pci_bus_add_devices __driver_probe_device ... work_on_cpu schedule_work_on : wakeup Thread B nvme_probe : wakeup scan_work to scan nvme disk and add nvme disk then wakeup udevd : udevd executes mdadm command flush_work main : wait for nvme_probe done ... __driver_probe_device find_driver_devices : probe next nvme device : 1) Detect the domain ... symlink; 2) Find the ... domain symlink from ... vmd sysfs; 3) The ... domain symlink is not ... created yet, failed sysfs_create_link : create domain symlink sysfs_create_link is invoked at the end of vmd_enable_domain. However, this implementation introduces a timing issue, where mdadm might fail to retrieve the vmd symlink path because the symlink has not been created yet. Please refer to the following link https://lore.kernel.org/linux-pci/20240603140329.7222-1-sjiwei@163.com/t/#u Could you please help to backport the following patch into sles15sp6 kernel? https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=vmd&id=7a13782e6150154abdf34ced3b733502275a16d1
Thanks for the report. I backported the fix PCI patch now to SLE15-SP6 branch. It likely slipped from the upcoming update in July, but will be included afterwards.