Bugzilla – Bug 1051354
"zypper up" after installing 42.3 causes boot to fail (dracut-initqueue timeout due to missing LVM devices)
Last modified: 2017-09-07 18:36:39 UTC
After installing Leap 42.3 I ran "zypper up" and got the following updates (there are a few others, but aren't relevant): The following 10 packages are going to be upgraded: dracut 044-21.7 -> 044.1-23.2 libsystemd0 228-27.2 -> 228-29.1 libsystemd0-32bit 228-27.2 -> 228-29.1 libudev1 228-27.2 -> 228-29.1 systemd 228-27.2 -> 228-29.1 systemd-32bit 228-27.2 -> 228-29.1 systemd-bash-completion 228-27.2 -> 228-29.1 systemd-logger 228-27.2 -> 228-29.1 systemd-sysvinit 228-27.2 -> 228-29.1 udev 228-27.2 -> 228-29.1 There's something in there that causes the next boot to fail due to dracut-initqueue not being able to find /dev/mapper/system-root. Looking at /dev/mapper the only file in there is 'control'. Running "lvm_scan" causes the volumes to be found and /sys-root to be mounted. The install is on a single 4TB SATA disk with 3 LVM volumes: system-root (Btrfs), system-swap and system-home (xfs).
Trying dracut maintainers, cc'ing systemd maintainers.
Probaby systemd bug 1051465. Can you check if the patch provided there fixes the problem for you?
(In reply to Daniel Molkentin from comment #2) > Probaby systemd bug 1051465. Can you check if the patch provided there fixes > the problem for you? Reporter does not have access to this package.
I have also the same problem. Applying the 2 patches from bug 1051465 doesn't solve the problem, so my system is still unbootable. For me this is the perfect definition of a blocking problem.
just documenting the work-around / downgrade: zypper in --oldpackage ` \ zypper info -t patch --conflicts openSUSE-2017-847 | \ grep " < " | while read NAME C VERSION; do \ rpm --quiet -q --queryformat "%{name}\n" $NAME && echo "${NAME}<${VERSION}"; \ done` zypper al -t patch openSUSE-2017-847
In response to comment in bug 1051465 , I don't end up in a dracut shell. Instead, the system is completely stuck. The last messages I see are the following. [ 2.768085] clocksource: Switched to clocksource tsc [ 2.853288] ata4: SATA link down (SStatus 0 SControl 300) [ 3.157443] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 3.159398] ata5.00: ATAPI: TSSTcorp CDDVDW SU-208DB, TF01, max UDMA/100 [ 3.162229] ata5.00: configured for UDMA/100 [ 3.165837] scsi 4:0:0:0: CD-ROM TSSTcorp CDDVDW SU-208DB TF01 PQ: 0 ANSI: 5 [ 3.196244] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) [ 3.196247] sd 0:0:0:0: [sda] 4096-byte physical blocks [ 3.196257] sd 0:0:0:0: [sda] Write Protect is off [ 3.196259] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 3.196274] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 3.199155] sda: sda1 sda2 sda3 [ 3.199416] sd 0:0:0:0: [sda] Attached SCSI disk [ 3.219578] sr 4:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray [ 3.219581] cdrom: Uniform CD-ROM driver Revision: 3.20 Afterwards, nothing happens.
(In reply to François Valenduc from comment #6) > In response to comment in bug 1051465 , I don't end up in a dracut shell. > Instead, the system is completely stuck. The last messages I see are the > following. > > [ 2.768085] clocksource: Switched to clocksource tsc > [ 2.853288] ata4: SATA link down (SStatus 0 SControl 300) > [ 3.157443] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > [ 3.159398] ata5.00: ATAPI: TSSTcorp CDDVDW SU-208DB, TF01, max UDMA/100 > [ 3.162229] ata5.00: configured for UDMA/100 > [ 3.165837] scsi 4:0:0:0: CD-ROM TSSTcorp CDDVDW SU-208DB > TF01 PQ: 0 ANSI: 5 > [ 3.196244] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 > TB/932 GiB) > [ 3.196247] sd 0:0:0:0: [sda] 4096-byte physical blocks > [ 3.196257] sd 0:0:0:0: [sda] Write Protect is off > [ 3.196259] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 > [ 3.196274] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, > doesn't support DPO or FUA > [ 3.199155] sda: sda1 sda2 sda3 > [ 3.199416] sd 0:0:0:0: [sda] Attached SCSI disk > [ 3.219578] sr 4:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer dvd-ram > cd/rw xa/form2 cdda tray > [ 3.219581] cdrom: Uniform CD-ROM driver Revision: 3.20 > > Afterwards, nothing happens. Please reboot, then in grub remove the "quiet" flag from the grub entry you are trying to boot (by pressing 'e', editing the line starting in 'linux' and then hitting Ctrl+x/F10). This should provide more details. Also note that sometimes systemd/dracut time out waiting for a particular device. It might be worth to wait, but not more than say 10 minutes.
Here is what cat /proc/cmdline: BOOT_IMAGE=/boot/vmlinuz-4.12.4 root=UUID=56d42695-8175-4b18-b0e6-1d4891e2f386 r o BOOT_IMAGE=/boot/x86_64/loader/linux ramdisk_size=512000 ramdisk_blocksize=409 6 resume=/dev/system/swap I had already remove the quiet flag, system is just rebooted with the problematic update of udev. Let's see what will happens in 15 minutes...
Reassigning to systemd maintainers meanwhile, I'll stay on CC.
After waiting some time more, I got extra info: A lot of lines with "dracut initqueue starting timeout scripts" Then "could not boot, /dev/disk/by-uuid/...." not found. So it seems these symlinks in /dev/disk/by-uuid" are not created any more, which is the cause of the problem.
The problem also occurs with root=/dev/system/opensuse. Then the error message is sligthly different. It then complains that /dev/system/opensuse does not exist.
(In reply to François Valenduc from comment #10) > After waiting some time more, I got extra info: > > A lot of lines with "dracut initqueue starting timeout scripts" > Then "could not boot, /dev/disk/by-uuid/...." not found. > So it seems these symlinks in /dev/disk/by-uuid" are not created any more, > which is the cause of the problem. Hm,normally you should get a shell when the dracut initqueue times out. Do you get one when you boot with the parameter: rd.shell
I don't get a shell even with rd.shell. After the errors about the disks not found, I get this: "Failed to start dracut-emergency.service: transaction is destructive Not all disks have been found You might want to regenerate your initramfs" But initramfs images were already rebuild after update of udev.
(In reply to François Valenduc from comment #13) > I don't get a shell even with rd.shell. After the errors about the disks not > found, I get this: > > "Failed to start dracut-emergency.service: transaction is destructive > Not all disks have been found > You might want to regenerate your initramfs" > > But initramfs images were already rebuild after update of udev. This is bad, without a shell it will be harder to debug. Can you boot with the parameter: rd.break=initqueue and see whether this gives you a shell? If so, please attach the output of: systemctl status systemd-udevd.service systemctl list-jobs journalctl -axb if not, please boot with the additional options: debug rd.systemd.unit=sysinit.target and check whether you see something like: Started udev Kernel Device Manager if you see an error instead, please attach the error message.
(In reply to Thomas Blume from comment #14) > debug rd.systemd.unit=sysinit.target Sorry, there is a mistake, the command should be: debug rd.systemd.unit=systemd-udevd.service
I got a shell with rd.break=initqueue. As suspected, logical volumes are not present in /dev/mapper, but I can activate them with lvm vgchange -a y. Here are the requested infos: systemctl status systemd-udevd.service gives this: ● system-udevd.service Loaded: not-found (Reason: No such file or directory) Active: inactive (dead) systemctl list-jobs gives this; JOB UNIT TYPE STATE 60 emergency.target start waiting 61 emergency.service start running 2 jobs listed.
Created attachment 735347 [details] rsdosreport file
Created attachment 735349 [details] output of journalctl -axb
Any news on this annoying bug ? Meanwhile, I am forced to use the workaround described in comment #5 to boot my computer.
The problem still occurs with udev 228-32.2. I don't think my system is totally out of the ordinary. It has a SATA drive using the ahci driver and the root partition is on lvm. Nobody as yet an idea on my problem ? Meanwhile, I continue to stick to an older version of udev so that my system boot.
No update, no idea was recorded in this bug.
What do you mean? I have provided all the logs and other requested information. Since then, nothing happens. Is this bug going to stay forever?
Yes you provided the requested information. Nevertheless nobody has solved the bug yet, which is why there was the update in the bug. Yes, some bugs never get solved. If there is an update in the bug, it will be noted here.
Reporter or Fracois, please test with udev-228-32.2.x86_64 and the latest systemd updates.
As I said in comment #20, the problem still occurs with the latest updates of systemd and udev.
Hi, I have the same issue on my system with an NVMe drive, and for me at least the cause is due to the recent changes made in /usr/lib/udev/rules.d/60-persistent-storage.rules which gets packed into the initrd. Specifically, I am comparing udev-228-27.2, which worked fine, with udev-228-32.2, which causes by system not to boot in the same way as described in comment #10 In my case, the nvme entries in the "SCSI devices" have disappeared completely udev-228-27.2: KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}!="?*", IMPORT{program}="scsi_id --export --whitelisted -d $tempnode", ENV{ID_BUS}="nvme" KERNEL=="sd*|sr*|cciss*|nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}" KERNEL=="sd*|cciss*|nvme*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n" udev-228-32.2: KERNEL=="sd*|sr*|cciss*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}" KERNEL=="sd*|cciss*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n" For me, manually adding "nvme*" back into the above lines was a workaround, but of course it needs to be fixed properly so I don't have problems on the next update. For others, who maybe do not have NVMe, please note the following other lines which are missing in 228-32.2: " # scsi compat links for ATA devices KERNEL=="sd*[!0-9]", ENV{ID_BUS}=="ata", PROGRAM="scsi_id --whitelisted --replace-whitespace -p0x80 -d $devnode", RESULT=="?*", ENV{ID_SCSI_COMPAT}="$result", SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT}" KERNEL=="sd*[0-9]", ENV{ID_SCSI_COMPAT}=="?*", SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT}-part%n" # scsi compat links for ATA devices (for compatibility with udev < 184) KERNEL=="sd*[!0-9]", ENV{ID_BUS}=="ata", PROGRAM="scsi_id --truncated-serial --whitelisted --replace-whitespace -p0x80 -d$tempnode", RESULT=="?*", ENV{ID_SCSI_COMPAT_TRUNCATED}="$result", SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT_TRUNCATED}" KERNEL=="sd*[0-9]", ENV{ID_SCSI_COMPAT_TRUNCATED}=="?*", SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT_TRUNCATED}-part%n" # by-path (parent device path, compat version, only for ATA/NVMe/SAS bus) ENV{DEVTYPE}=="disk", ENV{ID_BUS}=="ata|nvme|scsi", DEVPATH!="*/virtual/*", IMPORT{program}="path_id_compat %p" ENV{DEVTYPE}=="disk", ENV{ID_PATH_COMPAT}=="?*", SYMLINK+="disk/by-path/$env{ID_PATH_COMPAT}" ENV{DEVTYPE}=="partition", ENV{ID_PATH_COMPAT}=="?*", SYMLINK+="disk/by-path/$env{ID_PATH_COMPAT}-part%n" " I cannot say if adding these lines may fix other non-NVMe systems, but worth a try. Suggest you add those lines to /usr/lib/udev/rules.d/60-persistent-storage.rules and report your findings back to bugzilla @SUSE colleagues: Can you please check my analysis and hopefully fix. Thanks Lee
Adding these lines doesn't change anything. In fact, the disk and the partition are detected. The problem is rather that the LVM volumes are not found.
I suffer from the same problem. :(
(In reply to François Valenduc from comment #27) > Adding these lines doesn't change anything. In fact, the disk and the > partition are detected. The problem is rather that the LVM volumes are not > found. In my case the difference between working and bad initrd is: diff -Nur good/usr/lib/udev/rules.d/61-persistent-storage-compat.rules bad/usr/lib/udev/rules.d/61-persistent-storage-compat.rules --- good/usr/lib/udev/rules.d/61-persistent-storage-compat.rules 2017-08-20 20:58:53.723996905 +0200 +++ bad/usr/lib/udev/rules.d/61-persistent-storage-compat.rules 2017-08-20 20:59:04.775996863 +0200 @@ -37,10 +37,11 @@ # NVMe links were introduced first via a SUSE specific commit # (bsc#944132) and upstream gained support later but of course using a -# different scheme. -KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}!="?*", IMPORT{program}="scsi_id --export --whitelisted -d $tempnode", ENV{ID_BUS}="nvme" -KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}" -KERNEL=="nvme*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n" +# different scheme. Also note that ID_SERIAL is already used by the +# contemporary rules, see bsc#1048679 for details. +KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_NVME_SERIAL_COMPAT}!="?*", PROGRAM="scsi_id --whitelisted --replace-whitespace -d $devnode", RESULT=="?*", ENV{ID_NVME_SERIAL_COMPAT}="$result" +KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_NVME_SERIAL_COMPAT}=="?*", SYMLINK+="disk/by-id/nvme-$env{ID_NVME_SERIAL_COMPAT}" +KERNEL=="nvme*", ENV{DEVTYPE}=="partition", ENV{ID_NVME_SERIAL_COMPAT}=="?*", SYMLINK+="disk/by-id/nvme-$env{ID_NVME_SERIAL_COMPAT}-part%n" # SCSI compat links for ATA devices, removed by f6ba1a468cea (boo#769002) KERNEL=="sd*[!0-9]", ENV{ID_BUS}=="ata", PROGRAM="scsi_id --whitelisted --replace-whitespace -p0x80 -d $devnode", RESULT=="?*", ENV{ID_SCSI_COMPAT}="$result", SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT}" So, Lee might be correct.
(In reply to Lee Martin from comment #26) Following the proposal in comment #14, I reverted to the udev-228-32.2, rebuilt and initrd and dropped into dracut at boot, so I did some checking. dracut: # ls -l /dev/disk/by-id/ lrwxrwxrwx 1 root 0 15 Aug 20 23:57 -20025385b61502108-part1 -> ../../nvme0n1p1 lrwxrwxrwx 1 root 0 15 Aug 20 23:57 -20025385b61502108-part2 -> ../../nvme0n1p2 lrwxrwxrwx 1 root 0 15 Aug 20 23:57 -20025385b61502108-part3 -> ../../nvme0n1p3 lrwxrwxrwx 1 root 0 13 Aug 20 23:57 nvme-20025385b61502108 -> ../../nvme0n1 lrwxrwxrwx 1 root 0 13 Aug 20 23:57 nvme-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A -> ../../nvme0n1 lrwxrwxrwx 1 root 0 15 Aug 20 23:57 nvme-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part1 -> ../../nvme0n1p1 lrwxrwxrwx 1 root 0 15 Aug 20 23:57 nvme-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part2 -> ../../nvme0n1p2 lrwxrwxrwx 1 root 0 15 Aug 20 23:57 nvme-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 -> ../../nvme0n1p3 lrwxrwxrwx 1 root 0 13 Aug 20 23:57 nvme-eui.0025385b61502108 -> ../../nvme0n1 lrwxrwxrwx 1 root 0 15 Aug 20 23:57 nvme-eui.0025385b61502108-part1 -> ../../nvme0n1p1 lrwxrwxrwx 1 root 0 15 Aug 20 23:57 nvme-eui.0025385b61502108-part2 -> ../../nvme0n1p2 lrwxrwxrwx 1 root 0 15 Aug 20 23:57 nvme-eui.0025385b61502108-part3 -> ../../nvme0n1p3 My LVM partition is LUKS encrypted, and after upgrading to udev-228-32.2 I never got the LUKS password entry, but instead the error described in this bug. I only installed my 42.3 a few days ago using the initial udev-228-27.2 from the ISO, which created the following /etc/crypttab: # cat /etc/crypttab cr_-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 /dev/disk/by-id/-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 none none Now, if I compare that crypttab to the disk id's I see in dracut, I notice that the device name is missing completely, well, more exactly, all the NVMe devices now have an "nvme" prefix, whereas during install with udev-228-27.2 they apparently did not have the "nvme" prefix. So, now with a corrected crypttab and the standard udev-228-32.2 I'm fine. HOWEVER this indicates to me that this Leap 42.3 update around udev is changing many device names at boot versus the ISO installation, so anything dependent on specific device names (like LUKS) is at risk of not working after this update. Therefore a fix of somekind is necessary since imagine LUKs setups to be relatively common on the desktop. Regarding LVM on a non-encrypted system, like Francois, I wondered if LVM maybe has a fixed list of device names somewhere, and maybe the device name changes in dracut are causing LVM some grief? For LVM, I came across /etc/lvm/archive/*.vg which lists the LVM configuration, and guess what, there is a physical_volumes section which contains specific device names. Francois, maybe you want to check/list the device names you see in dracut with different versions of udev, and then compare them with the LVM configuration files I mention above. Since my LVM sites on top of LUKS, the physical volume is the same, but assuming your device names at boot in dracut have now changed, then that might explain why your VG is not activating automatically? Richard, maybe you can also check your LVM config and dracut device names and give feedback. Hope that helps. Lee
(In reply to Richard Weinberger from comment #29) > (In reply to François Valenduc from comment #27) > > Adding these lines doesn't change anything. In fact, the disk and the > > partition are detected. The problem is rather that the LVM volumes are not > > found. > > In my case the difference between working and bad initrd is: > diff -Nur good/usr/lib/udev/rules.d/61-persistent-storage-compat.rules > bad/usr/lib/udev/rules.d/61-persistent-storage-compat.rules > --- good/usr/lib/udev/rules.d/61-persistent-storage-compat.rules > 2017-08-20 20:58:53.723996905 +0200 > +++ bad/usr/lib/udev/rules.d/61-persistent-storage-compat.rules 2017-08-20 > 20:59:04.775996863 +0200 > @@ -37,10 +37,11 @@ > > # NVMe links were introduced first via a SUSE specific commit > # (bsc#944132) and upstream gained support later but of course using a > -# different scheme. > -KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}!="?*", > IMPORT{program}="scsi_id --export --whitelisted -d $tempnode", > ENV{ID_BUS}="nvme" > -KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="?*", > SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}" > -KERNEL=="nvme*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", > SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n" > +# different scheme. Also note that ID_SERIAL is already used by the > +# contemporary rules, see bsc#1048679 for details. > +KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_NVME_SERIAL_COMPAT}!="?*", > PROGRAM="scsi_id --whitelisted --replace-whitespace -d $devnode", > RESULT=="?*", ENV{ID_NVME_SERIAL_COMPAT}="$result" > +KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_NVME_SERIAL_COMPAT}=="?*", > SYMLINK+="disk/by-id/nvme-$env{ID_NVME_SERIAL_COMPAT}" > +KERNEL=="nvme*", ENV{DEVTYPE}=="partition", > ENV{ID_NVME_SERIAL_COMPAT}=="?*", > SYMLINK+="disk/by-id/nvme-$env{ID_NVME_SERIAL_COMPAT}-part%n" > > # SCSI compat links for ATA devices, removed by f6ba1a468cea (boo#769002) > KERNEL=="sd*[!0-9]", ENV{ID_BUS}=="ata", PROGRAM="scsi_id --whitelisted > --replace-whitespace -p0x80 -d $devnode", RESULT=="?*", > ENV{ID_SCSI_COMPAT}="$result", > SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT}" > > > So, Lee might be correct. Sorry for the delay, I'm back from vacation and will continue processing. Can you please test whether the packages from bug 1051465 comment#9 fix it? Please note that the fix is only for nvme disks.
(In reply to Lee Martin from comment #30) > I only installed my 42.3 a few days ago using the initial udev-228-27.2 from > the ISO, which created the following /etc/crypttab: > > # cat /etc/crypttab > cr_-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 > /dev/disk/by-id/-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 none > none That is actually bug 1048679. > So, now with a corrected crypttab and the standard udev-228-32.2 I'm fine. > > HOWEVER this indicates to me that this Leap 42.3 update around udev is > changing many device names at boot versus the ISO installation, so anything > dependent on specific device names (like LUKS) is at risk of not working > after this update. Therefore a fix of somekind is necessary since imagine > LUKs setups to be relatively common on the desktop. Hm, the only fix I can imagine with an update is to re-add the broken symlinks. The consequence would be to carry something that is broken for a quite long time. I'd prefer if we would document this behaviour and provide an Driver Update for the 42.3 installation system instead. Steffen, would this be feasible?
(In reply to Thomas Blume from comment #31) > > > > So, Lee might be correct. > > Sorry for the delay, I'm back from vacation and will continue processing. > > Can you please test whether the packages from bug 1051465 comment#9 fix it? > Please note that the fix is only for nvme disks. Sure, since I have a NVMe disk they might help. So, installing these packages followed by a reboot should work? Or is there some other action needed?
(In reply to Richard Weinberger from comment #33) > (In reply to Thomas Blume from comment #31) > > > > > > So, Lee might be correct. > > > > Sorry for the delay, I'm back from vacation and will continue processing. > > > > Can you please test whether the packages from bug 1051465 comment#9 fix it? > > Please note that the fix is only for nvme disks. > > Sure, since I have a NVMe disk they might help. > So, installing these packages followed by a reboot should work? > Or is there some other action needed? It won't help if your system is using the broken symlinks from the installation system somewhere. Hence, please check /etc/fstab, /etc/crypttab and the filter setting in /etc/lvm/lvm.conf that there are no device names with a dash as first charakter. For example, if you have something like Lee in /etc/crypttab: cr_-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 /dev/disk/by-id/-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 none you will need to add 'nvme' before the dash so that it looks like: cr_nvme-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 /dev/disk/by-id/nvme-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 none After changing any of the above files, please run mkinitrd.
Created attachment 737849 [details] output of ls -laR /dev/disk with a bad initrd In response to comment #30, here is the output of ls -laR /dev/disk with a bad initrd obtained with rd.break=initqueue.
Created attachment 737850 [details] output of ls -laR /dev/disk with a good initrd Here is the output of ls -laR /dev/disk with a good initrd after the system has booted. If somebody can explains me how to get a dracut shell, because with rd.shell, the boot process doesn't end up in a shell.
(In reply to François Valenduc from comment #36) > Created attachment 737850 [details] > output of ls -laR /dev/disk with a good initrd > > Here is the output of ls -laR /dev/disk with a good initrd after the system > has booted. If somebody can explains me how to get a dracut shell, because > with rd.shell, the boot process doesn't end up in a shell. The problem is that dracut sees multiple root= and resume= boot parameters: --> Aug 04 20:09:14 pc-francois dracut-cmdline[704]: Using kernel command line parameters: rd.lvm.lv=system/swap rd.lvm.lv=system/opensuse resume=/dev/mapper/system-swap resume=/dev/mapper/syste m-swap root=/dev/mapper/system-opensuse rootfstype=ext4 rootflags=rw,noatime,data=ordered BOOT_IMAGE=/boot/vmlinuz-4.12.4 root=UUID=56d42695-8175-4b18-b0e6-1d4891e2f386 ro BOOT_IMAGE=/boot/x 86_64/loader/linux ramdisk_size=512000 ramdisk_blocksize=4096 resume=/dev/system/swap rd.break=initqueue --< Please reboot your machine and when you see the bootloader screen go into the grub2 editor. From there, remove the entries: root=UUID=56d42695-8175-4b18-b0e6-1d4891e2f386 ro resume=/dev/system/swap and then boot your machine. Does it come up? If so, please edit /etc/default/grub and remove the problematic entries above. Afterwards run: grub2-mkconfig -o /boot/grub2/grub.cfg in order to update your boot configuration.
So, I don't need to indicate the root device ? And resume is not more allowed ?
The system still doesn't boot whitout root and resume parameters. What I also find strange is that if I run lvm vgchange -a y in the shell I get with rd.break=initqueue, all the LVM volumes are found. So why aren't they detected in a normal boot ?
(In reply to François Valenduc from comment #38) > So, I don't need to indicate the root device ? Yes, dracut is able to automatically determine the root device and will write the settings into: etc/cmdline.d/95root-dev.conf in the initrd. > And resume is not more allowed ? Sure it is, but the resume parameter should only be given once.
Maybe it was a bad copy paste from me, but the resume and root parameters where given only once. And the problem continues.
(In reply to François Valenduc from comment #39) > The system still doesn't boot whitout root and resume parameters. What I > also find strange is that if I run lvm vgchange -a y in the shell I get with > rd.break=initqueue, all the LVM volumes are found. So why aren't they > detected in a normal boot ? lvm activation is done in the initqueue. If you break before dracut-initqueue is finished, it is normal that lvm is not active. You can try: rd.break=pre-mount instead, which hopefully gives you a dracut shell too. If so, please provide a new rdsosreport.
unfortunately, there is a no shell with rd.break=pre-mount
(In reply to François Valenduc from comment #43) > unfortunately, there is a no shell with rd.break=pre-mount Then, I guess the only chance to get more info is to start with the additional boot parameter: rd.debug and to capture the boot log (e.g. via serial console). Can you please try to do so and attach the log?
How can I use a serial console ? My computer is way to recent to have a serial port... With rd.debug, I can see that it repeatedly tries to find the root partition in the initqueue, without finding it. Is there a git tree of udev of systemd in opensuse ? Then I could use git bisect to try to find the problematic change.
(In reply to François Valenduc from comment #45) > How can I use a serial console ? My computer is way to recent to have a > serial port... > With rd.debug, I can see that it repeatedly tries to find the root partition > in the initqueue, without finding it. And is the pv (sda3) already present then? Does it look for the logical volume name or the UUID or both? Can you see any hint that the logical volumes get activated? > Is there a git tree of udev of systemd in opensuse ? Then I could use git > bisect to try to find the problematic change. I don't think this is a bug in systemd. It rather looks like a setup problem in dracut. But sure, you are welcome to look into the code. You can find the git at: https://github.com/openSUSE/systemd.git udev is part of the systemd sources.
the lvm volumes is on sda3 which is detected. It repeatedly tries to find /dev/mapper/system-opensuse, but in the end, it complains that it doesn't find /dev/system/opensuse
The problem is indeed dracut and not udev or systemd. If I revert to the older packages like explained in comment #5 and if I the update everythinhg (thus systemd and udev) except dracut, it works without problem.
I was a bit too fast. If I lock dracut, udev stays at the older version too.
(In reply to François Valenduc from comment #49) > I was a bit too fast. If I lock dracut, udev stays at the older version too. Can you please attach the output of: lsinitrd -f etc/udev/rules.d/64-lvm.rules /boot/$YOUR_INITRD where $YOUR_INITRD is the initrd that fails to boot?
Here are the requested info: # hacky rules to try to activate lvm when we get new block devs... # # Copyright 2008, Red Hat, Inc. # Jeremy Katz <katzj@redhat.com> SUBSYSTEM!="block", GOTO="lvm_end" ACTION!="add|change", GOTO="lvm_end" # Also don't process disks that are slated to be a multipath device ENV{DM_MULTIPATH_DEVICE_PATH}=="?*", GOTO="lvm_end" KERNEL=="dm-[0-9]*", ACTION=="add", GOTO="lvm_end" ENV{ID_FS_TYPE}!="LVM?_member", GOTO="lvm_end" PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/dm-[0-9]*; do [ -e $$i ] && exit 0; done; exit 1;' ", \ GOTO="lvm_end" RUN+="/sbin/initqueue --settled --onetime --unique /sbin/lvm_scan" RUN+="/sbin/initqueue --timeout --name 51-lvm_scan --onetime --unique /sbin/lvm_scan --partial" RUN+="/bin/sh -c '>/tmp/.lvm_scan-%k;'" LABEL="lvm_end"
Here is the output with a working initrd. To me, it is exactly the same; # hacky rules to try to activate lvm when we get new block devs... # # Copyright 2008, Red Hat, Inc. # Jeremy Katz <katzj@redhat.com> SUBSYSTEM!="block", GOTO="lvm_end" ACTION!="add|change", GOTO="lvm_end" # Also don't process disks that are slated to be a multipath device ENV{DM_MULTIPATH_DEVICE_PATH}=="?*", GOTO="lvm_end" KERNEL=="dm-[0-9]*", ACTION=="add", GOTO="lvm_end" ENV{ID_FS_TYPE}!="LVM?_member", GOTO="lvm_end" PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/dm-[0-9]*; do [ -e $$i ] && exit 0; done; exit 1;' ", \ GOTO="lvm_end" RUN+="/sbin/initqueue --settled --onetime --unique /sbin/lvm_scan" RUN+="/sbin/initqueue --timeout --name 51-lvm_scan --onetime --unique /sbin/lvm_scan --partial" RUN+="/bin/sh -c '>/tmp/.lvm_scan-%k;'" LABEL="lvm_end"
(In reply to François Valenduc from comment #52) > Here is the output with a working initrd. To me, it is exactly the same; > > # hacky rules to try to activate lvm when we get new block devs... > # > # Copyright 2008, Red Hat, Inc. > # Jeremy Katz <katzj@redhat.com> > > > SUBSYSTEM!="block", GOTO="lvm_end" > ACTION!="add|change", GOTO="lvm_end" > # Also don't process disks that are slated to be a multipath device > ENV{DM_MULTIPATH_DEVICE_PATH}=="?*", GOTO="lvm_end" > KERNEL=="dm-[0-9]*", ACTION=="add", GOTO="lvm_end" > ENV{ID_FS_TYPE}!="LVM?_member", GOTO="lvm_end" > > PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/dm-[0-9]*; do [ -e $$i > ] && exit 0; done; exit 1;' ", \ > GOTO="lvm_end" > > RUN+="/sbin/initqueue --settled --onetime --unique /sbin/lvm_scan" > RUN+="/sbin/initqueue --timeout --name 51-lvm_scan --onetime --unique > /sbin/lvm_scan --partial" > RUN+="/bin/sh -c '>/tmp/.lvm_scan-%k;'" > > LABEL="lvm_end" Ok, so the udev rule for activating the lvm device is there. Still rdsosreport shows that it doesn't get activated: --> + lvm vgdisplay --- Volume group --- VG Name system System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 2933 VG Access read/write VG Status resizable MAX LV 0 Cur LV 15 Open LV 0 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ --< Maybe there is an error when the rule is executed. Please go again to the dracut shell and run: udevadm test /block/sda and attach the output. The output of: udevadm info -e would also be helpful.
Created attachment 738360 [details] output of udevadm test /block/sda
Created attachment 738361 [details] output of udevadm info -e
(In reply to François Valenduc from comment #55) > Created attachment 738361 [details] > output of udevadm info -e The pv of your lvm device is misidentified by udev: --> P: /devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda3 N: sda3 [...] E: ID_FS_TYPE=iso9660 [...] E: ID_FS_VERSION=Joliet Extension E: ID_MODEL=ST1000LM014-1EJ1 --< Normally for an lvm pv, it should look like this: --> E: ID_FS_TYPE=LVM2_member [...] E: ID_FS_VERSION=LVM2 001 E: ID_MODEL=LVM PV GplkpP-Ovcs-w2SQ-H31f-hOFe-ztuO-SlLCGH on /dev/sda2 --< This is similar to bug 1046268. Can you please test whether the workaround from bug 1046268 comment#29 fixes it?
Indeed commenting ENV{DEVTYPE}=="partition", IMPORT{parent}="ID_*" in /usr/lib/udev/rules.d/61-persistent-storage-compat.rules and regenerating the initramfs solves the problem.
The correction of the rules for NMVe devices are processed in bug 1051465. Closing this one as duplicate. *** This bug has been marked as a duplicate of bug 1051465 ***
In my case, this problem has nothing to do with rules for NVMe devices, I have a SATA disk.
(In reply to François Valenduc from comment #59) > In my case, this problem has nothing to do with rules for NVMe devices, I > have a SATA disk. Yes, your case is not covered within this bug. Yours is a duplicate of bug 1046268, see comment#56. However, I've referenced the wrong duplicate for the NVMe issue, sorry. The right one is: 1048679 *** This bug has been marked as a duplicate of bug 1048679 ***
The problem is indeed solved with the latest version of udev and systemd (228-35.1) available in opensuse.