Bug 1051354

Summary: "zypper up" after installing 42.3 causes boot to fail (dracut-initqueue timeout due to missing LVM devices)
Product: [openSUSE] openSUSE Distribution Reporter: Forgotten User IZlMt4-xuB <forgotten_IZlMt4-xuB>
Component: BasesystemAssignee: systemd maintainers <systemd-maintainers>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P3 - Medium CC: astieger, daniel, forgotten_IZlMt4-xuB, kresten, lee.martin, richard, snwint, systemd-maintainers, thomas.blume
Version: Leap 42.3   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 42.3   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: rsdosreport file
output of journalctl -axb
output of ls -laR /dev/disk with a bad initrd
output of ls -laR /dev/disk with a good initrd
output of udevadm test /block/sda
output of udevadm info -e

Description Forgotten User IZlMt4-xuB 2017-07-30 05:32:51 UTC
After installing Leap 42.3 I ran "zypper up" and got the following updates (there are a few others, but aren't relevant):

The following 10 packages are going to be upgraded:
  dracut                   044-21.7 -> 044.1-23.2
  libsystemd0              228-27.2 -> 228-29.1
  libsystemd0-32bit        228-27.2 -> 228-29.1
  libudev1                 228-27.2 -> 228-29.1
  systemd                  228-27.2 -> 228-29.1
  systemd-32bit            228-27.2 -> 228-29.1
  systemd-bash-completion  228-27.2 -> 228-29.1
  systemd-logger           228-27.2 -> 228-29.1
  systemd-sysvinit         228-27.2 -> 228-29.1
  udev                     228-27.2 -> 228-29.1

There's something in there that causes the next boot to fail due to dracut-initqueue not being able to find /dev/mapper/system-root. Looking at /dev/mapper the only file in there is 'control'. 
Running "lvm_scan" causes the volumes to be found and /sys-root to be mounted.

The install is on a single 4TB SATA disk with 3 LVM volumes: system-root (Btrfs), system-swap and system-home (xfs).
Comment 1 Andreas Stieger 2017-07-30 12:35:49 UTC
Trying dracut maintainers, cc'ing systemd maintainers.
Comment 2 Daniel Molkentin 2017-08-01 11:17:29 UTC
Probaby systemd bug 1051465. Can you check if the patch provided there fixes the problem for you?
Comment 3 Andreas Stieger 2017-08-01 11:27:40 UTC
(In reply to Daniel Molkentin from comment #2)
> Probaby systemd bug 1051465. Can you check if the patch provided there fixes
> the problem for you?

Reporter does not have access to this package.
Comment 4 Forgotten User lX4JxJ-D8z 2017-08-01 19:14:13 UTC
I have also the same problem. Applying the 2 patches from bug 1051465 doesn't solve the problem, so my system is still unbootable. For me this is the perfect definition of a blocking problem.
Comment 5 Andreas Stieger 2017-08-01 22:29:51 UTC
just documenting the work-around / downgrade:

zypper in --oldpackage ` \
zypper info -t patch --conflicts openSUSE-2017-847 | \
grep " < " | while read NAME C VERSION; do \
rpm --quiet -q --queryformat "%{name}\n" $NAME && echo "${NAME}<${VERSION}"; \
done`

zypper al -t patch openSUSE-2017-847
Comment 6 Forgotten User DjcOQHkgwr 2017-08-03 11:46:07 UTC
In response to comment in bug 1051465 , I don't end up in a dracut shell. Instead, the system is completely stuck. The last messages I see are the following.

[    2.768085] clocksource: Switched to clocksource tsc
[    2.853288] ata4: SATA link down (SStatus 0 SControl 300)
[    3.157443] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    3.159398] ata5.00: ATAPI: TSSTcorp CDDVDW SU-208DB, TF01, max UDMA/100
[    3.162229] ata5.00: configured for UDMA/100
[    3.165837] scsi 4:0:0:0: CD-ROM            TSSTcorp CDDVDW SU-208DB  TF01 PQ: 0 ANSI: 5
[    3.196244] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
[    3.196247] sd 0:0:0:0: [sda] 4096-byte physical blocks
[    3.196257] sd 0:0:0:0: [sda] Write Protect is off
[    3.196259] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    3.196274] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    3.199155]  sda: sda1 sda2 sda3
[    3.199416] sd 0:0:0:0: [sda] Attached SCSI disk
[    3.219578] sr 4:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
[    3.219581] cdrom: Uniform CD-ROM driver Revision: 3.20

Afterwards, nothing happens.
Comment 7 Daniel Molkentin 2017-08-03 13:23:09 UTC
(In reply to François Valenduc from comment #6)
> In response to comment in bug 1051465 , I don't end up in a dracut shell.
> Instead, the system is completely stuck. The last messages I see are the
> following.
> 
> [    2.768085] clocksource: Switched to clocksource tsc
> [    2.853288] ata4: SATA link down (SStatus 0 SControl 300)
> [    3.157443] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [    3.159398] ata5.00: ATAPI: TSSTcorp CDDVDW SU-208DB, TF01, max UDMA/100
> [    3.162229] ata5.00: configured for UDMA/100
> [    3.165837] scsi 4:0:0:0: CD-ROM            TSSTcorp CDDVDW SU-208DB 
> TF01 PQ: 0 ANSI: 5
> [    3.196244] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00
> TB/932 GiB)
> [    3.196247] sd 0:0:0:0: [sda] 4096-byte physical blocks
> [    3.196257] sd 0:0:0:0: [sda] Write Protect is off
> [    3.196259] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [    3.196274] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
> [    3.199155]  sda: sda1 sda2 sda3
> [    3.199416] sd 0:0:0:0: [sda] Attached SCSI disk
> [    3.219578] sr 4:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer dvd-ram
> cd/rw xa/form2 cdda tray
> [    3.219581] cdrom: Uniform CD-ROM driver Revision: 3.20
> 
> Afterwards, nothing happens.

Please reboot, then in grub remove the "quiet" flag from the grub entry you are trying to boot (by pressing 'e', editing the line starting in 'linux' and then hitting Ctrl+x/F10). This should provide more details.

Also note that sometimes systemd/dracut time out waiting for a particular device. It might be worth to wait, but not more than say 10 minutes.
Comment 8 Forgotten User DjcOQHkgwr 2017-08-03 13:27:26 UTC
Here is what cat /proc/cmdline:

BOOT_IMAGE=/boot/vmlinuz-4.12.4 root=UUID=56d42695-8175-4b18-b0e6-1d4891e2f386 r                                                                                                                                                             o BOOT_IMAGE=/boot/x86_64/loader/linux ramdisk_size=512000 ramdisk_blocksize=409                                                                                                                                                             6 resume=/dev/system/swap

I had already remove the quiet flag, system is just rebooted with the problematic update of udev. Let's see what will happens in 15 minutes...
Comment 9 Daniel Molkentin 2017-08-03 13:29:46 UTC
Reassigning to systemd maintainers meanwhile, I'll stay on CC.
Comment 10 Forgotten User DjcOQHkgwr 2017-08-03 13:39:11 UTC
After waiting some time more, I got extra info:

A lot of lines with "dracut initqueue starting timeout scripts"
Then "could not boot, /dev/disk/by-uuid/...." not found.
So it seems these symlinks in /dev/disk/by-uuid" are not created any more, which is the cause of the problem.
Comment 11 Forgotten User DjcOQHkgwr 2017-08-03 13:45:31 UTC
The problem also occurs with root=/dev/system/opensuse. Then the error message is sligthly different. It then complains that /dev/system/opensuse does not exist.
Comment 12 Thomas Blume 2017-08-03 14:08:08 UTC
(In reply to François Valenduc from comment #10)
> After waiting some time more, I got extra info:
> 
> A lot of lines with "dracut initqueue starting timeout scripts"
> Then "could not boot, /dev/disk/by-uuid/...." not found.
> So it seems these symlinks in /dev/disk/by-uuid" are not created any more,
> which is the cause of the problem.

Hm,normally you should get a shell when the dracut initqueue times out.
Do you get one when you boot with the parameter:

rd.shell
Comment 13 Forgotten User DjcOQHkgwr 2017-08-03 14:50:44 UTC
I don't get a shell even with rd.shell. After the errors about the disks not found, I get this:

"Failed to start dracut-emergency.service: transaction is destructive
Not all disks have been found
You might want to regenerate your initramfs"

But initramfs images were already rebuild after update of udev.
Comment 14 Thomas Blume 2017-08-04 08:16:51 UTC
(In reply to François Valenduc from comment #13)
> I don't get a shell even with rd.shell. After the errors about the disks not
> found, I get this:
> 
> "Failed to start dracut-emergency.service: transaction is destructive
> Not all disks have been found
> You might want to regenerate your initramfs"
> 
> But initramfs images were already rebuild after update of udev.

This is bad, without a shell it will be harder to debug.
Can you boot with the parameter:

rd.break=initqueue

and see whether this gives you a shell?
If so, please attach the output of:

systemctl status systemd-udevd.service
systemctl list-jobs
journalctl -axb

if not, please boot with the additional options:

debug rd.systemd.unit=sysinit.target

and check whether you see something like:

Started udev Kernel Device Manager

if you see an error instead, please attach the error message.
Comment 15 Thomas Blume 2017-08-04 08:38:50 UTC
(In reply to Thomas Blume from comment #14)
 
> debug rd.systemd.unit=sysinit.target

Sorry, there is a mistake, the command should be:

debug rd.systemd.unit=systemd-udevd.service
Comment 16 Forgotten User lX4JxJ-D8z 2017-08-04 18:01:21 UTC
I got a shell with rd.break=initqueue. As suspected, logical volumes are not present in /dev/mapper, but I can activate them with lvm vgchange -a y. Here are the requested infos:

systemctl status systemd-udevd.service gives this:

● system-udevd.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

systemctl list-jobs gives this;

JOB UNIT              TYPE  STATE  
 60 emergency.target  start waiting
 61 emergency.service start running

2 jobs listed.
Comment 17 Forgotten User lX4JxJ-D8z 2017-08-04 18:02:42 UTC
Created attachment 735347 [details]
rsdosreport file
Comment 18 Forgotten User lX4JxJ-D8z 2017-08-04 18:16:23 UTC
Created attachment 735349 [details]
output of journalctl -axb
Comment 19 Forgotten User lX4JxJ-D8z 2017-08-15 09:31:26 UTC
Any news on this annoying bug ? Meanwhile, I am forced to use the workaround described in comment #5 to boot my computer.
Comment 20 Forgotten User lX4JxJ-D8z 2017-08-17 19:00:46 UTC
The problem still occurs with udev 228-32.2. I don't think my system is totally out of the ordinary. It has a SATA drive using the ahci driver and the root partition is on lvm. Nobody as yet an idea on my problem ?
Meanwhile, I continue to stick to an older version of udev so that my system boot.
Comment 21 Andreas Stieger 2017-08-17 21:15:04 UTC
No update, no idea was recorded in this bug.
Comment 22 Forgotten User lX4JxJ-D8z 2017-08-17 21:20:53 UTC
What do you mean? I have provided all the logs and other requested information. Since then, nothing happens.
Is this bug going to stay forever?
Comment 23 Andreas Stieger 2017-08-17 21:33:17 UTC
Yes you provided the requested information. Nevertheless nobody has solved the bug yet, which is why there was the update in the bug. Yes, some bugs never get solved. If there is an update in the bug, it will be noted here.
Comment 24 Andreas Stieger 2017-08-17 21:34:29 UTC
Reporter or Fracois, please test with udev-228-32.2.x86_64 and the latest systemd updates.
Comment 25 Forgotten User lX4JxJ-D8z 2017-08-17 21:39:23 UTC
As I said in comment #20, the problem still occurs with the latest updates of systemd and udev.
Comment 26 Lee Martin 2017-08-19 23:52:34 UTC
Hi,

I have the same issue on my system with an NVMe drive, and for me at least the cause is due to the recent changes made in  /usr/lib/udev/rules.d/60-persistent-storage.rules which gets packed into the initrd.

Specifically, I am comparing udev-228-27.2, which worked fine, with udev-228-32.2, which causes by system not to boot in the same way as described in comment #10

In my case, the nvme entries in the "SCSI devices" have disappeared completely

udev-228-27.2:
KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}!="?*", IMPORT{program}="scsi_id --export --whitelisted -d $tempnode", ENV{ID_BUS}="nvme"
KERNEL=="sd*|sr*|cciss*|nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}"
KERNEL=="sd*|cciss*|nvme*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n"

udev-228-32.2:
KERNEL=="sd*|sr*|cciss*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}"
KERNEL=="sd*|cciss*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n"


For me, manually adding "nvme*" back into the above lines was a workaround, but of course it needs to be fixed properly so I don't have problems on the next update.


For others, who maybe do not have NVMe, please note the following other lines which are missing in 228-32.2:

"
# scsi compat links for ATA devices
KERNEL=="sd*[!0-9]", ENV{ID_BUS}=="ata", PROGRAM="scsi_id --whitelisted --replace-whitespace -p0x80 -d $devnode", RESULT=="?*", ENV{ID_SCSI_COMPAT}="$result", SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT}"
KERNEL=="sd*[0-9]", ENV{ID_SCSI_COMPAT}=="?*", SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT}-part%n"
 
# scsi compat links for ATA devices (for compatibility with udev < 184)
KERNEL=="sd*[!0-9]", ENV{ID_BUS}=="ata", PROGRAM="scsi_id --truncated-serial --whitelisted --replace-whitespace -p0x80 -d$tempnode", RESULT=="?*", ENV{ID_SCSI_COMPAT_TRUNCATED}="$result", SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT_TRUNCATED}"
KERNEL=="sd*[0-9]", ENV{ID_SCSI_COMPAT_TRUNCATED}=="?*", SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT_TRUNCATED}-part%n"
# by-path (parent device path, compat version, only for ATA/NVMe/SAS bus)
ENV{DEVTYPE}=="disk", ENV{ID_BUS}=="ata|nvme|scsi", DEVPATH!="*/virtual/*", IMPORT{program}="path_id_compat %p"
ENV{DEVTYPE}=="disk", ENV{ID_PATH_COMPAT}=="?*", SYMLINK+="disk/by-path/$env{ID_PATH_COMPAT}"
ENV{DEVTYPE}=="partition", ENV{ID_PATH_COMPAT}=="?*", SYMLINK+="disk/by-path/$env{ID_PATH_COMPAT}-part%n"
"

I cannot say if adding these lines may fix other non-NVMe systems, but worth a try. Suggest you add those lines to /usr/lib/udev/rules.d/60-persistent-storage.rules and report your findings back to bugzilla


@SUSE colleagues: Can you please check my analysis and hopefully fix.

Thanks
Lee
Comment 27 Forgotten User lX4JxJ-D8z 2017-08-20 05:51:47 UTC
Adding these lines doesn't change anything. In fact, the disk and the partition are detected. The problem is rather that the LVM volumes are not found.
Comment 28 Richard Weinberger 2017-08-20 18:09:34 UTC
I suffer from the same problem. :(
Comment 29 Richard Weinberger 2017-08-20 19:04:23 UTC
(In reply to François Valenduc from comment #27)
> Adding these lines doesn't change anything. In fact, the disk and the
> partition are detected. The problem is rather that the LVM volumes are not
> found.

In my case the difference between working and bad initrd is:
diff -Nur good/usr/lib/udev/rules.d/61-persistent-storage-compat.rules bad/usr/lib/udev/rules.d/61-persistent-storage-compat.rules
--- good/usr/lib/udev/rules.d/61-persistent-storage-compat.rules        2017-08-20 20:58:53.723996905 +0200
+++ bad/usr/lib/udev/rules.d/61-persistent-storage-compat.rules 2017-08-20 20:59:04.775996863 +0200
@@ -37,10 +37,11 @@

 # NVMe links were introduced first via a SUSE specific commit
 # (bsc#944132) and upstream gained support later but of course using a
-# different scheme.
-KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}!="?*", IMPORT{program}="scsi_id --export --whitelisted -d $tempnode", ENV{ID_BUS}="nvme"
-KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}"
-KERNEL=="nvme*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n"
+# different scheme. Also note that ID_SERIAL is already used by the
+# contemporary rules, see bsc#1048679 for details.
+KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_NVME_SERIAL_COMPAT}!="?*", PROGRAM="scsi_id --whitelisted --replace-whitespace -d $devnode", RESULT=="?*", ENV{ID_NVME_SERIAL_COMPAT}="$result"
+KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_NVME_SERIAL_COMPAT}=="?*", SYMLINK+="disk/by-id/nvme-$env{ID_NVME_SERIAL_COMPAT}"
+KERNEL=="nvme*", ENV{DEVTYPE}=="partition", ENV{ID_NVME_SERIAL_COMPAT}=="?*", SYMLINK+="disk/by-id/nvme-$env{ID_NVME_SERIAL_COMPAT}-part%n"

 # SCSI compat links for ATA devices, removed by f6ba1a468cea (boo#769002)
 KERNEL=="sd*[!0-9]", ENV{ID_BUS}=="ata", PROGRAM="scsi_id --whitelisted --replace-whitespace -p0x80 -d $devnode", RESULT=="?*", ENV{ID_SCSI_COMPAT}="$result", SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT}"


So, Lee might be correct.
Comment 30 Lee Martin 2017-08-20 23:07:26 UTC
(In reply to Lee Martin from comment #26)

Following the proposal in comment #14, I reverted to the udev-228-32.2, rebuilt and initrd and dropped into dracut at boot, so I did some checking.

dracut:
# ls -l /dev/disk/by-id/
lrwxrwxrwx 1 root 0    15 Aug 20 23:57 -20025385b61502108-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root 0    15 Aug 20 23:57 -20025385b61502108-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root 0    15 Aug 20 23:57 -20025385b61502108-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root 0    13 Aug 20 23:57 nvme-20025385b61502108 -> ../../nvme0n1
lrwxrwxrwx 1 root 0    13 Aug 20 23:57 nvme-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A -> ../../nvme0n1
lrwxrwxrwx 1 root 0    15 Aug 20 23:57 nvme-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root 0    15 Aug 20 23:57 nvme-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root 0    15 Aug 20 23:57 nvme-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root 0    13 Aug 20 23:57 nvme-eui.0025385b61502108 -> ../../nvme0n1
lrwxrwxrwx 1 root 0    15 Aug 20 23:57 nvme-eui.0025385b61502108-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root 0    15 Aug 20 23:57 nvme-eui.0025385b61502108-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root 0    15 Aug 20 23:57 nvme-eui.0025385b61502108-part3 -> ../../nvme0n1p3


My LVM partition is LUKS encrypted, and after upgrading to udev-228-32.2 I never got the LUKS password entry, but instead the error described in this bug.

I only installed my 42.3 a few days ago using the initial udev-228-27.2 from the ISO, which created the following /etc/crypttab:

 # cat /etc/crypttab
cr_-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 /dev/disk/by-id/-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 none       none


Now, if I compare that crypttab to the disk id's I see in dracut, I notice that the device name is missing completely, well, more exactly, all the NVMe devices now have an "nvme" prefix, whereas during install with udev-228-27.2 they apparently did not have the "nvme" prefix.

So, now with a corrected crypttab and the standard udev-228-32.2 I'm fine. 

HOWEVER this indicates to me that this Leap 42.3 update around udev is changing many device names at boot versus the ISO installation, so anything dependent on specific device names (like LUKS) is at risk of not working after this update. Therefore a fix of somekind is necessary since imagine LUKs setups to be relatively common on the desktop.

Regarding LVM on a non-encrypted system, like Francois, I wondered if LVM maybe has a fixed list of device names somewhere, and maybe the device name changes in dracut are causing LVM some grief? For LVM, I came across /etc/lvm/archive/*.vg which lists the LVM configuration, and guess what, there is a physical_volumes section which contains specific device names.

Francois, maybe you want to check/list the device names you see in dracut with different versions of udev, and then compare them with the LVM configuration files I mention above. Since my LVM sites on top of LUKS, the physical volume is the same, but assuming your device names at boot in dracut have now changed, then that might explain why your VG is not activating automatically?

Richard, maybe you can also check your LVM config and dracut device names and give feedback.

Hope that helps.
Lee
Comment 31 Thomas Blume 2017-08-21 07:20:23 UTC
(In reply to Richard Weinberger from comment #29)
> (In reply to François Valenduc from comment #27)
> > Adding these lines doesn't change anything. In fact, the disk and the
> > partition are detected. The problem is rather that the LVM volumes are not
> > found.
> 
> In my case the difference between working and bad initrd is:
> diff -Nur good/usr/lib/udev/rules.d/61-persistent-storage-compat.rules
> bad/usr/lib/udev/rules.d/61-persistent-storage-compat.rules
> --- good/usr/lib/udev/rules.d/61-persistent-storage-compat.rules       
> 2017-08-20 20:58:53.723996905 +0200
> +++ bad/usr/lib/udev/rules.d/61-persistent-storage-compat.rules 2017-08-20
> 20:59:04.775996863 +0200
> @@ -37,10 +37,11 @@
> 
>  # NVMe links were introduced first via a SUSE specific commit
>  # (bsc#944132) and upstream gained support later but of course using a
> -# different scheme.
> -KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}!="?*",
> IMPORT{program}="scsi_id --export --whitelisted -d $tempnode",
> ENV{ID_BUS}="nvme"
> -KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="?*",
> SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}"
> -KERNEL=="nvme*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*",
> SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n"
> +# different scheme. Also note that ID_SERIAL is already used by the
> +# contemporary rules, see bsc#1048679 for details.
> +KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_NVME_SERIAL_COMPAT}!="?*",
> PROGRAM="scsi_id --whitelisted --replace-whitespace -d $devnode",
> RESULT=="?*", ENV{ID_NVME_SERIAL_COMPAT}="$result"
> +KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_NVME_SERIAL_COMPAT}=="?*",
> SYMLINK+="disk/by-id/nvme-$env{ID_NVME_SERIAL_COMPAT}"
> +KERNEL=="nvme*", ENV{DEVTYPE}=="partition",
> ENV{ID_NVME_SERIAL_COMPAT}=="?*",
> SYMLINK+="disk/by-id/nvme-$env{ID_NVME_SERIAL_COMPAT}-part%n"
> 
>  # SCSI compat links for ATA devices, removed by f6ba1a468cea (boo#769002)
>  KERNEL=="sd*[!0-9]", ENV{ID_BUS}=="ata", PROGRAM="scsi_id --whitelisted
> --replace-whitespace -p0x80 -d $devnode", RESULT=="?*",
> ENV{ID_SCSI_COMPAT}="$result",
> SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT}"
> 
> 
> So, Lee might be correct.

Sorry for the delay, I'm back from vacation and will continue processing.

Can you please test whether the packages from bug 1051465 comment#9 fix it?
Please note that the fix is only for nvme disks.
Comment 32 Thomas Blume 2017-08-21 07:43:26 UTC
(In reply to Lee Martin from comment #30)

> I only installed my 42.3 a few days ago using the initial udev-228-27.2 from
> the ISO, which created the following /etc/crypttab:
> 
>  # cat /etc/crypttab
> cr_-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3
> /dev/disk/by-id/-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 none      
> none

That is actually bug 1048679.

> So, now with a corrected crypttab and the standard udev-228-32.2 I'm fine. 
> 
> HOWEVER this indicates to me that this Leap 42.3 update around udev is
> changing many device names at boot versus the ISO installation, so anything
> dependent on specific device names (like LUKS) is at risk of not working
> after this update. Therefore a fix of somekind is necessary since imagine
> LUKs setups to be relatively common on the desktop.

Hm, the only fix I can imagine with an update is to re-add the broken symlinks.
The consequence would be to carry something that is broken for a quite long time.
I'd prefer if we would document this behaviour and provide an Driver Update for the 42.3 installation system instead.

Steffen, would this be feasible?
Comment 33 Richard Weinberger 2017-08-21 11:37:43 UTC
(In reply to Thomas Blume from comment #31)
> > 
> > So, Lee might be correct.
> 
> Sorry for the delay, I'm back from vacation and will continue processing.
> 
> Can you please test whether the packages from bug 1051465 comment#9 fix it?
> Please note that the fix is only for nvme disks.

Sure, since I have a NVMe disk they might help.
So, installing these packages followed by a reboot should work?
Or is there some other action needed?
Comment 34 Thomas Blume 2017-08-21 12:37:56 UTC
(In reply to Richard Weinberger from comment #33)
> (In reply to Thomas Blume from comment #31)
> > > 
> > > So, Lee might be correct.
> > 
> > Sorry for the delay, I'm back from vacation and will continue processing.
> > 
> > Can you please test whether the packages from bug 1051465 comment#9 fix it?
> > Please note that the fix is only for nvme disks.
> 
> Sure, since I have a NVMe disk they might help.
> So, installing these packages followed by a reboot should work?
> Or is there some other action needed?

It won't help if your system is using the broken symlinks from the installation system somewhere.
Hence, please check /etc/fstab, /etc/crypttab and the filter setting in /etc/lvm/lvm.conf that there are no device names with a dash as first charakter.

For example, if you have something like Lee in /etc/crypttab:

cr_-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3
/dev/disk/by-id/-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 none 

you will need to add 'nvme' before the dash so that it looks like:

cr_nvme-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3
/dev/disk/by-id/nvme-Samsung_SSD_960_PRO_2TB_S3EXNCAHB01257A-part3 none

After changing any of the above files, please run mkinitrd.
Comment 35 Forgotten User lX4JxJ-D8z 2017-08-22 18:20:58 UTC
Created attachment 737849 [details]
output of ls -laR /dev/disk with a bad initrd

In response to comment #30, here is the output of ls -laR /dev/disk with a bad initrd obtained with rd.break=initqueue.
Comment 36 Forgotten User lX4JxJ-D8z 2017-08-22 18:22:33 UTC
Created attachment 737850 [details]
output of ls -laR /dev/disk with a good initrd

Here is the output of ls -laR /dev/disk with a good initrd after the system has booted. If somebody can explains me how to get a dracut shell, because with rd.shell, the boot process doesn't end up in a shell.
Comment 37 Thomas Blume 2017-08-23 11:32:00 UTC
(In reply to François Valenduc from comment #36)
> Created attachment 737850 [details]
> output of ls -laR /dev/disk with a good initrd
> 
> Here is the output of ls -laR /dev/disk with a good initrd after the system
> has booted. If somebody can explains me how to get a dracut shell, because
> with rd.shell, the boot process doesn't end up in a shell.

The problem is that dracut sees multiple root= and resume= boot parameters:

-->
Aug 04 20:09:14 pc-francois dracut-cmdline[704]: Using kernel command line parameters: rd.lvm.lv=system/swap rd.lvm.lv=system/opensuse resume=/dev/mapper/system-swap resume=/dev/mapper/syste
m-swap root=/dev/mapper/system-opensuse rootfstype=ext4 rootflags=rw,noatime,data=ordered BOOT_IMAGE=/boot/vmlinuz-4.12.4 root=UUID=56d42695-8175-4b18-b0e6-1d4891e2f386 ro BOOT_IMAGE=/boot/x
86_64/loader/linux ramdisk_size=512000 ramdisk_blocksize=4096 resume=/dev/system/swap rd.break=initqueue
--<

Please reboot your machine and when you see the bootloader screen go into the grub2 editor.
From there, remove the entries:

root=UUID=56d42695-8175-4b18-b0e6-1d4891e2f386 
ro
resume=/dev/system/swap

and then boot your machine.
Does it come up?
If so, please edit /etc/default/grub and remove the problematic entries above.
Afterwards run:

grub2-mkconfig -o /boot/grub2/grub.cfg

in order to update your boot configuration.
Comment 38 Forgotten User DjcOQHkgwr 2017-08-23 11:48:41 UTC
So, I don't need to indicate the root device ? 
And resume is not more allowed ?
Comment 39 Forgotten User DjcOQHkgwr 2017-08-23 11:57:08 UTC
The system still doesn't boot whitout root and resume parameters. What I also find strange is that if I run lvm vgchange -a y in the shell I get with rd.break=initqueue, all the LVM volumes are found. So why aren't they detected in a normal boot ?
Comment 40 Thomas Blume 2017-08-23 11:59:35 UTC
(In reply to François Valenduc from comment #38)
> So, I don't need to indicate the root device ? 

Yes, dracut is able to automatically determine the root device and will write the settings into: etc/cmdline.d/95root-dev.conf in the initrd.

> And resume is not more allowed ?

Sure it is, but the resume parameter should only be given once.
Comment 41 Forgotten User DjcOQHkgwr 2017-08-23 12:04:48 UTC
Maybe it was a bad copy paste from me, but the resume and root parameters where given only once. And the problem continues.
Comment 42 Thomas Blume 2017-08-23 12:13:11 UTC
(In reply to François Valenduc from comment #39)
> The system still doesn't boot whitout root and resume parameters. What I
> also find strange is that if I run lvm vgchange -a y in the shell I get with
> rd.break=initqueue, all the LVM volumes are found. So why aren't they
> detected in a normal boot ?

lvm activation is done in the initqueue.
If you break before dracut-initqueue is finished, it is normal that lvm is not active.

You can try:

rd.break=pre-mount

instead, which hopefully gives you a dracut shell too.
If so, please provide a new rdsosreport.
Comment 43 Forgotten User DjcOQHkgwr 2017-08-23 12:55:52 UTC
unfortunately, there is a no shell with rd.break=pre-mount
Comment 44 Thomas Blume 2017-08-23 13:07:19 UTC
(In reply to François Valenduc from comment #43)
> unfortunately, there is a no shell with rd.break=pre-mount

Then, I guess the only chance to get more info is to start with the additional boot parameter:

rd.debug

and to capture the boot log (e.g. via serial console).
Can you please try to do so and attach the log?
Comment 45 Forgotten User DjcOQHkgwr 2017-08-23 13:32:03 UTC
How can I use a serial console ? My computer is way to recent to have a serial port...
With rd.debug, I can see that it repeatedly tries to find the root partition in the initqueue, without finding it. 
Is there a git tree of udev of systemd in opensuse ? Then I could use git bisect to try to find the problematic change.
Comment 46 Thomas Blume 2017-08-23 13:56:55 UTC
(In reply to François Valenduc from comment #45)
> How can I use a serial console ? My computer is way to recent to have a
> serial port...
> With rd.debug, I can see that it repeatedly tries to find the root partition
> in the initqueue, without finding it. 

And is the pv (sda3) already present then?
Does it look for the logical volume name or the UUID or both?
Can you see any hint that the logical volumes get activated?

> Is there a git tree of udev of systemd in opensuse ? Then I could use git
> bisect to try to find the problematic change.

I don't think this is a bug in systemd. It rather looks like a setup problem in dracut.
But sure, you are welcome to look into the code.
You can find the git at:

https://github.com/openSUSE/systemd.git

udev is part of the systemd sources.
Comment 47 Forgotten User DjcOQHkgwr 2017-08-23 14:06:44 UTC
the lvm volumes is on sda3 which is detected. It repeatedly tries to find /dev/mapper/system-opensuse, but in the end, it complains that it doesn't find /dev/system/opensuse
Comment 48 Forgotten User DjcOQHkgwr 2017-08-23 14:20:08 UTC
The problem is indeed dracut and not udev or systemd. If I revert to the older packages like explained in comment #5 and if I the update everythinhg (thus systemd and udev) except dracut, it works without problem.
Comment 49 Forgotten User DjcOQHkgwr 2017-08-23 14:39:14 UTC
I was a bit too fast. If I lock dracut, udev stays at the older version too.
Comment 50 Thomas Blume 2017-08-24 06:21:40 UTC
(In reply to François Valenduc from comment #49)
> I was a bit too fast. If I lock dracut, udev stays at the older version too.

Can you please attach the output of:

lsinitrd -f etc/udev/rules.d/64-lvm.rules /boot/$YOUR_INITRD

where $YOUR_INITRD is the initrd that fails to boot?
Comment 51 Forgotten User lX4JxJ-D8z 2017-08-24 18:02:28 UTC
Here are the requested info:

# hacky rules to try to activate lvm when we get new block devs...
#
# Copyright 2008, Red Hat, Inc.
# Jeremy Katz <katzj@redhat.com>


SUBSYSTEM!="block", GOTO="lvm_end"
ACTION!="add|change", GOTO="lvm_end"
# Also don't process disks that are slated to be a multipath device
ENV{DM_MULTIPATH_DEVICE_PATH}=="?*", GOTO="lvm_end"
KERNEL=="dm-[0-9]*", ACTION=="add", GOTO="lvm_end"
ENV{ID_FS_TYPE}!="LVM?_member", GOTO="lvm_end"

PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/dm-[0-9]*; do [ -e $$i ] && exit 0; done; exit 1;' ", \
    GOTO="lvm_end"

RUN+="/sbin/initqueue --settled --onetime --unique /sbin/lvm_scan"
RUN+="/sbin/initqueue --timeout --name 51-lvm_scan --onetime --unique /sbin/lvm_scan --partial"
RUN+="/bin/sh -c '>/tmp/.lvm_scan-%k;'"

LABEL="lvm_end"
Comment 52 Forgotten User lX4JxJ-D8z 2017-08-24 18:14:41 UTC
Here is the output with a working initrd. To me, it is exactly the same;

# hacky rules to try to activate lvm when we get new block devs...
#
# Copyright 2008, Red Hat, Inc.
# Jeremy Katz <katzj@redhat.com>


SUBSYSTEM!="block", GOTO="lvm_end"
ACTION!="add|change", GOTO="lvm_end"
# Also don't process disks that are slated to be a multipath device
ENV{DM_MULTIPATH_DEVICE_PATH}=="?*", GOTO="lvm_end"
KERNEL=="dm-[0-9]*", ACTION=="add", GOTO="lvm_end"
ENV{ID_FS_TYPE}!="LVM?_member", GOTO="lvm_end"

PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/dm-[0-9]*; do [ -e $$i ] && exit 0; done; exit 1;' ", \
    GOTO="lvm_end"

RUN+="/sbin/initqueue --settled --onetime --unique /sbin/lvm_scan"
RUN+="/sbin/initqueue --timeout --name 51-lvm_scan --onetime --unique /sbin/lvm_scan --partial"
RUN+="/bin/sh -c '>/tmp/.lvm_scan-%k;'"

LABEL="lvm_end"
Comment 53 Thomas Blume 2017-08-25 09:03:37 UTC
(In reply to François Valenduc from comment #52)
> Here is the output with a working initrd. To me, it is exactly the same;
> 
> # hacky rules to try to activate lvm when we get new block devs...
> #
> # Copyright 2008, Red Hat, Inc.
> # Jeremy Katz <katzj@redhat.com>
> 
> 
> SUBSYSTEM!="block", GOTO="lvm_end"
> ACTION!="add|change", GOTO="lvm_end"
> # Also don't process disks that are slated to be a multipath device
> ENV{DM_MULTIPATH_DEVICE_PATH}=="?*", GOTO="lvm_end"
> KERNEL=="dm-[0-9]*", ACTION=="add", GOTO="lvm_end"
> ENV{ID_FS_TYPE}!="LVM?_member", GOTO="lvm_end"
> 
> PROGRAM=="/bin/sh -c 'for i in $sys/$devpath/holders/dm-[0-9]*; do [ -e $$i
> ] && exit 0; done; exit 1;' ", \
>     GOTO="lvm_end"
> 
> RUN+="/sbin/initqueue --settled --onetime --unique /sbin/lvm_scan"
> RUN+="/sbin/initqueue --timeout --name 51-lvm_scan --onetime --unique
> /sbin/lvm_scan --partial"
> RUN+="/bin/sh -c '>/tmp/.lvm_scan-%k;'"
> 
> LABEL="lvm_end"

Ok, so the udev rule for activating the lvm device is there.
Still rdsosreport shows that it doesn't get activated:

-->
+ lvm vgdisplay
  --- Volume group ---
  VG Name               system
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2933
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                15
  Open LV               0
^^^^^^^^^^^^^^^^^^^^^^^^^^^
--<

Maybe there is an error when the rule is executed.
Please go again to the dracut shell and run:

udevadm test /block/sda

and attach the output.
The output of:

udevadm info -e

would also be helpful.
Comment 54 Forgotten User lX4JxJ-D8z 2017-08-25 18:27:04 UTC
Created attachment 738360 [details]
output of udevadm test /block/sda
Comment 55 Forgotten User lX4JxJ-D8z 2017-08-25 18:27:38 UTC
Created attachment 738361 [details]
output of udevadm info -e
Comment 56 Thomas Blume 2017-08-28 08:38:44 UTC
(In reply to François Valenduc from comment #55)
> Created attachment 738361 [details]
> output of udevadm info -e

The pv of your lvm device is misidentified by udev:

-->
P: /devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda3
N: sda3
[...]
E: ID_FS_TYPE=iso9660
[...]
E: ID_FS_VERSION=Joliet Extension
E: ID_MODEL=ST1000LM014-1EJ1
--<

Normally for an lvm pv, it should look like this:

-->
E: ID_FS_TYPE=LVM2_member
[...]
E: ID_FS_VERSION=LVM2 001
E: ID_MODEL=LVM PV GplkpP-Ovcs-w2SQ-H31f-hOFe-ztuO-SlLCGH on /dev/sda2
--<

This is similar to bug 1046268.
Can you please test whether the workaround from bug 1046268 comment#29 fixes it?
Comment 57 Forgotten User lX4JxJ-D8z 2017-08-28 18:05:37 UTC
Indeed commenting ENV{DEVTYPE}=="partition", IMPORT{parent}="ID_*" in /usr/lib/udev/rules.d/61-persistent-storage-compat.rules and regenerating the initramfs solves the problem.
Comment 58 Thomas Blume 2017-09-05 12:38:05 UTC
The correction of the rules for NMVe devices are processed in bug 1051465.
Closing this one as duplicate.

*** This bug has been marked as a duplicate of bug 1051465 ***
Comment 59 Forgotten User DjcOQHkgwr 2017-09-05 12:41:07 UTC
In my case, this problem has nothing to do with rules for NVMe devices, I have a SATA disk.
Comment 60 Thomas Blume 2017-09-05 13:50:55 UTC
(In reply to François Valenduc from comment #59)
> In my case, this problem has nothing to do with rules for NVMe devices, I
> have a SATA disk.

Yes, your case is not covered within this bug.
Yours is a duplicate of bug 1046268, see comment#56.

However, I've referenced the wrong duplicate for the NVMe issue, sorry.
The right one is: 1048679

*** This bug has been marked as a duplicate of bug 1048679 ***
Comment 61 Forgotten User lX4JxJ-D8z 2017-09-07 18:36:24 UTC
The problem is indeed solved with the latest version of udev and systemd (228-35.1) available in opensuse.