Bug 1063249 - Encrypted LVM password no longer requested after systemd update --> boot fails completely
Encrypted LVM password no longer requested after systemd update --> boot fail...
Status: RESOLVED FIXED
: 1060156 1060226 (view as bug list)
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Basesystem
Leap 42.3
x86-64 openSUSE 42.3
: P5 - None : Critical with 5 votes (vote)
: ---
Assigned To: systemd maintainers
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-10-13 12:51 UTC by Stefan Dirsch
Modified: 2018-01-09 20:19 UTC (History)
10 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
lvm-encryption-timeout.jpg (97.43 KB, image/jpeg)
2017-10-13 12:54 UTC, Stefan Dirsch
Details
rdsosreport.txt (84.82 KB, text/plain)
2017-10-13 12:55 UTC, Stefan Dirsch
Details
journalctl.log (82.98 KB, text/plain)
2017-10-13 12:55 UTC, Stefan Dirsch
Details
Photo (124.80 KB, image/jpeg)
2017-10-20 14:46 UTC, Stefan Dirsch
Details
rdsosreport.txt (85.55 KB, text/plain)
2017-10-20 14:51 UTC, Stefan Dirsch
Details
journalctl.log (945.40 KB, text/plain)
2017-10-20 15:13 UTC, Stefan Dirsch
Details
/etc/crypttab (125 bytes, text/plain)
2017-10-23 10:29 UTC, Stefan Dirsch
Details
udevadm info /dev/nvme0n1p2 (1.53 KB, text/plain)
2017-10-23 10:29 UTC, Stefan Dirsch
Details
journalctl.log.gz (182.53 KB, application/octet-stream)
2017-10-23 13:04 UTC, Stefan Dirsch
Details
udevadm_test_block_nvme0n1_nvme0n1p2.txt (9.31 KB, text/plain)
2017-10-23 14:55 UTC, Stefan Dirsch
Details
udevadm_info_e.txt (152.11 KB, text/plain)
2017-10-23 14:57 UTC, Stefan Dirsch
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Dirsch 2017-10-13 12:51:57 UTC
After updating systemd, the system no longer asks for the LVM encryption password during boot. Instead it runs in a timeout and an emergency shell. :-(

I made sure, that neither a kernel update nor a dracut update is responsible for this. Both updates regenerate the initrds and I rebooted afterwards successfully.

Still working systemd was:
(latest changelog entry)
  * Fr Jun 23 2017 fbui@suse.com
     - Import commit 6c14b00040edfe5b60aadb6a195955990e8a423f
       e2026f234 core:execute: fix fork() fail handling in exec_spawn()
       (bsc#1040258)

The broken systemd update, which also regenerated initrds is:
(added changelog entries since the last known working systemd version, see above)

    * Do Aug 31 2017 fbui@suse.com
- Import commit 533ab326a0d1826c14a87a765087b926e944c867
  289949a42 device: make sure to remove all device units sharing the same sysfs path (#6679)
  3b81de623 coredumpctl: fix handling of files written to fd
  ef76ac9da udev/path_id: introduce support for NVMe devices (#4169) (bsc#1045987)
  341e240ce core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification (v228) (bsc#1045384 bsc#1047379)

* Mi Aug 30 2017 fbui@suse.com
- Add 0001-Revert-core-device-Use-JobRunningTimeoutSec-for-devi.patch (bsc#1048605)
  It's a temporary but urgent fix for a regression discovered in bug
  1048605. The fix is still under discussion with upstream but we need
  to make progress here and limit the number of affected users.
  Consequently this fix reintroduces bsc#1004995 (the bug report has
  been re-opened) but this one is far less critical and a workaround
  was provided.
  The final solution will fix both bugs.

* Mi Aug 30 2017 fbui@suse.com
- Import commit 9a04d42dd9e2f9035f79952b2d173a7b3af7fb2f
  7a4935268 compat-rules: drop the boggus 'import everything' rule (bsc#1046268)

* Mi Jul 26 2017 fbui@suse.com
- Import commit 506ef1c91d97cfa4c1e321f57dbf71c7fc42d422
  8ea065d44 compat-rules: don't rely on ID_SERIAL when generating 'by-id' symlinks for NVMe devices (bsc#1048679)
  ecc54d349 timesyncd: don't use compiled-in list if FallbackNTP has been configured explicitly
  1142bd715 fstab-generator: fix new NULL dereference. (#6296)
  a2c8f9032 basic/strv: add STRPTR_IN_SET
  3a80fbf4a fstab-generator: handle NFS "bg" mounts correctly. (#6103) (bnc#874665 fate#323464)
  3a09ebb0b Revert "fstab-generator: add support for the nfs mount option bg"
  946e3c60c fstab-generator: add x-systemd.mount-timeout (#4603)

* Di Jul 11 2017 fbui@suse.com
- Make sure dracut (if installed) will embed the new compat rule
  The new compat rules (as well as the compat generation number) must
  be embedded in the initramfs so make sure that the installed dracut
  supports it.

* Mi Jul 05 2017 fbui@suse.com
- Add minimal support for boot.d/* scripts in systemd-sysv-convert (boo#1046750)

* Mo Jul 03 2017 fbui@suse.com
- Import commit 642a5846a465085dc0af184e38a79e1be25080bd
  Here is a special import: the udev rules kept for generating some
  old/deprecated persistent symlinks have been moved into a separate
  rule file rules/61-persistent-storage-compat.rules. This has been
  done mainly to prevent generating them for new installations.
  642a5846a rules: move the rules dealing with SCSI truncated serials for ATA device to the compat persistent storage file
  2c4fe971f Revert "udev add path_compat_id to provide backwards compatibility with SLE11"
  63da94fcc Revert "udev: rules persistent device names for NVMe devices"
  d420489d8 Revert "udev: re-enable creation of by-id scsi links for ATA devices"
  eae935ef8 Revert "udev: add old fashion phy SAS disk enumeration"
  e010132ed automount: don't lstat(2) upon umount request (#6086) (bsc#1040968)
  d6dbfd264 udev: re-add back SAS addr by-path symlinks (bsc#1040153)
  0861598de udev: move compat rules in a dedicated rule file
  59e43084b udev: add old fashion phy SAS disk enumeration
  40e6c18e5 udev: re-enable creation of by-id scsi links for ATA devices
  7026e63a0 udev: rules persistent device names for NVMe devices
  70441d4fb udev add path_compat_id to provide backwards compatibility with SLE11

* Mo Jul 03 2017 fbui@suse.com
- Import commit 3c369f1c4a4931c3bd807413859fb1967d269e12
  3bf83e8bf resolved: simplify alloc size calculation (bsc#1045290 CVE-2017-9445)
  bd7b84227 build-sys: add check for gperf lookup function signature (#5055)


I will attach a screenshot, journalctl log and rdsosreport.txt
Comment 1 Stefan Dirsch 2017-10-13 12:54:09 UTC
Created attachment 744336 [details]
lvm-encryption-timeout.jpg
Comment 2 Stefan Dirsch 2017-10-13 12:55:15 UTC
Created attachment 744337 [details]
rdsosreport.txt
Comment 3 Stefan Dirsch 2017-10-13 12:55:52 UTC
Created attachment 744338 [details]
journalctl.log
Comment 4 Andreas Stieger 2017-10-13 14:14:14 UTC
reported as working: systemd-228-32.2, openSUSE:Leap:42.3:Update/systemd.7140
reported as broken:  systemd-228-35.1, openSUSE:Leap:42.3:Update/systemd.7229
source diff: osc rdiff openSUSE:Leap:42.3:Update/systemd.{7140,7229}

Bug list of this update:

bug 1045384: systemd high number of scopes, 2359, with 15 shown in systemctl status - possible scope leak
bug 1046268: USB Keys are mis-identified by device notifier
bug 1048605: regression in initrd dracut or systemd causing boot failure - timeout detecting LVM - suspect systemd / udev MU
bug 1047379: systemd-run Create a Large Number of run-.scope Directories
bug 1045987: [Intel SLES12SP3 Bug] RAID auto-rebuild does not work for NVMe bare-spare

This came from SUSE:Maintenance:5535
Comment 5 Stefan Dirsch 2017-10-16 09:03:48 UTC
(In reply to Andreas Stieger from comment #4)
> reported as working: systemd-228-32.2, openSUSE:Leap:42.3:Update/systemd.7140
> reported as broken:  systemd-228-35.1, openSUSE:Leap:42.3:Update/systemd.7229
> source diff: osc rdiff openSUSE:Leap:42.3:Update/systemd.{7140,7229}
> 
> Bug list of this update:
> 
> bug 1045384: systemd high number of scopes, 2359, with 15 shown in systemctl
> status - possible scope leak
> bug 1046268: USB Keys are mis-identified by device notifier
> bug 1048605: regression in initrd dracut or systemd causing boot failure -
> timeout detecting LVM - suspect systemd / udev MU
> bug 1047379: systemd-run Create a Large Number of run-.scope Directories
> bug 1045987: [Intel SLES12SP3 Bug] RAID auto-rebuild does not work for NVMe
> bare-spare
> 
> This came from SUSE:Maintenance:5535

Andreas, what is this? Some automatic comment written in your name or what? It isn't really helpful for me as reporter I'm afraid.
Comment 6 Andreas Stieger 2017-10-16 10:09:57 UTC
(In reply to Stefan Dirsch from comment #5)
> Andreas, what is this? Some automatic comment written in your name or what?
> It isn't really helpful for me as reporter I'm afraid.

The comment is manual and augments your initial report. You mentioned changelog dates, and I took the time to match that with the binary and source versions. Also since you reported a regression, I noted the source diff that produced the update, along with a list of bugs that are tracked for it.
Comment 7 Stefan Dirsch 2017-10-17 13:23:16 UTC
Ok. Thanks.
Comment 8 Franck Bui 2017-10-20 09:19:24 UTC
Hi Stefan,

(In reply to Stefan Dirsch from comment #0)
> After updating systemd, the system no longer asks for the LVM encryption
> password during boot. Instead it runs in a timeout and an emergency shell.
> :-(
> 
> I made sure, that neither a kernel update nor a dracut update is responsible
> for this. Both updates regenerate the initrds and I rebooted afterwards
> successfully.

Hm the update of systemd came with the following change:

  * Di Jul 11 2017 fbui@suse.com
  - Make sure dracut (if installed) will embed the new compat rule
    The new compat rules (as well as the compat generation number) must
    be embedded in the initramfs so make sure that the installed dracut
    supports it.

This is implemented with the following conflict directive in the spec file:

  Conflicts:      dracut < 044.1

So you're supposed to update dracut to its latest version as well, which is dracut-044.1-26.1.x86_64 but according to your logs your version is only at 044-26.1 ...

How is this possible ?
Comment 9 Stefan Dirsch 2017-10-20 14:42:22 UTC
This time I made sure that dracut is on 0.44.1 after online update. Still the same issue.

Maybe last time I tried the oneline update,  dracut updates came in the wrong order. I think there were two systemd update patterns (at least one containing also dracut) and one extra dracut update pattern (containing only dracut package).

This time I just selected all online updates.
Comment 10 Stefan Dirsch 2017-10-20 14:46:18 UTC
Created attachment 745343 [details]
Photo
Comment 11 Stefan Dirsch 2017-10-20 14:51:33 UTC
Created attachment 745344 [details]
rdsosreport.txt
Comment 12 Stefan Dirsch 2017-10-20 14:56:19 UTC
(In reply to Stefan Dirsch from comment #11)
> Created attachment 745344 [details]
> rdsosreport.txt

Again this looks like 044-26.1 instead of 044.1-26.1, but I double checked after the online update that dracut is on 0.44.1 level with latest changelog entry

-------------------------------------------------------------------
Tue Aug 29 13:46:38 UTC 2017 - daniel.molkentin@suse.com

- Don't detect crc32.ko as built-in (bsc#1054538)
 * adds 0537-dracut-init.sh-ignore-crc32.ko-in-builtin-test.patch

- Enable systemd-based core dumps for initrd (bsc#1054809)
 * adds 0538-Enable-core-dumps-with-systemd-from-initrd.patch

Also I even manually ran mkinitrd afterwards.
Comment 13 Franck Bui 2017-10-20 15:01:32 UTC
could you try to run with the debug logs enabled and show the content of the journal ?

To do so, append "debug printk.devkmsg=on" to the kernel command line.

Thanks.
Comment 14 Stefan Dirsch 2017-10-20 15:13:08 UTC
Created attachment 745346 [details]
journalctl.log
Comment 15 Franck Bui 2017-10-20 15:22:04 UTC
sorry for asking again but could you show the same debug logs but with a working systemd ?

Thanks !
Comment 16 Stefan Dirsch 2017-10-21 10:44:07 UTC
(In reply to Franck Bui from comment #15)
> sorry for asking again but could you show the same debug logs but with a
> working systemd ?

Sure, can do this on monday. Be aware, that this means reinstalling the system. So always takes some time. Actually, this is the bugreport (+ the data loss you have, when you've observed this bug the first time). ;-)
Comment 17 Franck Bui 2017-10-23 06:18:00 UTC
Ok, maybe before re-installing, could you show:

 - the content of /etc/crypttab

 - the output of "udevadm info /dev/nvme0n1p2" from within the emergency shell ?n

Thanks.
Comment 18 Stefan Dirsch 2017-10-23 10:29:03 UTC
Created attachment 745451 [details]
/etc/crypttab
Comment 19 Stefan Dirsch 2017-10-23 10:29:49 UTC
Created attachment 745455 [details]
udevadm info /dev/nvme0n1p2
Comment 20 Stefan Dirsch 2017-10-23 13:04:58 UTC
Created attachment 745493 [details]
journalctl.log.gz

journalctl with debug options enabled (with still working systemd)
Comment 21 Franck Bui 2017-10-23 14:22:13 UTC
Thanks.

Just to make sure the installation was done from the Leap 42.3 ISO, right ? IOW udev-228-25.6.1.x86_64 was the version used during the installation until you updated and broke your system.

So basically during the installation, /dev/disk/by-id/cr_-LENSE20512GMSP34MEAT2TA_1142267006586-part2 was used for initializing /etc/crypttab.

If so, I currently don't see how this could have happened and I would be interested in seeing the output of "udevadm test /block/nvme0n1p2". Also "udevadm info -e" might help.

I'm afraid you need to reinstall your system, sorry.
Comment 22 Stefan Dirsch 2017-10-23 14:45:22 UTC
(In reply to Franck Bui from comment #21)
> Thanks.
> 
> Just to make sure the installation was done from the Leap 42.3 ISO, right ?

Yes, yes, yes. ISO written on a USB-stick.

> IOW udev-228-25.6.1.x86_64 was the version used during the installation
> until you updated and broke your system.

In the reinstalled system I see udev-228-27.2.x86_64 (without any Online updates)

> So basically during the installation,
> /dev/disk/by-id/cr_-LENSE20512GMSP34MEAT2TA_1142267006586-part2 was used for
> initializing /etc/crypttab.

No, there is no /dev/disk/by-id/cr_-LENSE20512GMSP34MEAT2TA_1142267006586-part2, it is /dev/disk/by-id/-LENSE20512GMSP34MEAT2TA_1142267006586-part2 (without cr_).

> If so, I currently don't see how this could have happened and I would be
> interested in seeing the output of "udevadm test /block/nvme0n1p2". Also
> "udevadm info -e" might help.

Before I do this wrong. Should I do this after breaking system, i.e. in the emergency shell or in the working system? Currently I have a working system (before online update).

> I'm afraid you need to reinstall your system, sorry.

This is no problem, if this can be of any help.
Comment 23 Stefan Dirsch 2017-10-23 14:48:21 UTC
Ah. And there is no /block/nvme0n1p2. Should I use /sys/block/nvme0n1/nvme0n1p2/ instead? This matches best I believe.
Comment 24 Franck Bui 2017-10-23 14:48:59 UTC
(In reply to Stefan Dirsch from comment #22)
> 
> Before I do this wrong. Should I do this after breaking system, i.e. in the
> emergency shell or in the working system? Currently I have a working system
> (before online update).

in the working system please !
Comment 25 Franck Bui 2017-10-23 14:49:22 UTC
(In reply to Stefan Dirsch from comment #23)
> Ah. And there is no /block/nvme0n1p2. Should I use
> /sys/block/nvme0n1/nvme0n1p2/ instead? This matches best I believe.

yep sorry for the typo.
Comment 26 Stefan Dirsch 2017-10-23 14:55:59 UTC
Created attachment 745518 [details]
udevadm_test_block_nvme0n1_nvme0n1p2.txt

udevadm test /block/nvme0n1/nvme0n1p2 (in working system)
Comment 27 Stefan Dirsch 2017-10-23 14:57:34 UTC
Created attachment 745519 [details]
udevadm_info_e.txt

udevadm info -e (in working system)
Comment 28 Franck Bui 2017-10-23 15:14:27 UTC
Thanks Stefan.

So apparently in the following rules:

> KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}!="?*", IMPORT{program}="scsi_id --export --whitelisted -d $tempnode", ENV{ID_BUS}="nvme"
> KERNEL=="sd*|sr*|cciss*|nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}"
> KERNEL=="sd*|cciss*|nvme*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n"

ID_BUS is empty although it should be set to "nvme" by the first rule.

I don't see how this can happen...

Could you show the output of "/usr/lib/udev/scsi_id --export --whitelisted -d  /dev/nvme0n1" ?
Comment 29 Stefan Dirsch 2017-10-23 15:21:34 UTC
# /usr/lib/udev/scsi_id --export --whitelisted -d  /dev/nvme0n1
ID_SCSI=1
ID_VENDOR=NVMe
ID_VENDOR_ENC=NVMe\x20\x20\x20\x20
ID_MODEL=LENSE20512GMSP34
ID_MODEL_ENC=LENSE20512GMSP34
ID_REVISION=8341
ID_TYPE=disk
ID_SERIAL=2a03299dce10c6586
ID_SERIAL_COMPAT=
ID_SERIAL_SHORT=a03299dce10c6586
ID_SCSI_SERIAL=1142267006586
Comment 30 Franck Bui 2017-10-23 15:58:48 UTC
Thanks Stefan.

Actually it was easy: in 60-persistent-storage.rules line 23:

> KERNEL=="nvme*[0-9]n*[0-9]", ENV{DEVTYPE}=="disk", ATTRS{model}=="?*", ENV{ID_SERIAL_SHORT}=="?*", ENV{ID_SERIAL}="$attr{model}_$env{ID_SERIAL_SHORT}", ...

so 'ID_SERIAL' is set and therefore rule line 46 is skipped.

> KERNEL=="nvme*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}!="?*", IMPORT{program}="scsi_id --export --whitelisted -d $tempnode", ENV{ID_BUS}="nvme"

and that explains why brand new systems still continue to use such broken by-id symlinks (where the "nvme-" prefix is missing.

Ludwig, couldn't we make the latest version of systemd/udev part of the ISO instead of shipping a broken version ?

Don't the ISOs regularly updated ?

Thanks.
Comment 31 Franck Bui 2017-10-23 16:14:34 UTC
Stefan, I think the best fix in your case is to update /etc/crypttab in order to get rid of the use of the broken symlink.

So in your case:

> cr_-LENSE20512GMSP34MEAT2TA_1142267006586-part2 /dev/disk/by-id/-LENSE20512GMSP34MEAT2TA_1142267006586-part2 none       none

becomes

> cr_nvme-LENSE20512GMSP34MEAT2TA_1142267006586-part2 /dev/disk/by-id/nvme-LENSE20512GMSP34MEAT2TA_1142267006586-part2 none none

This way you will use the symlinks shipped by upstream and should be "safe" during the next updates.

The symlink should work even if you don't update, and of course will hopefully work after updating.

Thanks.
Comment 32 Stefan Dirsch 2017-10-24 09:14:57 UTC
Wow! Indeed this change fixes the issue for me! Thanks a lot!

About updating the ISO. Sure it would be appreciated. Unfortunately we've given out promotion DVDs, which we obviously cannot update. Also ISOs are meanwhile on media, which we do not control.

Since this issue means complete data loss for the customer (I was happy that I've updated my system that early), I would like to know if you can think of any workaround, which the customer can apply in the emergency shell (or by other means). Then we could describe this below "Known issues for Leap 42.3".

And now I'm wondering, whether we see this issue also on sle12-sp3. I haven't tested this yet.
Comment 33 Stefan Dirsch 2017-10-24 09:25:06 UTC
Just to confirm.

Reboot worked fine after doing the proposed change in crypttab. And this way reboot also works after a complete online update including systemd. And after this online update indeed

  /dev/disk/by-id/-LENSE20512GMSP34MEAT2TA_1142267006586-part2

no longer exists. Only

  /dev/disk/by-id/nvme-2a03299dce10c6586-part2

Before the online update both existed.

Can't we update systemd in a way, that we don't break customer's system that horribly including complete data loss?
Comment 34 Franck Bui 2017-10-24 09:44:24 UTC
(In reply to Stefan Dirsch from comment #32)
> Wow! Indeed this change fixes the issue for me! Thanks a lot!

Thanks a lot for helping me debugging this.

> 
> About updating the ISO. Sure it would be appreciated. Unfortunately we've
> given out promotion DVDs, which we obviously cannot update. Also ISOs are
> meanwhile on media, which we do not control.

Indeed but at least we could limit the number of impacted users.

> 
> Since this issue means complete data loss for the customer (I was happy that
> I've updated my system that early), I would like to know if you can think of
> any workaround, which the customer can apply in the emergency shell (or by
> other means). Then we could describe this below "Known issues for Leap 42.3".
> 

Maybe from the emergency shell: /etc/crypptab may be fixed to use the correct symlink, then "systemctl daemon-reload" and then exit from the shell.

Could you try that ?


> And now I'm wondering, whether we see this issue also on sle12-sp3. I
> haven't tested this yet.

It depends on the version of udev shipped by the ISO... do you know where I could find such information ?
Comment 35 Franck Bui 2017-10-24 09:48:15 UTC
(In reply to Stefan Dirsch from comment #33)
> 
> Can't we update systemd in a way, that we don't break customer's system that
> horribly including complete data loss?

We will certainly do that... however this means that we still continue to generate and use broken symlinks by default on systems having NVMe devices which we will need to support forever.
Comment 36 Stefan Dirsch 2017-10-24 10:22:10 UTC
I could successfully change /etc/crypttab from emergency. Unfortunately this alone didn't help. Most likely one also needs to run dracut/mkinitrd to recreate initrd.
Comment 37 Stefan Dirsch 2017-10-24 10:42:35 UTC
(In reply to Stefan Dirsch from comment #36)
> I could successfully change /etc/crypttab from emergency. Unfortunately this
> alone didn't help. Most likely one also needs to run dracut/mkinitrd to
> recreate initrd.

Tried that by using cryptsetup, lvm, mount, etc. but failed in the end. I'm afraid that's something, we cannot describe below "Known issues". We should really provide another systemd update, which fixes this issue. Seriously.
Comment 38 Stefan Dirsch 2017-10-24 15:41:19 UTC
(In reply to Franck Bui from comment #35)
> (In reply to Stefan Dirsch from comment #33)
> > 
> > Can't we update systemd in a way, that we don't break customer's system that
> > horribly including complete data loss?
> 
> We will certainly do that... however this means that we still continue to
> generate and use broken symlinks by default on systems having NVMe devices
> which we will need to support forever.

Please let me know, once you have a fix available. I want to give it a try before
I reinstall the machine for real, so things can be well tested.
Comment 39 Franck Bui 2017-10-25 07:34:31 UTC
(In reply to Stefan Dirsch from comment #37)
> Tried that by using cryptsetup, lvm, mount, etc. but failed in the end. I'm
> afraid that's something, we cannot describe below "Known issues". We should
> really provide another systemd update, which fixes this issue. Seriously.

hmm this sounds like you tried to update /etc/cryptsetup located in your rootfs. If so I was suggesting to update the one in the initramfs which should be directly accessible at /etc/crypttab from the emergency shell.

The recipe is quite simple:

  - edit /etc/crypttab and and the "nvme" prefix if it's missing.
    you can figure out it's missing if the by-id path starts with
    '-'

  - systemctl daemon-reload

  - exit from the emergency shell

If that works you still need to update /etc/crypttab in your rootfs and run "mkinitrd".
Comment 40 Franck Bui 2017-10-25 08:48:32 UTC
(In reply to Stefan Dirsch from comment #38)
> 
> Please let me know, once you have a fix available. I want to give it a try
> before
> I reinstall the machine for real, so things can be well tested.

Here we go:

https://build.opensuse.org/package/show/home:fbui:systemd:next:openSUSE-Leap42.3/systemd
Comment 41 Stefan Dirsch 2017-10-25 11:47:53 UTC
(In reply to Franck Bui from comment #39)
> hmm this sounds like you tried to update /etc/cryptsetup located in your
> rootfs. 

That's correct.

> If so I was suggesting to update the one in the initramfs which
> should be directly accessible at /etc/crypttab from the emergency shell.
> 
> The recipe is quite simple:
> 
>   - edit /etc/crypttab and and the "nvme" prefix if it's missing.
>     you can figure out it's missing if the by-id path starts with
>     '-'
> 
>   - systemctl daemon-reload
> 
>   - exit from the emergency shell

Did this. Then I see two lines

dracut-initqueue Warning: Not all disks have been found.
dracut-initqueue Warning: You might want to regenerate the initramfs

Nothing more happens then.
Comment 42 Stefan Dirsch 2017-10-25 12:44:37 UTC
(In reply to Franck Bui from comment #40)
> (In reply to Stefan Dirsch from comment #38)
> > 
> > Please let me know, once you have a fix available. I want to give it a try
> > before
> > I reinstall the machine for real, so things can be well tested.
> 
> Here we go:
> 
> https://build.opensuse.org/package/show/home:fbui:systemd:next:openSUSE-
> Leap42.3/systemd

Hooray! I can confirm, that this update fixes the issue. :-) I believe we want to see this update available ASAP. ;-)
Comment 43 Franck Bui 2017-10-25 13:09:45 UTC
(In reply to Stefan Dirsch from comment #41)
> 
> dracut-initqueue Warning: Not all disks have been found.
> dracut-initqueue Warning: You might want to regenerate the initramfs
> 
> Nothing more happens then.

Hmm you probably need to start a couple services by hand before exiting the shell then...

Maybe "systemctl restart cryptsetup.target" ?
Comment 44 Franck Bui 2017-10-25 13:11:07 UTC
(In reply to Stefan Dirsch from comment #42)
> Hooray! I can confirm, that this update fixes the issue. :-) I believe we
> want to see this update available ASAP. ;-)

Thanks the testing.

I'll do my best to submit a new update, but unfortunately the last submission was declined because we're waiting for another fix to be submitted correctly...
Comment 45 Franck Bui 2017-10-25 13:12:33 UTC
BTW, AFAIK only leap 42.3 is affected (not SLE12).

The fix will be released but it would be really nice if we could update the version of udev shipped by the ISO.
Comment 46 Stefan Dirsch 2017-10-25 15:03:03 UTC
(In reply to Franck Bui from comment #43)
> (In reply to Stefan Dirsch from comment #41)
> > 
> > dracut-initqueue Warning: Not all disks have been found.
> > dracut-initqueue Warning: You might want to regenerate the initramfs
> > 
> > Nothing more happens then.
> 
> Hmm you probably need to start a couple services by hand before exiting the
> shell then...
> 
> Maybe "systemctl restart cryptsetup.target" ?

Ok. This opens the device successfully after I provided the password. Other than that the result is the same. System doesn't start. Same two messages as above.
Comment 47 Ludwig Nussel 2017-10-27 09:10:01 UTC
We cannot really update the ISO at this point. The issue needs to be documented in the release notes https://github.com/openSUSE/release-notes-openSUSE/tree/Leap_42.3
Feel free to file a pull request yourself or open bug here for the release notes component.

Would it help to install with online updates? If so documenting that would also help.
Comment 48 Franck Bui 2017-10-27 09:18:15 UTC
Well I think it's better to release the fix otherwise the user experience would be quite bad for users having NVMe devices.

It's sad that those users will use a broken symlink by default and that will have to maintain them forever.
Comment 49 Franck Bui 2017-10-27 09:57:53 UTC
*** Bug 1060226 has been marked as a duplicate of this bug. ***
Comment 50 Stefan Dirsch 2017-10-27 10:47:04 UTC
(In reply to Franck Bui from comment #49)
> *** Bug 1060226 has been marked as a duplicate of this bug. ***

OMG! This issue is known since 2017-09-25 and the culprit found just a few days later by Jonathan Cottrill <novell@jonathancottrill.net>. And the bugowner of that bug is in Cc of this bug ...
(JFYI, the workaround in this bugreport is not really a workaround; it can only
be applied after a new installation, i.e. after you had the complete data loss)

Ludwig, I'm not sure what another entry in our release notes would help here. One of the systemd update was the culprit here. What we need here is another systemd Online update. ASAP.
Comment 51 Ludwig Nussel 2017-10-27 11:50:46 UTC
(In reply to Stefan Dirsch from comment #50)
> [...]
> Ludwig, I'm not sure what another entry in our release notes would help
> here.

There is no reference in release notes atm. People hit by the
problem will find information about it in internet search engines
if we put something in the release notes.

> One of the systemd update was the culprit here. What we need here is
> another systemd Online update. ASAP.

Agreed.
Comment 52 Thomas Altrock 2017-10-27 12:03:35 UTC
I wasn't in the CC List of THIS bug until it was marked as duplicate by Franck Bui! (also i think THIS bug is a duplicate of 1060226 - not the other way around, but doesn't matter)

1060226 is Assigned to Daniel Molkentin. I think its his job to assign the bug correctly. But unfortunately didn't get any response from him.

The workaround provided by Jonathan Cottrill was the solution for my Problem. The installer makes a wrong entry in /etc/crypttab..... and with the patches the wrong link disappears, but with no correction in /etc/crypttab. After correcting this (e.g with a bootet rescue-system) all works fine.

I think the corresponding systemd Patch must be revoked and a new (fixed one) released. A correction of a broken entry for nvme-devices in /etc/crypttab shouldn't be too complicated.
Comment 53 Franck Bui 2017-10-27 15:46:19 UTC
OK I've submitted a new update which contains the fix.

I still need to document this shortcoming in the release notes so Leap 42.3 users are aware that /etc/crypttab uses a boggus symlink and therefore the file should be updated.
Comment 54 Franck Bui 2017-10-27 15:46:50 UTC
Oh and before you ask, it's SR#145031
Comment 55 Jonathan Cottrill 2017-10-27 15:48:42 UTC
@Stefan Dirsch, I was only in the CC list of the older bug that was opened a month ago, 1060226, so I only became aware of this bug ID today, when the older one was marked as a duplicate of the newer one. In any case, I spent some time today creating a workaround for existing systems, in case that's helpful to people. Obviously, there are too many system variations for this to be guaranteed to work for everyone, but there's a good chance you can get the general idea to work for a given system with some slight variations.

Unfortunately, it's fairly involved, so only worth it for systems you really care about. :-)

Steps:

1. Boot from an openSUSE Leap 42.3 installation disc/USB drive (just as if you were going to install it).

2. On the first page, select "More...".

3. On the next page, select "Rescue System".

4. At the "rescue login:" prompt, enter root for the username (there is no password).

5. Decrypt *all* encrypted drives/partitions (you quite possibly only have one) that are part of your system. The lsblk command may be helpful, or you may find these under paths like /dev/nvme*, or it may be easier to look at the symlinks under /dev/disk/by-id/ (be aware there are likely lots of duplicates, pointing to the same drive/partition, in this location). Run cryptsetup for each one:

cryptsetup open <device> <some-unique-name>

For example:

cryptsetup open /dev/nvme0n1p2 name1

<some-unique-name> needs to be different for each device, obviously. :-)

Enter the normal passphrase for each device when prompted.

6a. If you are using LVM, run:

lvdisplay

Verify you can see all your logical volumes. If not, run:

vgscan
lvscan
lvdisplay

Note the "LV Path" of each logical volume.

Mount the logical volume that corresponds to the / filesystem for your system on /mnt. It may have an "LV Name" of root, or it may not. Use the "LV Path" to mount it:

mount <lv-path> /mnt

For example:

mount /dev/system/root /mnt

6b. If you ARE NOT using LVM, mount the drive/partition that corresponds to the / filesystem for your system on /mnt:

mount <device> /mnt

For example:

mount /dev/nvme0n1p2 /mnt

7. (probably unneeded!) If you have a very unusual setup where /etc is on a separate filesystem from /, mount it:

mount <device/lv-path> /mnt/etc

For example:

mount /dev/system/etc /mnt/etc

8. Prepare and enter a chroot environment for your system:

mount -t proc none /mnt/proc
mount --rbind /dev /mnt/dev
mount --rbind /sys /mnt/sys
chroot /mnt /bin/bash

You should now be "inside" your actual system (although you are running the kernel, etc., from the rescue system). (This step and some of the following steps are taken from https://doc.opensuse.org/documentation/leap/startup/html/book.opensuse.startup/cha.trouble.html#sec.trouble.data.recover.rescue.access, and you can read additional information at that location.)

9. Mount the remaining filesystems (if any) using information from your system's /etc/fstab file:

mount -a

10. Edit /etc/crypttab to fix the entry that is no longer created by udev because of this bug. This will probably mean changing an entry like this:

/dev/disk/by-id/-xxxxxxxxx

...to add nvme before the dash:

/dev/disk/by-id/nvme-xxxxxxxxx

11. Regenerate the initramfs boot images with dracut. First, determine which images are present (these normally correspond to kernel versions):

ls /boot/initrd-*

If you know which image you will be booting, run dracut for just that image; otherwise, run dracut for each image:

dracut -f <image-path> <kernel-version>

For example:

dracut -f /boot/initrd-4.4.76-1-default 4.4.76-1-default
dracut -f /boot/initrd-4.4.92-31-default 4.4.92-31-default

12. Exit from the chroot into your system, and reboot:

umount -a
exit
reboot
Comment 56 Franck Bui 2017-10-27 15:59:23 UTC
*** Bug 1060156 has been marked as a duplicate of this bug. ***
Comment 58 Stefan Dirsch 2017-10-27 17:14:09 UTC
@Jonathan, you're my personal hero for figuring out what the culprit issue was that early! It's definitely not your fault, that this information didn't get through to the systemd developers. :-( We lost about 4 weeks this way. And I don't know how many users ran into this issue in this time ...

Thanks for the instructions to workaround the issue. Basically I already tried this, i.e. editing crypttab and recreating initrd in chroot environment after mounting the encrypted parititions. But this failed for me in the end. And now I finally installed my system for real and don't want to do any more testing with an unknown outcome.
Comment 59 Franck Bui 2017-10-30 09:04:40 UTC
(In reply to Jonathan Cottrill from comment #55)
> 
> Unfortunately, it's fairly involved, so only worth it for systems you really
> care about. :-)
> 

Thanks a lot for writing this.

There might be an easier way to boot a broken system though, I think the following steps should work:

 1. Boot the broken system and append the following options to the kernel
    command line:

      rdinit=/bin/sh systemd.default_timeout_start_sec=10

 2. A very early shell should be started instead of systemd. Now we should
    be able to fix /etc/crypttab contained in initd:

      sed -Ei 's,(cr_|by-id/)-,\1nvme-,g' /etc/crypttab

 3. And finally boot systemd:

      exec /sbin/init

During the boot process, there will be some errors related to /etc/crypttab from the rootfs containing references to the broken symlinks (the step 2. only fixed crypptab from initrd). Those errors shouldn't be fatal though but the system would wait until a timeout expires (1'30). That the reason why in step 1. we added  "systemd.default_timeout_start_sec=10" option.

Once the system booted, /etc/crypptab (from the rootfs) still needs to be fixed.  The following steps should do that:

 1. sed -Ei 's,(cr_|by-id/)-,\1nvme-,g' /etc/crypttab

 2. mkinitrd
Comment 60 Franck Bui 2017-10-30 09:23:50 UTC
BTW I think I'll add a sanity check in systemd package to detect if the broken symlink is in used in /etc/crypttab and warns if it's the case.

We'll also ask to replace it before upgrading the system so hopefully once the support for 42.3 will be ended, we could get rid of it.
Comment 61 Nicolas Rochard 2017-10-30 14:31:40 UTC
It works successfully by using your fixed package for me without editing any file manually.
Comment 67 Swamp Workflow Management 2017-11-30 17:11:47 UTC
SUSE-RU-2017:3163-1: An update that has 11 recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1004995,1035386,1039099,1040800,1045472,1048605,1050152,1053137,1053595,1055641,1063249
CVE References: 
Sources used:
SUSE Linux Enterprise Software Development Kit 12-SP3 (src):    systemd-228-150.22.1
SUSE Linux Enterprise Software Development Kit 12-SP2 (src):    systemd-228-150.22.1
SUSE Linux Enterprise Server for Raspberry Pi 12-SP2 (src):    systemd-228-150.22.1
SUSE Linux Enterprise Server 12-SP3 (src):    systemd-228-150.22.1
SUSE Linux Enterprise Server 12-SP2 (src):    systemd-228-150.22.1
SUSE Linux Enterprise Desktop 12-SP3 (src):    systemd-228-150.22.1
SUSE Linux Enterprise Desktop 12-SP2 (src):    systemd-228-150.22.1
SUSE Container as a Service Platform ALL (src):    systemd-228-150.22.1
OpenStack Cloud Magnum Orchestration 7 (src):    systemd-228-150.22.1
Comment 68 Swamp Workflow Management 2017-12-02 17:16:13 UTC
openSUSE-RU-2017:3197-1: An update that has 11 recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1004995,1035386,1039099,1040800,1045472,1048605,1050152,1053137,1053595,1055641,1063249
CVE References: 
Sources used:
openSUSE Leap 42.3 (src):    systemd-228-38.1, systemd-mini-228-38.1
openSUSE Leap 42.2 (src):    systemd-228-25.18.1, systemd-mini-228-25.18.1
Comment 69 Franck Bui 2017-12-07 16:55:05 UTC
The fix has been released, it's time to close this bug.

Thanks all for helping me sort this out.
Comment 70 Swamp Workflow Management 2018-01-09 20:19:17 UTC
SUSE-SU-2018:0053-1: An update that solves 29 vulnerabilities and has 57 fixes is now available.

Category: security (moderate)
Bug References: 1003846,1004995,1009966,1022404,1025282,1025891,1026567,1029907,1029908,1029909,1029995,1030623,1035386,1036619,1039099,1039276,1039513,1040800,1040968,1041090,1043059,1043590,1043883,1043966,1044016,1045472,1045522,1045732,1047178,1047233,1048605,1048861,1050152,1050258,1050487,1052503,1052507,1052509,1052511,1052514,1052518,1053137,1053347,1053595,1053671,1055446,1055641,1055825,1056058,1056312,1056381,1057007,1057139,1057144,1057149,1057188,1057634,1057721,1057724,1058480,1058695,1058783,1059050,1059065,1059075,1059292,1059723,1060599,1060621,1061241,1061384,1062561,1063249,1063269,1064571,1064999,1065363,1066242,1066371,1066500,1066611,1067891,1070878,1070958,1071905,1071906
CVE References: CVE-2014-3710,CVE-2014-8116,CVE-2014-8117,CVE-2014-9620,CVE-2014-9621,CVE-2014-9653,CVE-2017-12448,CVE-2017-12450,CVE-2017-12452,CVE-2017-12453,CVE-2017-12454,CVE-2017-12456,CVE-2017-12799,CVE-2017-12837,CVE-2017-12883,CVE-2017-13757,CVE-2017-14128,CVE-2017-14129,CVE-2017-14130,CVE-2017-14333,CVE-2017-14529,CVE-2017-14729,CVE-2017-14745,CVE-2017-14974,CVE-2017-3735,CVE-2017-3736,CVE-2017-3737,CVE-2017-3738,CVE-2017-6512
Sources used:
SUSE CaaS Platform ALL (src):    sles12-caasp-dex-image-2.0.0-3.3.11, sles12-dnsmasq-nanny-image-2.0.1-2.3.15, sles12-haproxy-image-2.0.1-2.3.16, sles12-kubedns-image-2.0.1-2.3.11, sles12-mariadb-image-2.0.1-2.3.15, sles12-openldap-image-2.0.0-2.3.11, sles12-pause-image-2.0.1-2.3.9, sles12-pv-recycler-node-image-2.0.1-2.3.10, sles12-salt-api-image-2.0.1-2.3.10, sles12-salt-master-image-2.0.1-2.3.10, sles12-salt-minion-image-2.0.1-2.3.14, sles12-sidecar-image-2.0.1-2.3.11, sles12-tiller-image-2.0.0-2.3.11, sles12-velum-image-2.0.1-2.3.13