Bug 1060226 - Boot-Failure after applying Patch openSUSE-2017-847 / 950 / 1005 on LUKS encrypted NVMe devices
Summary: Boot-Failure after applying Patch openSUSE-2017-847 / 950 / 1005 on LUKS encr...
Status: RESOLVED DUPLICATE of bug 1063249
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Maintenance (show other bugs)
Version: Leap 42.3
Hardware: x86-64 openSUSE 42.3
: P5 - None : Critical with 15 votes (vote)
Target Milestone: ---
Assignee: Daniel Molkentin
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-25 12:57 UTC by Forgotten User XlNtqid6F5
Modified: 2017-10-27 09:57 UTC (History)
4 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: Yes
Marketing QA Status: ---
IT Deployment: ---


Attachments
output of journalctl (96.50 KB, text/plain)
2017-09-25 12:57 UTC, Forgotten User XlNtqid6F5
Details
output of systemctl (16.76 KB, text/plain)
2017-09-25 12:58 UTC, Forgotten User XlNtqid6F5
Details
output of 'systemctl status' (831 bytes, text/plain)
2017-09-25 12:59 UTC, Forgotten User XlNtqid6F5
Details
zypper history (127.16 KB, text/plain)
2017-09-25 12:59 UTC, Forgotten User XlNtqid6F5
Details
udev file 60-persistent-storage.rules used by the YaST installer (7.04 KB, text/plain)
2017-09-30 01:43 UTC, Jonathan Cottrill
Details
Screenshot from YaST installer showing Device IDs of NVMe partition (140.74 KB, image/png)
2017-09-30 01:44 UTC, Jonathan Cottrill
Details
/etc/crypttab created by YaST installer (94 bytes, text/plain)
2017-09-30 01:45 UTC, Jonathan Cottrill
Details
/dev/disk/by-id/ listing while running YaST installer (1.22 KB, text/plain)
2017-09-30 01:45 UTC, Jonathan Cottrill
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Forgotten User XlNtqid6F5 2017-09-25 12:57:44 UTC
Created attachment 741780 [details]
output of journalctl

We did a fresh install of Leap 42.3 to an Samsung NVMe SSD (2TB 960 Pro M.2). The first boot after installation works fine. Applying all Patches with a 'zypper patch' will result in a boot-failure at next startup. On next startup the question for the luks/crypt-password is missing.
After approx. 180 seconds timeout the Systems comes up in emergency mode with the following output:

 dracut-initqueue[307]: Warning: dracut-initqueue timeout - starting timeout scripts
 Sep 22 10:41:18 linux-6bio dracut-initqueue[307]: Warning: Could not boot.
 Sep 22 10:41:18 linux-6bio dracut-initqueue[307]: Warning: /dev/mapper/system-root does not exist
 Sep 22 10:41:18 linux-6bio dracut-initqueue[307]: Warning: /dev/system/root does not exist
 Sep 22 10:41:18 linux-6bio dracut-initqueue[307]: Warning: /dev/system/swap does not exist
 Sep 22 10:41:18 linux-6bio systemd[1]: Starting Setup Virtual Console...
 Sep 22 10:41:18 linux-6bio systemd[1]: Started Setup Virtual Console.
 Sep 22 10:41:18 linux-6bio systemd[1]: Starting Dracut Emergency Shell...


some debugging:

1) New installation (Server / Text-Mode), no changes to default configuration / package selection

2a) Partitioning: LVM-Based proposal, ext4, no seperate home >> applying patch 'openSUSE-2017-847' >> boot OK
2b) Partitioning: LVM-Based proposal WITH CRYPT/LUKS, ext4, no seperate home >> applying patch 'openSUSE-2017-847' >> boot FAILS!

3) Applying patch 'openSUSE-2017-847' also installs 'openSUSE-2017-950' and 'openSUSE-2017-1005'.

4) The boot fails only on crypted NVMe drives. SATA drives are not affected.

5) Two different Hardware Systems (Notebook / Desktop-PC) have been testet.


I have attached some files from the emergency shell.
Comment 1 Forgotten User XlNtqid6F5 2017-09-25 12:58:35 UTC
Created attachment 741781 [details]
output of systemctl
Comment 2 Forgotten User XlNtqid6F5 2017-09-25 12:59:05 UTC
Created attachment 741782 [details]
output of 'systemctl status'
Comment 3 Forgotten User XlNtqid6F5 2017-09-25 12:59:33 UTC
Created attachment 741783 [details]
zypper history
Comment 4 Forgotten User XlNtqid6F5 2017-09-27 12:11:38 UTC
more debugging:

1) downgrade dracut to 044.1-23.2

2017-09-27 11:32:05|install|dracut|044.1-23.2|x86_64|root@linux-210y|repo-update|e3d230b5e79de0a603d1f5e4916760c965c908cf0d890fc1ebeb944ef1fb6c33|

1b) mkinitrd + reboot
   
    >> boot FAILS / no improvement


2) downgrade dracut to 044-21.7 (with dependencies)

2017-09-27 11:39:52|install|systemd|228-27.2|x86_64|root@linux-210y|repo-oss|4bb2106ddadc9a02cbf5bc41db0cb23f936d4db0|
2017-09-27 11:39:52|install|udev|228-27.2|x86_64|root@linux-210y|repo-oss|6387f8b3aeb6926614d1d678823e25e1266b5076|
2017-09-27 11:39:52|install|systemd-sysvinit|228-27.2|x86_64|root@linux-210y|repo-oss|189fc4dccc70dd02414a4f6df6b3deac13af1587|
2017-09-27 11:39:53|install|dracut|044-21.7|x86_64|root@linux-210y|repo-oss|36de01742836205d426e946744525d96ab399b2b|

2b) mkinitrd + reboot
   
    >> boot OK / Password query appears
Comment 5 Jonathan Cottrill 2017-09-27 23:12:36 UTC
I'm also experiencing this issue. I've traced it specifically to an update from udev-228-29.1 to udev-228-32.2 (with corresponding systemd update). Updating to udev-228-35.1 also causes the issue, but the breakage first appears with udev-228-32.2.

This appears to be caused by a udev rules change. I notice that rules affecting NVMe were modified by the update. Here are steps showing that the udev rules are the culprit:

* Install Leap 42.3
* Install all updates EXCEPT udev/systemd
* Reboot, confirming normal system operation
* Save old version of /usr/lib/udev/rules.d/60-persistent-storage.rules
* Update udev/systemd to 228-32.2 (or 228-35.1) with zypper
* Reboot, confirming boot failure (with no crypto password prompt after Grub)
* Use rescue system to manually tweak /boot/initrd-4.4.87-25-default:
  * Remove /usr/lib/udev/rules.d/61-persistent-storage-compat.rules file added by update
  * Replace /usr/lib/udev/rules.d/60-persistent-storage.rules with saved copy
* Reboot, confirming normal system operation (crypto password prompt restored, normal boot occurs, etc.)

In other words, reverting only the two udev rules files in the initrd image is sufficient to "fix" the problem.

I'll add that this system is using UEFI w/Secure Boot and GPT partitioning, if that makes a difference.

Please let me know if I can provide any more information; I'm happy to help with  udev logs or whatever is needed.
Comment 6 Jonathan Cottrill 2017-09-30 01:42:03 UTC
More information:

LVM isn't actually involved; a system with a simple partition for the root filesystem will have the same issue if you check the box in the YaST installer to encrypt the partition. I've edited the title of the bug accordingly.

Here's the full cause of the issue:

udev file 60-persistent-storage.rules that ships with the YaST installer for 42.3 (attached to this bug as yast-installer-60-persistent-storage-rules.txt) has a rule (under the SCSI devices section, bizarrely) that creates an improperly-named symlink to each NVMe partition:

KERNEL=="sd*|cciss*|nvme*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n"

The flaw in the rule is the $env{ID_BUS} token; the variable ID_BUS has never been set for NVMe partitions (for disks, yes; for *partitions*, no). This results in a symlink with a leading hyphen, like -TRNSN34098GGX_NVMe_TOSHIBA_1024GB_10AZR11Z5QADR-part2, being created under /dev/disk/by-id.

There are two other symlinks created for the NVMe partition by other udev rules; unfortunately, the YaST installer picks up the misnamed one as Device ID 1, and this is what it puts in /etc/crypttab. (See yast-installer-device-id.png and etc-crypttab.txt, attached to this bug. A full listing of /dev/disk/by-id/ as it exists during the YaST installer's execution is attached as dev-disk-by-id.txt.)

Commit 63da94f (https://github.com/openSUSE/systemd/commit/63da94fcce059ac153be9b20657ae9fcc3b61e06) on July 3 modified the udev rule such that it no longer applied to NVMe devices:

KERNEL=="sd*|cciss*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n"

This means installing builds of udev containing this commit will cause the symlink placed in /etc/crypttab by the YaST installer to no longer be created, and the system will fail to boot.

Workaround:

After installing openSUSE Leap 42.3, but before installing updates, edit /etc/crypttab and fix /dev/disk/by-id/-<whatever> to be /dev/disk/by-id/nvme-<whatever>. This change needs to be added to the initrd image, as well; either run dracut -f, or just install updates (when the new udev/systemd package is installed, zypper will automatically run dracut for you).

Thomas Altrock, can you please confirm that this workaround works for you?
Comment 7 Jonathan Cottrill 2017-09-30 01:43:18 UTC
Created attachment 742641 [details]
udev file 60-persistent-storage.rules used by the YaST installer
Comment 8 Jonathan Cottrill 2017-09-30 01:44:22 UTC
Created attachment 742642 [details]
Screenshot from YaST installer showing Device IDs of NVMe partition
Comment 9 Jonathan Cottrill 2017-09-30 01:45:20 UTC
Created attachment 742643 [details]
/etc/crypttab created by YaST installer
Comment 10 Jonathan Cottrill 2017-09-30 01:45:52 UTC
Created attachment 742644 [details]
/dev/disk/by-id/ listing while running YaST installer
Comment 11 Forgotten User XlNtqid6F5 2017-10-02 09:23:34 UTC
I have testet the workaround some minutes ago and i can CONFIRM that it works!

@Jonathan: thanks for debugging and support!
Comment 12 Franck Bui 2017-10-27 09:57:53 UTC
@Jonathan thanks for sorting this out.

It's actually a duplicate of bug 1063249.

*** This bug has been marked as a duplicate of bug 1063249 ***