Bug 1198095 - System won't boot -- hibernate/resume/crypto issue
System won't boot -- hibernate/resume/crypto issue
Status: NEW
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Basesystem
Leap 15.4
x86-64 openSUSE Leap 15.4
: P5 - None : Normal (vote)
: ---
Assigned To: YaST Team
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2022-04-05 13:39 UTC by Neil Rickert
Modified: 2022-04-28 08:05 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Yast logs from an install of Build208.2 (Leap 15.4 Beta) (1.71 MB, application/x-xz)
2022-04-11 18:59 UTC, Neil Rickert
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Neil Rickert 2022-04-05 13:39:16 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0
Build Identifier: 

I installed Leap 15.4 last November in a libvirt virtual machine.  I have been regularly updating.

Yesterday I updated (with "zypper dup") to the latest build.  And the system would not boot.

The last two lines that I see are:
 -----
[  OK  ] Reached target Remote File Systems
[ ***  ] A start job is running for /dev/mapper/cr_swap (26s / no limit)
 -----
The time keeps counting up.  I have tried letting it go to 10 minutes.  But nothing more ever happens.  (It does recognize CTRL-ALT-DEL to reboot).

This used to work fine until the recent update.

Swap is randomly encrypted:
  -- cat /etc/crypttab --
cr_swap  /dev/sda1  /dev/urandom  swap
  --

The grub2 boot kernel line contains: resume=/dev/mapper/cr_swap

If I change that to: noresume

it then boots fine.  I'm not sure what changed, but it seems that the crypto is being delayed until after the resume from hibernation.  And that's not going to work.

As an additional test, I cloned the VM, and did a clean install with the same use of randomly encrypted swap.  And the failure was the same.  The kernel boot parameters set by install do not work in this case.

I also tested another similar install, using LUKS encrypted swap.  And that worked just fine.



Reproducible: Always
Comment 1 Neil Rickert 2022-04-05 15:52:30 UTC
I just tried a test install of Tumbleweed 20220404

I set it up the same way (with randomly encrypted swap).

Tumbleweed boots up fine this way.  So the problem may be limited to Leap 15.4
Comment 2 Michael Chang 2022-04-08 10:08:31 UTC
Hi Antonio and Thomas

Does it ring any bell to you guys ? Thanks.
Comment 3 Antonio Feijoo 2022-04-08 10:49:14 UTC
Now the initrd handles the resume kernel argument in more cases. In your system the boot process hangs until the device specified with the resume parameter is available. Does crypttab contain an entry to decrypt cr_swap at boot time?

I you don't want to use suspension/hibernation, you should remove the resume argument from the grub entry (/etc/default/grub : GRUB_CMDLINE_LINUX_DEFAULT) and regenerate the grub config file.
Comment 4 Neil Rickert 2022-04-08 14:39:19 UTC
Responding to Antonio (at c#3 ):

I included the content of "/etc/crypttab" in my initial report on this.  It is only one line, so perhaps you skipped past it.

On the question of GRUB_CMDLINE_LINUX_DEFAULT (in "/etc/default/grub") -- yes, I have already changed that on my system.  And if that were the only issue, then a comment in release-notes would be sufficient.  The problem, however, is that the installer sets it up that way.  So it is already broken as installed.  I did a clean install into a VM to test this.
Comment 5 Antonio Feijoo 2022-04-08 14:54:45 UTC
(In reply to Neil Rickert from comment #4)
> Responding to Antonio (at c#3 ):
> 
> I included the content of "/etc/crypttab" in my initial report on this.  It
> is only one line, so perhaps you skipped past it.

Yes, sorry. Did you regenerate the initrd after changing the crypttab file (dracut -f)?

What's the output of:

# lsinitrd -f etc/crypttab
Comment 6 Neil Rickert 2022-04-08 17:30:41 UTC
> Did you regenerate the initrd after changing the crypttab file (dracut -f)?

I did not change "crypttab".  It is still the same as at install.

> What's the output of:

> # lsinitrd -f etc/crypttab

I just get empty output.  It seems that "crypttab" is not in the initrd.  I guess that's the problem.  Regenerating the "initrd" with "mkinitrd" does not change that.
Comment 7 Sebastian Wagner 2022-04-09 06:38:33 UTC
I have the same issue since Thursday with tw. It happens with both 5.16.15-1 and 5.17.1-1. I can't say what exactly (which package update) changed on that day/the day before.

initrd's crypttab does not contain the encrypted swap, although /etc/crypttab does and mkinitrd does not bring it there.
Comment 8 Michael Chang 2022-04-11 04:21:30 UTC
Hi Antonio,

As long as it looks like being dracut related issue thus far, I reassign to you and hope you are ok with that. Thanks.
Comment 9 Antonio Feijoo 2022-04-11 15:06:04 UTC
(In reply to Michael Chang from comment #8)
> Hi Antonio,
> 
> As long as it looks like being dracut related issue thus far, I reassign to
> you and hope you are ok with that. Thanks.

Thanks Michael, I forgot to take it.

(In reply to Neil Rickert from comment #6)
> > Did you regenerate the initrd after changing the crypttab file (dracut -f)?
> 
> I did not change "crypttab".  It is still the same as at install.
> 
> > What's the output of:
> 
> > # lsinitrd -f etc/crypttab
> 
> I just get empty output.  It seems that "crypttab" is not in the initrd.  I
> guess that's the problem.  Regenerating the "initrd" with "mkinitrd" does
> not change that.

I missed one thing: the system cannot hibernate/suspend using a swap partition encrypted with a volatile random key. So, dracut is not failing because the initrd should not contain its crypttab entry. But, the installer should not add a resume= parameter to the grub command line in this case. I cannot reproduce it with Leap 15.4 Build208.2, so I must assume that the installer issue is already fixed.

(In reply to Neil Rickert from comment #1)
> I just tried a test install of Tumbleweed 20220404
> 
> I set it up the same way (with randomly encrypted swap).
> 
> Tumbleweed boots up fine this way.  So the problem may be limited to Leap
> 15.4

Indeed. I cannot reproduce it with TW.


Thanks to this bug report we found a couple of issues related to the inclusion of the dracut resume module:
- The resume module is being added even if there is not any suitable swap.
- The sanity check of the resume module can be improved by verifying if the kernel command line contains a resume= argument pointing to a volatile swap.

TW PR: https://github.com/openSUSE/dracut/pull/171
Comment 10 Neil Rickert 2022-04-11 18:59:31 UTC
Created attachment 858046 [details]
Yast logs from an install of Build208.2 (Leap 15.4 Beta)

>I cannot reproduce
>it with Leap 15.4 Build208.2, so I must assume that the installer issue is
>already fixed.

No, the installer is not fixed.  I did test this with Build208.2.

I just did another test, again with Build208.2.  And again, the problem occurred.  My install took the defaults for booting, and these included the "resume=" parameter.  I am attaching the Yast logs for this install.
Comment 11 Antonio Feijoo 2022-04-12 06:31:08 UTC
(In reply to Neil Rickert from comment #10)
> Created attachment 858046 [details]
> Yast logs from an install of Build208.2 (Leap 15.4 Beta)
> 
> >I cannot reproduce
> >it with Leap 15.4 Build208.2, so I must assume that the installer issue is
> >already fixed.
> 
> No, the installer is not fixed.  I did test this with Build208.2.
> 
> I just did another test, again with Build208.2.  And again, the problem
> occurred.  My install took the defaults for booting, and these included the
> "resume=" parameter.  I am attaching the Yast logs for this install.

I can't find any reference to resume=/dev/mapper/cr_swap on the kernel command line by searching the YaST logs... Anyway, there is no dracut issue here, I'll pass this bug to the YaST team so they can verify if the installer part is solved and close it.