Bug 1213227 - Recent update broke boot for me, raid not assembled, probably initramfs issue, very custom disk/partition layout
Summary: Recent update broke boot for me, raid not assembled, probably initramfs issue...
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Current
Hardware: x86-64 openSUSE Tumbleweed
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: dracut maintainers
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-11 20:06 UTC by Eric van Blokland
Modified: 2024-05-17 20:50 UTC (History)
6 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
antonio.feijoo: SHIP_STOPPER-
antonio.feijoo: CCB_Review-


Attachments
Output from lsblk (3.14 KB, text/plain)
2023-07-11 20:06 UTC, Eric van Blokland
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eric van Blokland 2023-07-11 20:06:15 UTC
Created attachment 868152 [details]
Output from lsblk

After I updated last week I can no longer boot my Tumbleweed. Switching kernels does not appear to help. Something is wrong with the initramfs, but I've been unable to determine the exact cause.

Issues started after updating on 2023-07-04.
Previous update at 2023-06-24, after which everything was fine.

Booting any kernel drops to the emergency shell because of missing disks. I've determined that the main culprit is a md raid(1) not being started. The raid itself is fine. In the emergency shell I can start the raid manually using "mdadm --assemble --scan" or "mdadm --assemble /dev/md/system"

Simplified disk layout (lsblk attached)

sda1 -> vfat/efi boot  
sda2 -> raid1 /dev/md/boot
sda3 -> raid1 /dev/md/system

sdb2 -> raid1 /dev/md/boot
sdb3 -> raid1 /dev/md/system

/dev/md/system -> luks cr-system

cr-system -> volume group

-----------

I've noticed a change in the dracut kernel cmdline before and after the update. The raid containing the encrypted file system is dropped and instead I get two extra duplicates for the luks device.
I've tried to manually configure the dracut kernel cmdline to have the missing raid included, but then I get the explicit error there is no device with that uuid. I can still manually start the raid.

Before: 

rd.driver.pre=btrfs
 rd.luks.uuid=luks-630b0167-da92-4009-ac8f-92f8f48c1206
 rd.lvm.lv=system/os   rd.lvm.lv=system/swap  
 rd.md.uuid=eeba5271:9fe5c2a5:4817b844:89ba3077  rd.md.uuid=49ed23b9:3bc8323b:7588fac8:34380f75 
 resume=UUID=141d070b-87ac-4770-9130-7d5dfbdfbdd1
 root=UUID=5373d6d7-8a50-4e03-a066-186f575de289 rootfstype=btrfs rootflags=rw,relatime,ssd,space_cache,subvolid=264,subvol=/@/.snapshots/1/snapshot,subvol=@/.snapshots/1/snapshot

After:

rd.driver.pre=btrfs
 rd.luks.uuid=luks-630b0167-da92-4009-ac8f-92f8f48c1206 rd.luks.uuid=luks-630b0167-da92-4009-ac8f-92f8f48c1206 rd.luks.uuid=luks-630b0167-da92-4009-ac8f-92f8f48c1206
 rd.lvm.lv=system/swap   rd.lvm.lv=system/os  
 rd.md.uuid=eeba5271:9fe5c2a5:4817b844:89ba3077 
 resume=UUID=141d070b-87ac-4770-9130-7d5dfbdfbdd1
 root=UUID=5373d6d7-8a50-4e03-a066-186f575de289 rootfstype=btrfs rootflags=rw,relatime,ssd,space_cache,subvolid=264,subvol=/@/.snapshots/1/snapshot,subvol=@/.snapshots/1/snapshot

----------

Finally I've noticed something strange with the output from blkid. The members for the raid that wont start get reported as  TYPE="crypto_LUKS" instead of "linux_raid_member". Using an old Tumbleweed Live (at least a year old) I was able to determine that on that system blkid do report these disk as TYPE="linux_raid_member". Could there be an udev related issue?

blkid /dev/sda2

/dev/sda2: UUID="eeba5271-9fe5-c2a5-4817-b84489ba3077" UUID_SUB="f1377829-2a10-4ef8-142b-631fc5b07525" LABEL="any:boot" TYPE="linux_raid_member" PARTUUID="7f45f070-708e-4e8a-8f60-1cbd8dfca603"

blkid /dev/sda3

/dev/sda3: UUID="630b0167-da92-4009-ac8f-92f8f48c1206" TYPE="crypto_LUKS" PARTUUID="4ee33cec-5d0c-4d52-984a-cc624cb7752e"

fdisk -l /dev/sda

Disk /dev/sda: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: Samsung SSD 870 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: B24C1F5A-FB28-4894-8009-1429D0973EBD

Device       Start        End    Sectors  Size Type
/dev/sda1     2048    1050623    1048576  512M EFI System
/dev/sda2  1050624    2099199    1048576  512M Linux RAID
/dev/sda3  2099200 3438071807 3435972608  1.6T Linux RAID

I've been unable to fix this for a week, please help...
Comment 1 Eric van Blokland 2023-07-11 20:35:31 UTC
Regarding the output from blkid:

os-release snippet from my Live Tumbleweed:
ID="opensuse-tumbleweed"
ID_LIKE="opensuse suse"
VERSION_ID="20220124"

Output for blkid /dev/sda3
/dev/sda3: UUID="49ed23b9-3bc8-323b-7588-fac834380f75" UUID_SUB="84d7f60a-4f10-a12a-1a6e-8f78a0777203" LABEL="any:system" TYPE="linux_raid_member" PARTUUID="4ee33cec-5d0c-4d52-984a-cc624cb7752e"

Now using the Live environment I unlock the encrypted disks, start the volume group, mount the root file system and chroot. 

Output for blkid /dev/sda3
/dev/sda3: UUID="630b0167-da92-4009-ac8f-92f8f48c1206" TYPE="crypto_LUKS" PARTUUID="4ee33cec-5d0c-4d52-984a-cc624cb7752e"

I finally figured out how to temporarily boot this system, so for whoever has a similar issue:

1. Edit the kernel options in grub, at the end add "rd.break=initqueue", this will drop you to the emergency shell early.
2. Log in with your root password
3. Make the missing disk(s) available (in my case mdadm --assemble --scan)
4. Type "systemctl default" and if you are fast enough and timers haven't expired you might actually get to unlock your luks disk.
Comment 2 Antonio Feijoo 2023-07-12 10:48:03 UTC
(In reply to Eric van Blokland from comment #0)

> After I updated last week I can no longer boot my Tumbleweed. Switching
> kernels does not appear to help. Something is wrong with the initramfs, but
> I've been unable to determine the exact cause.
> 
> Issues started after updating on 2023-07-04.
> Previous update at 2023-06-24, after which everything was fine.
> 
> Booting any kernel drops to the emergency shell because of missing disks.
> I've determined that the main culprit is a md raid(1) not being started. The
> raid itself is fine. In the emergency shell I can start the raid manually
> using "mdadm --assemble --scan" or "mdadm --assemble /dev/md/system"

If you cannot boot with an old grub entry using the previous kernel that worked with the 20230624 snapshot, then it must be a dracut issue. But, the latest dracut changes were introduced with the 20230621 snapshot, and you said 20230624 snapshot was working for you, right?

> Finally I've noticed something strange with the output from blkid. The
> members for the raid that wont start get reported as  TYPE="crypto_LUKS"
> instead of "linux_raid_member". Using an old Tumbleweed Live (at least a
> year old) I was able to determine that on that system blkid do report these
> disk as TYPE="linux_raid_member". Could there be an udev related issue?

dracut add a `rd.md.uuid=` cmdline option for each fs type with *_raid_member, and it uses `blkid` internally to get that, so maybe the latest util-linux update (20230624 snapshot) has anything to do with this issue. Fabian, what do you think?

Could you add `rd.debug rd.udev.log_priority=debug printk.devkmsg=on` to the kernel command line and attach the output of `/run/initramfs/rdsosreport.txt` and `journalctl -b -o short-monotonic`?
Comment 3 Eric van Blokland 2023-07-12 11:08:34 UTC
(In reply to Antonio Feijoo from comment #2)
> (In reply to Eric van Blokland from comment #0)
> 
> > After I updated last week I can no longer boot my Tumbleweed. Switching
> > kernels does not appear to help. Something is wrong with the initramfs, but
> > I've been unable to determine the exact cause.
> > 
> > Issues started after updating on 2023-07-04.
> > Previous update at 2023-06-24, after which everything was fine.
> > 
> > Booting any kernel drops to the emergency shell because of missing disks.
> > I've determined that the main culprit is a md raid(1) not being started. The
> > raid itself is fine. In the emergency shell I can start the raid manually
> > using "mdadm --assemble --scan" or "mdadm --assemble /dev/md/system"
> 
> If you cannot boot with an old grub entry using the previous kernel that
> worked with the 20230624 snapshot, then it must be a dracut issue. But, the
> latest dracut changes were introduced with the 20230621 snapshot, and you
> said 20230624 snapshot was working for you, right?
> 

Yes, 20230624 was working. I am pretty sure I am facing two issues though. 

1. The dracut cmdline is missing the raid, possibly due to the blkid output

2. During boot/initramfs there is a complaint about a missing disk/by-id dev node for the raid uuid (after including it manually in the cmdline)

But I guess these two issues could have the same underlying cause.

> > Finally I've noticed something strange with the output from blkid. The
> > members for the raid that wont start get reported as  TYPE="crypto_LUKS"
> > instead of "linux_raid_member". Using an old Tumbleweed Live (at least a
> > year old) I was able to determine that on that system blkid do report these
> > disk as TYPE="linux_raid_member". Could there be an udev related issue?
> 
> dracut add a `rd.md.uuid=` cmdline option for each fs type with
> *_raid_member, and it uses `blkid` internally to get that, so maybe the
> latest util-linux update (20230624 snapshot) has anything to do with this
> issue. Fabian, what do you think?
> 
> Could you add `rd.debug rd.udev.log_priority=debug printk.devkmsg=on` to the
> kernel command line and attach the output of
> `/run/initramfs/rdsosreport.txt` and `journalctl -b -o short-monotonic`?

I will be able to provide the additional logs later this evening.
Comment 4 Fabian Vogt 2023-07-12 14:13:55 UTC
I created a similar setup in a VM, which ends up using a similar dracut cmdline with no MD UUIDs but multiple LUKS ones. It still boots fine though, probably because the MD array has auto assembly enabled or something. Do you have that disabled?

In any case, the blkid misdetection needs to be fixed. That turned out to be a regression upstream, I created a revert PR: https://github.com/util-linux/util-linux/pull/2373
Comment 5 Eric van Blokland 2023-07-12 14:49:01 UTC
(In reply to Fabian Vogt from comment #4)
> I created a similar setup in a VM, which ends up using a similar dracut
> cmdline with no MD UUIDs but multiple LUKS ones. It still boots fine though,
> probably because the MD array has auto assembly enabled or something. Do you
> have that disabled?

If I've read the dracut manual correctly, having rd.md.uuid in the cmdline enables assembly for only the specified uuid(s). I believe without specific uuids auto assembly is enabled unless rd.md=0 is specified.
Since I have another raid that is detected correctly, rd.md.uuid is added to cmdline. It doesn't appear to be possible to discard the automatically generated cmdline, so I always end up with at least one rd.md.uuid.

> In any case, the blkid misdetection needs to be fixed. That turned out to be
> a regression upstream, I created a revert PR:
> https://github.com/util-linux/util-linux/pull/2373

Nice find. I'm pretty sure my superblock is at the end of the device.

Is it correct to assume that due to the blkid behaviour udev does no longer automatically create the disk/by-id device node? If so all my issues are explained and I guess you no longer need the logs.

Regarding the discussion in the pull request: 

I guess probing beyond the start header should be disallowed for certain devices. Is the flag BLKID_FL_OPAL_LOCKED in the blkid_probe parameter? If so reading beyond the header could be skipped in linux_raid.c if that flag is set.
Comment 6 Fabian Vogt 2023-07-13 06:39:16 UTC
(In reply to Eric van Blokland from comment #5)
> (In reply to Fabian Vogt from comment #4)
> > I created a similar setup in a VM, which ends up using a similar dracut
> > cmdline with no MD UUIDs but multiple LUKS ones. It still boots fine though,
> > probably because the MD array has auto assembly enabled or something. Do you
> > have that disabled?
> 
> If I've read the dracut manual correctly, having rd.md.uuid in the cmdline
> enables assembly for only the specified uuid(s). I believe without specific
> uuids auto assembly is enabled unless rd.md=0 is specified.
> Since I have another raid that is detected correctly, rd.md.uuid is added to
> cmdline. It doesn't appear to be possible to discard the automatically
> generated cmdline, so I always end up with at least one rd.md.uuid.
> 
> > In any case, the blkid misdetection needs to be fixed. That turned out to be
> > a regression upstream, I created a revert PR:
> > https://github.com/util-linux/util-linux/pull/2373
> 
> Nice find. I'm pretty sure my superblock is at the end of the device.
> 
> Is it correct to assume that due to the blkid behaviour udev does no longer
> automatically create the disk/by-id device node? If so all my issues are
> explained and I guess you no longer need the logs.

A test package should be available soon, you can try https://download.opensuse.org/repositories/home:/favogt:/boo1213227/openSUSE_Tumbleweed (add as repo, use zypper dup --from <repo>)

> Regarding the discussion in the pull request: 
> 
> I guess probing beyond the start header should be disallowed for certain
> devices. Is the flag BLKID_FL_OPAL_LOCKED in the blkid_probe parameter? If
> so reading beyond the header could be skipped in linux_raid.c if that flag
> is set.

It looks like the code already handles OPAL detection so it should "just work" (tm) with the change reverted.
Comment 7 Eric van Blokland 2023-07-13 06:59:16 UTC
(In reply to Fabian Vogt from comment #6)
> (In reply to Eric van Blokland from comment #5)
> > (In reply to Fabian Vogt from comment #4)
> > > I created a similar setup in a VM, which ends up using a similar dracut
> > > cmdline with no MD UUIDs but multiple LUKS ones. It still boots fine though,
> > > probably because the MD array has auto assembly enabled or something. Do you
> > > have that disabled?
> > 
> > If I've read the dracut manual correctly, having rd.md.uuid in the cmdline
> > enables assembly for only the specified uuid(s). I believe without specific
> > uuids auto assembly is enabled unless rd.md=0 is specified.
> > Since I have another raid that is detected correctly, rd.md.uuid is added to
> > cmdline. It doesn't appear to be possible to discard the automatically
> > generated cmdline, so I always end up with at least one rd.md.uuid.
> > 
> > > In any case, the blkid misdetection needs to be fixed. That turned out to be
> > > a regression upstream, I created a revert PR:
> > > https://github.com/util-linux/util-linux/pull/2373
> > 
> > Nice find. I'm pretty sure my superblock is at the end of the device.
> > 
> > Is it correct to assume that due to the blkid behaviour udev does no longer
> > automatically create the disk/by-id device node? If so all my issues are
> > explained and I guess you no longer need the logs.
> 
> A test package should be available soon, you can try
> https://download.opensuse.org/repositories/home:/favogt:/boo1213227/
> openSUSE_Tumbleweed (add as repo, use zypper dup --from <repo>)
> 

This resolves both issues.

1. Both raids are automatically added to the cmdline when regenerating the initramfs
2. The device node for the raid in disk/by-id is now present during boot

Lovely, thanks a lot for looking into this.
Comment 8 t neo 2023-07-17 13:19:54 UTC
I want to try the provided fix, following my ticket here: https://bugzilla.opensuse.org/show_bug.cgi?id=1213361

Am I doing something wrong as the URL reports "Resource is no longer available!"?
Comment 9 Antonio Feijoo 2023-07-17 13:23:13 UTC
(In reply to t neo from comment #8)
> I want to try the provided fix, following my ticket here:
> https://bugzilla.opensuse.org/show_bug.cgi?id=1213361
> 
> Am I doing something wrong as the URL reports "Resource is no longer
> available!"?

FWIW, snapshot 20230716 already ships util-linux with the fix.
Comment 10 t neo 2023-07-17 13:37:32 UTC
Moved to snapshot 20230716. And my issue is not resolved.
Comment 11 Antonio Feijoo 2023-07-17 14:26:23 UTC
(In reply to t neo from comment #10)
> Moved to snapshot 20230716. And my issue is not resolved.

Hmm, although the patch was already applied, after checking http://download.opensuse.org/tumbleweed/repo/oss/x86_64/ it looks like util-linux is still in the previous 2.39-2.1 version (only util-linux-systemd was upgraded to 2.39-3.1), and its build for x86_64 is failing in https://build.opensuse.org/package/show/openSUSE:Factory/util-linux
Comment 12 Fabian Vogt 2023-07-17 15:06:32 UTC
(In reply to Antonio Feijoo from comment #11)
> (In reply to t neo from comment #10)
> > Moved to snapshot 20230716. And my issue is not resolved.
> 
> Hmm, although the patch was already applied, after checking
> http://download.opensuse.org/tumbleweed/repo/oss/x86_64/ it looks like
> util-linux is still in the previous 2.39-2.1 version (only
> util-linux-systemd was upgraded to 2.39-3.1), and its build for x86_64 is
> failing in
> https://build.opensuse.org/package/show/openSUSE:Factory/util-linux

There was indeed a random build fail, it's green now. Snapshot 20230717 will have it.
Comment 13 t neo 2023-07-18 22:00:07 UTC
Works again. Thank you!