|
Bugzilla – Full Text Bug Listing |
| Summary: | kernel: EXT4-fs error (device dm-0): ext4_mark_recovery_complete:6245: comm mount: Orphan file not empty on read-only fs. | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | Jonas Kvinge <jonaski> |
| Component: | Kernel:Filesystems | Assignee: | Luis Henriques <lhenriques> |
| Status: | NEW --- | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | jonaski |
| Version: | Current | ||
| Target Milestone: | --- | ||
| Hardware: | aarch64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
Jonas Kvinge
2024-06-06 12:32:08 UTC
This seems to point at an issue with e2fsck, which reports a successful check:
> ROOT: clean, 65957/323840 files, 1043456/1306624 blocks
but it then fails mounting.
I guess you may have tried this already but it's worth asking: have you tried to manually rerun e2fsck against that file system? Did it still a similar output (e.g. that it cleaned inode 18323, 48587, etc...)?
(There's a newer upstream release of the e2fsprogs that seem to have a few fixes that seem related to orphan list processing, but unfortunately it's not on TW yet.)
I have run fsck in rescue when the system didn't automatically boot, no issues found, so I just rebooted the system from dracut-ssh with reboot -f and it boots, I've also added fsck.mode=force so fsck is run on each boot, and no filesystem issues are found. But the issue have occurred several times on 2 different RPI's, it doesn't boot even if filesystem errors are corrected. Awesome, thanks for confirming. So, here's my theory: e2fsck _claims_ to have fixed the issue. Then, when filesystem is being mounted, it detects inconsistencies (because e2fsck didn't fix them). And, because the filesystem is being mounted read-only, the kernel can not fix it either and simply aborts the operation. It looks like, by re-running e2fsck, the filesystem is finally fixed, so it might be something "simple", but I'll need to dig deeper and try to reproduce it. Thank you for reporting, I'll see what I can find and report back here. Here's my understanding of what happens: there was an application (or applications) that had files open and those files were deleted while still open. Which means that these files were added to the 'orphan files' list. And the power outage happen while this list still had files in it. After a reboot, the e2fsck tries to clear this list but doesn't clean the inode that manages the list itself. When the kernel tries to mount it, it correctly detects the filesystem still requires recovery and, because it is being mounted read-only, it can not do the recovery itself and fails. Later, when you manually run e2fsck it will finally do the full recovery and the filesystem can then be mounted. All this to say that this seems to be a bug in e2fsck for which I've sent out a fix upstream[1]. Hopefully it won't take long for a fix to be merged. Again, thank you for your report. [1] https://lore.kernel.org/all/20240611142704.14307-1-luis.henriques@linux.dev/ Thank you. After I added "fsck.mode=force" I could no longer reproduce the issue when cutting power, as soon as I removed "fsck.mode=force" it's stuck almost every time the power is cut. Even if you fixed the issue, is it a good idea to keep fsck.mode=force for machines that are often shut down by cutting power? I assume it does not hurt, since it only takes a few seconds more when booting. This is what I'm seeing now when running fsck manually: initramfs-ssh:/root# fsck -f -y /dev/mmcblk0p1 fsck from util-linux 2.40.1 Cannot initialize conversion from codepage 850 to ANSI_X3.4-1968: Invalid argument Cannot initialize conversion from ANSI_X3.4-1968 to codepage 850: Invalid argument Using internal CP850 conversion table fsck.fat 4.2 (2021-01-31) Dirty bit is set. Fs was not properly unmounted and some data may be corrupt. Automatically removing dirty bit. *** Filesystem was changed *** Writing changes. /dev/mmcblk0p1: 360 files, 3261/40915 clusters initramfs-ssh:/root# fsck -f -y /dev/mmcblk0p2 fsck from util-linux 2.40.1 e2fsck 1.47.0 (5-Feb-2023) /dev/mmcblk0p2: recovering journal Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Feature orphan_present is set but orphan file is clean. Clear? yes /dev/mmcblk0p2: ***** FILE SYSTEM WAS MODIFIED ***** /dev/mmcblk0p2: 666/131072 files (0.5% non-contiguous), 53121/524288 blocks initramfs-ssh:/root# fsck -f -y /dev/mapper/rpi_rootfs fsck from util-linux 2.40.1 e2fsck 1.47.0 (5-Feb-2023) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Orphan file (inode 12) block 13 is not clean. Clear? yes ROOT: ***** FILE SYSTEM WAS MODIFIED ***** ROOT: 344398/1748736 files (0.2% non-contiguous), 4462948/7064315 blocks initramfs-ssh:/root# reboot Failed to set wall message, ignoring: The name org.freedesktop.login1 was not provided by any .service files Call to Reboot failed: The name org.freedesktop.login1 was not provided by any .service files (In reply to Jonas Kvinge from comment #5) > Thank you. After I added "fsck.mode=force" I could no longer reproduce the > issue when cutting power, as soon as I removed "fsck.mode=force" it's stuck > almost every time the power is cut. > > Even if you fixed the issue, is it a good idea to keep fsck.mode=force for > machines that are often shut down by cutting power? I assume it does not > hurt, since it only takes a few seconds more when booting. I guess that what 'fsck.mode=force' will do is to run e2fsck with '-f' even if the filesystem seems to be clean (but I'm not familiar with the details on how this parameter is handled, probably by systemd). In general, forcing the filesystem check shouldn't be needed in most cases, but it should also be harmless. On the other hand, my patch to e2fsck is to actually force it to do a full check in the presence of the orphan files inode. So using that kernel parameter seems to be a good workaround to the issue. (In the meantime, I haven't had any feedback on my patch yet.) |