Bug 144753

Summary: With SATA modules builtin, boot fails
Product: [openSUSE] SUSE Linux 10.1 Reporter: Jens Axboe <axboe>
Component: BasesystemAssignee: Kay Sievers <kasievers>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P5 - None CC: hare
Version: Beta 2   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Jens Axboe 2006-01-23 10:47:52 UTC
The fsck script complains about stat() failing on /dev/sda2. This does not happen for a modularly loaded libata.
Comment 1 Jens Axboe 2006-01-23 11:44:41 UTC
Werner, I don't think this is a kernel issue. SUSE 10.0 works just fine with the same kernel.

Can you please at least give a reason when you assign a bug away? Thanks.
Comment 2 Jens Axboe 2006-01-23 11:53:45 UTC
One more data point - /dev/sda2 _does_ show up, just only right after fsck has attempted to open it. Assigning back to basesystem/werner.
Comment 3 Dr. Werner Fink 2006-01-23 12:06:05 UTC
Why this is basesystem? Please tell me the name of the
maintainer of libata.
Comment 4 Jens Axboe 2006-01-23 12:10:52 UTC
Because I think it's a basesystem bug? The device node does show up, it just appears that fsck check happens right before it's there.
Comment 5 Dr. Werner Fink 2006-01-23 12:43:40 UTC
Jens, do you know that there is no devs package anymore and
how large the basesystem is, don't you?
Wilde guess: udev problem, but this is a guess. I've no
idea when and what is loading the libata kernel module
and if the device nodes are static entries or rules of
the udev configurtation. Maybe this should also be done by
mkinitrd because of the kernel events for this disk
or any other boot script.  Beside this, if the kernel
does not trigger events for the devices, this would also
a bug but in this case this would be a kernel bug.
Comment 6 Jens Axboe 2006-01-23 13:02:25 UTC
I thought that udev and such would fall under the base system umbrella. Dunno if this is a udev bug, or a bad interaction between udev and init scripts. You mention modules, but there are no modules involved, the driver and libata is statically builtin. The device entry does show up, as mentioned.
Comment 7 Hannes Reinecke 2006-01-23 13:30:21 UTC
Which fsck? Called from where? /var/log/messages output?

Jens, tststs.
Comment 8 Jens Axboe 2006-01-23 13:34:20 UTC
Hannes, fsck called from boot.localfs. fsck fails with stat() failing to stat /dev/sda2. If I put this in boot.localfs _right_ before fsck is run on non-root devices (/dev/sda2 is /home):

if [ ! -f /dev/sda2 ]; then
    mknod /dev/sda2
fi

I get an error from mknod saying the device exists and fsck works fine after that.  A sleep 1 would likely fix it too, haven't tried it though. So it looks like tight timing.

Let me know if you need more info!
Comment 9 Kay Sievers 2006-01-23 13:49:38 UTC
In:
  /etc/init.d/boot.localfs
is:
  # Required-Start: boot.rootfsck

Does changing that to:
  # Required-Start: boot.udev
help?
Comment 10 Kay Sievers 2006-01-23 13:51:59 UTC
Dunno if the depends get updated automatically. Just run:
  insserv
after that.
Comment 11 Jens Axboe 2006-01-23 14:13:00 UTC
Seems so, at least the first boot worked fine. I tried perhaps 2-3 before that all failed. Let me try a few more just to be on the safe side.
Comment 12 Jens Axboe 2006-01-23 14:15:58 UTC
Did 2 more, both worked great. Thanks Kay! Closing this one as fixed.
Comment 13 Jens Axboe 2006-01-24 11:37:24 UTC
I'm afraid it happened again, now it seems to happen consistenly on every boot.
Comment 14 Kay Sievers 2006-01-24 19:53:57 UTC
I get something similar with todays update to autobuild. Does commenting out the rule in /etc/udev/rules.d/85-mount-fstab.rules help?
Comment 15 Jens Axboe 2006-01-26 19:43:58 UTC
Made the change, will try and reboot the laptop 10 times :-)
Comment 16 Jens Axboe 2006-01-26 19:53:48 UTC
Seems to happen on every boot with that rule commented out, but I switched to beta2 meanwhile so that could mix things up. beta2 continues the boot when fsck errors on /dev/sda2 not being there which does help me, but I'm not sure it's a great idea :)
Comment 17 Jens Axboe 2006-02-01 10:34:20 UTC
Kay, same thing happens on an install of SLES10 beta3 I just did. This one uses the megaraid kernel module, and it fails fsck of /dev/sda1 (the root fs) and thus boots with / RO and nothing works.

Any ideas? This really hinders testing of kernels, so I'd say it's quite important that we find a fix for this ASAP.
Comment 18 Kay Sievers 2006-02-01 10:58:12 UTC
This is a kernel without any initramfs, right?

/dev/sda1 is the rootfs that the kernel itself has mounted, right?

fsck /dev/sda1 fails, cause it misses the device node or something else?
Comment 19 Jens Axboe 2006-02-01 11:03:23 UTC
Yes on all three questions. mount/fsck complains about /dev/sda1 not being there. Which one depends on whether I touch'ed /fastboot or not.
Comment 20 Jens Axboe 2006-02-01 12:29:08 UTC
Kay, do you have a quick work-around I can use until you get this fixed? I badly need to test stuff today, and so far I wasted half a day on this already...
Comment 21 Jens Axboe 2006-02-01 12:35:17 UTC
BTW, I already tried the suggestions listed here for 10.1, none of them work for me (add boot.udev as required start for boot.rootfs and commenting out that line in /etc/udev/rules.d/85-mount-fstab.rules).
Comment 22 Kay Sievers 2006-02-08 19:15:23 UTC
Jens, this is fixed, right?
Comment 23 Jens Axboe 2006-02-13 10:47:45 UTC
Yes, I think so! Feel free to close it.
Comment 24 Kay Sievers 2006-02-13 12:22:06 UTC
Closing.