Bug 325552

Summary: system unbootable (attempt to acces beyond end of device)
Product: [openSUSE] openSUSE 10.3 Reporter: Miquel A. Noguera <ibz>
Component: InstallationAssignee: Tejun Heo <teheo>
Status: RESOLVED FIXED QA Contact: Jiri Srain <jsrain>
Severity: Blocker    
Priority: P5 - None CC: asklein, coolo, forgotten_eTk6BCeiKJ, forgotten_xI2C5NvggO, michel.munnix, mike, vetter
Version: RC 1Flags: coolo: SHIP_STOPPER+
Target Milestone: ---   
Hardware: i686   
OS: openSUSE 10.3   
Whiteboard:
Found By: Beta-Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: hwinfo --all
YaST log files
fdisk -l
tar -cvzf kmsg.tar.gz kmsg.log
libata-fix-set_max_sectors
tar -cvzf boot_msg.tar.gz boot.msg

Description Miquel A. Noguera 2007-09-16 09:34:06 UTC
Afer a clean Beta3-DVD installation, my system is unbootable form hard disk

Boot process shows a lot of "attempt to acces beyond end of device" lines and finish with:

Waiting for device /dev/disk/by-id/scsi-SATA_WDC_WD800JB-00CWD-WCA8E6665404-part5 to appear........................Could not find /dev/disk/by-id/scsi-SATA_WDC_WD800JB-00CWD-WCA8E6665404-part5
Want me to fall back to /dev/sda5? /Y/n)
y
Waiting for device /dev/sda5 to appear............ not found - exiting to /bin/sh




Booting from DVD, launching installed system is possible.

I activated my network connection during installation fase and installer picked some packages from internet.
Comment 1 Miquel A. Noguera 2007-09-16 09:34:57 UTC
Created attachment 172693 [details]
hwinfo --all
Comment 2 Miquel A. Noguera 2007-09-16 09:35:34 UTC
Created attachment 172694 [details]
YaST log files
Comment 3 Stephan Kulow 2007-09-16 09:43:42 UTC
What does fdisk -l /dev/sda say?
Is this the first beta you try or did you had problems before?
Comment 4 Stephan Kulow 2007-09-16 09:44:22 UTC
don't mess around with priorities, please!
Comment 6 Miquel A. Noguera 2007-09-16 10:05:49 UTC
Created attachment 172695 [details]
fdisk -l

Disk /dev/sda: 80.0 GB, 80025280000 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x6f99b258
fdisk -l

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        3264    26218048+   7  HPFS/NTFS
/dev/sda2            3265        5875    20972857+  83  Linux
/dev/sda3            5876        8486    20972857+  83  Linux
/dev/sda4            8487        9729     9984397+   f  W95 Ext'd (LBA)
/dev/sda5            8487        9729     9984366   83  Linux

Disk /dev/sdb: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00017d2c

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1         523     4200966   82  Linux swap / Solaris
/dev/sdb2             524        9729    73947195   83  Linux
Comment 7 Miquel A. Noguera 2007-09-16 10:15:32 UTC
I have tested all releases from alpha5 in 4 different pc's with no boot problems.

Beta3-DVD is the first beta I installed in this problematic box.

Comment 8 Miquel A. Noguera 2007-09-16 10:22:14 UTC
Installation in my laptop (from same media) has been succesfull.

Comment 9 Miquel A. Noguera 2007-09-16 12:52:09 UTC
Since problematic box has a sis5513 chipset, may be this bug is related to 308384

BTW, I'm running another Beta3 installation in a different box with the same mobo and it boots fine (I don't remember wich media I did use to install)

https://bugzilla.novell.com/show_bug.cgi?id=308384

Comment 10 Stephan Kulow 2007-09-16 17:18:58 UTC
ok, it doesn't sound too severe then. Even though we still may have a driver issue
Comment 11 Tejun Heo 2007-09-17 08:41:02 UTC
Can you please post kernel dmesg of the failing boot?  You can either use netconsole or serial console.  Thanks.
Comment 12 Miquel A. Noguera 2007-09-17 09:31:56 UTC
Not familiar with netconsole/serial console but with keyboard ;-)

This is a manually copied version:

preping 03-storage.sh
running 03-storage.sh
preping 04-udev.sh
preping 04-udev.sh
Creating device nodes with udev
preping 05-blogd.sh
running 05-blogd.sh
preping 11-block.sh
running 11-block.sh
preping 11-usb.sh
running 11-usb.sh
preping 21-devinit_done.sh
running 21-devinit_done.sh
preping 81-kdump.sh
running 81-kdump.sh
preping 82-resume-userspace.sh
running 82-resume-userspace.sh
Trying manual resume from /dev/sdb1
Invoking userspace resume from /dev/sdb1
resume: could not stat configuration file
resume: libcrypt version: 1.2.4
preping 83-resume.kernel.sh
running 83-resume.kernel.sh
Trying manual resume from /dev/sdb1
preping 84-mount.sh
running 84-mount.sh
Waiting for device /dev/sda5 to appear..........Could not find /dev/sda5
Want me to fall back /dev/sda5 ?
Waiting for device /dev/sda5 to appear..........not found -- exiting to /bin/sh
sh: no job control in this shell
Comment 13 Tejun Heo 2007-09-17 09:42:59 UTC
Is the log from booting installation media or after installation?  If that happens while botting from installation media, please switch to command console (ctrl-alt-f9), plug in a usb memory stick, mount it to /mnt and run the following commands.

# cp /var/log/boot.msg /mnt
# hwinfo --all > /mnt/hwinfo.log

And post the resulting files here.  Also, digital cameras are good enough and much less painful when you can't record kernel log remotely.
Comment 14 Thomas Fehr 2007-09-17 10:07:38 UTC
I had a look at the y2log files and could not see any problem there.
Parted correctly detected partition sizes.
Formatting and mounting of sda5 and sda2 suceeded without problems.
/etc/fstab also looks fine:

/dev/disk/by-id/scsi-SATA_WDC_WD800JB-00CWD-WCA8E6665404-part5 /                ext3 acl,user_xattr 1 1
/dev/disk/by-id/scsi-SATA_WDC_WD800JB-00CWD-WCA8E6665404-part2 /home                ext3 acl,user_xattr 1 2
/dev/disk/by-id/scsi-SATA_WDC_WD800JB-00CWD-WCA8E6665404-part1 /windows/C           ntfs-3g users,gid=users,fmask=133,dmask=022,locale=es_ES.UTF-8 0 0
/dev/disk/by-id/scsi-SATA_WDC_WD800BB-00FWD-WMAJD2057107-part1 swap                 swap defaults 0 0
proc                 /proc                proc       defaults 0 0
sysfs                /sys                 sysfs      noauto   0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
/dev/fd0             /media/floppy        auto       noauto,user,sync      0 0

So this should be either an initrd or kernel issue.
Comment 15 Miquel A. Noguera 2007-09-18 19:10:38 UTC
I have downgraded to 2.6.22.3-7-bigsmp (from Beta2 DVD) and system works again.
Comment 16 Miquel A. Noguera 2007-09-18 19:24:41 UTC
With the installation media, system boots fine too
Comment 17 Tejun Heo 2007-09-19 04:53:04 UTC
On beta3, HPA is unlocked by default.  That could be causing problems.  Miquel, can you please post boot log from the installation media?  Also, if you enter partitioning menu during installation, can you see all the partitions okay?
Comment 18 Thomas Fehr 2007-09-19 09:19:40 UTC
From what I can see in the y2log files partitions are detected fine.
Last partitions on each disk ends on last disk cylinder as one would expect
in a standard setup.
Comment 19 Tejun Heo 2007-09-19 09:36:16 UTC
Thanks, Thomas.  It's not really a driver problem either then.  The installation media and installed system use the same kernel.  Only initrd is different.  If the kernel can detect the device fine when booted from installation media, it should do fine from installed system too.  I'd really like to see the failing boot log.  Miquel, can you please remove "splash=silent" from kernel boot parameter and take a picture of screen during the failing boot?
Comment 20 Thomas Fehr 2007-09-19 09:47:26 UTC
As I already said, it could also be an initrd issue. This should be decidable
by looking into /proc/partition from emergency shell after failed boot. 
If partitions are present in /proc/partitions it could be an udev issue or a 
broken initrd. 
Could be also other drivers being loaded in initrd as were loaded during 
installation.

y2logmkinitrd contains the following:
Kernel image:   /boot/vmlinuz-2.6.22.5-16-bigsmp
Initrd image:   /boot/initrd-2.6.22.5-16-bigsmp
Root device:    /dev/disk/by-id/scsi-SATA_WDC_WD800JB-00CWD-WCA8E6665404-part5 (/dev/sda5) (mounted on / as ext3)
Kernel Modules: processor thermal scsi_mod libata pata_sis fan jbd mbcache ext3 edd sd_mod usbcore ohci-hcd uhci-hcd ehci-hcd ff-memless hid usbhid 
Features:       block usb resume.userspace resume.kernel
Bootsplash:     SuSE (1280x1024)
17880 blocks
ERROR: Bootloader::Library::SetLoaderType: Initializing for unknown bootloader 
ERROR: Bootloader::Core::ListFiles: Running generic function, it should never be called
ERROR: Bootloader::Core::ParseLines: Running generic function, it should never be called

No idea if the lines starting with ERROR (that are normally not there) have 
anything to do with this problem.
Comment 21 Tejun Heo 2007-09-19 09:50:50 UTC
Thanks again, Thomas.  Miquel, from the emergency shell, please run

# cat /proc/partitions
# ls /sys/bus/pci/drivers/
# dmesg

and report the result.  Thanks.
Comment 22 Miquel A. Noguera 2007-09-19 11:44:15 UTC
with 2.6.22.3-7 (from media Beta2-dvd, manually installed)

   * Everything ok


with 2.6.22.5-10 (from media Beta3-dvd, manually installed)

   * Everything ok (but a problem a problem access to dvd drive after media
     check screen)


with 2.6.22.5-21 (Factory update)

   * Kernel panic


with 2.6.22.5-16 (picked automatically from online repositories during installation)

   * cat /proc/partitions
     
     major  minor  #blocks   name
        8      0    2653272  sda
        8      1   26218048  sda1
        8      2   20972857  sda2
        8      3   20972857  sda3
        8      4          1  sda4
        8     16   78150744  sdb
        8     17    4200966  sdb1
        8     18   73947195  sdb2

   * ls /sys/bus/pci/drivers
    
     ehci_hcd imsttfb ohci_hcd pata_sis pcieport-driver serial

   * dmesg
 
     sh: dmesg: command not found

   * cat /var/log/boot.msg -> as described in comment #12

Comment 23 Tejun Heo 2007-09-19 12:32:44 UTC
I see, the kernel is updated during installation.  Yeah, this looks like a driver problem.

In the emergency shell, please run...

# (while read line; do echo $line; sleep .1; done) < /proc/kmsg &

This will give you slowly scrolling kernel boot messages.  You can also create a directory (/mnt), mount a partition there (probably /dev/sdb2) and redirect the output to a file but due to lack of job control and because /proc/kmsg is emptied once read, it can be a bit tricky.

While scolling, you can pause the messages by pressing "Pause/Break" key and resume it by pressing it again.  If copying doesn't work, just pick up a digital camera and take shots of the logs and post them here.

Thanks.
Comment 24 Miquel A. Noguera 2007-09-19 16:06:16 UTC
Created attachment 173402 [details]
tar -cvzf kmsg.tar.gz kmsg.log

(while read line; do echo $line >> kmsg.log ; done) < /proc/kmsg &
Comment 25 Miquel A. Noguera 2007-09-21 08:44:36 UTC
I have two boxes with the same motherboard, but with different versions from BIOS and also the hard disks are different.

In one of them, no kernel > 2.6.22.5-10 works.

In the other one nevertheless, the 2.6.22.5-25 works again.

As in the problematic machine kernel 2.6.22.5-10 works well, but zypper wants to eliminate it in each update :-(

I have workarounde the problem recompiling this kernel in my machine and installing it with make install.



Comment 26 Tejun Heo 2007-09-21 09:03:46 UTC
*** Bug 326887 has been marked as a duplicate of this bug. ***
Comment 27 Tejun Heo 2007-09-21 09:04:53 UTC
This bug is also present on RC1 and we can't release SL103 with this bug.  Bumping up to BLOCKER.
Comment 28 Stephan Kulow 2007-09-21 09:08:56 UTC
do you have a fix?
Comment 29 Tejun Heo 2007-09-21 09:19:53 UTC
Not yet.  Still looking into the problem.
Comment 30 Tejun Heo 2007-09-21 09:36:03 UTC
Okay, got it.  Will soon post a patched kernel for testing.
Comment 31 Tejun Heo 2007-09-21 10:23:52 UTC
Please test the following kernel and report the result.

  http://htj.dyndns.org/kernel-default-2.6.22.5-HPA_debug.i586.rpm

Thanks.
Comment 32 Tejun Heo 2007-09-21 10:24:35 UTC
Created attachment 173823 [details]
libata-fix-set_max_sectors

This is the patch to fix the problem (included in the debug kernel).
Comment 33 Miquel A. Noguera 2007-09-21 11:02:11 UTC
kernel-default-2.6.22.5-HPA_debug.i586.rpm boots fine in my problematic box :-)

Great job. Thanks.
Comment 34 Tejun Heo 2007-09-21 11:05:28 UTC
Thanks for testing.  I'll forward the patch to mainline and commit it to kernel CVS after getting ACK from internal patch review.  Final SL103 will install and run fine on your machine.  I will close the bug till the patch is committed.  Thanks.
Comment 35 Tejun Heo 2007-09-21 11:08:37 UTC
Miquel, can you please post /var/log/boot.msg from the debug kernel?
Comment 36 Miquel A. Noguera 2007-09-21 11:14:47 UTC
Created attachment 173833 [details]
tar -cvzf boot_msg.tar.gz boot.msg
Comment 37 Tejun Heo 2007-09-21 22:07:06 UTC
Patch committed.  Resolving as FIXED.  Thanks.
Comment 38 Jiri Kosina 2007-09-24 11:50:43 UTC
*** Bug 327518 has been marked as a duplicate of this bug. ***
Comment 39 Tejun Heo 2007-09-24 22:11:16 UTC
*** Bug 327612 has been marked as a duplicate of this bug. ***
Comment 40 Jiri Kosina 2007-09-26 09:20:44 UTC
*** Bug 327513 has been marked as a duplicate of this bug. ***
Comment 41 Jiri Kosina 2007-09-26 09:32:51 UTC
*** Bug 327846 has been marked as a duplicate of this bug. ***
Comment 42 Jiri Kosina 2007-10-02 16:22:02 UTC
*** Bug 329878 has been marked as a duplicate of this bug. ***
Comment 43 Christopher Coray 2008-03-16 08:34:36 UTC
The link to http://htj.dyndns.org/kernel-default-2.6.22.5-HPA_debug.i586.rpm is producing an "object not found" message. I'm experiencing the same problem with openSUSE 10.3 on a Compaq Armada m700 (coincidentally, an old Novell laptop that was refurbished/resold).

I'm not sure about regulations on putting kernel fixes on the Novell server, but it should probably be available for a while. I tried searching for the file on Google, but apparently there were no mirrors.

Other than this one problem, openSUSE would have been the absolute easiest OS install I've ever done.
Comment 44 Tejun Heo 2008-03-16 08:44:37 UTC
If you haven't installed yet, the ISOs in the following directories will help.

  http://htj.dyndns.org/export/kiso/

If you already have installed, the fix will be delivered to you when you update the kernel.