|
Bugzilla – Full Text Bug Listing |
| Summary: | 11.0 beta3 freeze 75% of the time at start of boot | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.0 | Reporter: | Lee Matheson <lee_matheson> |
| Component: | Kernel | Assignee: | Greg Kroah-Hartman <gregkh> |
| Status: | RESOLVED INVALID | QA Contact: | Jiri Srain <jsrain> |
| Severity: | Blocker | ||
| Priority: | P5 - None | CC: | kdmoeller, lee_matheson, nikok79 |
| Version: | RC 1 | Flags: | coolo:
SHIP_STOPPER-
|
| Target Milestone: | --- | ||
| Hardware: | i686 | ||
| OS: | openSUSE 11.0 | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
Lee Matheson
2008-05-19 17:28:16 UTC
When exactly does the boot freeze? Does it load bootloader? Could you tell what's the last that boot process reports (you can add splash=verbose to kernel command line to disable graphic splash screen) Per my original post, it freezes IMMEDIATELY after: "[Linux-initrd @ 0x37b59000,0x496337 bytes"] "Probing EDD (edd=off to disable)..." When it freezes the entire screen goes black. This "Probing EDD (edd=off to disable)" scrolls by so fast, I was only able to capture it by taking a video of the boot, and then playing it back. When it freezes, there is no grub splash. Just a black screen. Is there a boot code I can apply (to have things stored in a log file)? Or after a failed boot, would it help if I then do a hardware reset, and re-boot to a live CD/DVD (such as knoppix) and then go into the hard drive and copy some openSUSE log file? (if so, which file?). Some more information on this bug .. I am using 11.0 beta3 with kernel (from "uname -a"): Linux linux 2.6.25.3-2-pae #1 SMP 2008-05-10 07:46:36 +0200 i686 athlon i386 GNU/Linux The selection of pae was done automatically by openSUSE installer, which was surprising to me, as this old athlon-1100 PC has only 1 GBbye of RAM. I tried taking a look at the log files, when a boot failed, by booting to openSUSE-10.3 (which successfully dual boots 100% of the time on this PC), and then mounted the 11.0 beta3 / partition, and checked /var/log on the 11.0 beta3 partition. Both "boot.msg" and "messages" had no entries for the failed boot, ... as their last entries were the previous shutdown. I checked the time stamp of all log messages in /var/log and there were in fact no log messages with a time stamp that corresponded to the failed boot. I then tried rebooting back to 11.0 beta3, and using the boot code "acpi=off". This made no difference, and the boot still failed. After repeated attempts at a normal boot (without the "acpi=off" boot code), the boot finally did succeed. Again, openSUSE-10.3 reliably boots 100% of the time. ... I'll try installing an updated 11.0 beta3 kernel (using the "rpm -ihv" command, and hand edit the /boot/grub/menu.lst file to provide a kernel boot selection), in order to see if an updated kernel makes any difference. Ok, I went here: http://download.opensuse.org/distribution/SL-OSS-factory/inst-source/suse/i586/ and downloaded kernel-default-2.6.25.4-2.i586.rpm and also downloaded kernel-pae-2.6.25.4-2.i586.rpm I installed both rpms with "rpm -ivh" such that both kernels were available, in addition to my original kernel. I checked /boot/grub/menu.lst to confirm it allowed me to boot to each of the available kernels. Some comments, my first attempt to boot to my baseline kernel-pae-2.6.25.3-2 took 14 efforts, before the kernel would boot. On 13-attempts it froze as described in a previous post, ... ie the problem is "highly" repeatable. Once it succeeded in booting (on the 14th attempt), I installed the two kernels. I then rebooted to the "kernel-default-2.6.25.4-2" kernel. It froze just like the "kernel-pae-2.6.25.3-2". I then tried a reboot of the "kernel-default-2.6.25.4-2" with "failsafe" settings. It also froze just like the "kernel-pae-2.6.25.3-2 kernel". I then tried a reboot to the kernel-pae-2.6.25.4.2 kernel. It booted successfully on this 1st attempt, but then failed a reboot test on the 2nd thru to 5th attempts, and then booted successfully on the 6th attempt. ie. the 2.6.25.4-2 kernel has similar buggy behaviour to the 2.6.25.3-2 kernel. I do not know what I can do next. openSUSE-10.3 boots with no problem on this PC. I suppose could go into the PC BIOS and try to change settings cuh as Internal Cache (currently writeback), or System BIOS cacheable (currently Enabled), or C000-32k-shadow (currently cached) or APIC Function (currently enabled), etc .... but I would like some feedback and suggestions to help me focus on this. And again, openSUSE-10.3 has no problems with these BIOS settings (neither did 9.3, 10.0, 10.1, nor 10.2). This is time consuming and if this is a waste of effort, it would be useful if I could be advised of that. Thankyou. Should be fixed in next release. Thank-you, I am glad to read there is believed and planned to be a resolution. I also tried the kernel-pae-2.6.25.4-8.1.i586.rpm from here: http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_Factory/ and I obtained the same anomalous behaviour, ... ie 4 failed boots (with an identical freeze early in boot process) followed by one successful boot on the 5th attempt. I'm looking forward to the updated kernel with the fix. neither kernel-default-2.6.25.4-9.1 worked for me (Vaio vgn-fs740), hope next time will be better. can smb tell me where to find a kernel with sources; i'm on 2.6.25.3-2-default now and i need the kernel sources but don't know where to find it. Thanks I tried the kernel-pae-2.6.25.4-11, and also kernel-default-2.6.25.4-11 from here: http://download.opensuse.org/distribution/SL-OSS-factory/inst-source/suse/i586/ They both exhibited the same anomalous behaviour as above, booting successfully only once out of every approximately 5 attempts. They froze during boot at the exact same place every time. I also tried the kernel-pae-2.6.25.4.11 (with acpi=off) and tried the kernel-default-2.6.25.4.11 (with safe settings) and the kernel-vanilla-2.6.25.4-11, and in all cases the same anomalous behaviour was observed, with a freeze in the exact same place. I'm looking forward to RC1 later this week, which I hope has a kernel build that contains the fix. Can you please try the kernel from ftp://ftp.suse.com/pub/projects/kotd/HEAD/ instead? Ok, after reading Comment#9 (above) I went to this URL: ftp://ftp.suse.com/pub/projects/kernel/kotd/HEAD/i386/ and downloaded and installed with "rpm -ivh --oldpackage <kernel-as-below.rpm>: kernel-default-2.6.25.4-HEAD_20080526132305.i586.rpm 27-May-2008 08:39 21.7M kernel-pae-2.6.25.4-HEAD_20080526132305.i586.rpm 26-May-2008 16:15 21.8M kernel-vanilla-2.6.25.4-HEAD_20080526132305.i586.rpm 26-May-2008 16:16 21.7M .... installing one at a time, and then checking menu.lst to ensure it was updated to allow a boot. I obtained the same freeze symptoms as before, where I tried booting: 1st: 1 x kernel-default-2.6.25.4-HEAD_20080526132305.i586.rpm 2nd: 1 x kernel-default-2.6.25.4-HEAD_20080526132305.i586.rpm [fail safe settings] 3rd: 1 x kernel-pae-2.6.25.4-HEAD_20080526132305.i586.rpm 4th: 1 x kernel-vanilla-2.6.25.4-HEAD_20080526132305.i586.rpm 6th-9th: 4 x kernel-default-2.6.25.4-HEAD_20080526132305.i586.rpm [ie 4 more attempts of this build] 10th-11th: 2 x kernel-pae-2.6.25.4-HEAD_20080526132305.i586.rpm [ie 2 more attempts of this build] 12th: 1 x kernel-pae-2.6.25.4-HEAD_20080526132305.i586.rpm ... and on this 12th attempt PC did not freeze in the same repeatable place, but booted successfully (finally). But thats rather unsatisfactory, having to boot 12 times for 1 success. This suggests to me the "HEAD_20080526132305.i586" version of the kernel has the same problem. I hope I grabbed the correct kernel to test from ftp://ftp.suse.com/pub/projects/kernel/kotd/HEAD/i386/. There was a lot of kernels on that site. I noted there was the newer -2.6.25.4-HEAD_20080528142504 build of kernel-default, kernel-pae, and kernel-vanilla. So I installed those with "rpm -ivh --oldpackage <kernel-as-below> kernel-default-2.6.25.4-HEAD_20080528142504.i586.rpm 28-May-2008 18:10 21.7M kernel-pae-2.6.25.4-HEAD_20080528142504.i586.rpm 28-May-2008 17:26 21.8M kernel-vanilla-2.6.25.4-HEAD_20080528142504.i586.rpm 28-May-2008 18:11 21.7M ... all of those kernels froze 16 out of 18 attempts (total) during boot at exactly the same place as all the previous attempts in this thread. Specifically attempted was: 1st: kernel-default - froze 2nd: kernel-failsafe - booted 3rd-4th: kernel-failsafe - froze both occasions 5th: kernel-vanilla - froze 6th: kernel-pae - froze 7th-15th: kernel-default - froze all 9 attempts 16th-17th: kernel-pae - froze both attempts 18th: kernal-pae - booted. Stopped the test as behaviour is the same as previous anomalous behaviour. This was the last attempt I will make on 11.0 beta3. I am downloadin now via bittorent 11.0 RC1, and I will install that and give that a try. I hope it will work, albeit I confess my confidence is low on this being fixed (as previous reported). I just finished install 11.0 RC1. The anomalous behaviour is still present. And it made the installation unpleasant. During the installation, after the software install, and as part of the 1st boot before the "configuration" step (where x is configured ... etc ..) the PC fooze immediately upon reboot, with the IDENTICAL freeze characteristics to the above. I had to hit the hardware reset button 6 times, before I obtained a successful reboot. Normally, I would NEVER do that, but my experience with this BUG to date suggested to me that if I kept hitting reset the PC may eventually not freeze upon boot, but may finish the boot, and eventually it did. I have selected to "reopen" this Bug, and place it as a blocker, and marked it as being found in RC1. IMHO this BUG would have blocked most users. Note, 10.3 does not have this problem booting from this PC (neither did 9.x, 10.0, 10.1, nor 10.2) Is there anything else I can do to help? There is a lot of information about my PC, when I first raised this bug as part of the alpha releases. .... https://bugzilla.novell.com/show_bug.cgi?id=373012 This is very odd. Have you run memtest to verify that there are no problems with your hardware? How about trying to boot with "nohz=off"? I tried booting 11.0 RC1 with "nohz=off". Same freeze. I'm currently run a hardware memory test. Thus far no errors after 32 minutes. Earlier this PM, after reading comment#13, I opened the case, checked all connectors, cards, ... etc ... everything is properly seated. There are no visible problems. As a test I rebooted to openSUSE-10.3 from the grub text menu. It booted successfully 6 out of 6 tries (100% of the time). openSUSE-11.0 alpha/beta/RC1, on the other hand, freezes 4 out of every 5 boot tries. Sometimes worse. Possibly ( ? ) related, is a problem with Grub, in that most of the time (90% approx) it boots with (1) a text display indicating an error, then http://picpaste.com/1grub-initial-view.jpg http://picpaste.com/pics/1grub-initial-view.1212176826.jpg (2) an error message "graphics initialization failed", followed by http://picpaste.com/2grub-graphic-initialization-failed.jpg http://picpaste.com/pics/2grub-graphic-initialization-failed.1212176919.jpg (3) the typical grub text menu. Could this be a symptom of a graphic card failure, that is in turn affecting 11.0, but does not affect 10.3 ? http://picpaste.com/3grub-text-boot-menu.jpg http://picpaste.com/pics/3grub-text-boot-menu.1212176968.jpg The other 10% grub boots with a proper graphic menu selection. As opposed to vga=normal or vga=031a, is it possible trying different VGA codes in the grub menu would make a difference to both the grub boot, and the kernel freeze/boot? *** Bug 396200 has been marked as a duplicate of this bug. *** I do not believe it is clear that https://bugzilla.novell.com/show_bug.cgi?id=396200 is a duplicate bug. While both PCs (in Bug 396200 and 992198) have nvidia cards, I note Bug 396200 was able to boot with the vanilla kernel. That has never been the case with bug 392198, as I have tried a number of different vanilla kernels, all of which exhibited the same identical failure on 11.0 beta3. I have not tried a vanilla kernel with RC1, but given my (lack of) success with the previous ftp://ftp.suse.com/pub/projects/kernel/kotd/HEAD/i386/ kernels, I have no reason to believe the RC1 vanilla kernel will make any difference. If a URL/directory is pointed to me where I can find a vanilla kernel for 11.0 RC1 , I am willing to try that (or any other video). Also, if there are any other tests/configurations that are recommended for me to try, please advise, and I will attempt them. Thankyou. you can install the vanilla kernel with: zypper install kernel-vanilla Try running that kernel and letting us know what happens. In installed the kernel-vanilla, and it also still freezes. However I did not install it via "zypper install kernel-vanilla". Instead, between Comment #16 and now (prior to my reading Comment#17) I had updated 11.0 RC1 with a zypper refresh zypper dist-upgrade That installed kernel-pae-2.6.25.4-10 ... I obtained a freeze 4 out of 5 times with kernel-pae-2.6.25.4-10. I also installed kernel-default-2.6.25.4-10 with "rpm -ivh <package> and I still obtained a freeze 4 out of 5 times. I then noticed Comment#17, and installed kernel-vanilla-2.6.25.4-10 also with rpm -ivh <package>. It also has the same freeze behavior (although after a few on boot freezes, I gave up trying to boot with that vanilla kernel). As an aside (likely unrelated), after the "zypper dist-upgrade", Grub (grub-0.97-126) now starts with a graphical boot menu every time (as opposed before when I would obtain a "graphic initialization error" and grub would go to the grub text menu 9 out of 10 times before). However the grub menu freezes (ie keyboard freezes) 50% of the time upon a mouse press. .... I am also using a KVM switch ... I may take that (and connect direct to keyboard/video/mouse) to see if that makes any difference to the Grub boot. After selection by grub (successfully 50% of the attempts), I can still boot successfully 100% of the time to openSUSE-10.3. So that still makes me think the problem is 11.0 related. The sentence in post#18: "However the grub menu freezes(ie keyboard freezes) 50% of the time upon a mouse press." should read: "However the grub menu freezes (ie keyboard freezes) 50% of the time upon a key press." I removed the KVM switch, and connected Keyboard, Monitor, and Mouse direct to the PC. I then rebooted dozens of times, trying to boot the kernel-pae-2.6.25.4-10 kernel. The behaviour was identical (with freeze in exact same place with same characteristics) - ie 50% grub will freeze and when grub does not freeze, 80% of the time the kernel boot will freeze. That suggests to me that the KVM switch is not a factor in this problem. One note, ... I do not have a PS2 mouse, but rather I have been using a Logitech USB mouse through the KVM (and for the direct without KVM tests). I used a USB-to-PS2 adapter to connect the mouse to the KVM. For the direct "Keyboard, Monitor, and Mouse" to PC tests, I tried testing with the USB mouse connected direct to a USB port, and also tested with the USB mouse connected direct to a PS2 mouse (via the USB-to-PS2 adapter). It made no difference. The behaviour was identical (with freeze in exact same place with same characteristics) in all cases. When the 11.0 grub menu lets me (and does not freeze), openSUSE-10.3 still boots every time, with no problem. This is in contrast to 11.0 which, when the grub menu lets me (and does not freeze) the 11.0 kernel will freeze 80% of the time.. I may replace the 11.0 installed grub menu, with a 10.3 installed grub menu, to see if I can take "grub" out of the picture. Other than that, I am pretty much out of ideas for testing. I've been wondering if this bug report could be related to a hard drive problem (and hence not an openSUSE-11.0 fault), with a problem with the hard drive's MBR? I plan to test that possibility, by either this weekend or next, where I plan to change out the hard drive on this PC with a spare drive I have, and install 11.0 on the spare drive. Then compare the boot behaviour. [... or does someone know an easier way to check this ... ]? As this is a problem of locking up _before_ the kernel runs, I really don't expect it to be a kernel issue itself, but rather a hardware issue. We are going to need some kind of kernel log message before I can have any chance of fixing it in the kernel. As the initiator, I changed this bug report to "INVALID". And my apologies for this bug report. This is a PC hardware problem. My suspicions in Comment #21 proved correct, when I changed out the 80GByte hard drive for a 40GByte hard drive, and installed openSUSE-11.0 GM on the 40 GByte drive, the problem disappeared. I performed 10 reboots, with no problem in each of the 10 rebotos. Grub came up and performed correctly, and the kernel booted properly. My view now is the MBR in the previous 80GByte hard drive had an "intermittent" fault that occurred 90% of the time. My guess is the reason openSUSE-10.3 booted ok, is because its code was located in a "healthier" location of the hard drive. Perhaps the only puzzle is why the hardware health checks of the hard drive did not pick up anything. Thank you to the openSUSE developers, for implementing a faster installation and also retaining the faster reboots on openSUSE-11.0, else this test would have been unbearable. If nothing else, I have learned now how to charcterize a hard drive with a failing MBR. But I think I wasted far too much of everyone's time, and my sincere apologies for this. This BUG report is withdrawn by initiator (or called invalid). |