|
Bugzilla – Full Text Bug Listing |
| Summary: | HS40 resets on initial welcome and/or startup screen | ||
|---|---|---|---|
| Product: | [openSUSE] SUSE LINUX 10.0 | Reporter: | Murlin Wenzel <mwenzel> |
| Component: | Installation | Assignee: | Steffen Winterfeldt <snwint> |
| Status: | RESOLVED FIXED | QA Contact: | Klaus Kämpf <kkaempf> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | kstansel, patrick.donckers, Richard.Beal, trenn |
| Version: | Beta 4 | ||
| Target Milestone: | --- | ||
| Hardware: | i686 | ||
| OS: | All | ||
| Whiteboard: | |||
| Found By: | Third Party Developer/Partner | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
test iso
test iso 2 test iso 3 test iso 4 test iso 5 |
||
|
Description
Murlin Wenzel
2005-09-01 17:39:26 UTC
Might be problem of linuxrc, or kernel... You're talking about the boot loader? Maybe related to bug 81046. I'd say that is a harware/bios problem. What are ierr errors and what do they indicate? *** Bug 118024 has been marked as a duplicate of this bug. *** This should be the open bug. My bugzilla account died a horrible death. This does indeed sound very similar to #81046, but with entirely different hardware. Is there a way to do the initial boot in non-gui mode, or would I need a custom cd? I'd like to try and narrow this down. Hold down SHIFT when isolinux starts. But then the problem will go away. What I supspect _could_ cause this is a nonblocked interrupt (like NMI) because the grahics code runs without a properly setup IDT for some time. Hence my question what these IOERRs are. I did verify that I can hold down SHIFT and get the install started. Of course, everything crashes as expected on a reboot. Has the startup code changed that much since SLES9? That works fine on this same hardware. I did find out that the IERRs are either cpu faults(not likely) or PCI device timeouts/errors (most likely) Keep in mind that these blades are completely USB based (floppy, cd-rom, keyboard, mouse). Until an os stack is loaded and running, the system is running legacy emulation which generally requires SMI handlers or INT handlers. Yes, the code has changed a lot. What cpu is running there? Could you attach /proc/cpuinfo? Created attachment 51038 [details]
test iso
Please run the attached test iso on that machine. It (hopefully) will print a
colored
stripe on the screen when it hangs.
What color is it?
Sorry I couldn't get to this sooner. After a couple of tries, here is what I got. 1. Welcome screen appeared with all languages... Second screen appeared(should be install menu) with just the SuSe logo upper right corner. System did hard reset with IERR. 2. Welcome screen started to appear(background and 2 languages). System did hard reset with IERR. I never did see any stripe. I can tell you the system is running 4 2.8GHZ Xeon MP processors. If you still need the cpuinfo, I'll have to install something else to get the files. That would indicate that there are no unexpected interrupts or faults occurring. Is there any way to get the instruction pointer where the IERR happens? Created attachment 51228 [details]
test iso 2
In any case, here's another try: This ISO will show a debug window in the upper
left corner.
You can single step by pressing Enter or Space
(or about any other key except Esc (which would end single step mode)).
What's the number after 'ip' when it crashes?
I managed to get at least one crash with the single step code. ip 47d: 4e.7 The second time I hit a reset, the number right after ip changed (couldn't get it fast enough) but the 2nd number was still 4e.7. I hope this means something to you. Could we get any more useful info if I installed the box in text mode(shift key) and somehow setup the bootloader to run in text mode? I'll try the single step again and see what I can get. Created attachment 51371 [details]
test iso 3
Please try this one. Any better or does it at least crash at a different place?
I was holding down the spacebar to single step. This one appeared to get further. The latest IP when it reset 1bd5: 4e.7. It appeared to be starting to build the menu on the second screen. This has got to be some type of ugly smi/protected mode timing issue. I just rebooted several times and hit <ESC> to get out of single step mode. 2 times the startup succeeded and I was able to interact with the boot menu/options. Of course I couldn't really do anything else. I was getting excited until I rebooted 2 more times and the system just reset both times. Created attachment 51483 [details]
test iso 4
The welcome texts were displayed in random order. I replaced the PRNG with a
dummy in this iso. Does it still crash at different places?
After reading some processor docs I no longer think an SMI could cause this. I'm trapping all interrupts now and you would see the mentioned colored stripe whenever an unexpected int or trap happens. There is a tiny window of a few instructions where an NMI could cause a cpu reset. I'm going to address this but it needs some work. My favorite theory at the moment is that some BIOS functions destroy registers they should not. I started to save regs around some of them in test iso 3 (and hoped it would help a bit more). Murlin, in view of bug 81046 comment 20, are we talking about an ATI graphics card here? My BIOS setup reports this as an ATI embedded RADEON 7000. I haven't seen an IBM server in ages that didn't have some type of embedded adapter. I know I can use the 'shift' key during intial boot to start the install in text mode, is there something similar or a way to configure grub to boot in text mode? I could at least get the latest code installed that way and look for other problems. Sure, just remove the gfxmenu line in /boot/grub/memu.lst. According to the mentioned report it should be avoidable by doing dword aligned accesses. I'll see what I can do. Using some of the various tricks you showed me, I was able to get the initial install to complete. I had to edit menu.lst before the reboot or I just kept hitting the same gui welcome screen restart. After I got the bootloader working I ended up with the same problem described in bugzilla #114800. I guess it shouldn't really have surprised me that the same problem showed up on this hardware since it has the same CSB6 IDE chipset. Created attachment 53644 [details]
test iso 5
Please test this iso. It uses only dword aligned video memory reads. Hopefully
this helps.
IT HELPS. So far 10 reboots, 10 successful boots to menu screen. Now I just have to get past bugzilla #114800 Great! And it even speeds things up considerably. How hard would it be to come up with a SL10 cd1 with these updates? 10.0 should be no problem (comment 24 is 10.1). I'll put something together tomorrow. Here is an updated 10.0 boot & rescue ISO: ftp://ftp.suse.com/pub/people/snwint/10.0/SUSE-10.0-boot.iso *** Bug 116934 has been marked as a duplicate of this bug. *** read more about this in bug 129724 *** Bug 133280 has been marked as a duplicate of this bug. *** *** Bug 158061 has been marked as a duplicate of this bug. *** |