Bugzilla – Bug 148343
Random system freezes with activated Xen and some running domUs
Last modified: 2008-06-25 09:53:32 UTC
I installed the 8259 Changeset for SuSE 10.0 because there was a problem with the 6xxx changeset which ships with SuSE 10.0. That bug prevented the domUs from rebooting. The domUs never came back after a reboot (also see #143266). That problem was fixed in the 8259 changeset. However, we're now experiencing problems with system freezes from time to time. The system completly locks up, no special log messages are generated or something, it just freezes and there is no other way to get the system back up running, than hitting the reset button. We already read something about problems with Hyperthreading enabled, and disabled it in the bios, without success. After that, we already tried to disable usb because we read about problems related to usb. It worked for about 1,5 weeks, but then the system locked up again. I'll try to compile a newer xen version (3.0.1) on my own, and report back if that fixed it. However, since the system only locks up ca. once a week, it can take some time. Is this problem known in any way and maybe fixed in a newer changeset? Our xen-host is a ibm xseries 306 with a P4 3.20GHz and 3GB of Memory. We already changed the ram modules, since it's possible that bad hardware could be the problem, but this wasnt a solution. The ram module we tested definitly works fine in another system. Any help, ideas or a working solution would be appreciated.
Have you tried using the latest available beta version of SL (10.1 Beta 3)? If you insert the boot-cd there is a diagnostic tool (memtest86) which can be called using `memory test' in the boot menu. On an installed system, it should also be available. This software does extensive memory tests, you should let it run at least 2 days, if there are no messages, the hardware is most likely OK. Before we do anything: Please try the latest version of SL. Fixing bugs which are non-reproducible is almost impossible.
You may also want to try running the debug version of Xen (which is included in our xen RPM). This will log messages to the console if the dom0 kernel tries to do something stupid. Here's an excerpt from the latest README, explaining this: To debug Xen or dom0 Linux crashes or hangs, it may be useful to use the debug-enabled hypervisor, and to prevent automatic rebooting. Change your Grub configuration from something like this: kernel (hd0,5)/xen.gz To something like this: kernel (hd0,5)/xen-dbg.gz noreboot After rebooting, the Xen hypervisor will write any error messages directly to the text console.
Jan: Please reopen this buf if you can provide more information.
mass reopening all SuSE Linux bugs that are set to REMIND+LATER to change the resolution to WONTFIX (adapting to new policy)
Closing old LATER+REMIND bugs as WONTFIX - if you still plan to work on it, feel free to reopen and set to ASSIGNED. In case the report saw repeated reopen comments, it's due to bugzilla timing out on the huge request ;(