Bugzilla – Bug 148719
xen 64bit dom0 crashes at kernel startup
Last modified: 2006-09-07 20:42:08 UTC
for my 2*2-opteron, dom0 immediately crashes at startup. kernel trace says <NMI> profile_tast_exit+21 do_exit+32 ... do_IRQ+64 default_idle+0 ret_from_intr+0 <EOI> thread_return+0 default_idle+43 cpu_idle+151 start_kernel+460 _sinittext+664 screen shot available (4Mpixel), maybe I can get a serial console log this evening, if necessary ?! FYI: xen kernel fron beta-2 at least booted, but it broke eth0 (mac 00:0:00)
Please add all information you can offer.
Created attachment 66834 [details] that's the screen shot of the crash... *all* information won't fit on your disk ;-) which other information do you need ? hardware setup ? what else ? since it crashes at kernel startup, my chances to provide much more information is somewhat limited, sorry.
(In reply to comment #2) > since it crashes at kernel startup, my chances to provide much more information > is somewhat limited, sorry. update: I installed KOTD kernel-xen-2.6.16_rc2_git2-20060206180157.x86_64.rpm. the good news: now the system boots again into dom0 ! the bad news: the system crashes in {mm_pin+328} within 1-2 minutes with only dom0 running with a pure CPU load of ~5 (5 proccesses running with tight loop) SUSE 10.0 + you XEN for x86_64 suffers exactly the same instability :-(( that's the only reason right now I'm wrestling: I really hoped that the 10.1 XEN which is "real" Xen-3.0 will have this fixed. 10.0 with xen-kernel from xensource.com is stable under that cpu load in dom0!
(In reply to comment #3) > the bad news: the system crashes in {mm_pin+328} within 1-2 minutes with only > dom0 running with a pure CPU load of ~5 (5 proccesses running with tight loop) > > SUSE 10.0 + you XEN for x86_64 suffers exactly the same instability :-(( > that's the only reason right now I'm wrestling: I really hoped that the 10.1 > XEN which is "real" Xen-3.0 will have this fixed. > > 10.0 with xen-kernel from xensource.com is stable under that cpu load in dom0! for the SUSE 10.0 + YOU xen kernel I have a full serial console log of that crash mm_pin if this is of any help for you. I still have the 10.0+YOU system available, so tests for new xen kernels for 10.0 would be no problem... setting up the serial console is quite some work though, so I'd be happy you could track/fix this with only the 10.0 log (and mayne some screen shots;-) BTW: this stability problem with CPU load in dom0 in 10.0 only shows up when running xen in x86_64 mode (64 bit hypervisor) while it looks rock stable with the 10.0+YOU 32 bit hypervisor & xen-kernel...
I have a quite large disk ;) Reassigning to Clyde.
If any 32-bit processes are involved in the scenario, then the problem is known (bug 147503) and fixed. Please confirm.
(In reply to comment #6) > If any 32-bit processes are involved in the scenario, then the problem is known > ( bug 147503) and fixed. Please confirm. CONFIRM! my small test programs for the test cpu load are ELF-32 ! is it possible to get a source patch (or fixed kernel) for testing ? tomorrow I'll try that cpu-load test again with ELF-64 test tools.... thanks for the good news -- hopefully this will make XEN work for 10.0 setup too ?!?
You can get the kernel of the day, which should have the fix. For 10.0 I'm not sure, if the problem exists there too, then this patch would need backporting for that kernel. *** This bug has been marked as a duplicate of 147503 ***
(In reply to comment #8) > You can get the kernel of the day, which should have the fix. thanks! I'll test and report tomorrow... > For 10.0 I'm not > sure, if the problem exists there too, then this patch would need backporting > for that kernel. oh yes, 10.0 suffes the same symptom -- crashes at the same mm_pin() within minutes with my smal 32bit idle loop test program... getting a backported fix for 10.0 would be *much* appreciated (and immediately tested and reported back;-) > > *** This bug has been marked as a duplicate of 147503 *** >
(In reply to comment #9) > > getting a backported fix for 10.0 would be *much* appreciated (and immediately > tested and reported back;-) more than 6 months after that bug report I had to install a new SUSE 10.0 on a XEN server and now I have to realize that this bus still hasn't fixed in the latest 10.0-you-kernel: kernel-xen-2.6.13-15.11.x86_64.rpm that's really really pitty -- that's not the SUSE which I knew years ago :-(((
This has been addressed in SUSE Linux 10.1 and SLES 10. The fix will not be backported to SL 10.0.