Bug 148719 - xen 64bit dom0 crashes at kernel startup
Summary: xen 64bit dom0 crashes at kernel startup
Status: RESOLVED WONTFIX
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Xen (show other bugs)
Version: Final
Hardware: Other Other
: P1 - Urgent : Normal
Target Milestone: ---
Assignee: Jan Beulich
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-07 15:09 UTC by Harald Koenig
Modified: 2006-09-07 20:42 UTC (History)
1 user (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
that's the screen shot of the crash... (1.92 MB, image/jpeg)
2006-02-07 17:19 UTC, Harald Koenig
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Harald Koenig 2006-02-07 15:09:38 UTC
for my 2*2-opteron, dom0 immediately crashes at startup. kernel trace says

        <NMI> profile_tast_exit+21
        do_exit+32
        ...
        do_IRQ+64
        default_idle+0
        ret_from_intr+0
        <EOI>
        thread_return+0
        default_idle+43
        cpu_idle+151
        start_kernel+460
        _sinittext+664

screen shot available (4Mpixel),  maybe I can get a serial console log
this evening, if necessary ?!

FYI: xen kernel fron beta-2 at least booted, but it broke eth0 (mac 00:0:00)
Comment 1 Michael Gross 2006-02-07 15:40:16 UTC
Please add all information you can offer.
Comment 2 Harald Koenig 2006-02-07 17:19:18 UTC
Created attachment 66834 [details]
that's the screen shot of the crash...

*all* information won't fit on your disk ;-)

which other information do you need ?
hardware setup ?  
what else ?

since it crashes at kernel startup, my chances to provide much more information is somewhat limited, sorry.
Comment 3 Harald Koenig 2006-02-07 19:22:38 UTC
(In reply to comment #2)

> since it crashes at kernel startup, my chances to provide much more information
> is somewhat limited, sorry.

update: I installed KOTD  kernel-xen-2.6.16_rc2_git2-20060206180157.x86_64.rpm.

the good news:  now the system boots again into dom0 !

the bad news: the system crashes in {mm_pin+328} within 1-2 minutes with only dom0 running with a pure CPU load of ~5 (5 proccesses running with tight loop)

SUSE 10.0 + you XEN for x86_64 suffers exactly the same instability :-((
that's the only reason right now I'm wrestling:  I really hoped that the 10.1 XEN which is "real" Xen-3.0 will have this fixed. 

10.0 with xen-kernel from xensource.com is stable under that cpu load in dom0!
Comment 4 Harald Koenig 2006-02-07 19:31:11 UTC
(In reply to comment #3)

> the bad news: the system crashes in {mm_pin+328} within 1-2 minutes with only
> dom0 running with a pure CPU load of ~5 (5 proccesses running with tight loop)
> 
> SUSE 10.0 + you XEN for x86_64 suffers exactly the same instability :-((
> that's the only reason right now I'm wrestling:  I really hoped that the 10.1
> XEN which is "real" Xen-3.0 will have this fixed. 
> 
> 10.0 with xen-kernel from xensource.com is stable under that cpu load in dom0!


for the SUSE 10.0 + YOU xen kernel I have a full serial console log of that crash mm_pin if this is of any help for you.  I still have the 10.0+YOU system available, so tests for new xen kernels for 10.0 would be no problem...

setting up the serial console is quite some work though, so I'd be happy you 
could track/fix this with only the 10.0 log (and mayne some screen shots;-)


BTW:  this stability problem with CPU load in dom0 in 10.0 only shows up when running xen in x86_64 mode (64 bit hypervisor) while it looks rock stable with the 10.0+YOU 32 bit hypervisor & xen-kernel...
Comment 5 Michael Gross 2006-02-08 12:46:12 UTC
I have a quite large disk ;)
Reassigning to Clyde.
Comment 6 Jan Beulich 2006-02-09 10:44:12 UTC
If any 32-bit processes are involved in the scenario, then the problem is known (bug 147503) and fixed. Please confirm.
Comment 7 Harald Koenig 2006-02-09 18:25:32 UTC
(In reply to comment #6)
> If any 32-bit processes are involved in the scenario, then the problem is known
> ( bug 147503) and fixed. Please confirm.

CONFIRM!  my small test programs for the test cpu load are ELF-32  !

is it possible to get a source patch (or fixed kernel) for testing ?
tomorrow I'll try that cpu-load test again with ELF-64 test tools....


thanks for the good news -- hopefully this will make XEN work for 10.0 setup too ?!?
Comment 8 Jan Beulich 2006-02-10 17:47:08 UTC
You can get the kernel of the day, which should have the fix. For 10.0 I'm not sure, if the problem exists there too, then this patch would need backporting for that kernel.

*** This bug has been marked as a duplicate of 147503 ***
Comment 9 Harald Koenig 2006-02-13 19:09:50 UTC
(In reply to comment #8)
> You can get the kernel of the day, which should have the fix. 

thanks!  I'll test and report tomorrow...

> For 10.0 I'm not
> sure, if the problem exists there too, then this patch would need backporting
> for that kernel.

oh yes, 10.0 suffes the same symptom -- crashes at the same mm_pin() within minutes with my smal 32bit idle loop test program...

getting a backported fix for 10.0 would be *much* appreciated (and immediately tested and reported back;-)

> 
> *** This bug has been marked as a duplicate of  147503 ***
> 

Comment 10 Harald Koenig 2006-09-04 13:56:17 UTC
(In reply to comment #9)
>
> getting a backported fix for 10.0 would be *much* appreciated (and immediately
> tested and reported back;-)

more than 6 months after that bug report I had to install a new SUSE 10.0 on a XEN server and now I have to realize that this bus still hasn't fixed in the latest 10.0-you-kernel:

      kernel-xen-2.6.13-15.11.x86_64.rpm

that's really really pitty -- that's not the SUSE which I knew years ago :-(((
Comment 12 Jason Douglas 2006-09-07 20:42:08 UTC
This has been addressed in SUSE Linux 10.1 and SLES 10.  The fix will not be backported to SL 10.0.