Bug 158229

Summary: CPU0 FATAL PAGE FAULT
Product: [openSUSE] SUSE Linux 10.1 Reporter: Stephan Rickauer <sles>
Component: XenAssignee: Charles Arnold <carnold>
Status: VERIFIED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Blocker    
Priority: P5 - None CC: sles
Version: Beta 8   
Target Milestone: ---   
Hardware: x86-64   
OS: Other   
Whiteboard:
Found By: Beta-Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Stephan Rickauer 2006-03-15 13:37:38 UTC
On two of my running 10.1b6 xen dom0's (both are identical) I've updated to kernel 2.6.16-rc6-git1-2-xen using the according factory rpm.

One machine is fine, the other now stops booting xen-3.0.gz with the error "CPU0 FATAL PAGE FAULT, rebooting in 5 seconds" (which is the reason why I can't tell you which page it refers to, unless you really need that info. If so, I'll take my digicam downstairs and picture it ;) ).

lvs02:~ # uname -r
2.6.16-rc6-git1-2-xen

lvs02:~ # cat /proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 37
model name      : AMD Opteron(tm) Processor 254
stepping        : 1
cpu MHz         : 2806.440
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm
bogomips        : 7017.62
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

lvs02:~ # free
             total       used       free     shared    buffers     cached
Mem:       1884420     472528    1411892          0        272     372568
-/+ buffers/cache:      99688    1784732
Swap:      1052248          0    1052248
lvs02:~ #
Comment 1 Stephan Rickauer 2006-03-15 13:40:34 UTC
Forgotten to mention the plain kernel (= non-xen version) works fine.
Comment 2 Lynn Bendixsen 2006-03-15 20:26:50 UTC
Not sure if we need the info yet but have you tried putting noreboot on the boot command line?  That should stop it from rebooting and give you time to see what the error message is (or perhaps the machine will boot this time as we have seen several times in my test lab :)
Comment 3 Stephan Rickauer 2006-03-16 09:37:56 UTC
I didn't know about the 'noreboot' option though I think it would be more consistent (means one expects a kernel panic _not_ to reboot) to have a 'reboot' option and making the 'noreboot' the default .. anyways, here's the output:

---snip---
XEN call trace:

[<ffff83000011e22f>] ioapic_guest_read+0x1f/0xb0
[<ffff8300001284e0>] do_physdev_op+0x130/0x2a8
[<ffff830000130cfc>] syscall_enter+0x5c/0x61

Pagetable walk from ffff82fffeda9700:
  L4 = 000000000tc35063
   L3 = 0000000000000000

Panic on CPU0
CPU0 FATAL PAGE FAULT
[error_code=0000]
Faulty linear address: ffff82fffeda9700
---snip---

And no, it does not boot this time ;)
Comment 4 Stephan Rickauer 2006-03-16 09:39:06 UTC
L4 should be '0000000001c35063'
Comment 5 Charles Arnold 2006-03-28 00:54:32 UTC
Fixed in SLES 10 Beta 9 and corresponding SL 10.1 beta