Bug 148022

Summary: Melody-Fugue: Failed to install SUSE 10 64bits when "ACPI Enable" on
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Dave Keck <david.keck>
Component: KernelAssignee: E-mail List <kernel-maintainers>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: aosthof, david.keck
Version: unspecified   
Target Milestone: ---   
Hardware: x86-64   
OS: Other   
Whiteboard:
Found By: Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Dave Keck 2006-02-03 16:00:39 UTC
2/2/2006 1:44:28 PM - Sang-Min Oh:
SuSE Linux 10 64-bit shows kernel panic when ACPI SRAT table is enabled and the node memory is not sequential from node 0. If node0 doesn't have memory and node 1 has memory, then there is a kernel panic. And if ACPI SRAT table is disabled, then Linux boots successfully.

I checked memory configuration and SRAT table and those are correct. And Windows boots without problem in the same configuration.

I attached kernel message dump file(SUSE10.txt) from following configuration.

PFGD00-9 BIOS
Melody 10 without Fugue board
Node 0 2.0GHz JH E6 - no memory
Node 1 2.0GHz JH E6 - 256MB memroy

12/13/2005 11:15:51 AM - Hanh Nguyen:
Memory configurations seem to cause the problem of  SUSU 10 64 bit doesn't come up.

I have Fugue and Melody system. If I have 256MB module connected to H0_DIMM 0 and 256MB module connected to H2_DDR_DIMM0 system can't boot to SUSE 64bit 10.

12/13/2005 9:30:26 AM - Hanh Nguyen:
I have to turn off ACPI when I have Fugue board connected to Melody system. I don't have any problem install SUSU 10 64bit on the melody system without Fugue board (ACPI on). I tried with Cg 2200 MHz and E6 1600 MHz processors. PFGD00-9 bios


11/21/2005 10:41:41 AM - Hanh Nguyen:
enable_timer_pin_1 didn't fix the problem.

 Reason for Rejection/Clarification:
Reason for Rejection/Clarification
Enter current state (e.g. Under Analysis, Root Caused, etc.) of the issue and then the reason that the issue is being sent back to the earlier state. Be specific.
	12/7/2005 5:54:05 PM - Sang-Min Oh:
I could install SuSE 10 64 bit on my Fugue board successfully. Enable_timer_pin_1 was not used for installation and all 16 cores are detected properly.  The SuSE linux version is 10.0 gold.

11/18/2005 5:36:39 PM - Charles White:
See OBS28918 (linked below). Try enable_timer_pin_1 at the boot loader screen as a workaround. 
==Kernel message Dump =================================================
Bootdata ok (command line is root=/dev/hda3 vga=0x314 selinux=0    resume=/dev/h                                                                                
da2  splash=silent earlyprintk=serial,9600)                                     
Linux version 2.6.13-15-smp (geeko@buildhost) (gcc version 4.0.2 20050901 (prere
lease) (SUSE Linux)) #1 SMP Tue Sep 13 14:56:15 UTC 2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009b400 (usable)
 BIOS-e820: 000000000009b400 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000c8000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001fef0000 (usable)
 BIOS-e820: 000000001fef0000 - 000000001feff000 (ACPI data)
 BIOS-e820: 000000001feff000 - 000000001ff00000 (ACPI NVS)
 BIOS-e820: 000000001ff00000 - 0000000020000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec00400 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
kernel direct mapping tables upto ffff810100000000 @ 8000-b000
SRAT: PXM 0 -> APIC 0 -> CPU 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> CPU 1 -> Node 0
SRAT: PXM 1 -> APIC 2 -> CPU 2 -> Node 1
SRAT: PXM 1 -> APIC 3 -> CPU 3 -> Node 1
SRAT: Node 1 PXM 1 0-9ffff
SRAT: Node 1 PXM 1 0-1fffffff
Bootmem setup node 1 0000000000000000-000000001feeffff
PANIC: early exception rip ffffffff8053a2b8 error 0 cr2 0
PANIC: early exception rip ffffffff8011ba8a error 0 cr2 ffffffffff5fd023
Comment 1 Stefan Hundhammer 2006-02-03 16:09:07 UTC
Please use component "kernel" for kernel bugs.
Comment 2 Chris L Mason 2006-02-04 01:28:53 UTC
Please try kotd, I would expected the newer 10.1 kernels to work here.

Comment 3 Bodo Bauer 2006-02-06 10:13:23 UTC
As a general note to AMD. Please don't use SL kernels for testing, but go for SLES. SuSE Linux is not a maintained product as SLES is.

And as question, what is a 'Fugue-Board', and how can a Melody have 16 cores? AFAIK, 8 coreas are max for a 4 node Melody...

Comment 4 Dave Keck 2006-02-07 18:03:52 UTC
Here is a reply from Martin Oh:
Fugue board is a vertical 6P CPU board and can be plugged into Melody board instead of Harmony 2P board. So Melody/Fugue board can have up to 8P/16cores. 

This problem with SuSE Linux 10 can be reproduced without Fugue board. Melody with 2 dual core processors and 256MB memory on node 1 causes the same kernel panic. 

Thanks,
Martin 
Comment 5 Dave Keck 2006-02-07 18:12:22 UTC
Comments from Martin about SL vs. SLES:

Dave, 

Their statement on SLES vs. SL is confusing to me. SL is a SuSE product and SRD QA/validation is using SL for internal system validation. 

This problem occurs with 2P system and SL can be used on 2P workstation system. Can you ask to SuSE if AMD should use only SLES for validation? 

Thanks,
Martin 
Comment 6 Bodo Bauer 2006-02-08 10:02:28 UTC
If any possible, you should use SLES. I realize that this may not always be an option, as SL is faster to adopt to new hardware that's not enabled yet in SLES.

SLES however is a maintained product, while SL only gets security fixes. We take bugs much more serious in SLES kernels than we do for SUSE Linux and release bug fixes and feature updates on a regular base only for SLES, not for SL.

(And thanks for the Fugue info)
Comment 7 Dave Keck 2006-02-14 20:11:51 UTC
They tried SUSE 10.1 Beta 32bit and this solved their problem.

Tester was Hanh Nguyen.


Dave Keck