Bugzilla – Bug 154291
PCI: Multiple Domains Are Not Supported ( kernel warning/error )
Last modified: 2008-06-25 09:53:44 UTC
Desciption : Devices on PCI domain greater than PCI0 cannot be accessed. Steps to replicate : Install SL10.0 on mainboard with more than one PCI domain. Actual Results : Unable to access PCI-X slots/devices, etc. Expected Results : Normal use of all devices on mainboard. Explanation: Installation fails to use all PCI* on systems that have multiple PCI domains on the mainboard. In my case PCI1 and PCI2 are not available. That means I have no access to devices in those domains, none the least of which are the PCI-X slots. The reason is that there has been no support in the 'stock' linux kernel for multiple domians on the x86 arch until recently. In that respect, one might consider whether this is a bug or a feature. Since this is a basic system problem here and is not adding some special device, I can only suggest it is a bug. It is a blocker for anyone who want to use Novell/SuSE Linux with new-er x86 mainboards that have PCI, PCIe & PCIX slots and devices. Support for multiple PCI domains on the x86 architecture is needed for compatibility with current and future computer mainboards. Background, available fixes, etc: Multiple PCI domains are also known as segments by ACPI. These are used by newer mainboards for support of the multiple types of PCI bus that now exist on the x86 arch, including the PCI-X aa well as the (non-bus) PCIe. Support for the x86 arch has just been recently patched into the kernel(Garzik patch in December 2005) and the patch even more recently updated. They are in A. Morton's -mm tree. Info is here: http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc4/2.6.16-rc4-mm2/announce.txt Patches are available from http://kernel.org/pub/linux/kernel/people/akpm/ The breakouts (if you want to look at the patches) are here: http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc4/2.6.16-rc4-mm2/broken-out/ Patches included are: gregkh-pci-pci-fix-the-x86-pci-domain-support-fix.patch gregkh-pci-x86-pci-domain-support-a-humble-fix.patch gregkh-pci-x86-pci-domain-support-struct-pci_sysdata.patch gregkh-pci-x86-pci-domain-support-the-meat.patch revert-gregkh-pci-x86-pci-domain-support-the-meat.patch They are listed using prefix '' gregkh-pci-x86- '' -> signed-off by Greg K.H. Of course that info changes quickly; there may be new updates already. From what I can determine, they are now fixed(?) in 2.6.16-rc4-mm2.bz2 as struct-pci_sysdata patches to add '' CONFIG_PCI_DOMAINS '' to the kernel config among other things. It is not just a 64-bit OS problem since the 32-bit versions fail to install also with same error message or install with reduced capacity. Support for multiple PCI domains for the x86 arch is needed as soon as possible to insure compatibility for installation and use of this linux OS. (otherwise I have to go back to Redmond...) In my case, this is a Gigabyte GA-2CEWH-RH Dual Opteron server/workstation mainboard with Phoenix BIOS. Here are some brief specific info from the boot log: ... <6>ACPI: PCI Root Bridge [PCI1] (0001:80) <4>PCI: Multiple domains not supported <4> ACPI-0279: *** Warning: Bus 0001:80 not present in PCI namespace <4> ACPI-0167: *** Warning: Invalid ACPI-PCI context for parent device PCI1 <4> ACPI-0167: *** Warning: Invalid ACPI-PCI context for parent device PCI1 ... <6>ACPI: PCI Root Bridge [PCI2] (0002:40) <4>PCI: Multiple domains not supported <4> ACPI-0279: *** Warning: Bus 0002:40 not present in PCI namespace <4> ACPI-0167: *** Warning: Invalid ACPI-PCI context for parent device PCI2 <4> ACPI-0167: *** Warning: Invalid ACPI-PCI context for parent device PCI2 ... That's the gist of it. The longer output shows that (ACPI) PCI Interrupt Links are disabled. I can supply the hardware configuration from YaST2 if/when needed.
Hi Ric, this should be supported in the upcoming SL10.1/SLES10 kernel. We do not backport hardware support to older Suse Linux kernels though. As such I need to close this as WONTFIX. Please do give the SL10.1 beta a try.
Olaf, sorry, but not, this is not supported in SL10.1 or SLES10, so trying out a new kernel will not help at all. The problem is that we have some initial attempts at doing this, but when running them, they break other types of systems (see lkml for the bug reports), so the majority of this patch is backed out in the -mm series due to that. It will take some access to one of these machines (local access is preferred), and some time to get it all working properly. And that will have to happen after SLES10 is released. So sorry, Linux currently does not support this kind of hardware very well yet.
First, thanks. You guys are great. :) Did you know that I initially wanted SuSE as my Linux of choice 6 years ago but did not primarily because the license model was too confusing for a newbie? I am sure you did not but I mention that because I have some years of Linux experience(primarily RPM distro), have built RPMs & kernels, successfully, in the past and am available to work on this very significant problem. I don't claim to be a kernel expert and my "SuSE" experience is extremely low but I think I still have enough enthusism to make up the difference. I am rather upset with Gigabyte because they advertise this mobo as compatible with both 64-bit and 32-bit flavors of Linux, in particular SuSE( as well as RH ). I mention that because in email conversations last week they said they would work on it. They should provide some assistance if/when needed. Olaf Kirch : Please do not close as won't fix. This is a current as well as future problem for Novell & SuSE, as well as the Linux Operating System for x86 arch in general. I am willing to post/transfer-to as a 10.1 bug if that is what is needed( but I am not running the beta at this time ). Closing as 'wont fix' has ramifications that are not good. Greg Kroah-Hartman : I had hoped that Gigabyte would contact you, Greg, as well as Garzik and Morton about getting the support into the kernel. I guess not yet, huh? I recognize part of the problem is that these are new, cutting-edge mainboards and few in number in the Linux world. However this one is available but it is the only 64-bit system here. There are other 32-bit x86 (P, PII & PIII) systems that could be used for testing... I read about AM's problem on the older system in his mailing list post and that it might have possibly buggy ACPI(?). I have read everything I could find about this issue, which, frankly, is not much so have been hesitant to just jump into the water, so to speak. I really do not know, yet, still, what is under the surface. I am not a developer (although I do c) and certainly not a kernel hacker. The point here, really, is that I am asking what is the best way to proceed to get this done? ( Please email me, if needed. )
Ok, I'll take this one
Yes, it would be best if Gigabyte were to contact me (my email address is very easy to find as the kernel PCI maintainer.) I would be more than willing to work together with them to get Linux working properly on their machines. The fact that they are advertising that it all works is odd, I'd be interested in finding out what they are doing to get that to work for these devices. And no, the problem isn't for buggy ACPI issues, but on other platforms that have multiple PCI domains (NUMA boxes from IBM), that currently work just fine with Linux. We can't break them with this patch, as you can understand. I'll mark this LATER, to remind me if someone contacts me about this in the future.
marking LATER to remind me in the future.
Thanks, :) . Gigabyte(GBT) advertises this mobo on their front page, http://us.giga-byte.com/ and just ended a February promotion. It is, presumably, their flagship AMD Dual Opteron(TM) graphics workstation/server product. It supports NUMA too. There is a PDF doc of "supported" OS via http://us.giga-byte.com/Server/Support/OSSupport/OSSupport_ServerBoard_GA-2CEWH.htm which I am going to upload here( for reference ). Did not see the IBM problem(I'm not receiving lkml) & agree, of course, that a patch should not, generally, bust other things. GBT support has been responsive to other issues. I'll ping GBT ...
Created attachment 70864 [details] List of supported OS for GA-2CEWH-RH mainboard OS Compatibility for GA-2CEWH (Updated PDF version)
I did ping GBT and they have responded that this info is being forwarded to "HQ" for the BIOS team to use. I hope they have made contact. As for me, I have the recent kernels(2.6.15.5 & 2.6.16-rc5) as well as the -mm patchset(2.6.16-rc5-mm2) to attempt build and use for testing. It has been delayed 'cause of a little setup problem with SuSE due to lack of clear and valuable info on the *proper* setup for building in the SuSE environment(i.e., I have newbie-itis, :) ) ... a couple more days, probably.
I am speaking from #> uname -r 2.6.16-rc6-mm1-smp and it has made NO difference. (I did not have any better results with previous kernels ... this is the freshest.) Something is a bit haywire because the CONFIG_PCI_DOMAIN was NOT available during the config, the multiple configs for multiple attempts, thanks (mostly) to a missing object file that was not really missing because it was not supposed to be in the make in the first place and because I kept trying to find the sucker. It is in the patch. Why is it not in the config? I'll forego printing out the confirmation data that it does not work( PCI: Multiple domains not supported ) to provide the PCI-X access as well as the rest of my mainboard since it is probably a (kernel) config oversight. What could it be? I have saved logs if needed... kernel as well as PCI debug is in too.
I took a (closer) look at what we built. The CONFIG_PCI_DOMAINS=y seems to be available for just about every arch on the planet Earth EXCEPT i386 and x86_64 (amd64). Why and how do I fix that so I can test it on this Dual Opteron workstation?
Andrew has dropped the patches from his tree (well, he has a revert for them) as they caused too many problems. I'm about to drop them too. If you want, I can email them to you offline, or point you to them on the web.
Thanks for answering. :) Any others besides these?: gregkh-pci-pci-fix-the-x86-pci-domain-support-fix.patch gregkh-pci-x86-pci-domain-support-a-humble-fix.patch gregkh-pci-x86-pci-domain-support-struct-pci_sysdata.patch gregkh-pci-x86-pci-domain-support-the-meat.patch I have those from the ' Broken-out ' at Morton's site. You said there are "Too many problems"? - Is that on the lkml? Please do point me to any public discussion; I have to make some kind of informed decision. I am a little confused as to why a patch, the only patch that I could find ever existed for x86 Multiple PCI Domains(MPD), is just going to be dropped rather than fixed. It seems rather important since even Win2k supports MPD. I suppose that also means that GBT never contacted you. Yes/no? And thanks for the claification. I was a bit bugged since I thought I had done all that work incorrectly. You may certainly email me if you prefer.
There are some odd things with the sysdata & meat patches comments. They seem to be missing the closing "*/" in some places so I guess I'll need the real thing to make sure there is not a C&P error somehow or something else. (I could not find them at the ' .../people/jgarzik/patches ' ...) I am a simple, by-the-book, sometimes-if/when-necessary programmer so I do not know what/why the sysdata & meat patches do that.
I got it. (I have not been doing enough patches to be familiar with format). If you will verify to me that the complete patchset is as I listed I will try to do something here... ( this GBT mainboard is _worthless_ to me w/o linux at full tilt. )
No, no one has contacted me. And the patches are being dropped, as no one is working on them to fix the remaining issues. They can all be found at: http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/bad/pci-domain/ Look at the file, "series" in that directory for the order in which the patches should be applied. good luck.
Thanks, :). Are the "remaining issues" colated anywhere? I cannot predict the future but MS Win2k (& later) support the multiple PCI domains so I am relatively sure that this will be an ongoing issue, with, at least, advanced mobos using multiple types of PCI. And thanks for the luck, too -will need it as it seems that GBT is being more difficult than necessary. Vendors ... :rollseyes:
No, the issues aren't collected anywhere, except that some boxes are still crashing with these patches (that work fine today, we can't allow that.) And sure, other operating system probably support this just fine, I know, but unless someone does the work here, it's not going to happen for Linux, that is just how this project works :)
mass reopening all SuSE Linux bugs that are set to REMIND+LATER to change the resolution to WONTFIX (adapting to new policy)
Closing old LATER+REMIND bugs as WONTFIX - if you still plan to work on it, feel free to reopen and set to ASSIGNED. In case the report saw repeated reopen comments, it's due to bugzilla timing out on the huge request ;(