Bug 142280

Summary: Problem with manual binding driver shpchp
Product: [openSUSE] SUSE Linux 10.1 Reporter: Christian Zoz <zoz>
Component: KernelAssignee: Greg Kroah-Hartman <gregkh>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: bjacke, coomac, forgotten_85NSFCoNoT, lmuelle, suse-beta, whiplash
Version: Beta 3   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: Development Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Christian Zoz 2006-01-10 11:56:12 UTC
echo -n "pci-id" > /sys/bus/pci/drivers/shpchp on a machine with kernel 2.6.15-rc6-git2-2-smp does never return. echo loops writing again and again.
You can try at gray.suse.de
Comment 1 Lars Müller 2006-02-02 12:10:41 UTC
This is still the case with 2.6.16-rc1-git3-7-smp for Beta 3.
Comment 2 Christian Zoz 2006-02-02 12:29:13 UTC
Will this be fixed for 10.1
Comment 3 Forgotten User 85NSFCoNoT 2006-02-02 22:48:42 UTC
investigating.
Comment 4 Forgotten User 85NSFCoNoT 2006-02-04 00:57:38 UTC
i was able to duplicate this bug.  The problem is that the shpchp driver is not able to reserve the MMIO region that it needs, and the controller initialization fails, returning -ENODEV from shpc_probe.  However, the driver_attach() function seems to continue to try to call the shpc_probe routine over and over again despite the driver consistently returning -ENODEV.  It is likely that there are 2 problems 1) why is MMIO not able to reserved and 2) why is driver_attach() called in an endless loop.  I will attempt to address 1) next week.
Comment 5 Greg Kroah-Hartman 2006-02-06 21:53:55 UTC
1) mmio will not be assigned as it is for an invalid device, right?

and for 2), that's just wrong :)
Comment 6 Forgotten User 85NSFCoNoT 2006-02-06 22:22:53 UTC
1) shpchp doesn't support .remove - so when the user writes the device
number to unbind, nothing in the driver actually frees any resources,
although it appears in sysfs as if that operation was successful.

2) when the user attempts to rebind, the driver will not be able to
reserve MMIO because it never freed it, and even still is using it since
it never really unbound.  This will cause the driver to return -ENODEV
back from the bind operation.  Then, driver_bind will go into an endless
loop, trying to call shpchp's probe routine over and over again.

This problem should happen on a number of the pci hotplug controller
drivers, since not that many of them actually support .remove - so it is
a larger problem than just shpchp.  Am working on a patch to add .remove support to shpchp, however, this will only address one of the several problems listed here.
Comment 7 Christian Zoz 2006-02-15 14:38:47 UTC
*** Bug 150395 has been marked as a duplicate of this bug. ***
Comment 8 Forgotten User 85NSFCoNoT 2006-04-20 17:42:00 UTC
bug is fixed and included upstream in 2.6.17-rc1
Comment 9 Ralf Flaxa 2006-04-26 13:54:59 UTC
Greg, will we get this fix also for CODE10?
Comment 10 Greg Kroah-Hartman 2006-04-26 18:26:44 UTC
Don't take Kristen off the bug...
Comment 11 Greg Kroah-Hartman 2006-04-26 18:27:42 UTC
No, this will not go into CODE10.  It is much too late, and the fix is much too big.

It _might_ go into SP1, but that depends on if the proper FATE entry is
approved or not.
Comment 12 Gordon Schumacher 2006-05-16 22:18:49 UTC
*** Bug 175572 has been marked as a duplicate of this bug. ***
Comment 13 Gordon Schumacher 2006-05-17 15:55:42 UTC
***  Bug 175572 has been marked as a duplicate of this bug. ***

(Something's screwed up with your Bugzilla install - I marked another bug as a duplicate of this one nearly seventeen hours ago, and it's still telling me I've got a mid-air collision when I submit this.  So I'm adding it manually with this comment.)

So... for inquiring minds that want to know, is there a way around this bug?  I've got a Dell PowerEdge that displays it, and therefore 10.1 is completely unusable to me for the time being.  I'm planning on dropping back to 10.0 for now, since that's been working beautifully for me on all the machines I've run it on.

I'm loath to do it myself, not being on the SuSE team, but please, someone bump this higher than a P5?!?  As it stands, I'm having to tell people to leave this release alone for now...
Comment 14 Greg Kroah-Hartman 2006-05-17 16:01:39 UTC
Why are you trying to manually bind the shpchp driver to anything?  What
is the problem that you are trying to solve?
Comment 15 Gordon Schumacher 2006-05-17 22:47:58 UTC
I'm not trying to do anything special; I got here from bug 150395, which was marked as a duplicate.  I installed SuSE 10.1 on my server and got a pair of hwup scripts sucking down all the CPU time they could get (the system currently has two network interfaces, the e1000 on the mainboard and a plug-in e100).
Comment 16 Greg Kroah-Hartman 2006-05-17 22:55:35 UTC
Someone resolved the bug incorrectly, this one has nothing to do with
your original one, sorry.
Comment 17 Christian Zoz 2006-05-18 17:52:53 UTC
I resolved it, but not incorrectly. huwup loops because of this problem. 'echo <anything> > /sys/<anything>' must not loop forever.

Gordon, see bug 150395 initial comment. You may create a hwcfg-file to work around this problem. So you can use 10.1. ;)
Comment 18 Gordon Schumacher 2006-05-18 19:15:50 UTC
Ah, I'd missed that detail about the hotplug controller when I was looking before - I'd been fighting with the system on various fronts for a couple of days by that point and was pretty fried.  Thanks.
Comment 19 Nkoli Ukpabi 2006-05-19 18:02:53 UTC
Gordon, I have the same setup and the same problem on my laptop, ie two cards and two hwup scripts running at startup... my bug reporting on bug 175572 wasn't too great. Anyway, I changed the startmode to manual in /etc/sysconfig/hardware/hwcfg-bus-usb-2-2:1.0 to get around it.
Comment 20 Stephan Kulow 2008-06-25 09:34:32 UTC
mass reopening all SuSE Linux bugs that are set to REMIND+LATER to change the resolution to WONTFIX (adapting to new policy)
Comment 21 Stephan Kulow 2008-06-25 09:36:42 UTC
mass reopening all SuSE Linux bugs that are set to REMIND+LATER to change the resolution to WONTFIX (adapting to new policy)
Comment 22 Stephan Kulow 2008-06-25 09:41:47 UTC
mass reopening all SuSE Linux bugs that are set to REMIND+LATER to change the resolution to WONTFIX (adapting to new policy)
Comment 23 Stephan Kulow 2008-06-25 09:53:13 UTC
Closing old LATER+REMIND bugs as WONTFIX - if you still plan to work on it, feel free to reopen and set to ASSIGNED.

In case the report saw repeated reopen comments, it's due to bugzilla timing out on the huge request ;(