|
Bugzilla – Full Text Bug Listing |
| Summary: | e100 crashes on resume from suspend-to-RAM | ||
|---|---|---|---|
| Product: | [openSUSE] SUSE Linux 10.1 | Reporter: | Joachim Gleissner <joachim.gleissner> |
| Component: | Kernel | Assignee: | Olaf Kirch <okir> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | dmueller |
| Version: | Beta 1 | ||
| Target Milestone: | --- | ||
| Hardware: | i386 | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | Other | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
Picture of the Oops
Proposed patch |
||
|
Description
Joachim Gleissner
2006-01-24 15:03:51 UTC
Created attachment 64761 [details]
Picture of the Oops
exactly the same oops happens after suspend to disk or after echo -n 2 > $SYSFS_DEVICE_PATH/power/state echo -n 0 > $SYSFS_DEVICE_PATH/power/state # oopsen here I reproduced this on a second e100 machine (Compaq armada e500), the same oops. Did still work with 2.6.15-git12-6 (beta1 kernel) Stefan, can you try to generate the diff between last working and see if you can spot something interesting? Otherwise just add prints into e100_hw_init to see where it dereferences the NULL, and try to fix it. I'm afraid I do not have the right hardware.... (feel free to reassign to any kernel hacker that has e100.... Karsten had some collection on network cards?) Karsten, anything you can do to help here? I also notified lkml and netdev lists about this one. There was an e100 update in 2.6.16-rc1-git3, which seems to introduce
this problem.
Apparently it dies in e100_exec_cb_wait
if ((err = e100_exec_cb(nic, NULL, e100_setup_ucode)))
DPRINTK(PROBE,ERR, "ucode cmd failed with error %d\n", err);
/* we see this message in the oops; it returns ENOMEM because
* nic->cbs_avail == 0 */
/*...*/
while (!(cb->status & cpu_to_le16(cb_complete))) {
msleep(10);
if (!--counter) break;
}
I think it dies while referencing cb->status, which is NULL. That's because
the cb's aren't allocated until later.
Created attachment 64889 [details]
Proposed patch
kalman-okir-587 kernel-default: IN PROGRESS - i386: not started yet please test [mbuild kalman-okir-587] kernel-default on i386: succeeded Yes, sir. I can boogie. Works for me on Armada E500 and suspend to RAM. Thanks for confirming. Fix is in CVS tree Just for the record, it also works on the Sony Vaio now. Thanks! *** Bug 145507 has been marked as a duplicate of this bug. *** Following a comment from Jesse Brandenburg on netdev, I have adapted the patch to simply not call hw_init inside the resume() function. I'm currently building a new kernel with this patch, and I would like people to test this: - suspend to RAM and resume - suspend to disk and resume - ifconfig eth0 down; suspend/resume; ifconfig up mbuild job is - queued kernel-default for dist i386 Your jobid is 'kalman-okir-589'. Reports will be sent to okir@suse.de. The mbuild job is kalman-okir-593 kernel-default: IN PROGRESS - i386: building (on bach-1, ETA at 14:08) seems to work here, but now produces this: Jan 26 15:44:32 schleppi klogd: Debug: sleeping function called from invalid context at mm/slab.c:2515 Jan 26 15:44:32 schleppi klogd: in_atomic():0, irqs_disabled():1 Jan 26 15:44:32 schleppi klogd: [<c014cc6a>] kmem_cache_alloc+0x1b/0x79 Jan 26 15:44:32 schleppi klogd: [<c01ce167>] acpi_os_acquire_object+0xb/0x36 Jan 26 15:44:32 schleppi klogd: [<c01e4f30>] acpi_ut_allocate_object_desc_dbg+0x13/0x49 Jan 26 15:44:32 schleppi klogd: [<c01e4f7b>] acpi_ut_create_internal_object_dbg+0x15/0x68 Jan 26 15:44:32 schleppi klogd: [<c01e1169>] acpi_rs_set_srs_method_data+0x3d/0xb7 Jan 26 15:44:32 schleppi klogd: [<c014bcde>] cache_alloc_debugcheck_after+0xb8/0xea Jan 26 15:44:32 schleppi klogd: [<c01e870b>] acpi_pci_link_set+0x40/0x1c0 Jan 26 15:44:32 schleppi klogd: [<c01e87d1>] acpi_pci_link_set+0x106/0x1c0 Jan 26 15:44:32 schleppi klogd: [<c01e88e0>] irqrouter_resume+0x55/0x73 Jan 26 15:44:32 schleppi klogd: [<c020af67>] __sysdev_resume+0x11/0x53 Jan 26 15:44:32 schleppi klogd: [<c020b0a7>] sysdev_resume+0x16/0x47 Jan 26 15:44:32 schleppi klogd: [<c020efd2>] device_power_up+0x5/0xa Jan 26 15:44:32 schleppi klogd: [<c012ef8c>] swsusp_suspend+0x6b/0x85 Jan 26 15:44:32 schleppi klogd: [<c012fdf6>] pm_suspend_disk+0x44/0xd1 Jan 26 15:44:32 schleppi klogd: [<c012e4cc>] enter_state+0x50/0x160 Jan 26 15:44:32 schleppi klogd: [<c012e664>] state_store+0x88/0x95 Jan 26 15:44:32 schleppi klogd: [<c012e5dc>] state_store+0x0/0x95 Jan 26 15:44:32 schleppi klogd: [<c0182666>] subsys_attr_store+0x1e/0x22 Jan 26 15:44:32 schleppi klogd: [<c0182927>] sysfs_write_file+0x9b/0xc1 Jan 26 15:44:32 schleppi klogd: [<c018288c>] sysfs_write_file+0x0/0xc1 Jan 26 15:44:32 schleppi klogd: [<c014f816>] vfs_write+0xa1/0x146 Jan 26 15:44:32 schleppi klogd: [<c014fd2c>] sys_write+0x3c/0x63 Jan 26 15:44:32 schleppi klogd: [<c0102a3b>] sysenter_past_esp+0x54/0x79 That seems to be a different problem in the generic swsusp code. Please open a new bug report for this Anyone else able to confirm that this patch fixes the problem as well? works fine. The sleeping in atomic is known and harmless. It goes away as soon as we disable the debugging again :-) the sleeping in atomic is actually in generic acpi code, it happens on suspedn to ram and disk. And the ACPI guys know about it. Works for me, too. Thanks, updated patch is in CVS. |