Bugzilla – Bug 145197
e100 crashes on resume from suspend-to-RAM
Last modified: 2006-01-26 16:04:08 UTC
Machine is a Sony Vaio VGN-FS115B. I'll attach a screen shot of the oops.
Created attachment 64761 [details] Picture of the Oops
exactly the same oops happens after suspend to disk or after echo -n 2 > $SYSFS_DEVICE_PATH/power/state echo -n 0 > $SYSFS_DEVICE_PATH/power/state # oopsen here I reproduced this on a second e100 machine (Compaq armada e500), the same oops. Did still work with 2.6.15-git12-6 (beta1 kernel)
Stefan, can you try to generate the diff between last working and see if you can spot something interesting? Otherwise just add prints into e100_hw_init to see where it dereferences the NULL, and try to fix it. I'm afraid I do not have the right hardware.... (feel free to reassign to any kernel hacker that has e100.... Karsten had some collection on network cards?)
Karsten, anything you can do to help here? I also notified lkml and netdev lists about this one.
There was an e100 update in 2.6.16-rc1-git3, which seems to introduce this problem. Apparently it dies in e100_exec_cb_wait if ((err = e100_exec_cb(nic, NULL, e100_setup_ucode))) DPRINTK(PROBE,ERR, "ucode cmd failed with error %d\n", err); /* we see this message in the oops; it returns ENOMEM because * nic->cbs_avail == 0 */ /*...*/ while (!(cb->status & cpu_to_le16(cb_complete))) { msleep(10); if (!--counter) break; } I think it dies while referencing cb->status, which is NULL. That's because the cb's aren't allocated until later.
Created attachment 64889 [details] Proposed patch
kalman-okir-587 kernel-default: IN PROGRESS - i386: not started yet please test
[mbuild kalman-okir-587] kernel-default on i386: succeeded
Yes, sir. I can boogie. Works for me on Armada E500 and suspend to RAM.
Thanks for confirming. Fix is in CVS tree
Just for the record, it also works on the Sony Vaio now. Thanks!
*** Bug 145507 has been marked as a duplicate of this bug. ***
Following a comment from Jesse Brandenburg on netdev, I have adapted the patch to simply not call hw_init inside the resume() function. I'm currently building a new kernel with this patch, and I would like people to test this: - suspend to RAM and resume - suspend to disk and resume - ifconfig eth0 down; suspend/resume; ifconfig up mbuild job is - queued kernel-default for dist i386 Your jobid is 'kalman-okir-589'. Reports will be sent to okir@suse.de.
The mbuild job is kalman-okir-593 kernel-default: IN PROGRESS - i386: building (on bach-1, ETA at 14:08)
seems to work here, but now produces this: Jan 26 15:44:32 schleppi klogd: Debug: sleeping function called from invalid context at mm/slab.c:2515 Jan 26 15:44:32 schleppi klogd: in_atomic():0, irqs_disabled():1 Jan 26 15:44:32 schleppi klogd: [<c014cc6a>] kmem_cache_alloc+0x1b/0x79 Jan 26 15:44:32 schleppi klogd: [<c01ce167>] acpi_os_acquire_object+0xb/0x36 Jan 26 15:44:32 schleppi klogd: [<c01e4f30>] acpi_ut_allocate_object_desc_dbg+0x13/0x49 Jan 26 15:44:32 schleppi klogd: [<c01e4f7b>] acpi_ut_create_internal_object_dbg+0x15/0x68 Jan 26 15:44:32 schleppi klogd: [<c01e1169>] acpi_rs_set_srs_method_data+0x3d/0xb7 Jan 26 15:44:32 schleppi klogd: [<c014bcde>] cache_alloc_debugcheck_after+0xb8/0xea Jan 26 15:44:32 schleppi klogd: [<c01e870b>] acpi_pci_link_set+0x40/0x1c0 Jan 26 15:44:32 schleppi klogd: [<c01e87d1>] acpi_pci_link_set+0x106/0x1c0 Jan 26 15:44:32 schleppi klogd: [<c01e88e0>] irqrouter_resume+0x55/0x73 Jan 26 15:44:32 schleppi klogd: [<c020af67>] __sysdev_resume+0x11/0x53 Jan 26 15:44:32 schleppi klogd: [<c020b0a7>] sysdev_resume+0x16/0x47 Jan 26 15:44:32 schleppi klogd: [<c020efd2>] device_power_up+0x5/0xa Jan 26 15:44:32 schleppi klogd: [<c012ef8c>] swsusp_suspend+0x6b/0x85 Jan 26 15:44:32 schleppi klogd: [<c012fdf6>] pm_suspend_disk+0x44/0xd1 Jan 26 15:44:32 schleppi klogd: [<c012e4cc>] enter_state+0x50/0x160 Jan 26 15:44:32 schleppi klogd: [<c012e664>] state_store+0x88/0x95 Jan 26 15:44:32 schleppi klogd: [<c012e5dc>] state_store+0x0/0x95 Jan 26 15:44:32 schleppi klogd: [<c0182666>] subsys_attr_store+0x1e/0x22 Jan 26 15:44:32 schleppi klogd: [<c0182927>] sysfs_write_file+0x9b/0xc1 Jan 26 15:44:32 schleppi klogd: [<c018288c>] sysfs_write_file+0x0/0xc1 Jan 26 15:44:32 schleppi klogd: [<c014f816>] vfs_write+0xa1/0x146 Jan 26 15:44:32 schleppi klogd: [<c014fd2c>] sys_write+0x3c/0x63 Jan 26 15:44:32 schleppi klogd: [<c0102a3b>] sysenter_past_esp+0x54/0x79
That seems to be a different problem in the generic swsusp code. Please open a new bug report for this
Anyone else able to confirm that this patch fixes the problem as well?
works fine. The sleeping in atomic is known and harmless. It goes away as soon as we disable the debugging again :-)
the sleeping in atomic is actually in generic acpi code, it happens on suspedn to ram and disk. And the ACPI guys know about it.
Works for me, too.
Thanks, updated patch is in CVS.