Bugzilla – Bug 134548
x86_64/mm/init.c:146 bad pte ..., while suspending to disk APIC enabled
Last modified: 2006-01-26 10:49:12 UTC
Hi, My Notebook (nx6125) freezes during suspend to disk with the following kernel screen: http://homepage.hispeed.ch/hb9xcg/img_0105.jpg Additional info about the suspend procedure: http://homepage.hispeed.ch/hb9xcg/suspend2disk.log If you need more info about the system I'll be happy to send them. Best regards, Markus Walser
Pavel, I'm assigning this to you
Is it reproducible? It looks like duplicate of bug #119833 to me.
I tried three times to suspend and hit this bug every time. Note that this bug happend during suspend while bug #119833 happend during resume. Shall I try to apply the patch mentioned in bug #119833 or do you have a more recent one from Andi to test?
Stefan, have you seen something similar? Can you try it with minimum drivers? init=/bin/bash.
i haven't seen this (but i don't have many x86_64 machines) and it looks pretty different to bug#119833 (to me :-). Maybe the "bad pte" is not really the fatal error but the driver for device 0000:00:13.0 is hanging on resume-during-suspend?
Hi, Just tried to suspend with init=/bin/bash and "echo 4 > /proc/acpi/sleep". (The only thing I did after booting was a "swapon /dev/hda2" and remounting /proc). It ended with almost the same result: http://homepage.hispeed.ch/hb9xcg/suspend_with_init_bash.jpg Can you give me an advise how to find out what´s behind 0000:00:13.0? A "lspci | grep 13.0" would report: 00:13.0 USB Controller: ATI Technologies Inc IXP SB400 USB Host Controller
BTW, trying to suspend to ram with "echo 3 > /proc/acpi/sleep" switchs off the notebook. Not that I could resume, but at least it suspends to ram. Could it be a problem that I have 1.5GB RAM and only 1GB swap?
USB Host would be ehci_hcd, uhci_hcd or ohci_hcd, but since with init=/bin/bash those are not loaded, they are probably not the troublemakers. Also, the swap size does not really matter here. Sorry, i have no further ideas.
What are your current parameters at kernel command line? Can you try adding noapic?
cmdline is: root=/dev/hda3 vga=0x342 selinux=0 resume=/dev/hda2 splash=0 init=/bin/bash With additional parameter such as noapic or pci=noacpi or acpi=off the machine doesn´t boot and prints no messages at all. The only acpi thing I found which boots is acpi=oldboot, but the result is about the same after "echo shutdown >/sys/power/disk;echo disk>/sys/power/state": http://homepage.hispeed.ch/hb9xcg/suspend_with_acpi_oldboot.jpg
Do you use SMP kernel by chance?
I can't verify it at the moment because the notebook is at home. But according to the suspend2disk.log it's SuSEs default kernel /boot/2.6.13-15-default which doesn't have SMP support, I suppose.
It's definitely no SMP: turion:~ # cat /proc/config.gz | gunzip | grep SMP CONFIG_BROKEN_ON_SMP=y # CONFIG_SMP is not set turion:~ # uname -a Linux turion 2.6.13-15-default #1 Tue Sep 13 14:56:15 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux Complete config is here: http://homepage.hispeed.ch/hb9xcg/config
Okay, I guess we see the problem as 113886 -- APIC troubles. I guess testing 32-bit kernel would be hard?
Also try latest vanilla kernel... Hopefully it is 113886 duplicate. *** This bug has been marked as a duplicate of 113886 ***
I think it is more duplicated with this bug: http://bugzilla.kernel.org/show_bug.cgi?id=5534
Hi, I tried suspend to disk again with the 2.6.16-rc1-mm3 kernel and got interessting results. Basically resuming from disk work the first time on this nx6125. But with an ugly soft lockup on CPU0 during suspend to disk: http://homepage.hispeed.ch/hb9xcg/img_0410.jpg That followed resuming went well. Config was: http://homepage.hispeed.ch/hb9xcg/config-2.6.16-rc1-mm3 And SystemMap was: http://homepage.hispeed.ch/System.map-2.6.16-rc1-mm3-mw An the dmesg after resume: http://homepage.hispeed.ch/resume-2.6.16-rc1-mm3.log May be there's a connection between this lockup and the messages like: " osl-0822 [77] os_wait_semaphore : Failed to acquire semaphore[ffff81005733ad40|1|0], AE_TIME" Which I see very often in the log. F.e. when accessing the proc filesystem: "cat /proc/acpi/thermal_zone/TZ1/temperature" Any suggestions?
These seem to be different problems, please log them into bugzilla.kernel.org.
Two links in Comment #17 are wrong. They should be: http://homepage.hispeed.ch/hb9xcg/System.map-2.6.16-rc1-mm3-mw http://homepage.hispeed.ch/hb9xcg/resume-2.6.16-rc1-mm3.log
Done: http://bugzilla.kernel.org/show_bug.cgi?id=5962