Bug 134548

Summary: x86_64/mm/init.c:146 bad pte ..., while suspending to disk APIC enabled
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Markus Walser <markus.walser>
Component: KernelAssignee: Pavel Machek <pavel>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None    
Version: unspecified   
Target Milestone: ---   
Hardware: x86-64   
OS: SuSE Linux 10.0   
Whiteboard:
Found By: Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Markus Walser 2005-11-19 16:12:12 UTC
Hi,
My Notebook (nx6125) freezes during suspend to disk with the following kernel screen:
http://homepage.hispeed.ch/hb9xcg/img_0105.jpg

Additional info about the suspend procedure:
http://homepage.hispeed.ch/hb9xcg/suspend2disk.log

If you need more info about the system I'll be happy to send them.

Best regards, Markus Walser
Comment 1 Olaf Kirch 2005-11-21 09:14:34 UTC
Pavel, I'm assigning this to you
Comment 2 Pavel Machek 2005-11-21 12:35:05 UTC
Is it reproducible? It looks like duplicate of bug #119833 to me.
Comment 3 Markus Walser 2005-11-21 15:34:58 UTC
I tried three times to suspend and hit this bug every time. Note that this bug happend during suspend while bug #119833 happend during resume.
Shall I try to apply the patch mentioned in bug #119833 or do you have a more recent one from Andi to test?
Comment 4 Pavel Machek 2005-11-21 19:49:18 UTC
Stefan, have you seen something similar?

Can you try it with minimum drivers? init=/bin/bash.
Comment 5 Forgotten User ZhJd0F0L3x 2005-11-21 20:12:30 UTC
i haven't seen this (but i don't have many x86_64 machines) and it looks pretty different to bug#119833 (to me :-).
Maybe the "bad pte" is not really the fatal error but the driver for device
0000:00:13.0 is hanging on resume-during-suspend?
Comment 6 Markus Walser 2005-11-21 21:18:24 UTC
Hi,
Just tried to suspend with init=/bin/bash and "echo 4 > /proc/acpi/sleep". (The only thing I did after booting was a "swapon /dev/hda2" and remounting /proc). It ended with almost the same result:
http://homepage.hispeed.ch/hb9xcg/suspend_with_init_bash.jpg

Can you give me an advise how to find out what´s behind 0000:00:13.0?
A "lspci | grep 13.0" would report:
00:13.0 USB Controller: ATI Technologies Inc IXP SB400 USB Host Controller

Comment 7 Markus Walser 2005-11-21 21:44:41 UTC
BTW, trying to suspend to ram with "echo 3 > /proc/acpi/sleep" switchs off the notebook. Not that I could resume, but at least it suspends to ram.

Could it be a problem that I have 1.5GB RAM and only 1GB swap?
Comment 8 Forgotten User ZhJd0F0L3x 2005-11-21 22:17:15 UTC
USB Host would be ehci_hcd, uhci_hcd or ohci_hcd, but since with init=/bin/bash those are not loaded, they are probably not the troublemakers.
Also, the swap size does not really matter here. Sorry, i have no further ideas.
Comment 9 Pavel Machek 2005-11-21 23:27:56 UTC
What are your current parameters at kernel command line? Can you try adding noapic?
Comment 10 Markus Walser 2005-11-22 23:07:17 UTC
cmdline is:
root=/dev/hda3 vga=0x342 selinux=0 resume=/dev/hda2 splash=0 init=/bin/bash

With additional parameter such as noapic or pci=noacpi or acpi=off the machine doesn´t boot and prints no messages at all.

The only acpi thing I found which boots is acpi=oldboot, but the result is about the same after "echo shutdown >/sys/power/disk;echo disk>/sys/power/state":
http://homepage.hispeed.ch/hb9xcg/suspend_with_acpi_oldboot.jpg
Comment 11 Pavel Machek 2005-11-22 23:38:06 UTC
Do you use SMP kernel by chance?
Comment 12 Markus Walser 2005-11-23 07:25:06 UTC
I can't verify it at the moment because the notebook is at home. But according to the suspend2disk.log it's SuSEs default kernel /boot/2.6.13-15-default which doesn't have SMP support, I suppose.
Comment 13 Markus Walser 2005-11-23 20:48:58 UTC
It's definitely no SMP:
turion:~ # cat /proc/config.gz | gunzip | grep SMP
CONFIG_BROKEN_ON_SMP=y
# CONFIG_SMP is not set

turion:~ # uname -a
Linux turion 2.6.13-15-default #1 Tue Sep 13 14:56:15 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux


Complete config is here:
http://homepage.hispeed.ch/hb9xcg/config
Comment 14 Pavel Machek 2005-11-25 19:33:12 UTC
Okay, I guess we see the problem as 113886 -- APIC troubles. I guess testing 32-bit kernel would be hard?
Comment 15 Pavel Machek 2006-01-11 10:23:00 UTC
Also try latest vanilla kernel... Hopefully it is 113886 duplicate.

*** This bug has been marked as a duplicate of 113886 ***
Comment 16 Markus Walser 2006-01-11 11:07:24 UTC
I think it is more duplicated with this bug:

http://bugzilla.kernel.org/show_bug.cgi?id=5534
Comment 17 Markus Walser 2006-01-26 09:49:38 UTC
Hi,
I tried suspend to disk again with the 2.6.16-rc1-mm3 kernel and got interessting results. Basically resuming from disk work the first time on this nx6125. But with an ugly soft lockup on CPU0 during suspend to disk:

http://homepage.hispeed.ch/hb9xcg/img_0410.jpg

That followed resuming went well.

Config was:
http://homepage.hispeed.ch/hb9xcg/config-2.6.16-rc1-mm3

And SystemMap was:
http://homepage.hispeed.ch/System.map-2.6.16-rc1-mm3-mw

An the dmesg after resume:
http://homepage.hispeed.ch/resume-2.6.16-rc1-mm3.log

May be there's a connection between this lockup and the messages like:

"     osl-0822 [77] os_wait_semaphore     : Failed to acquire semaphore[ffff81005733ad40|1|0], AE_TIME"

Which I see very often in the log. F.e. when accessing the proc filesystem:
"cat /proc/acpi/thermal_zone/TZ1/temperature"

Any suggestions?

Comment 18 Pavel Machek 2006-01-26 10:12:24 UTC
These seem to be different problems, please log them into bugzilla.kernel.org.
Comment 20 Markus Walser 2006-01-26 10:49:12 UTC
Done: http://bugzilla.kernel.org/show_bug.cgi?id=5962