Bug 117248 - shutdown starts after resume from ram
Summary: shutdown starts after resume from ram
Status: VERIFIED FIXED
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Mobile Devices (show other bugs)
Version: RC 1
Hardware: i686 All
: P5 - None : Normal
Target Milestone: ---
Assignee: Forgotten User ZhJd0F0L3x
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-15 15:35 UTC by Harald Koenig
Modified: 2007-06-05 11:20 UTC (History)
2 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
syslogd and acpid log files (13.59 KB, application/x-gtar)
2005-09-15 16:29 UTC, Harald Koenig
Details
handle acpid events before dbus events (1.91 KB, patch)
2005-09-15 18:18 UTC, Forgotten User ZhJd0F0L3x
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Harald Koenig 2005-09-15 15:35:48 UTC
with both RC1 and beta-4 I have a strange new behaviour with suspend to ram with
beta-3 it seemed worked fine):

now after the 2nd resume seems to be finished and display is ok, suddenly a
shutdown gets started. why ??

in /var/log/messages I always (it's reproducable) some messages of "gconfd".
is gconfd switching to runlevel 0 ?

suspend-to-disk does not trigger that shutdown and I don't see any gconfd messages.
after 4 times "powersave -U" with no problem/shutdown the 2nd "powersave -u"
again triggered a shutdown.


this is _very_ annoying_ because it kills all my open apps.
right now I can't reliably suspend/resume to ram with SUSE 10.0 :-(

which other information do you need ?


Sep 14 21:25:42 harald kernel: ACPI: PCI interrupt for device 0000:00:07.2 disabled
Sep 15 10:14:50 harald kernel: Stopping tasks:
===================================================================|
Sep 15 10:14:50 harald kernel: ACPI: PCI interrupt for device 0000:00:0b.1 disabled
Sep 15 10:14:50 harald kernel: ACPI: PCI interrupt for device 0000:00:0b.0 disabled
Sep 15 10:14:50 harald kernel: ACPI: PCI interrupt for device 0000:00:00.1 disabled
Sep 15 10:14:50 harald kernel: Back to C!
Sep 15 10:14:50 harald kernel: ACPI: PCI Interrupt 0000:00:00.1[B] -> Link
[LNKB] -> GSI 11 (level, low) -> IRQ 11
Sep 15 10:14:50 harald kernel: PCI: Setting latency timer of device 0000:00:00.1
to 64
Sep 15 10:14:50 harald kernel: ACPI: PCI Interrupt 0000:00:0b.0[A] -> Link
[LNKA] -> GSI 11 (level, low) -> IRQ 11
Sep 15 10:14:50 harald kernel: ACPI: PCI Interrupt 0000:00:0b.1[B] -> Link
[LNKB] -> GSI 11 (level, low) -> IRQ 11
Sep 15 10:14:50 harald kernel: eth0: link down
Sep 15 10:14:50 harald kernel: Restarting tasks... done
Sep 15 10:14:50 harald kernel: USB Universal Host Controller Interface driver v2.3
Sep 15 10:14:50 harald kernel: ACPI: PCI Interrupt 0000:00:07.2[D] -> Link
[LNKD] -> GSI 11 (level, low) -> IRQ 11
Sep 15 10:14:50 harald kernel: uhci_hcd 0000:00:07.2: UHCI Host Controller
Sep 15 10:14:50 harald kernel: uhci_hcd 0000:00:07.2: new USB bus registered,
assigned bus number 1
Sep 15 10:14:50 harald kernel: uhci_hcd 0000:00:07.2: irq 11, io base 0x0000f840
Sep 15 10:14:50 harald kernel: hub 1-0:1.0: USB hub found
Sep 15 10:14:50 harald kernel: hub 1-0:1.0: 2 ports detected
Sep 15 10:14:50 harald kernel: Initializing USB Mass Storage driver...
Sep 15 10:14:50 harald kernel: usbcore: registered new driver usb-storage
Sep 15 10:14:50 harald kernel: USB Mass Storage support registered.
Sep 15 10:14:51 harald kernel: eth0: link up, 100Mbps, full-duplex, lpa 0x41E1
Sep 15 10:14:51 harald su: (to root) root on none
Sep 15 10:15:04 harald su: (to root) root on none
Sep 15 10:15:07 harald gconfd (root-31459): starting (version 2.11.92), pid
31459 user 'root'
Sep 15 10:15:07 harald gconfd (root-31459): Resolved address
"xml:readonly:/etc/opt/gnome/gconf/gconf.xml.mandatory" to a read-only
configuration source at po\
sition 0
Sep 15 10:15:07 harald gconfd (root-31459): Resolved address
"xml:readwrite:/root/.gconf" to a writable configuration source at position 1
Sep 15 10:15:07 harald gconfd (root-31459): Resolved address
"xml:readonly:/etc/opt/gnome/gconf/gconf.xml.defaults" to a read-only
configuration source at pos\
ition 2
Sep 15 10:15:10 harald init: Switching to runlevel: 0
Sep 15 10:15:13 harald hidd[4705]: Exit
Sep 15 10:15:13 harald sdpd[4658]: terminating...
Sep 15 10:15:13 harald hcid[4653]: Exit.
Sep 15 10:15:14 harald smartd[4941]: smartd received signal 15: Terminated
Sep 15 10:15:14 harald smartd[4941]: smartd is exiting (exit status 0)
Sep 15 10:15:15 harald smpppd[5054]: terminating on signal 15
Sep 15 10:15:15 harald sshd[5042]: Received signal 15; terminating.
Sep 15 10:15:19 harald kernel: Kernel logging (proc) stopped.
Sep 15 10:15:19 harald kernel: Kernel log daemon terminating.
Sep 15 10:15:20 harald exiting on signal 15
Sep 15 14:06:15 harald syslogd 1.4.1: restart.
Comment 1 Forgotten User ZhJd0F0L3x 2005-09-15 15:52:30 UTC
ACPI.

Please set the DEBUG level to 31 in /etc/sysconfig/powersave/common, restart
powersaved (maybe rotate /var/log/messages first), try it again. If it happens
again, "grep powersave /var/log/messages > /tmp/messages" and attach
/tmp/messages  and /var/log/acpid.log
Comment 2 Forgotten User ZhJd0F0L3x 2005-09-15 15:55:18 UTC
this may be a fallout of the "button events are reported late" bug. Timo,
Thomas, do you have a pointer to that? Maybe the button event is reported some
25 seconds late, at which time powersaved of course no longer ignores it (we
ignore button events directly after resume until resume cleanup is finished).
Comment 3 Harald Koenig 2005-09-15 16:29:33 UTC
Created attachment 50047 [details]
syslogd and acpid log files

these are complete log files for the following session:

boot system
powerdown -u
close LCD 
open LCD (this issues resume)
powerdown -u
close LCD 
open LCD (this issues resume)

after X11 is up again, the unwanted halt takes place.


before (not included in this log, I did a clean reboot after that first test)
I've tried to resume with power-button 2-3 times (LCD always open) but this
time it did not trigger the shutdown:-(

at least in beta-4 I'm sure I had this shutdowns too with only "powerdown -u"
and resume with power-button (not touching the LCD close/open switch).

does this help ?  any more details needed ?


thanks!
Comment 4 Harald Koenig 2005-09-15 16:32:06 UTC
(In reply to comment #2)
> this may be a fallout of the "button events are reported late" bug. Timo,
> Thomas, do you have a pointer to that? Maybe the button event is reported some
> 25 seconds late, at which time powersaved of course no longer ignores it (we
> ignore button events directly after resume until resume cleanup is finished).

if this is a timing issue, the following information might be important:

I'm using a Toshiba Portege 3480 notebook, which uses a P-III 600 MHz with only
192 MB ram, so that's not a fast machine on modern scales.

BUT the suspend/resume from/to ram takes much less than those 25 seconds
mentioned above...
Comment 5 Forgotten User ZhJd0F0L3x 2005-09-15 16:53:37 UTC
(In reply to comment #4)

> if this is a timing issue, the following information might be important:
> 
> I'm using a Toshiba Portege 3480 notebook, which uses a P-III 600 MHz with only
> 192 MB ram, so that's not a fast machine on modern scales.

this probably does not matter in this case, see below
 
> BUT the suspend/resume from/to ram takes much less than those 25 seconds
> mentioned above...

the problem is: Many machine generate a "button/power" event on resume from
suspend to RAM, regardless of them being resumed by lid open or powerbutton or a
simple keypress. Because of this, we ignore button/power events directly after
suspend, as long as the restore stuff / module reloading etc is not finished yet.

There is another bug which i referred to, that causes acpi buttons to be
reported late (always or sometimes randomly). If your button event from suspend
now happens to be delayed for ~1 minute (example), then the resume will be
finished. I timed the 25 seconds from your first log, from the time the usb
modules were loaded to the time the system went "init 0".

we could simply ignore the first button event after resume, no matter how long
it takes, but this would break those machines that do not generate such a button
event on resume and thus would also be wrong.

A work around for now might be to unload the button module before suspend and
reload after resume (although the delayed event may still be delivered, i am not
sure) or to simply tell powersaved to ignore the power button, unfortunately you
cannot use it to shutdown the machine then :-(

I will look into the verbose log (don't forget to reset the debug level to
something <=15 or it will spam your logs very badly :-) but i think that my
guess of the reason is pretty accurate.
Comment 6 Forgotten User ZhJd0F0L3x 2005-09-15 17:22:42 UTC
ok.
Holger, we have the following:
from acpid:
[Thu Sep 15 18:14:52 2005] received event "button/power PWRF 00000080 00000002"
[Thu Sep 15 18:14:52 2005] notifying client 5172[0:0]
[Thu Sep 15 18:14:52 2005] notifying client 5211[0:0]
[Thu Sep 15 18:14:52 2005] completed event "button/power PWRF 00000080 00000002"

from powersaved (excerpt):
Sep 15 18:14:14 compaq [powersave]: DIAG (ignore_buttons:1538) ignore_buttons is
set to: 1
Sep 15 18:14:14 compaq [powersave]: Info (sleep_wrapper:1420) Set machine into
suspend2disk/ suspend2ram mode 
Sep 15 18:14:52 compaq [powersave]: Info (sleep_wrapper:1436) Back from
suspend2ram inside sleep_wrapper 
Sep 15 18:14:53 compaq [powersave]: Debug (start:390) Select waiting: 36 secs
and 52 millisecs 
Sep 15 18:14:53 compaq [powersave]: Info (start:440) Handling dbus request on fd: 7
Sep 15 18:14:53 compaq [powersave]: Info (checkScriptReturn:1334) SCRIPT
returned:, Name: restore_after_suspend_to_ram, Return value: 0, Comment:
restore_after_suspend2ram finished 
Sep 15 18:14:53 compaq [powersave]: DIAG (ignore_buttons:1538) ignore_buttons is
set to: 0
Sep 15 18:14:53 compaq [powersave]: Debug (start:390) Select waiting: 36 secs
and 52 millisecs 
Sep 15 18:14:53 compaq [powersave]: Info (start:440) Handling dbus request on fd: 7
Sep 15 18:14:53 compaq [powersave]: Debug (start:390) Select waiting: 36 secs
and 52 millisecs 
Sep 15 18:14:53 compaq [powersave]: Debug (start:492) hw FD 6 is set, handling
it ... 
Sep 15 18:14:53 compaq [powersave]: DIAG (handleHWEventRequest:129) ACPI Event:
'button/power PWRF 00000080 00000002' ignore buttons: no
Sep 15 18:14:53 compaq [powersave]: Debug (handleHWEventRequest:142) type:
button/power, dev_name: PWRF, port: 00000080, count: 00000002 
Sep 15 18:14:53 compaq [powersave]: Info (handleHWEventRequest:194) button power
event occured 

I think this leads to a logical problem and a race condition, i will check the
code and then come back with a conclusion.

The ~20 seconds i saw are simply due to first trying to log out
KDE/GNOME/whatever and after that just plain calling shutdown
Comment 7 Harald Koenig 2005-09-15 17:28:28 UTC
(In reply to comment #5)

> the problem is: Many machine generate a "button/power" event on resume from
> suspend to RAM, regardless of them being resumed by lid open or powerbutton or a
> simple keypress. Because of this, we ignore button/power events directly after
> suspend, as long as the restore stuff / module reloading etc is not finished yet.
> 
> There is another bug which i referred to, that causes acpi buttons to be
> reported late (always or sometimes randomly). If your button event from suspend
> now happens to be delayed for ~1 minute (example), then the resume will be
> finished. I timed the 25 seconds from your first log, from the time the usb
> modules were loaded to the time the system went "init 0".
> 
> we could simply ignore the first button event after resume, no matter how long
> it takes, but this would break those machines that do not generate such a button
> event on resume and thus would also be wrong.
> 
> A work around for now might be to unload the button module before suspend and
> reload after resume (although the delayed event may still be delivered, i am not
> sure) or to simply tell powersaved to ignore the power button, unfortunately you
> cannot use it to shutdown the machine then :-(

yep, obviously not the right way to go.  thanks for the details!

a quick help for me might be to set 
    
    EVENT_BUTTON_POWER="suspend_to_ram"

this is what I'd like as default anyway, but didn't look for it until now.  with 
"suspend_to_ram" unstead of "wm_shutdown" I have to resume twice as worst case,
but I won't loose my whole session any more...


one more questions about powersave/events:

where can I find information which process/script/config files controls what
happens for those events, e.g. what gets started for e.g. "wm_shutdown" in the
first case, where can I change that action ?

I'm asking because right now my X11 display flashes once when I close/open the
LCD.  in the evens file I now found EVENT_BUTTON_LID_CLOSED="screen_saver" but
right now there seems to be no screen saver running ?!  I'd like to trigger a
"xlock -mode blank" for EVENT_BUTTON_LID_CLOSED (or better for the action
"screen_saver")

before SUSE 10 I only could use APM on this old Toshiba, and there I used
APMD_LOCK_MODE="blank" in powermanagement -- that's what I'd like to have again
with powersaved and ACPI.

btw, I'm using xfce2 (or before fvwm2), KDE is not an option, esp. not with
192MB ram (but neither with 1GB ;-)


> I will look into the verbose log (don't forget to reset the debug level to
> something <=15 or it will spam your logs very badly :-) but i think that my
> guess of the reason is pretty accurate.

thanks!
Comment 8 Harald Koenig 2005-09-15 17:32:09 UTC
(In reply to comment #6

> I think this leads to a logical problem and a race condition, i will check the
> code and then come back with a conclusion.
> 
> The ~20 seconds i saw are simply due to first trying to log out
> KDE/GNOME/whatever and after that just plain calling shutdown

one more timing detail:  if this unwanted shutdown gets issued, then it's
_always_ started almost immedieately (only 1, maybe 2 seconds) after the resume
is complete and X11 looks ok to me.  at least there is no "random timeout" after
which one more button event gets delivered...
Comment 9 Forgotten User ZhJd0F0L3x 2005-09-15 18:17:46 UTC
it is a race condition: we have a select waiting on 2 filedescriptors:
- acpid
- dbus

if select falls through, it checks first for dbus, then for acpid.
If enough dbus messages are in the queue, then the "button ignore" flag is reset
before the acpid event arrives. I'm building a test package for you to test
right now, it should appear shortly on
ftp.suse.com:/pub/people/seife/powersave/SUSE-10.0-test/10.0-i386/powersave-0.10.15-5.i586.rpm
(there is more than one version in this directory, make sure you use the latest
:-). Please reassign "wm_shutdown" or "notify" or something like that to the
power button, so that you actually see if it is still happening (it should not
;-) ).
Thanks for reporting and helping with debugging this
Comment 10 Forgotten User ZhJd0F0L3x 2005-09-15 18:18:53 UTC
Created attachment 50062 [details]
handle acpid events before dbus events

Holger, just for reference the patch i put into the package. If we get a
success report, i'll commit this to 10.0 branch and HEAD.
Comment 11 Holger Macht 2005-09-16 06:19:09 UTC
Ok, looks good to me. For futher versions, this problem might disappear anyway
because of glib integration. Thank you both for debugging.
Comment 12 Harald Koenig 2005-09-16 09:08:34 UTC
(In reply to comment #10)
> Created an attachment (id=50062) [edit]
> handle acpid events before dbus events
> 
> Holger, just for reference the patch i put into the package. If we get a
> success report, i'll commit this to 10.0 branch and HEAD.

SUCCESS!!  at least for the first 10+ resumes with both power-switch and
LCD close/open I never got a shutdown anymore (I changed EVENT_BUTTON_POWER
and restarted powersaved before;)


and to my surprise, now the xlock gets activated too for the first time!
great, almost... ;-)


starting xlock without KDE takes a long time, see my new report & fix #117434


thanks!
Comment 13 Forgotten User ZhJd0F0L3x 2005-09-16 09:17:23 UTC
ok, i'll check this fix into CVS. We will wait for other pending fixes and will
probably issue an online update once GM is out.