Bug 439461

Summary: kernel 2.6.25.18-0.2 doesn't resume from StR
Product: [openSUSE] openSUSE 11.0 Reporter: Christian Deckelmann <christian.deckelmann>
Component: KernelAssignee: Frank Seidel <fseidel>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P5 - None CC: curdyben, kollix, meik.piepmeyer, meissner, mfrueh, r2s2, tschmidt
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard: maint:released:11.0:21569
Found By: IS&T Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Mainline kernel compilation & bisection instructions

Description Christian Deckelmann 2008-10-28 11:33:03 UTC
After the installing the latest update kernel (2.6.25.18-0.2) system doesn't resume from StR anymore. It suspends and on resume it just stops with a blinking cursor.

Hardware is Lenovo X60.

2.6.25.16-0.1 works.
Comment 1 Marcus Meissner 2008-10-28 11:35:04 UTC
Seife or Trenn most likely guilty ;)
Comment 3 Thomas Renninger 2008-10-28 12:35:07 UTC
Can you retry in some hours with a new kernel of the day, pls.
Including thi s changelog:
Tue Oct 28 13:33:17 CET 2008 - trenn@suse.de

- patches.fixes/acpi_cpufreq_ppc_fix_typo.patch: Fix signed

Please reopen if it still does not work.
Comment 4 Christian Deckelmann 2008-10-29 19:13:20 UTC
The new kernel doesn't work either. Well, in fact it works once, but at the second resume it seems I have to press some keys for progress. As soon as I press a key if moves forward to stop then again until I press a key. Looks like these key hits would generate an interrupt or something like that. At the third resume even that doesn't help.
Comment 5 Thomas Renninger 2008-10-29 20:26:17 UTC
Then it's something else...

Wait..., Matze also sees this but only together with his docking station.
Could it be that 2.6.25.16 also had the bug, but you did not realize it, because you didn't use the docking station?

Comment 6 Roland Schulz 2008-10-30 16:14:55 UTC
I also have the problem that s2ram works great with 2.6.25.16-0.1 but crashes after the update to 2.6.25.18. I reverted back to the .16 kernel and it works again. A bios update didn't help to make suspend work with .18. s2ram -n output wiht .18:
Machine matched entry 387:
    sys_vendor   = 'LENOVO'
    sys_product  = '8897*'
    sys_version  = ''
    bios_version = ''
Fixes: 0x3  S3_BIOS S3_MODE
This machine can be identified by:
    sys_vendor   = "LENOVO"
    sys_product  = "8897CTO"
    sys_version  = "ThinkPad T61"
    bios_version = "7LETC4WW (2.24 )"

It seems that bug 439353 is a duplicate of this bug not of the other bug which has been closed. 

This is without docking station.
Comment 7 Thomas Renninger 2008-10-30 17:24:18 UTC
Pavel, Rafael, do you have an idea?
I went through kernel-source.changes, I only had a look at the topics of the patches. These are not that much, but I could not find anything related.

I can debug that (bisect our CVS) together with deckel, but both of us have a week of holiday next week (I have 2 1/2 -> re-assigning).
Comment 8 Rafael Wysocki 2008-10-30 19:47:45 UTC
Well, I would check the upstream -stable branch on which kernel-pae-2.6.25.18-0.2 is based (BTW, is "pae" reserved for 32-bit kernel?  I hope it is).
Comment 9 Roland Schulz 2008-10-30 19:55:13 UTC
In my case I use the default kernel not the pae. I added my report to this bug because I assumed it is not depending on pae. My system is 64bit.
Comment 10 Pavel Machek 2008-10-30 20:19:19 UTC
If pressing keys makes it continue, that sounds like nohz problem. Maybe "nohz=off highres=off" helps?
Comment 11 Roland Schulz 2008-10-30 20:27:03 UTC
no pressing keys doesn't help.
Comment 12 Christian Deckelmann 2008-10-30 22:33:51 UTC
Pressing keys helped in my tests only once. At the next resume from StR it didn't help anymore.
In my case it is a PAE kernel because YaST decided to install that kernel flavour as my system has a CPU which supports then NX functionalty. Or the equivalent on intel architecture. There are bugs against YaST where this behaviour is documented.
Comment 13 Alexander Orlovskyy 2008-10-31 09:31:30 UTC
*** Bug 440392 has been marked as a duplicate of this bug. ***
Comment 14 Christian Deckelmann 2008-11-02 11:42:05 UTC
*** Bug 440856 has been marked as a duplicate of this bug. ***
Comment 15 Marcus Meissner 2008-11-02 12:49:28 UTC
we also upgraded from 2.6.25.16 to 2.6.25.18 , this might have caused this 
regression.

There is 1 additional ACPI event fix
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.25.y.git;a=commit;h=dc317ed0f9cb83f616b95ae6abdba44832c60f39
in there
and also various clock / timer related fixes.
Comment 16 Rafael Wysocki 2008-11-02 17:21:22 UTC
It certainly would be helpful if someone having this problem could test the vanilla 2.6.25.18 kernel and see if it's reproducible with that one.
Comment 17 Meik Piepmeyer 2008-11-02 20:34:39 UTC
vanilla 2.6.25.18 does _not_ work, should be reproducible. I did two tries, one waking up by key press and one by closing and opening. Kernel is a 32-bit PAE, machine a Lenovo T61.

But I think that many vanilla kernels do have this problems, as I tested 2.6.26.0 some time ago with the same results. 2.6.27.4 is working like a charm - I don't know if this information does help anybody, so I just add it.

Any other kernel to try?
Comment 18 Marcus Meissner 2008-11-02 21:43:48 UTC
Meik: can you please test vanilla 2.6.25.16 too?
Comment 19 Meik Piepmeyer 2008-11-02 23:24:39 UTC
Marcus: vanilla 2.6.25.16 works for me. Four suspends, four wake ups.
Comment 20 Pavel Machek 2008-11-03 09:00:16 UTC
Ok, so problem was introduced in vanilla kernels between 2.6.25.16 and 2.6.25.18. Meik: Can you test 2.6.25.17 by chance?

Adding greg, perhaps he has idea what went wrong.
Comment 21 Pavel Machek 2008-11-03 09:10:58 UTC
I went through 2.6.25.16->18 changelogs, and:

This one could be it but then it would probably not be widespread:

commit 8e023f85b670c9f6008df675b6b213025b3387b3
Author: Yinghai Lu <yhlu.kernel@gmail.com>
Date:   Fri Aug 22 17:40:05 2008 +0000

    x86: work around MTRR mask setting
    
    commit 38cc1c3df77c1bb739a4766788eb9fa49f16ffdf upstream
    
    Joshua Hoblitt reported that only 3 GB of his 16 GB of RAM is
    usable. Booting with mtrr_show showed us the BIOS-initialized
    MTRR settings - which are all wrong.
    
Timer/clockevents issues could also be it, and I believe I actually debugged something like that... but not in -stable:

commit 22e4330618d27748cc69b62d3c96223bcefe6c6c
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sat Sep 6 03:06:08 2008 +0200

    x86: HPET: read back compare register before reading counter
    
    commit 72d43d9bc9210d24d09202eaf219eac09e17b339 upstream
    
    After fixing the u32 thinko I sill had occasional hickups on ATI chipsets
    with small deltas. There seems to be a delay between writing the compare
    register and the transffer to the internal register which triggers the
    interrupt. Reading back the value makes sure, that it hit the internal
    match register befor we compare against the counter value.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

commit 59ff733c6b6ef547bb09a9902020750dfbb2200f
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sat Sep 6 03:03:32 2008 +0200

    x86: HPET fix moronic 32/64bit thinko
    
    commit f7676254f179eac6b5244a80195ec8ae0e9d4606 upstream
    
    We use the HPET only in 32bit mode because:
    1) some HPETs are 32bit only
    2) on i386 there is no way to read/write the HPET atomic 64bit wide
    
    The HPET code unification done by the "moron of the year" did
    not take into account that unsigned long is different on 32 and
    64 bit.
    
    This thinko results in a possible endless loop in the clockevents
    code, when the return comparison fails due to the 64bit/332bit
    unawareness.
    
    unsigned long cnt = (u32) hpet_read() + delta can wrap over 32bit.
    but the final compare will fail and return -ETIME causing endless
    loops.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

commit 9d29a18def727d9e0d5c656cfc86a278988b7926
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sat Sep 6 03:01:45 2008 +0200

    clockevents: broadcast fixup possible waiters
    
    commit 7300711e8c6824fcfbd42a126980ff50439d8dd0 upstream
    
    Until the C1E patches arrived there where no users of periodic broadcast
    before switching to oneshot mode. Now we need to trigger a possible
    waiter for a periodic broadcast when switching to oneshot mode.
    Otherwise we can starve them for ever.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

commit 4f8e2bf785bd7e5ba9b93a7d5ad5c18ba409a199
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Sep 3 21:37:24 2008 +0000

    HPET: make minimum reprogramming delta useful
    
    commit 7cfb0435330364f90f274a26ecdc5f47f738498c upstream
    
    The minimum reprogramming delta was hardcoded in HPET ticks,
    which is stupid as it does not work with faster running HPETs.
    The C1E idle patches made this prominent on AMD/RS690 chipsets,
    where the HPET runs with 25MHz. Set it to 5us which seems to be
    a reasonable value and fixes the problems on the bug reporters
    machines. We have a further sanity check now in the clock events,
    which increases the delta when it is not sufficient.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
    Tested-by: Dmitry Nezhevenko <dion@inhex.net>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

commit 19ab6cbbf02a7d4ca81ef44cc856ce11870e202b
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Sep 3 21:37:14 2008 +0000

    clockevents: prevent endless loop lockup
    
    commit 1fb9b7d29d8e85ba3196eaa7ab871bf76fc98d36 upstream
    
    The C1E/HPET bug reports on AMDX2/RS690 systems where tracked down to a
    too small value of the HPET minumum delta for programming an event.
    
    The clockevents code needs to enforce an interrupt event on the clock event
    device in some cases. The enforcement code was stupid and naive, as it just
    added the minimum delta to the current time and tried to reprogram the device.
    When the minimum delta is too small, then this loops forever.
    
    Add a sanity check. Allow reprogramming to fail 3 times, then print a warning
    and double the minimum delta value to make sure, that this does not happen again.
    Use the same function for both tick-oneshot and tick-broadcast code.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

commit ffa4da2a25bb4ac08f710ac99827baf48a8f8d57
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Sep 3 21:37:08 2008 +0000

    clockevents: prevent multiple init/shutdown
    
    commit 9c17bcda991000351cb2373f78be7e4b1c44caa3 upstream
    
    While chasing the C1E/HPET bugreports I went through the clock events
    code inch by inch and found that the broadcast device can be initialized
    and shutdown multiple times. Multiple shutdowns are not critical, but
    useless waste of time. Multiple initializations are simply broken. Another
    CPU might have the device in use already after the first initialization and
    the second init could just render it unusable again.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

commit e73068458bf253c2e738cd55080c3a54c61037ef
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Sep 3 21:37:03 2008 +0000

    clockevents: enforce reprogram in oneshot setup
    
    commit 7205656ab48da29a95d7f55e43a81db755d3cb3a upstream
    
    In tick_oneshot_setup we program the device to the given next_event,
    but we do not check the return value. We need to make sure that the
    device is programmed enforced so the interrupt handler engine starts
    working. Split out the reprogramming function from tick_program_event()
    and call it with the device, which was handed in to tick_setup_oneshot().
    Set the force argument, so the devices is firing an interrupt.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

commit fbbece349081a689d5687d9ebc769a847fdf423a
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Sep 3 21:36:57 2008 +0000

    clockevents: prevent endless loop in periodic broadcast handler
    
    commit d4496b39559c6d43f83e4c08b899984f8b8089b5 upstream
    
    The reprogramming of the periodic broadcast handler was broken,
    when the first programming returned -ETIME. The clockevents code
    stores the new expiry value in the clock events device next_event field
    only when the programming time has not been elapsed yet. The loop in
    question calculates the new expiry value from the next_event value
    and therefor never increases.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

commit 6141266c43db890ada7df589358b8553de2e6322
Author: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Date:   Wed Sep 3 21:36:50 2008 +0000

    clockevents: prevent clockevent event_handler ending up handler_noop
    
    commit 7c1e76897492d92b6a1c2d6892494d39ded9680c upstream
    
    There is a ordering related problem with clockevents code, due to which
    clockevents_register_device() called after tickless/highres switch
    will not work. The new clockevent ends up with clockevents_handle_noop as
    event handler, resulting in no timer activity.
    
    The problematic path seems to be
    
    * old device already has hrtimer_interrupt as the event_handler
    * new clockevent device registers with a higher rating
    * tick_check_new_device() is called
      * clockevents_exchange_device() gets called
        * old->event_handler is set to clockevents_handle_noop
      * tick_setup_device() is called for the new device
        * which sets new->event_handler using the old->event_handler which is noop.
    
    Change the ordering so that new device inherits the proper handler.
    
    This does not have any issue in normal case as most likely all the clockevent
    devices are setup before the highres switch. But, can potentially be affecting
    some corner case where HPET force detect happens after the highres switch.
    This was a problem with HPET in MSI mode code that we have been experimenting
    with.
    
    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Shaohua Li <shaohua.li@intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

...and of course, EC is always suspect:
commit dc317ed0f9cb83f616b95ae6abdba44832c60f39
Author: Zhao Yakui <yakui.zhao@intel.com>
Date:   Tue Sep 23 13:38:13 2008 +0800

    ACPI: Avoid bogus EC timeout when EC is in Polling mode
    
    commit 9d699ed92a459cb408e2577e8bbeabc8ec3989e1 upstream
    
    When EC is in Polling mode, OS will check the EC status continually by using
    the following source code:
           clear_bit(EC_FLAGS_WAIT_GPE, &ec->flags);
           while (time_before(jiffies, delay)) {
                   if (acpi_ec_check_status(ec, event))
           	            return 0;
                   msleep(1);
           }
    But msleep is realized by the function of schedule_timeout. At the same time
    although one process is already waken up by some events, it won't be scheduled
    immediately. So maybe there exists the following phenomena:
         a. The current jiffies is already after the predefined jiffies.
    	But before timeout happens, OS has no chance to check the EC
    	status again.
         b. If preemptible schedule is enabled, maybe preempt schedule will happen
    	before checking loop. When the process is resumed again, maybe
    	timeout already happens, which means that OS has no chance to check
    	the EC status.
    
    In such case maybe EC status is already what OS expects when timeout happens.
    But OS has no chance to check the EC status and regards it as AE_TIME.
    
    So it will be more appropriate that OS will try to check the EC status again
    when timeout happens. If the EC status is what we expect, it won't be regarded
    as timeout. Only when the EC status is not what we expect, it will be regarded
    as timeout, which means that EC controller can't give a response in time.
    
    http://bugzilla.kernel.org/show_bug.cgi?id=9823
    http://bugzilla.kernel.org/show_bug.cgi?id=11141
    
    Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
    Signed-off-by: Zhang Rui  <rui.zhang@intel.com>
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


...actually, can you try 2.6.27 kernel and/or opensuse11.1 beta?
Comment 22 Meik Piepmeyer 2008-11-03 10:09:04 UTC
I will test 2.6.25.17. I am currently running 2.6.27.4 and suspend works.
Comment 23 Meik Piepmeyer 2008-11-03 11:01:51 UTC
vanilla 2.6.25.17 works, too.
Comment 24 Rafael Wysocki 2008-11-03 12:20:24 UTC
So, clearly, there is a regression between 2.6.25.17 and 2.6.25.18 that makes suspend fail on your machine.

Unfortunately, as you can see from the Pavel's comment, there are quite some patches that may have introduced this regression and we can't really tell which one is the source of the problem without testing.  For this reason, would it be possible to carry out a binary search through commits made between 2.6.25.17 and 2.6.25.18?
Comment 25 Meik Piepmeyer 2008-11-03 12:34:10 UTC
Should be possible, but I am not familar with patching the kernel, reverting commits etc. If you'd give me so advice, I'll try with that.

By the way, would'd it be clever trying out the commits as listed by Pavel first?
Comment 26 Meik Piepmeyer 2008-11-05 10:45:53 UTC
I'm wondering if I was misunderstood.

I am willing to do the search und I know the process of a binary search. I just do not know how to patch the kernel when doing one iteration of the search. If one could give me advice how to deal with the commits, it would really helpfull.

I don't know if the common workflow deals with local patches or accessing a GIT tree or anything else.
Comment 27 Rafael Wysocki 2008-11-05 12:49:25 UTC
Sorry for the delay, I got distracted by some other urgencies.

I'd like you to binary search the patches applied to the -stable kernel after 2.6.25.17 and before 2.6.25.18.  I'll attach the instructions to the next comment.
Comment 28 Rafael Wysocki 2008-11-05 13:03:06 UTC
Created attachment 249988 [details]
Mainline kernel compilation & bisection instructions

The instructions are attached, but they are referring to the current mainline kernel.  To do the same for the -stable kernel 2.6.25.y you need to clone the -stable repository instead of the Linus' tree:

$ git clone \
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6-stable.git

(this should create a directory called linux-2.6-stable containing the repository)
and start the bisection by:

$ git bisect start
$ git bisect bad v2.6.25.18
$ git bisect good v2.6.25.17
Comment 29 Meik Piepmeyer 2008-11-05 19:34:09 UTC
Here is the output:

ffa4da2a25bb4ac08f710ac99827baf48a8f8d57 is first bad commit
commit ffa4da2a25bb4ac08f710ac99827baf48a8f8d57
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Sep 3 21:37:08 2008 +0000

    clockevents: prevent multiple init/shutdown

    commit 9c17bcda991000351cb2373f78be7e4b1c44caa3 upstream

    While chasing the C1E/HPET bugreports I went through the clock events
    code inch by inch and found that the broadcast device can be initialized
    and shutdown multiple times. Multiple shutdowns are not critical, but
    useless waste of time. Multiple initializations are simply broken. Another
    CPU might have the device in use already after the first initialization and
    the second init could just render it unusable again.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

:040000 040000 6bb2c37ac2a45ce8f97a37094949575a63750e5c eda80c4051ed1fb945a997732253385a7c9982bf M      kernel
Comment 30 Petr Baudis 2008-11-07 22:33:29 UTC
(FWIW, happens here with X41 as well, exactly the same symptoms.)
Comment 31 Meik Piepmeyer 2008-11-12 01:53:34 UTC
Any progress on this?
Comment 32 Kurt Bennater 2008-11-15 13:18:12 UTC
Hey, guys, Meik put a lot of effort into diagnosing what the error is. And now you are just ignoring this bug? These regressions are really a pity.
Comment 33 Rafael Wysocki 2008-11-15 13:27:59 UTC
We're not ignoring it, but we haven't put any new comments here just yet.
Comment 34 Rafael Wysocki 2008-11-16 12:13:14 UTC
*** Bug 445453 has been marked as a duplicate of this bug. ***
Comment 35 Christian Deckelmann 2008-11-16 15:27:38 UTC
JFYI:
The workaround from Tejun works for me.
Comment 36 Tejun Heo 2008-11-16 15:34:05 UTC
Dropping -pae- from the subject as it also happens on the default kernel.  Also, the bisection result doesn't seem to agree with my hacky workaround of unloading ehci_hcd before suspend.  cc'ing Oliver here too.  Oliver, any chance this is caused by something in usb proper?
Comment 37 Rafael Wysocki 2008-11-16 16:25:14 UTC
(In reply to comment #36 from Tejun Heo)
> Dropping -pae- from the subject as it also happens on the default kernel. 
> Also, the bisection result doesn't seem to agree with my hacky workaround of
> unloading ehci_hcd before suspend.

Hm, why exactly do you think it doesn't agree?  Perhaps ehci_hcd is only necessary to trigger the issue.  Have you tried with the bisected commit reverted?
Comment 38 Tejun Heo 2008-11-16 17:06:27 UTC
Maybe, it just looked not very likely.  Wouldn't timer issue appear earlier than ehci resume?  The reason why I try unloading modules was because libata seemed to resume and numlock was responsive which indicates that resume was stuck while walking through the driver resume methods.  Has anyone confirmed the bisection result?
Comment 39 Rafael Wysocki 2008-11-16 18:25:40 UTC
Well, no.
Comment 40 Tejun Heo 2008-11-17 03:20:34 UTC
Meh... forget about my workaround, it stopped working today for some reason and I can't resume at all.  :-(
Comment 41 Rafael Wysocki 2008-11-17 22:51:24 UTC
Does reverting the bisected patch help?
Comment 42 Meik Piepmeyer 2008-11-18 15:45:52 UTC
Yes, vanilla 2.6.25.18 works with the commit ffa4da2a25bb4ac08f710ac99827baf48a8f8d57 reverted.
Comment 43 Tejun Heo 2008-11-19 06:24:34 UTC
Confirmed with SUSE kernel with the commit reverted too.

 http://htj.dyndns.org/export/testing/sl110-x86_64-bug439461_dbg0/kernel-default-2.6.25.20-bug439461_dbg0.x86_64.rpm

 http://htj.dyndns.org/export/testing/sl110-x86_64-bug439461_dbg0/0001-revert-clockevents-prevent-multiple-init-shutdown

The commit couldn't be reverted by "patch -R -p1" so I did some manual editing.  I'm not entirely sure it's correct tho.  Anyways, with the above patch applied, resume works fine.
Comment 44 Kurt Bennater 2008-11-23 11:37:34 UTC
So, what is the standard procedure here? Are you going to release a patch for 2.6.25.18 or take care of it in an update to 2.6.25.20?
Comment 45 Rafael Wysocki 2008-11-23 12:36:56 UTC
Well, this is a regression in -stable and -stable regressions are (fortunately) rare, so it really is exceptional.
Comment 46 Kurt Bennater 2008-11-23 12:59:32 UTC
Fortunately, yes. But it seems that at least all Thinkpad users are concerned (I know of perhaps 10 people having the very same problem), and I would imagine that there are not so few of them. You don't want to suggest that all of them should compile their own kernels, do you?
I am particularly interested in this because downgrading back to 2.6.25.16 broke a few things which used to work before (great!) and I can't always reinstall on a productive system any time a new badly tested kernel update breaks the installation.
Comment 47 Petr Baudis 2008-11-23 13:12:50 UTC
Do you have some timeline of the update release, so that users can decide whether to wait further or roll their own solution? This bug really is super-annoying. :-(
Comment 48 Marcus Meissner 2008-11-23 13:20:43 UTC
we will be updating to the current 2.6.25.x stable release and include this patch.

just not sure when, but I hope in December. The 11.0 KOTD currently has a KABI change, which might cause more trouble then this issue. ;)
Comment 51 Rafael Wysocki 2008-11-25 15:38:19 UTC
*** Bug 443279 has been marked as a duplicate of this bug. ***
Comment 52 Frank Seidel 2008-11-26 17:17:45 UTC
just for notice: this also hits me on my Samsung Q45 (and all my thinkspads)
Comment 58 Kurt Bennater 2009-01-07 08:44:44 UTC
(In reply to comment #48 from Marcus Meissner)
> we will be updating to the current 2.6.25.x stable release and include this
> patch.
> 
> just not sure when, but I hope in December. The 11.0 KOTD currently has a KABI
> change, which might cause more trouble then this issue. ;)

I know that not all hopes are fulfilled, but could you please reconsider fixing this super-annoying bug? Thanks.

Comment 59 Greg Kroah-Hartman 2009-01-09 00:01:08 UTC
Frank was going to check this in...
Comment 62 Kurt Bennater 2009-01-14 15:26:09 UTC
I hate it that I have to play bad guy here. Somehow you keep forgetting this critical bug - can you explain to me why it takes more than two and a half months to release a kernel update if you know which patch you have to omit? (And even that was found out by somebody else, not by the Novell people!) I think that this behavior is going to frustrate even long-term SuSE users.
Comment 63 Greg Kroah-Hartman 2009-01-14 16:07:21 UTC
The patch has been availble in our kernel-of-the-day package, right?

And also, a 11.0 kernel update is already in the works, to be pushed out to all users soon, which will have this fix in it.

Sorry, we just forgot to close this bug out.
Comment 64 Meik Piepmeyer 2009-01-14 22:04:03 UTC
Well, this bug should not be closed until the kernel is available, because a system with the default kernel of 11.0 will have this issue until then.
Comment 65 Greg Kroah-Hartman 2009-01-14 22:22:16 UTC
Sorry, but no, that's not the way we use bugzilla.

Otherwise we would have to go back and do major amounts of "close this bug" type work when we do releases.  We close bugs when the fix is checked into our kernel tree, and it is publicly available, which this fix is, in the kernel-of-the-day package.
Comment 66 Marcus Meissner 2009-01-14 22:48:01 UTC
A testkernel is now in QA and also available for public testing:

http://download.opensuse.org/update/11.0-test/

(this contains all test updates, you might just want to update your kernel rpms.)
Comment 67 Swamp Workflow Management 2009-01-20 11:58:13 UTC
Update released for: kernel-debug, kernel-default, kernel-docs, kernel-kdump, kernel-pae, kernel-ppc64, kernel-ps3, kernel-rt, kernel-rt_debug, kernel-source, kernel-syms, kernel-vanilla, kernel-xen
Products:
openSUSE 11.0 (debug, i386, ppc, x86_64)