Bug 115459 - Mouse/Keyboard hang/pause in Beta4 on Sun W2100 dual opteron boxes
Summary: Mouse/Keyboard hang/pause in Beta4 on Sun W2100 dual opteron boxes
Status: RESOLVED FIXED
Alias: None
Product: SUSE LINUX 10.0
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 4
Hardware: x86-64 All
: P5 - None : Critical
Target Milestone: ---
Assignee: Andreas Kleen
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-06 17:59 UTC by Eric Whiting
Modified: 2005-10-11 20:39 UTC (History)
2 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
hwinfo (273.89 KB, text/plain)
2005-09-08 13:41 UTC, Eric Whiting
Details
hwinfo -- mistake -- this attachment is for a different bug. (581 bytes, text/plain)
2005-09-12 13:41 UTC, Eric Whiting
Details
Don't disable interrupts while polling thermal (5.64 KB, patch)
2005-09-13 11:23 UTC, Andreas Kleen
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Whiting 2005-09-06 17:59:01 UTC
In [suse-amd64] on several ocassions and in email (7/11/2005) with Andi this
keyboard/mouse hang/pause has been discussed before. 

With SUSE 10B4 it even seems to be worse. After B4 install and booting the
default kernel it was very hard to type. It was losing keystrokes about every
5-10s. 

B4 failsafe boot worked fine. 

The following cmdline does 'fix' the hang/pause. This is a fairly serious
problem that needs to be looked at a little more. (and a proper fix implemented)

 kernel /boot/vmlinuz root=/dev/sda1 selinux=0  vga=normal acpi=off noresume
edd=off
Comment 1 Vojtech Pavlik 2005-09-07 18:23:06 UTC
Does

    rmmod thermal

help? Also removing some other ACPI modules which use SMBus might help.
Comment 2 Eric Whiting 2005-09-07 19:23:56 UTC
rmmod thermal worked -- lots of hangs with thermal loaded. Hangs went away after
rmmod thermal. 

Current cmdline:
cat /proc/cmdline
root=/dev/sda1 selinux=0  vga=normal
Comment 3 Vojtech Pavlik 2005-09-07 19:55:23 UTC
Pavel, can you suggest what to do here? Thermal seems to be spending too much
time reading data, and not just in this case. I believe some
fixes for that appeared on l-k ...
Comment 4 Pavel Machek 2005-09-07 20:38:47 UTC
There is ec_burst patch (or how is that option called) that should help;
unfortunately it does something weird on other systems, so it can not be enabled
by default.
Comment 5 Thomas Renninger 2005-09-08 09:06:12 UTC
ec_burst is enabled by default.
Does it help to increase thermal polling?
echo 10 >/proc/acpi/thermal_zone/*/polling_frequency
Comment 6 Thomas Renninger 2005-09-08 11:28:47 UTC
With SUSE 10B4 it even seems to be worse. After B4 install and booting the
default kernel it was very hard to type. It was losing keystrokes about every
5-10s

-> this is probably because we increased thermal polling from 5 to 2 seconds,
please tell me a value that sovles your issue. Adding it is trivial but we are
running out of time...
This has also been declared as sysconfig variable now:
/etc/sysconfig/powersave/thermal
THERMAL_POLLING_FREQUENCY=""
Add your values (in seconds) there and restart the powersave daemon.

If I see this right you will always loose keystrokes on a thermal read?
So maybe there is no other workaround for you than setting the value very high
or just don't load the module?
Comment 7 Eric Whiting 2005-09-08 13:34:40 UTC
I set the interval to 10s and restarted  powersave. Then the mouse/keyboard hung
every 10s. (I did a 'watch date' in a window and then moved the window around on
the screen continuously -- sure enough it froze/hung every 10s.)

I don't think there is any value that will work... It is hard to type even with
it set at 10s. (like right now I've been having to backspace and fix things
several times just typing this)


Comment 8 Andreas Kleen 2005-09-08 13:37:37 UTC
Perhaps we just need to add a black list (PCI or DMI) and disable
thermal polling on this machine.

Can you attach hwinfo output?
Comment 9 Eric Whiting 2005-09-08 13:41:09 UTC
Created attachment 49213 [details]
hwinfo
Comment 10 Thomas Renninger 2005-09-08 14:38:37 UTC
Does ec_burst=0 help?
For what I know it should get worse, but you never know, it should be worth a try...
Comment 11 Eric Whiting 2005-09-08 17:30:35 UTC
I just updated the bios on this hardware to version 2.1.

No change -- same problem. 
Comment 12 Eric Whiting 2005-09-08 17:41:49 UTC
ec_burst=0 does not help -- didn't seem to make it much worse either.

Box is still not usable until I do a rmmod thermal. 

Comment 13 Eric Whiting 2005-09-12 13:41:24 UTC
Created attachment 49613 [details]
hwinfo -- mistake -- this attachment is for a different bug.
Comment 14 Andreas Kleen 2005-09-13 11:23:25 UTC
Created attachment 49756 [details]
Don't disable interrupts while polling thermal

DMI was difficult because x86-64 doesn't have DMI infrastructure right now (and
it's too late in the release to add it) and PCI backlist is not feasible.

But this patch should fix it. It simply doesn't
disable interrupts while reading thermal.

I think the reason it used a irqsave spinlock here
was that it used to be called from the timer interrupts. But these days ACPI
pushes it to a thread, so sleeping is ok. To be sure I check for in_interrupt
and error out if it happens.

I did a similar patch some time ago for the battery
reading in ec.c, this just extends it.

Eric, I will build you a test kernel with this.
Comment 15 Andreas Kleen 2005-09-13 12:03:51 UTC
Can you please test ftp://nozzle.suse.de/pub/people/ak/test2/kernel-smp* 
and report if it works now?

Comment 16 Eric Whiting 2005-09-13 13:36:43 UTC
Good news. Patched kernel seems to work fine.  In text console when I hold a key
down there is no longer the stop/start pause/hang as the letters appear on the
screen. Key-repeat works smoothly. 

rpm -Uhv failed to setup the nvidia module (I had not downloaded the source). I
changed xorg.conf to use nv instead of nvidia and tested under X a little bit.
The mouse movement and keyboard behavior under X seemed ok. (I'm downloading the
source now so I can run with the nvidia module loaded). 

Comment 17 Andreas Kleen 2005-09-13 13:40:34 UTC
Ok thanks. INcreasing severity to bring it onto the radar.
Comment 18 Andreas Kleen 2005-09-14 07:32:33 UTC
Fixed now for RC3
Comment 19 Eric Whiting 2005-10-11 20:39:52 UTC
Andi -- I just got my CD/DVD and installed 10.0.  It appears that the problem
fix did not make it to the GM release. I had to 'rmmod thermal' to get the box
working properly. I'll leave it up to you to reopen it if you think it should be...