Bug 119207

Summary: Kernel crash when pcmcia is stopped/ejected
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Jason Kasper <vR>
Component: KernelAssignee: Pavel Machek <pavel>
Status: RESOLVED INVALID QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None    
Version: Preview 1   
Target Milestone: ---   
Hardware: Other   
OS: All   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: /var/log/messages

Description Jason Kasper 2005-09-28 12:22:34 UTC
This is coming out of another bug, bug #116991.

When powersave goes through its suspend process, it ejects PCMCIA cards.  I have 
captured a /var/log/messages that contains a kernel crash in the pcmcia/yenta 
area that is triggered by this PCMCIA eject.  I will attach the /var/log/
messages to this next.

Please let me know what else I can provide for help in getting this fixed....
Comment 1 Jason Kasper 2005-09-28 12:25:48 UTC
Created attachment 51049 [details]
/var/log/messages

/var/log/messages that contains the kernel crash that is triggered by the
PCMCIA eject.  The crash starts at Sep 27 17:36:41.

My hardware is:

IBM Thinkpad A31
Symbol V2.90-57 PCMCIA wireless network card (Symbol Wireless 4111 Spectrum
card)

Please let me know what else to provide.

Thanks!
Comment 2 Hubert Mantel 2005-09-28 12:37:29 UTC
Pavel, I'm not sure if you're the right victim for this bug; but I don't know
who deals with PCMCIA issues nowadays...
Comment 3 Pavel Machek 2005-09-29 11:34:51 UTC
Well, unfortunately noone cares about PCMCIA these days, and big problem that it
is *noone*, not *noone inside suse* :-(.

OTOH I do not see any oops. I see a mishandled interrupt with a backtrace.

Can you reproduce it manually, without suspending? Does it also happen with
vanilla kernel?
Comment 4 Jason Kasper 2005-10-04 11:17:52 UTC
Hi Pavel,

Yes, I can reproduce this by simply running "pccardctl eject" while dhcpcd is 
running for the wireless card.  Backtrace shown below:

Oct  4 07:09:03 gandalf sudo:   gideon : TTY=pts/5 ; PWD=/home/gideon ; 
USER=root ; COMMAND=/sbin/pccardctl eject
Oct  4 07:09:06 gandalf kernel: irq 4: nobody cared (try booting with the 
"irqpoll" option)
Oct  4 07:09:06 gandalf kernel:  [<c013c17c>] __report_bad_irq+0x1c/0x70
Oct  4 07:09:06 gandalf kernel:  [<c013c26b>] note_interrupt+0x6b/0xd0
Oct  4 07:09:06 gandalf kernel:  [<c013bd0c>] __do_IRQ+0xbc/0xd0
Oct  4 07:09:06 gandalf kernel:  [<c01055e8>] do_IRQ+0x38/0x60
Oct  4 07:09:06 gandalf kernel:  [<c0103dea>] common_interrupt+0x1a/0x20
Oct  4 07:09:06 gandalf kernel:  [<c013bc05>] handle_IRQ_event+0x15/0x60
Oct  4 07:09:06 gandalf kernel:  [<c013bcd5>] __do_IRQ+0x85/0xd0
Oct  4 07:09:06 gandalf kernel:  [<c01055e8>] do_IRQ+0x38/0x60
Oct  4 07:09:06 gandalf kernel:  [<c0103dea>] common_interrupt+0x1a/0x20
Oct  4 07:09:06 gandalf kernel:  [<c013bc05>] handle_IRQ_event+0x15/0x60
Oct  4 07:09:06 gandalf kernel:  [<c013bcd5>] __do_IRQ+0x85/0xd0
Oct  4 07:09:06 gandalf kernel:  [<c01055e8>] do_IRQ+0x38/0x60
Oct  4 07:09:06 gandalf kernel:  [<c0103dea>] common_interrupt+0x1a/0x20
Oct  4 07:09:06 gandalf kernel:  [<c013007b>] posix_cpu_timer_schedule+0xcb/
0x480
Oct  4 07:09:06 gandalf kernel:  [<c0120b91>] __do_softirq+0x31/0xa0
Oct  4 07:09:06 gandalf kernel:  [<c0120c26>] do_softirq+0x26/0x30
Oct  4 07:09:06 gandalf kernel:  [<c01055ed>] do_IRQ+0x3d/0x60
Oct  4 07:09:06 gandalf [powersave]: Info (increaseSpeed:120) current speed: 
1200000KHz     new speed: 1900000KHz
Oct  4 07:09:06 gandalf kernel:  [<c0103dea>] common_interrupt+0x1a/0x20
Oct  4 07:09:06 gandalf [powersave]: Info (setSpeed:90) Speed set to: 1900000KHz 
 in file /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
Oct  4 07:09:06 gandalf kernel:  [<f9334331>] yenta_set_power+0x31/0x60 
[yenta_socket]
Oct  4 07:09:06 gandalf [powersave]: Info (adjustSpeed:202) Speed increased to 
(1900000) MHz
Oct  4 07:09:06 gandalf dhcpcd[9517]: terminating on signal 15
Oct  4 07:09:06 gandalf kernel:  [<f9334377>] yenta_set_socket+0x17/0x1e0 
[yenta_socket]
Oct  4 07:09:06 gandalf dhcpcd[9517]: dhcpStop: ioctl SIOCSIFADDR: No such 
device
Oct  4 07:09:06 gandalf kernel:  [<f9334981>] yenta_clear_maps+0x61/0xa0 
[yenta_socket]
Oct  4 07:09:06 gandalf dhcpcd[9517]: dhcpStop: ioctl SIOCSIFFLAGS: No such 
device
Oct  4 07:09:06 gandalf kernel:  [<f9334a5f>] yenta_sock_init+0x6f/0xa0 
[yenta_socket]
Oct  4 07:09:06 gandalf kernel:  [<f9317586>] shutdown_socket+0x26/0xa0 
[pcmcia_core]
Oct  4 07:09:06 gandalf kernel:  [<f9317b68>] socket_remove+0x8/0x40 
[pcmcia_core]
Oct  4 07:09:06 gandalf kernel:  [<f9317fbc>] pcmcia_eject_card+0x4c/0x50 
[pcmcia_core]
Oct  4 07:09:06 gandalf kernel:  [<f931a9cb>] pccard_store_eject+0x1b/0x30 
[pcmcia_core]
Oct  4 07:09:06 gandalf kernel:  [<f931a9b0>] pccard_store_eject+0x0/0x30 
[pcmcia_core]
Oct  4 07:09:06 gandalf kernel:  [<c025630b>] class_device_attr_store+0x1b/0x20
Oct  4 07:09:06 gandalf kernel:  [<c01914ed>] flush_write_buffer+0x1d/0x30
Oct  4 07:09:06 gandalf kernel:  [<c0191530>] sysfs_write_file+0x30/0x50
Oct  4 07:09:06 gandalf kernel:  [<c0191500>] sysfs_write_file+0x0/0x50
Oct  4 07:09:06 gandalf kernel:  [<c015935d>] vfs_write+0x8d/0x170
Oct  4 07:09:06 gandalf modify_resolvconf: restored /etc/resolv.conf.saved.by.
dhcpcd.eth1 to /etc/resolv.conf
Oct  4 07:09:06 gandalf kernel:  [<c01594ec>] sys_write+0x3c/0x70
Oct  4 07:09:06 gandalf kernel:  [<c0102d1b>] sysenter_past_esp+0x54/0x79
Oct  4 07:09:06 gandalf kernel: handlers:
Oct  4 07:09:06 gandalf kernel: [<f92ac0d0>] (usb_hcd_irq+0x0/0x60 [usbcore])
Oct  4 07:09:06 gandalf kernel: [<f9334830>] (yenta_interrupt+0x0/0xc0 
[yenta_socket])
Oct  4 07:09:06 gandalf kernel: [<f8ee7410>] (radeon_driver_irq_handler+0x0/0xa0 
[radeon])
Oct  4 07:09:06 gandalf kernel: Disabling IRQ #4

As far as from a vanilla kernel....  I've not compiled a vanilla kernel in quite 
a while.  =:)  And I don't have room on my laptop to build it right now, 
unfortunately.  Do you have a vanilla kernel RPM that I can install?
Comment 5 Pavel Machek 2005-10-10 12:51:29 UTC
Not me, but I'm sure it is somewhere inside suse.
Comment 6 Jason Kasper 2005-10-10 16:32:53 UTC
Well, um.  I don't think there's any info that I need to provide, so I'm taking 
the "ASSIGNED"/NEEDINFO off of me, hopefully.
Comment 7 Pavel Machek 2005-10-10 19:09:12 UTC
I still would like you to test with vanilla kernel. Compiling it yourself does
not take *that* much space...
Comment 8 Jason Kasper 2005-10-13 20:52:21 UTC
Hmmmmm.  I've compiled a vanilla linux-2.6.13.4 kernel and could not reproduce 
it.

Then I tried with the default linux-2.6.13-15 kernel from 10.0-final and 
couldn't reproduce it either.  If you'd like to close this, that's fine with me 
and I'll re-open it if I see it again.

Again, wishing there was a WTF resolution.  =:/
Comment 9 Forgotten User ZhJd0F0L3x 2005-10-14 09:53:56 UTC
ok. Let's assume it was a passing cosmic particle that crashed your machine ;-)
Comment 10 Jason Kasper 2005-10-14 12:43:51 UTC
Gah.  I spoke too soon.  I just saw that this happened again this morning.  This 
time, it was again as I was suspending my laptop in the morning to go to work.  
This is with the default 10.0-final kernel.  An excerpt from /var/log/messages 
again:

Oct 14 07:25:59 linux kernel: irq 4: nobody cared (try booting with the 
"irqpoll" option)
Oct 14 07:25:59 linux kernel:  [<c013c18c>] __report_bad_irq+0x1c/0x70
Oct 14 07:25:59 linux kernel:  [<c013c27b>] note_interrupt+0x6b/0xd0
Oct 14 07:25:59 linux kernel:  [<c013bd1c>] __do_IRQ+0xbc/0xd0
Oct 14 07:25:59 linux kernel:  [<c01055e8>] do_IRQ+0x38/0x60
Oct 14 07:25:59 linux kernel:  [<c0103dea>] common_interrupt+0x1a/0x20
Oct 14 07:25:59 linux kernel:  [<c013007b>] posix_cpu_timer_schedule+0xbb/0x480
Oct 14 07:25:59 linux kernel:  [<c0120ba1>] __do_softirq+0x31/0xa0
Oct 14 07:25:59 linux kernel:  [<c0120c36>] do_softirq+0x26/0x30
Oct 14 07:25:59 linux kernel:  [<c01055ed>] do_IRQ+0x3d/0x60
Oct 14 07:25:59 linux kernel:  [<c0103dea>] common_interrupt+0x1a/0x20
Oct 14 07:25:59 linux kernel:  [<f8f1d331>] yenta_set_power+0x31/0x60 
[yenta_socket]
Oct 14 07:25:59 linux kernel:  [<f8f1d377>] yenta_set_socket+0x17/0x1e0 
[yenta_socket]
Oct 14 07:25:59 linux kernel:  [<f8f1d981>] yenta_clear_maps+0x61/0xa0 
[yenta_socket]
Oct 14 07:25:59 linux kernel:  [<f8f1da5f>] yenta_sock_init+0x6f/0xa0 
[yenta_socket]
Oct 14 07:25:59 linux kernel:  [<f9031586>] shutdown_socket+0x26/0xa0 
[pcmcia_core]
Oct 14 07:26:00 linux kernel:  [<f9031b68>] socket_remove+0x8/0x40 [pcmcia_core]
Oct 14 07:26:00 linux kernel:  [<f9031fbc>] pcmcia_eject_card+0x4c/0x50 
[pcmcia_core]
Oct 14 07:26:00 linux kernel:  [<f90349cb>] pccard_store_eject+0x1b/0x30 
[pcmcia_core]
Oct 14 07:26:00 linux kernel:  [<f90349b0>] pccard_store_eject+0x0/0x30 
[pcmcia_core]
Oct 14 07:26:00 linux kernel:  [<c025643b>] class_device_attr_store+0x1b/0x20
Oct 14 07:26:00 linux kernel:  [<c01914fd>] flush_write_buffer+0x1d/0x30
Oct 14 07:26:00 linux kernel:  [<c0191540>] sysfs_write_file+0x30/0x50
Oct 14 07:26:00 linux kernel:  [<c0191510>] sysfs_write_file+0x0/0x50
Oct 14 07:26:00 linux kernel:  [<c015936d>] vfs_write+0x8d/0x170
Oct 14 07:26:00 linux kernel:  [<c01594fc>] sys_write+0x3c/0x70
Oct 14 07:26:00 linux kernel:  [<c0102d1b>] sysenter_past_esp+0x54/0x79
Oct 14 07:26:01 linux kernel: handlers:
Oct 14 07:26:02 linux kernel: [<f8f1d830>] (yenta_interrupt+0x0/0xc0 
[yenta_socket])
Oct 14 07:26:02 linux kernel: [<f8c35410>] (radeon_driver_irq_handler+0x0/0xa0 
[radeon])
Oct 14 07:26:02 linux kernel: [<f8f000d0>] (usb_hcd_irq+0x0/0x60 [usbcore])
Oct 14 07:26:02 linux kernel: Disabling IRQ #4


Hm.  I just noticed that immediately after my laptop resumed, my usb mouse went 
completely wonky.  Seeing the "Disabling IRQ #4" now makes perfect sense, since 
this is what /proc/interrupts looks like:

           CPU0
  0:     173346          XT-PIC  timer
  1:       2115          XT-PIC  i8042
  2:          0          XT-PIC  cascade
  4:      53977          XT-PIC  uhci_hcd:usb1, yenta, radeon@pci:0000:01:00.0
  5:      18451          XT-PIC  Intel 82801CA-ICH3, yenta
  7:          0          XT-PIC  uhci_hcd:usb3
  8:       8555          XT-PIC  rtc
  9:          0          XT-PIC  uhci_hcd:usb2
 11:      12616          XT-PIC  0.0
 12:        110          XT-PIC  i8042
 14:      37246          XT-PIC  ide0
 15:       3682          XT-PIC  ide1
NMI:          0
LOC:          0
ERR:          0
MIS:          0

Anyway, I'm reopening this since it did happen again, and I didn't see any 
cosmic particles this time.  =;)  What's the next step?  Do I need to run the 
vanilla kernel I built and see if it happens again?  If so, are there any 
problems with running a generic kernel with Suse 10.0-final?  Will everything 
work?  (hal, etc.?)
Comment 11 Pavel Machek 2005-10-14 20:32:22 UTC
You should find a reproducible way to reproduce this one...

yenta_set_power triggers an interrupt noone knows how to handle. That triggers disabling of irq4. 
Comment 12 Pavel Machek 2005-10-22 18:30:12 UTC
Is there reliable way to reproduce it?
Comment 13 Jason Kasper 2005-10-24 12:18:32 UTC
grep -E "Disabling|yenta_set_power" /var/log/messages

Oct 14 07:25:59 linux kernel:  [<f8f1d331>] yenta_set_power+0x31/0x60 [yenta_socket]
Oct 14 07:26:02 linux kernel: Disabling IRQ #4
Oct 19 13:45:02 linux kernel:  [<f8f2c331>] yenta_set_power+0x31/0x60 [yenta_socket]
Oct 19 13:45:03 linux kernel: Disabling IRQ #4
Oct 20 17:45:24 linux kernel:  [<f8fed331>] yenta_set_power+0x31/0x60 [yenta_socket]
Oct 20 17:45:26 linux kernel: Disabling IRQ #4

Well, yes there must be, since it's happened 3 times in the last ten days on my laptop.  I've not had time to try to track it down yet.  Also, I have had this in /etc/pcmcia/config.opts for a long time now...

exclude irq 4

... and apparently it is being ignored, since irq 4 is being used for PCMCIA?

  4:   20329964          XT-PIC  yenta, radeon@pci:0000:01:00.0, uhci_hcd:usb1
Comment 14 Pavel Machek 2005-11-22 23:43:42 UTC
In needinfo for way too long...