Bug 144623 - Kerneloopses with 2.6.15-git12-6-smp
Summary: Kerneloopses with 2.6.15-git12-6-smp
Status: RESOLVED FIXED
: 150251 (view as bug list)
Alias: None
Product: SUSE Linux 10.1
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 1
Hardware: 32bit Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Gerd Hoffmann
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-21 17:45 UTC by Markus Koßmann
Modified: 2006-02-28 17:11 UTC (History)
2 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Full boot.msg with crash (28.06 KB, text/plain)
2006-01-21 20:06 UTC, Markus Koßmann
Details
hwinfo (284.62 KB, text/plain)
2006-01-24 23:21 UTC, Markus Koßmann
Details
boot.msg of kotd 2.6.16-rc1-git3-20060120194150-smp (X86_64) on SUSE 10.0 (52.12 KB, text/plain)
2006-01-24 23:51 UTC, Markus Koßmann
Details
Boot.msg 10.1b1 with kotd 2.6.16-rc1-git3-20060124182340-smp (i586) (37.34 KB, text/plain)
2006-01-25 10:11 UTC, Markus Koßmann
Details
Screenshoot of oops during normal boot (1.95 MB, image/jpeg)
2006-02-03 14:43 UTC, Markus Koßmann
Details
Boot.msg 10.1b3, booting in safe mode (29.40 KB, text/plain)
2006-02-03 14:46 UTC, Markus Koßmann
Details
boot.msg with kotd kernel-debug-2.6.16_rc2_git8-20060210184420.i586 (33.10 KB, text/plain)
2006-02-11 14:22 UTC, michel munnix
Details
10.0 - hwinfo (194.44 KB, text/plain)
2006-02-23 15:28 UTC, Terje J. Hanssen
Details
10.1b3 - hwinfo (131.54 KB, application/octet-stream)
2006-02-23 15:29 UTC, Terje J. Hanssen
Details
10.1b3 - /var/log/boot.msg (29.70 KB, application/octet-stream)
2006-02-23 15:34 UTC, Terje J. Hanssen
Details
10.1b3 - /var/log/boot.omsg (31.72 KB, application/octet-stream)
2006-02-23 15:35 UTC, Terje J. Hanssen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Markus Koßmann 2006-01-21 17:45:16 UTC
After Finishing the 10.1 beta1 32bit installation and rebooting the system I got the following oops when X11 was starting. System is a ASUS A8N32 Athlon64 2G Ram with Geforce 7800GT graphics and WinTV 350PVR and Technotrend C2300 DVB-C cards:
 
Jan 21 18:07:57 emil3 klogd: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Jan 21 18:07:57 emil3 klogd:  printing eip:
Jan 21 18:07:57 emil3 klogd: c108cfe1
Jan 21 18:07:57 emil3 klogd: *pde = 00000000
Jan 21 18:07:57 emil3 klogd: Oops: 0000 [#2]
Jan 21 18:07:57 emil3 klogd: SMP
Jan 21 18:07:57 emil3 klogd: Modules linked in: button battery ac snd_mpu401 snd_mpu401_uart snd_rawmidi snd_seq_device ns558 gameport sky2 ohci1394 ieee1394 stradis compat_ioctl32
videodev snd_intel8x0 ehci_hcd snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd ohci_hcd i2c_nforce2 soundcore snd_page_alloc i2c_core usbcore forcedeth generic shpchp pci_hotplug
dm_mod parport_pc lp parport reiserfs fan thermal processor sata_sil24 sg sata_nv libata amd74xx sd_mod scsi_mod ide_disk ide_core
Jan 21 18:07:57 emil3 klogd: CPU:    0
Jan 21 18:07:57 emil3 klogd: EIP:    0060:[<c108cfe1>]    Tainted: G     U VLI
Jan 21 18:07:57 emil3 klogd: EFLAGS: 00010246   (2.6.15-git12-6-smp)
Jan 21 18:07:57 emil3 klogd: EIP is at sysfs_lookup+0x30/0x1a3
Jan 21 18:07:57 emil3 klogd: eax: 00000000   ebx: f7314198   ecx: c2563f68   edx: 00000004
Jan 21 18:07:57 emil3 klogd: esi: 00000000   edi: f73141f8   ebp: f71e6204   esp: c2563e54
Jan 21 18:07:57 emil3 klogd: ds: 007b   es: 007b   ss: 0068
Jan 21 18:07:57 emil3 klogd: Process hald (pid: 2949, threadinfo=c2562000 task=dfe92a90)
Jan 21 18:07:57 emil3 klogd: Stack: <0>f71e622c 00000001 c11fc140 f7314198 f71ce608 f71ce67c c1064cde c2563ec4
Jan 21 18:07:57 emil3 klogd:        c2563eb8 c2563f68 dfd72f40 baf3f719 f71ce608 c265101e c2563f68 c1066832
Jan 21 18:07:57 emil3 klogd:        c2651024 00000000 00000000 c11f88a8 000280d2 c11f88a8 00000000 c2289280
Jan 21 18:07:57 emil3 klogd: Call Trace:
Jan 21 18:07:57 emil3 klogd:  [<c1064cde>] do_lookup+0xa3/0x135
Jan 21 18:07:57 emil3 klogd:  [<c1066832>] __link_path_walk+0x7fd/0xc41
Jan 21 18:07:57 emil3 klogd:  [<c1066cbf>] link_path_walk+0x49/0xbd
Jan 21 18:07:57 emil3 klogd:  [<c1066fba>] path_lookup+0x145/0x17a
Jan 21 18:07:57 emil3 klogd:  [<c106760c>] __user_walk+0x21/0x31
Jan 21 18:07:57 emil3 klogd:  [<c10618a0>] sys_readlink+0x20/0x8c
Jan 21 18:07:57 emil3 klogd:  [<c1003c2b>] sysenter_past_esp+0x54/0x79
Jan 21 18:07:57 emil3 klogd: Code: d3 83 ec 08 8b 42 18 8b 40 54 89 04 24 8b 68 0c e9 63 01 00 00 f6 45 18 2c 0f 84 56 01 00 00 89 e8 e8 cc ec ff ff 8b 7b 24 89 c6 <ac> ae 75 08 84
c0 75 f8 31 c0 eb 04 19 c0 0c 01 85 c0 0f 85 32
Jan 21 18:07:57 emil3 klogd:  <6>BIOS EDD facility v0.16 2004-Jun-25, 3 devices found
Jan 21 18:07:57 emil3 klogd: Unable to handle kernel NULL pointer dereference at virtual address 00000030
Jan 21 18:07:57 emil3 klogd:  printing eip:
Jan 21 18:07:57 emil3 klogd: f93e7488
Jan 21 18:07:57 emil3 klogd: *pde = 7f4b3067
Jan 21 18:07:57 emil3 klogd: Oops: 0000 [#3]
Jan 21 18:07:57 emil3 klogd: SMP
Jan 21 18:07:57 emil3 klogd: Modules linked in: edd button battery ac snd_mpu401 snd_mpu401_uart snd_rawmidi snd_seq_device ns558 gameport sky2 ohci1394 ieee1394 stradis compat_ioct
l32 videodev snd_intel8x0 ehci_hcd snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd ohci_hcd i2c_nforce2 soundcore snd_page_alloc i2c_core usbcore forcedeth generic shpchp pci_hotp
lug dm_mod parport_pc lp parport reiserfs fan thermal processor sata_sil24 sg sata_nv libata amd74xx sd_mod scsi_mod ide_disk ide_core
Jan 21 18:07:57 emil3 klogd: CPU:    0
Jan 21 18:07:57 emil3 klogd: EIP:    0060:[<f93e7488>]    Tainted: G     U VLI
Jan 21 18:07:57 emil3 klogd: EFLAGS: 00013246   (2.6.15-git12-6-smp)
Jan 21 18:07:57 emil3 klogd: EIP is at video_open+0xb4/0x16a [videodev]
Jan 21 18:07:57 emil3 klogd: eax: 00000000   ebx: f93e8c20   ecx: f93fd520   edx: f6cbe000
Jan 21 18:07:57 emil3 klogd: esi: c1251e40   edi: f777242c   ebp: 00000000   esp: f6cbfef8
Jan 21 18:07:57 emil3 klogd: ds: 007b   es: 007b   ss: 0068
Jan 21 18:07:57 emil3 klogd: Process X (pid: 3536, threadinfo=f6cbe000 task=dfee2560)
Jan 21 18:07:57 emil3 klogd: Stack: <0>00000000 f786e8c0 00000000 f777242c c1061169 c1251e40 00000000 c1251e40
Jan 21 18:07:57 emil3 klogd:        f777242c 00000000 c1061043 c1058594 dfd72140 f70e042c c1251e40 f6cbff54
Jan 21 18:07:57 emil3 ifstatus:     eth0      device: nVidia Corporation CK804 Ethernet Controller (rev a3)
Jan 21 18:07:57 emil3 klogd:        b7ea3ff4 00000008 c10586dc c1251e40 00000000 00008002 c1058712 f70e042c
Jan 21 18:07:57 emil3 ifstatus:     eth0      configuration: eth-bus-pci-0000:00:13.0
Jan 21 18:07:58 emil3 klogd: Call Trace:
Jan 21 18:07:58 emil3 klogd:  [<c1061169>] chrdev_open+0x126/0x163
Jan 21 18:07:58 emil3 klogd:  [<c1251e40>] trap_init+0x135/0x1c8
Jan 21 18:07:58 emil3 klogd:  [<c1251e40>] trap_init+0x135/0x1c8
Jan 21 18:07:58 emil3 klogd:  [<c1061043>] chrdev_open+0x0/0x163
Jan 21 18:07:58 emil3 klogd:  [<c1058594>] __dentry_open+0xc7/0x1ab
Jan 21 18:07:58 emil3 klogd:  [<c1251e40>] trap_init+0x135/0x1c8
Jan 21 18:07:58 emil3 klogd:  [<c10586dc>] nameidata_to_filp+0x19/0x28
Jan 21 18:07:58 emil3 klogd:  [<c1251e40>] trap_init+0x135/0x1c8
Jan 21 18:07:58 emil3 klogd:  [<c1058712>] filp_open+0x27/0x2d
Jan 21 18:07:58 emil3 klogd:  [<c1251e40>] trap_init+0x135/0x1c8
Jan 21 18:07:58 emil3 klogd:  [<c1059499>] do_sys_open+0x33/0xa3
Jan 21 18:07:58 emil3 klogd:  [<c1003c2b>] sysenter_past_esp+0x54/0x79
Jan 21 18:07:58 emil3 klogd: Code: 85 d2 74 1b b8 00 e0 ff ff 21 e0 83 3a 02 8b 40 10 74 11 c1 e0 07 8d 84 10 00 01 00 00 ff 00 8b 41 34 eb 02 31 c0 89 46 10 31 ed <8b> 48 30 85 c9
74 6b 89 f2 89 f8 ff d1 85 c0 89 c5 74 5f 8b 46
Comment 1 Markus Koßmann 2006-01-21 20:06:59 UTC
Created attachment 64402 [details]
Full boot.msg with crash 

After another reboot the system locked hard even before X11 was about to start.  This is the saved boot.msg
Comment 3 Greg Kroah-Hartman 2006-01-24 17:05:00 UTC
Can you attach the output of 'hwinfo'?

It looks like you have some bad driver issues :(

Also, any chance you can test a kernel-of-the-day on this box?
Comment 4 Markus Koßmann 2006-01-24 23:21:06 UTC
Created attachment 64866 [details]
hwinfo 

This is hwinfo, run from a SUSE 10.0 x86_64 installation with kernel 2.6.15-20060109195850-smp.
on that system.
Comment 5 Markus Koßmann 2006-01-24 23:51:28 UTC
Created attachment 64867 [details]
boot.msg of kotd 2.6.16-rc1-git3-20060120194150-smp (X86_64) on SUSE 10.0

For test purposes I did allready compile and run  2.6.16-rc1-git3-20060120194150-smp on my SUSE-10.0 installation on that system . I did install the kernel-source.rpm and configured it using arch/x86_64/defconfig.smp.
That kernel is not exactly comparable with the problematic kernel from 10.1beta:
Its x86_64 arch not i586
It was compiled with gcc-4.0.2 (SuSE10.0) not with gcc-4.1 
SuSE_10.0 is started, not 10.1 
And the stradis driver , which seems to be problematic ("EIP is at stradis_probe+0x532/0x97f [stradis]" in the full boot.log oops) is not loaded.

Tomorrow^wToday I will try to find time to install the current KOTD on the currently broken 10.1 installation.
Comment 6 Markus Koßmann 2006-01-25 10:11:06 UTC
Created attachment 64884 [details]
Boot.msg 10.1b1 with kotd 2.6.16-rc1-git3-20060124182340-smp (i586)

Seems that the kotd didn't fix the problem yet. It still oopses in stradis_probe.
Comment 7 Markus Koßmann 2006-01-25 18:20:38 UTC
I think, I have now an idea what happens: 
The stradis driver was recently changed to use the 2.6 pci api ( see http://lkml.org/lkml/2005/12/31/144) . 
Both the Stradis Mpeg Output card and my Octal/Technotrend DVB-C card make use of a SAA7146 device. Now the stradis driver sees the SAA7146 on the Technotrend card and crashes when probing it. 
Comment 8 Markus Koßmann 2006-01-26 10:19:44 UTC
It's definitely the stradis module, which causes the problem. I've blacklisted it in /etc/modprobe.conf.local and now the oops doesn't show up any more.
So the question is now, why and where this module is configured to load on 10.1b1 without stradis hardware. 

 
Comment 9 Markus Koßmann 2006-01-27 14:26:45 UTC
This problem also bugged me in when installing beta2. It caused a oops when restarting the system after installation of CD1. Workaround: blacklisting stradis.ko. 
Comment 10 Greg Kroah-Hartman 2006-01-27 22:01:31 UTC
It's getting loaded because you have the hardware for that driver in 
your system.

As for why it is crashing, I do not know.  For beta3 there should
be some more debugging information in the crash to help us track
this down.  Can you reopen this bug then with the new oops message?
Comment 11 Markus Koßmann 2006-01-28 06:15:01 UTC
No, I have no stradis hardware, I have that Technotrend DVB-C device. 

Sure the Technotrend device makes also use of the SAA7146, but note there are two variants of the 7146 driver : saa7146.ko and saa7146_vv.ko. The dvb_ttpci  driver for the Techotrend device makes use of the  saa7146_vv.ko module, the stradis driver seems to use saa7146.ko. The differences between these drivers may be the reason for the crash. 

I think the stradis driver lacks proper hardware recognition,which can make a difference between the stradis hardware and the Technotrend device. But I must admit, that such a routine is a special case, which is normally not needed for PCI devices with their unique IDs.  
Comment 12 Markus Koßmann 2006-02-03 14:39:28 UTC
As expected, there was no change with beta3.
Unfortunately the first reboot during installation caused two kernel oops with hard lockup. No logfiles were written. So i can give you only picture of the screen with the second oops ( no scroollback possible). Then I rebooted into safe mode. This time a log was written.    
Comment 13 Markus Koßmann 2006-02-03 14:43:20 UTC
Created attachment 66360 [details]
Screenshoot of oops during normal boot
Comment 14 Markus Koßmann 2006-02-03 14:46:39 UTC
Created attachment 66361 [details]
Boot.msg 10.1b3, booting in safe mode
Comment 15 michel munnix 2006-02-07 20:17:24 UTC
Same problem here : oops with beta3
kernel vmlinuz-2.6.16-rc1-git3-7-default
....
EIP is at saa7146_irq+0x1f/0x515[stradis]
....
and no possibility to do a page up, no logs
I could write down some lines if of interrest
I have no stradis, I have a Hauppauge wintv dvb-s card
01:09.0 Multimedia controller: Philips Semiconductors SAA7146 (rev 01)
Comment 16 Markus Koßmann 2006-02-08 08:08:05 UTC
AFAIK the Hauppauge Nexus-S is identical to the Technotrend DVB-S card, like my Technotrend DVB-C is the same as the  Hauppauge Nexus-CA. Both are "full featured" DVB cards with hardware mpeg2 decoder. 
Comment 17 michel munnix 2006-02-11 14:22:27 UTC
Created attachment 67766 [details]
boot.msg with kotd kernel-debug-2.6.16_rc2_git8-20060210184420.i586

with the latest kernel, the stradis module seems no longer to interfere with my dvb-ttpci card, no kernel crash.
Comment 18 Markus Koßmann 2006-02-11 18:43:08 UTC
Well,I'am not sure that the latest kotd solves the problem. Just tested it: Removed the blacklist entry for stradis in /etc/modprobe.conf.local in my RC3 installation with original kernel and rebooted 2 times: No stradis module was loaded. And I also do run the kotds on a SuSE-10.0 installation and the problems with the stradis driver never showed up. So only the conditions in the early installation phase of 10.1 seem to trigger the problem.  
Comment 19 michel munnix 2006-02-12 11:25:26 UTC
I modprobed the stradis driver on my 10.0 installation with online-updated kernel, then rmmod it. The system became inoperative not immediately but about 30 sec later, having switched to graphical mode and moved the mouse.
To test your objection, I made a fresh install of 10.1Beta3 and on reboot after CD1 started on my 10.0 installation and installed the kotd in that 10.1 partition, then rebooted and continued the installation with CDs2-5 : no problem and there "are" traces in boot.log that the stradis module is probed at boot time. vdr seems to work as expected.
I think the problem is at least improving
Comment 20 Chris L Mason 2006-02-16 16:17:10 UTC
Gerd, can you take a quick look at this?  
Comment 21 Gerd Hoffmann 2006-02-17 09:43:50 UTC
The device probing it does looks a bit scary, it grabs every saa7146 device instead of checking PCI Subsystem IDs.  That can't work and should be fixed.  But I have no idea what the PCI subsystem ID's are.  Google doesn't find me anything, and /usr/share/pci.ids hasn't it either :-/

As far I know that piece of hardware is quite old, was expencive and probably is very rare.  So simply disabling the driver in the kernel config or maybe better only blacklisting it by default is the easiest way to deal with it.

Blacklisting modules is done by module-init-tools these days, right?
Marian, it's stradis.ko
Comment 22 Gerd Hoffmann 2006-02-21 12:49:21 UTC
*** Bug 150251 has been marked as a duplicate of this bug. ***
Comment 23 Christian Zoz 2006-02-23 14:57:14 UTC
Added stradis to blacklist.
Comment 24 Terje J. Hanssen 2006-02-23 15:18:37 UTC
As "my" Bug 150251, "Kernel panic - not syncing: Fatal exception in interupt" on 10.1 beta2/beta3 has been marked as a duplicate of this bug, I'll continue here.
To repeat, on my same hardware and disk I've installed and run SuSE Linux 9.0-9.3 Professional and jds2/SLES8 the last years. Currently I'm running SuSE 10.0 and jds3/SLES9 (beta) beside Win2k, in a multiboot configuration without any hardware or driver problems. 
This kernel panic problem started recently when I tried to install 10.1 beta2 and continued with beta3 installed on the same hardware. 

As the bug looks to be related to the mentioned "Stradis" driver in 10.1 beta, I searched the web (Stradis home page http://www.stradis.com/ ) that told me that that Stradis has long been a standard-definition MPEG and MEG-2 video decoder of choice for applications requiring true broadcast quality.

This brought me to think that the Linux Stradis driver problem in my case may be related to my Pinnacle DV500 DVD video capture and editing card (for Windows). Even that this 32 bit busmastering PCI card (+ a Breakout Box and IEEE 1394 DV cable) now is 5 years old, it was and is still quite capable capturing both analog S-Video and DV beside its MEG-2 import. Some more description about the DV500 DVD card and software product is available on e.g these links:
http://www.videoguys.com/dv500DVD.html
http://www.tomshardware.com/2001/08/01/building_a_digital_video_capture_system_/page13.html
http://www.pinnaclesys.com/WebVideo/dv500dvd/English(US)/doc/DV500_Datasheet.pdf

In case it can be of some help for debugging this Stradis driver problem by comparing hwinfo on 10.0 (that works ok) and 10.1b3 (once I was able to do a tty login in failsafe mode I think, else I only get a black login screen), I'll attach both here. 
Beside I also attach 10.1b3 /var/log/boot.msg and /var/log/boot.omsg

Terje J. Hanssen
Comment 25 Terje J. Hanssen 2006-02-23 15:26:42 UTC
I wish my info comment above and my attachments belown to be evaluated, as I think it looks too easy to just blacklist stradis that has worked for all pre-Suse distros before 10.1?

Terje J. Hanssen
Comment 26 Terje J. Hanssen 2006-02-23 15:28:44 UTC
Created attachment 69987 [details]
10.0 - hwinfo
Comment 27 Terje J. Hanssen 2006-02-23 15:29:22 UTC
Created attachment 69988 [details]
10.1b3 - hwinfo
Comment 28 Terje J. Hanssen 2006-02-23 15:34:48 UTC
Created attachment 69990 [details]
10.1b3 - /var/log/boot.msg
Comment 29 Terje J. Hanssen 2006-02-23 15:35:32 UTC
Created attachment 69991 [details]
10.1b3 - /var/log/boot.omsg
Comment 30 Gerd Hoffmann 2006-02-24 10:20:23 UTC
The stradis driver just didn't got autoloaded via hotplug on older distro versions (none of the v4l drivers was), thats why the bug didn't trigger.  If you manually load the driver using "modprobe stradis" on the 10.0 installation it likely blows up too.

Problem is that the stradis driver actually tries to handle every card with an saa7146 chip on it, no matter whenever it actually is a stradis card or something else using the same chip (like your pinnacle card).  And because of that behavior we simply can't autoload it, thus the blacklist entry.

Owners of a stradis card still can load it manually and it probably even works. We can't verify that due to lack of hardware though.
Comment 31 michel munnix 2006-02-28 17:11:29 UTC
Although this does'nt pose any problem in my installation, I'd like to signal that stradis is not blacklisted as of beta5. So I do not reopen the bug.
Still getting :
<4>videodev: "SAA7146A" has no release callback. Please fix your driver for proper sysfs support, see http://lwn.net/Articles/36850/                            <4>stradis0: config = 00 03 13 c2 26 0f ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  
in /var/log/boot.msg