Bug 145847 - Kernel oops on x86_64
Summary: Kernel oops on x86_64
Status: RESOLVED INVALID
Alias: None
Product: SUSE Linux 10.1
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 2
Hardware: 64bit Other
: P5 - None : Critical (vote)
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-26 14:44 UTC by Klaus Singvogel
Modified: 2006-01-30 14:15 UTC (History)
0 users

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Klaus Singvogel 2006-01-26 14:44:21 UTC
We can reproduce the oops. But kernel message isn't always dumped, machine resets/freezes instead.

Host: blackbird.suse.de
Architecture: x86_64

When testing my cups-drivers, I'm getting a kernel oops (watchdog). The running process is: gs (ghostscript).

---------------------------- kernel message ----------------------------------
NMI Watchdog detected LOCKUP on CPU 0
CPU 0
Modules linked in: autofs4 cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table nfsd exportfs lockd nfs_acl sunrpc ipv6 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device af_packet radeon drm edd button battery ac subdo main sdmatch_pcre loop usbhid shpchp snd_intel8x0 snd_ac97_codec pci_hotplug snd_ac97_bus snd_pcm snd_timer snd floppy soundcore ehci_hcd e100 i2c_ali1563 i2c_ali1535 snd_page_alloc ohci_hcd i2c_ali15x3 generic dm_mod usbcore i2c_core ide_cd cdrom mii parport_pc lp parport ext3 jbd sg fan thermal processor sata_uli libata alim15x3 sd_mod scsi_mod ide_disk ide_core
Pid: 4855, comm: gs Not tainted 2.6.16-rc1-git3-4-default #1
RIP: 0010:[<ffffffff80148ae1>] <ffffffff80148ae1>{watchdog+100}
RSP: 0000:ffffffff8038aec0  EFLAGS: 00000046
RAX: 0000000000000000 RBX: ffff8100067a1f58 RCX: 0000000000000017
RDX: 0000000000000000 RSI: 0000000000000017 RDI: 00000000000000fb
RBP: 000000000014577a R08: 0000000000000011 R09: ffffffff802bdd16
R10: 000000000014577a R11: 0000000000001d10 R12: 0000000000000000
R13: ffff8100067a1f58 R14: ffff8100067a1f58 R15: 000000008647dff8
FS:  00002b7aae7fab20(0000) GS:ffffffff8040e000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b7aac6b3780 CR3: 00000000067af000 CR4: 00000000000006e0
Process gs (pid: 4855, threadinfo ffff8100067a0000, task ffff81000d212040)
Stack: ffffffff00000063 0000000000000000 000000000014577a 0000000000000000
       ffffffff8010e2bf ffffffff803de110 000000000000000a 0000000000000246
       ffffffff803240e0 0000000000000000
Call Trace: <IRQ> <ffffffff8010e2bf>{timer_interrupt+487}
       <ffffffff80148b50>{handle_IRQ_event+41} <ffffffff80148c0b>{__do_IRQ+138}
       <ffffffff8010ca73>{do_IRQ+59} <ffffffff8010ac10>{ret_from_intr+0} <EOI>
 
Code: 74 df 65 48 8b 04 25 00 00 00 00 48 c7 00 00 00 00 00 31 c0
console shuts up ...
 <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
in_atomic():1, irqs_disabled():1
 
Call Trace: <NMI> <ffffffff80128488>{profile_task_exit+21}
       <ffffffff8012a073>{do_exit+32} <ffffffff8010bc34>{__die+0}
       <ffffffff8011547d>{nmi_watchdog_tick+161} <ffffffff8010c73e>{default_do_nmi+115}
       <ffffffff80115853>{do_nmi+61} <ffffffff8010b2d3>{nmi+127}
       <ffffffff802bdd16>{_spin_unlock_irq+9} <ffffffff80148ae1>{watchdog+100} <EOE> <IRQ>
       <ffffffff8010e2bf>{timer_interrupt+487} <ffffffff80148b50>{handle_IRQ_event+41}
       <ffffffff80148c0b>{__do_IRQ+138} <ffffffff8010ca73>{do_IRQ+59}
       <ffffffff8010ac10>{ret_from_intr+0} <EOI>
Kernel panic - not syncing: Aiee, killing interrupt handler!
Comment 1 Klaus Singvogel 2006-01-26 14:50:41 UTC
gs: Corrupted page table at address 2add986fd000
PGD 6b1f067 PUD 6b20067 PMD 571b067 PTE 800000ffff019067
Bad pagetable: 000f [1]
CPU 0
Modules linked in: autofs4 cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table nfsd exportfs lockd nfs_acl sunrpc ipv6 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device af_packet radeon drm edd button battery ac subdomain sdmatch_pcre loop usbhid snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer generic floppy snd soundcore i2c_ali1535 e100 snd_page_alloc ide_cd shpchp pci_hotplug mii cdrom i2c_ali15x3 i2c_ali1563 ohci_hcd i2c_core ehci_hcd usbcore dm_mod parport_pc lp parport ext3 jbd sg fan thermal processor sata_uli libata alim15x3 sd_mod scsi_mod ide_disk ide_core
Pid: 4726, comm: gs Not tainted 2.6.16-rc1-git3-4-default #1
RIP: 0033:[<00002add95d62fec>] [<00002add95d62fec>]
RSP: 002b:00007fff15161c88  EFLAGS: 00010216
RAX: 00002add986fba30 RBX: 00000000000001c7 RCX: 00002add986fd736
RDX: 00002add95da3118 RSI: 0000000000000000 RDI: 00002add986fcffc
RBP: 000000000079e960 R08: 00002add986fba30 R09: 00000000ffffffff
R10: 00000000000000b3 R11: 0000000000001d10 R12: 0000000000000003
R13: 000000000000ffff R14: 00000000000000ff R15: 00000000000000ff
FS:  00002add97d45b20(0000) GS:ffffffff8040e000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002add986fd000 CR3: 0000000006b65000 CR4: 00000000000006e0
Process gs (pid: 4726, threadinfo ffff810006b6e000, task ffff81001f22c040)
 
RIP [<00002add95d62fec>] RSP <00007fff15161c88>
 <3>Slab corruption: start=ffff8100172f53b8, len=2048
Redzone: 0x1a4bce/0x1a5bde.
Last user: [<00000000001a5bee>](0x1a5bee)
000: de 4b 1a 00 00 00 00 00 ee 4b 1a 00 00 00 00 00
010: fe 4b 1a 00 00 00 00 00 0e 4c 1a 00 00 00 00 00
020: 1e 4c 1a 00 00 00 00 00 2e 4c 1a 00 00 00 00 00
030: 3e 4c 1a 00 00 00 00 00 4e 4c 1a 00 00 00 00 00
040: 5e 4c 1a 00 00 00 00 00 6e 4c 1a 00 00 00 00 00
050: 7e 4c 1a 00 00 00 00 00 8e 4c 1a 00 00 00 00 00
Prev obj: start=ffffffff172f5248, len=2048
Unable to handle kernel paging request at ffffffff172f5a48 RIP:
<ffffffff801644c8>{print_objinfo+31}
PGD 103027 PUD 0
Oops: 0000 [2]
CPU 0
Modules linked in: autofs4 cpufreq_ondemand cpufreq_userspace cpufreq_powersaveModules linked in: autofs4 cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table nfsd exportfs lockd nfs_acl sunrpc ipv6 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device af_packet radeon drm edd button battery ac subdomain sdmatch_pcre loop usbhid snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer generic floppy snd soundcore i2c_ali1535 e100 snd_page_alloc ide_cd shpchp pci_hotplug mii cdrom i2c_ali15x3 i2c_ali1563 ohci_hcd i2c_core ehci_hcd usbcore dm_mod parport_pc lp parport ext3 jbd sg fan thermal processor sata_uli libata alim15x3 sd_mod scsi_mod ide_disk ide_core
Pid: 4247, comm: nscd Not tainted 2.6.16-rc1-git3-4-default #1
RIP: 0010:[<ffffffff801644c8>] <ffffffff801644c8>{print_objinfo+31}
RSP: 0018:ffff810011129a88  EFLAGS: 00010206
RAX: ffffffff172f5a48 RBX: 000000004f354619 RCX: 0000000000000001
RDX: 0000000000000002 RSI: ffffffff172f5240 RDI: ffff81001ffafc80
RBP: ffff81001ffafc80 R08: 0000000000000005 R09: ffff810011129818
R10: 0000000000000004 R11: 0000000000000000 R12: ffffffff172f5240
R13: 0000000000000002 R14: ffff8100015125f8 R15: 0000000000000006
FS:  0000000040200960(0063) GS:ffffffff8040e000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffff172f5a48 CR3: 000000000d514000 CR4: 00000000000006e0
Process nscd (pid: 4247, threadinfo ffff810011128000, task ffff8100080f8100)
Stack: 000000004f354619 ffffffff172f5240 ffff81001ffafc80 0000000000000800
       ffff8100015125f8 ffffffff801646b9 0000000000000080 ffffffff80263856
       ffff81001ffafc80 ffff8100172f53b0
Call Trace: <ffffffff801646b9>{check_poison_obj+318}
       <ffffffff80263856>{__scm_send+195} <ffffffff80164742>{cache_alloc_debugcheck_after+45}
       <ffffffff801648c1>{kmem_cache_alloc+129} <ffffffff80263856>{__scm_send+195}
       <ffffffff802ba21f>{unix_stream_sendmsg+154} <ffffffff80163e46>{poison_obj+38}
       <ffffffff8025bbfe>{sock_sendmsg+240} <ffffffff802b9e1a>{unix_stream_recvmsg+1081}
       <ffffffff80174755>{do_lookup+99} <ffffffff8013932d>{autoremove_wake_function+0}
       <ffffffff8025c5fb>{sock_aio_read+79} <ffffffff80157f85>{find_extend_vma+22}
       <ffffffff8025c124>{sys_sendmsg+527} <ffffffff80168052>{do_sync_read+199}
       <ffffffff8013932d>{autoremove_wake_function+0} <ffffffff8012434d>{default_wake_function+0}
       <ffffffff80147d30>{audit_syscall_entry+301} <ffffffff8010d6e4>{syscall_trace_enter+190}
       <ffffffff8010a7dc>{tracesys+209}
 
Code: 48 8b 18 48 89 ef e8 58 fd ff ff 48 8b 30 48 c7 c7 88 34 2e
RIP <ffffffff801644c8>{print_objinfo+31} RSP <ffff810011129a88>
CR2: ffffffff172f5a48
 <3>Slab corruption: start=ffff810013f1c188, len=2048
Redzone: 0x0/0x64.
Last user: [<0000000000000001>](0x1)
000: 83 00 00 00 00 00 00 00 20 a4 fd ff 00 00 00 00
010: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
020: 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00
030: 00 00 00 00 ff 7f 00 00 01 00 00 00 00 00 00 00
040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
050: 83 00 00 00 00 00 00 00 59 a7 fd ff 00 00 00 00
Prev obj: start=ffffffff13f1be61, len=2048
Unable to handle kernel paging request at ffffffff13f1c661 RIP:
<ffffffff801644c8>{print_objinfo+31}
PGD 103027 PUD 0
Oops: 0000 [3]
CPU 0
Modules linked in: autofs4 cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table nfsd exportfs lockd nfs_acl sunrpc ipv6 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device af_packet radeon drm edd button battery ac subdomain sdmatch_pcre loop usbhid snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer generic floppy snd soundcore i2c_ali1535 e100 snd_page_alloc ide_cd shpchp pci_hotplug mii cdrom i2c_ali15x3 i2c_ali1563 ohci_hcd i2c_core ehci_hcd usbcore dm_mod parport_pc lp parport ext3 jbd sg fan thermal processor sata_uli libata alim15x3 sd_mod scsi_mod ide_disk ide_core
Pid: 3312, comm: kdm_greet Not tainted 2.6.16-rc1-git3-4-default #1
RIP: 0010:[<ffffffff801644c8>] <ffffffff801644c8>{print_objinfo+31}
RSP: 0018:ffff810017009b18  EFLAGS: 00010206
RAX: ffffffff13f1c661 RBX: 000000004f34df9a RCX: 0000000000000001
RDX: 0000000000000002 RSI: ffffffff13f1be59 RDI: ffff81001ffafc80
RBP: ffff81001ffafc80 R08: 0000000000000005 R09: ffff8100170098a8
R10: 0000000000000004 R11: 0000000000000000 R12: ffffffff13f1be59
R13: 0000000000000002 R14: ffff81000143c298 R15: 0000000000000006
FS:  00002ba4ccedf5f0(0000) GS:ffffffff8040e000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff13f1c661 CR3: 0000000017bb8000 CR4: 00000000000006e0
Process kdm_greet (pid: 3312, threadinfo ffff810017008000, task ffff81001eaa40c0)
Stack: 000000004f34df9a ffffffff13f1be59 ffff81001ffafc80 0000000000000800
       ffff81000143c298 ffffffff801646b9 ffff81001a132410 ffffffff80261563
       ffff81001ffafc80 ffff810013f1c180
Call Trace: <ffffffff801646b9>{check_poison_obj+318}
       <ffffffff80261563>{__alloc_skb+89} <ffffffff80164742>{cache_alloc_debugcheck_after+45}
       <ffffffff80164f42>{__kmalloc+187} <ffffffff80261563>{__alloc_skb+89}
       <ffffffff8025df64>{sock_alloc_send_skb+99} <ffffffff80123713>{__wake_up+56}
       <ffffffff802ba2dd>{unix_stream_sendmsg+344} <ffffffff801090c3>{__switch_to+488}
       <ffffffff8025b727>{do_sock_write+193} <ffffffff8025c59d>{sock_aio_write+79}
       <ffffffff80179c10>{do_select+1025} <ffffffff80167f4e>{do_sync_write+199}
       <ffffffff8017f670>{file_update_time+48} <ffffffff8013932d>{autoremove_wake_function+0}
       <ffffffff88203a49>{:subdomain:subdomain_file_permission+324}
       <ffffffff80168857>{vfs_write+225} <ffffffff80168d75>{sys_write+69}
       <ffffffff8010a66a>{system_call+126}

Code: 48 8b 18 48 89 ef e8 58 fd ff ff 48 8b 30 48 c7 c7 88 34 2e
RIP <ffffffff801644c8>{print_objinfo+31} RSP <ffff810017009b18>
CR2: ffffffff13f1c661
Comment 2 Olaf Kirch 2006-01-30 12:27:09 UTC
This sounds horrible.

But these seem to be two different issues. The nmi watchdog thing sounds
like something forgot to re-enable IRQs.

The other oopses are pretty bad memory corruption.

Did you run a memtest on this machine recently?

Comment 3 Klaus Singvogel 2006-01-30 14:05:38 UTC
Wow!!! I'm really impressed by your knowledge/experience.
I started memtest and had >1200 failures in test#3. Replaced RAM now.
Comment 4 Olaf Kirch 2006-01-30 14:15:25 UTC
Pointing at the usual suspects is easy :-)

I'm closing this as invalid; please reopen if this nmi_watchdog oops
happens again. Thanks!