Bugzilla – Bug 145847
Kernel oops on x86_64
Last modified: 2006-01-30 14:15:25 UTC
We can reproduce the oops. But kernel message isn't always dumped, machine resets/freezes instead. Host: blackbird.suse.de Architecture: x86_64 When testing my cups-drivers, I'm getting a kernel oops (watchdog). The running process is: gs (ghostscript). ---------------------------- kernel message ---------------------------------- NMI Watchdog detected LOCKUP on CPU 0 CPU 0 Modules linked in: autofs4 cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table nfsd exportfs lockd nfs_acl sunrpc ipv6 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device af_packet radeon drm edd button battery ac subdo main sdmatch_pcre loop usbhid shpchp snd_intel8x0 snd_ac97_codec pci_hotplug snd_ac97_bus snd_pcm snd_timer snd floppy soundcore ehci_hcd e100 i2c_ali1563 i2c_ali1535 snd_page_alloc ohci_hcd i2c_ali15x3 generic dm_mod usbcore i2c_core ide_cd cdrom mii parport_pc lp parport ext3 jbd sg fan thermal processor sata_uli libata alim15x3 sd_mod scsi_mod ide_disk ide_core Pid: 4855, comm: gs Not tainted 2.6.16-rc1-git3-4-default #1 RIP: 0010:[<ffffffff80148ae1>] <ffffffff80148ae1>{watchdog+100} RSP: 0000:ffffffff8038aec0 EFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff8100067a1f58 RCX: 0000000000000017 RDX: 0000000000000000 RSI: 0000000000000017 RDI: 00000000000000fb RBP: 000000000014577a R08: 0000000000000011 R09: ffffffff802bdd16 R10: 000000000014577a R11: 0000000000001d10 R12: 0000000000000000 R13: ffff8100067a1f58 R14: ffff8100067a1f58 R15: 000000008647dff8 FS: 00002b7aae7fab20(0000) GS:ffffffff8040e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b7aac6b3780 CR3: 00000000067af000 CR4: 00000000000006e0 Process gs (pid: 4855, threadinfo ffff8100067a0000, task ffff81000d212040) Stack: ffffffff00000063 0000000000000000 000000000014577a 0000000000000000 ffffffff8010e2bf ffffffff803de110 000000000000000a 0000000000000246 ffffffff803240e0 0000000000000000 Call Trace: <IRQ> <ffffffff8010e2bf>{timer_interrupt+487} <ffffffff80148b50>{handle_IRQ_event+41} <ffffffff80148c0b>{__do_IRQ+138} <ffffffff8010ca73>{do_IRQ+59} <ffffffff8010ac10>{ret_from_intr+0} <EOI> Code: 74 df 65 48 8b 04 25 00 00 00 00 48 c7 00 00 00 00 00 31 c0 console shuts up ... <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 in_atomic():1, irqs_disabled():1 Call Trace: <NMI> <ffffffff80128488>{profile_task_exit+21} <ffffffff8012a073>{do_exit+32} <ffffffff8010bc34>{__die+0} <ffffffff8011547d>{nmi_watchdog_tick+161} <ffffffff8010c73e>{default_do_nmi+115} <ffffffff80115853>{do_nmi+61} <ffffffff8010b2d3>{nmi+127} <ffffffff802bdd16>{_spin_unlock_irq+9} <ffffffff80148ae1>{watchdog+100} <EOE> <IRQ> <ffffffff8010e2bf>{timer_interrupt+487} <ffffffff80148b50>{handle_IRQ_event+41} <ffffffff80148c0b>{__do_IRQ+138} <ffffffff8010ca73>{do_IRQ+59} <ffffffff8010ac10>{ret_from_intr+0} <EOI> Kernel panic - not syncing: Aiee, killing interrupt handler!
gs: Corrupted page table at address 2add986fd000 PGD 6b1f067 PUD 6b20067 PMD 571b067 PTE 800000ffff019067 Bad pagetable: 000f [1] CPU 0 Modules linked in: autofs4 cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table nfsd exportfs lockd nfs_acl sunrpc ipv6 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device af_packet radeon drm edd button battery ac subdomain sdmatch_pcre loop usbhid snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer generic floppy snd soundcore i2c_ali1535 e100 snd_page_alloc ide_cd shpchp pci_hotplug mii cdrom i2c_ali15x3 i2c_ali1563 ohci_hcd i2c_core ehci_hcd usbcore dm_mod parport_pc lp parport ext3 jbd sg fan thermal processor sata_uli libata alim15x3 sd_mod scsi_mod ide_disk ide_core Pid: 4726, comm: gs Not tainted 2.6.16-rc1-git3-4-default #1 RIP: 0033:[<00002add95d62fec>] [<00002add95d62fec>] RSP: 002b:00007fff15161c88 EFLAGS: 00010216 RAX: 00002add986fba30 RBX: 00000000000001c7 RCX: 00002add986fd736 RDX: 00002add95da3118 RSI: 0000000000000000 RDI: 00002add986fcffc RBP: 000000000079e960 R08: 00002add986fba30 R09: 00000000ffffffff R10: 00000000000000b3 R11: 0000000000001d10 R12: 0000000000000003 R13: 000000000000ffff R14: 00000000000000ff R15: 00000000000000ff FS: 00002add97d45b20(0000) GS:ffffffff8040e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00002add986fd000 CR3: 0000000006b65000 CR4: 00000000000006e0 Process gs (pid: 4726, threadinfo ffff810006b6e000, task ffff81001f22c040) RIP [<00002add95d62fec>] RSP <00007fff15161c88> <3>Slab corruption: start=ffff8100172f53b8, len=2048 Redzone: 0x1a4bce/0x1a5bde. Last user: [<00000000001a5bee>](0x1a5bee) 000: de 4b 1a 00 00 00 00 00 ee 4b 1a 00 00 00 00 00 010: fe 4b 1a 00 00 00 00 00 0e 4c 1a 00 00 00 00 00 020: 1e 4c 1a 00 00 00 00 00 2e 4c 1a 00 00 00 00 00 030: 3e 4c 1a 00 00 00 00 00 4e 4c 1a 00 00 00 00 00 040: 5e 4c 1a 00 00 00 00 00 6e 4c 1a 00 00 00 00 00 050: 7e 4c 1a 00 00 00 00 00 8e 4c 1a 00 00 00 00 00 Prev obj: start=ffffffff172f5248, len=2048 Unable to handle kernel paging request at ffffffff172f5a48 RIP: <ffffffff801644c8>{print_objinfo+31} PGD 103027 PUD 0 Oops: 0000 [2] CPU 0 Modules linked in: autofs4 cpufreq_ondemand cpufreq_userspace cpufreq_powersaveModules linked in: autofs4 cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table nfsd exportfs lockd nfs_acl sunrpc ipv6 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device af_packet radeon drm edd button battery ac subdomain sdmatch_pcre loop usbhid snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer generic floppy snd soundcore i2c_ali1535 e100 snd_page_alloc ide_cd shpchp pci_hotplug mii cdrom i2c_ali15x3 i2c_ali1563 ohci_hcd i2c_core ehci_hcd usbcore dm_mod parport_pc lp parport ext3 jbd sg fan thermal processor sata_uli libata alim15x3 sd_mod scsi_mod ide_disk ide_core Pid: 4247, comm: nscd Not tainted 2.6.16-rc1-git3-4-default #1 RIP: 0010:[<ffffffff801644c8>] <ffffffff801644c8>{print_objinfo+31} RSP: 0018:ffff810011129a88 EFLAGS: 00010206 RAX: ffffffff172f5a48 RBX: 000000004f354619 RCX: 0000000000000001 RDX: 0000000000000002 RSI: ffffffff172f5240 RDI: ffff81001ffafc80 RBP: ffff81001ffafc80 R08: 0000000000000005 R09: ffff810011129818 R10: 0000000000000004 R11: 0000000000000000 R12: ffffffff172f5240 R13: 0000000000000002 R14: ffff8100015125f8 R15: 0000000000000006 FS: 0000000040200960(0063) GS:ffffffff8040e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffff172f5a48 CR3: 000000000d514000 CR4: 00000000000006e0 Process nscd (pid: 4247, threadinfo ffff810011128000, task ffff8100080f8100) Stack: 000000004f354619 ffffffff172f5240 ffff81001ffafc80 0000000000000800 ffff8100015125f8 ffffffff801646b9 0000000000000080 ffffffff80263856 ffff81001ffafc80 ffff8100172f53b0 Call Trace: <ffffffff801646b9>{check_poison_obj+318} <ffffffff80263856>{__scm_send+195} <ffffffff80164742>{cache_alloc_debugcheck_after+45} <ffffffff801648c1>{kmem_cache_alloc+129} <ffffffff80263856>{__scm_send+195} <ffffffff802ba21f>{unix_stream_sendmsg+154} <ffffffff80163e46>{poison_obj+38} <ffffffff8025bbfe>{sock_sendmsg+240} <ffffffff802b9e1a>{unix_stream_recvmsg+1081} <ffffffff80174755>{do_lookup+99} <ffffffff8013932d>{autoremove_wake_function+0} <ffffffff8025c5fb>{sock_aio_read+79} <ffffffff80157f85>{find_extend_vma+22} <ffffffff8025c124>{sys_sendmsg+527} <ffffffff80168052>{do_sync_read+199} <ffffffff8013932d>{autoremove_wake_function+0} <ffffffff8012434d>{default_wake_function+0} <ffffffff80147d30>{audit_syscall_entry+301} <ffffffff8010d6e4>{syscall_trace_enter+190} <ffffffff8010a7dc>{tracesys+209} Code: 48 8b 18 48 89 ef e8 58 fd ff ff 48 8b 30 48 c7 c7 88 34 2e RIP <ffffffff801644c8>{print_objinfo+31} RSP <ffff810011129a88> CR2: ffffffff172f5a48 <3>Slab corruption: start=ffff810013f1c188, len=2048 Redzone: 0x0/0x64. Last user: [<0000000000000001>](0x1) 000: 83 00 00 00 00 00 00 00 20 a4 fd ff 00 00 00 00 010: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 020: 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 030: 00 00 00 00 ff 7f 00 00 01 00 00 00 00 00 00 00 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 050: 83 00 00 00 00 00 00 00 59 a7 fd ff 00 00 00 00 Prev obj: start=ffffffff13f1be61, len=2048 Unable to handle kernel paging request at ffffffff13f1c661 RIP: <ffffffff801644c8>{print_objinfo+31} PGD 103027 PUD 0 Oops: 0000 [3] CPU 0 Modules linked in: autofs4 cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table nfsd exportfs lockd nfs_acl sunrpc ipv6 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device af_packet radeon drm edd button battery ac subdomain sdmatch_pcre loop usbhid snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm snd_timer generic floppy snd soundcore i2c_ali1535 e100 snd_page_alloc ide_cd shpchp pci_hotplug mii cdrom i2c_ali15x3 i2c_ali1563 ohci_hcd i2c_core ehci_hcd usbcore dm_mod parport_pc lp parport ext3 jbd sg fan thermal processor sata_uli libata alim15x3 sd_mod scsi_mod ide_disk ide_core Pid: 3312, comm: kdm_greet Not tainted 2.6.16-rc1-git3-4-default #1 RIP: 0010:[<ffffffff801644c8>] <ffffffff801644c8>{print_objinfo+31} RSP: 0018:ffff810017009b18 EFLAGS: 00010206 RAX: ffffffff13f1c661 RBX: 000000004f34df9a RCX: 0000000000000001 RDX: 0000000000000002 RSI: ffffffff13f1be59 RDI: ffff81001ffafc80 RBP: ffff81001ffafc80 R08: 0000000000000005 R09: ffff8100170098a8 R10: 0000000000000004 R11: 0000000000000000 R12: ffffffff13f1be59 R13: 0000000000000002 R14: ffff81000143c298 R15: 0000000000000006 FS: 00002ba4ccedf5f0(0000) GS:ffffffff8040e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff13f1c661 CR3: 0000000017bb8000 CR4: 00000000000006e0 Process kdm_greet (pid: 3312, threadinfo ffff810017008000, task ffff81001eaa40c0) Stack: 000000004f34df9a ffffffff13f1be59 ffff81001ffafc80 0000000000000800 ffff81000143c298 ffffffff801646b9 ffff81001a132410 ffffffff80261563 ffff81001ffafc80 ffff810013f1c180 Call Trace: <ffffffff801646b9>{check_poison_obj+318} <ffffffff80261563>{__alloc_skb+89} <ffffffff80164742>{cache_alloc_debugcheck_after+45} <ffffffff80164f42>{__kmalloc+187} <ffffffff80261563>{__alloc_skb+89} <ffffffff8025df64>{sock_alloc_send_skb+99} <ffffffff80123713>{__wake_up+56} <ffffffff802ba2dd>{unix_stream_sendmsg+344} <ffffffff801090c3>{__switch_to+488} <ffffffff8025b727>{do_sock_write+193} <ffffffff8025c59d>{sock_aio_write+79} <ffffffff80179c10>{do_select+1025} <ffffffff80167f4e>{do_sync_write+199} <ffffffff8017f670>{file_update_time+48} <ffffffff8013932d>{autoremove_wake_function+0} <ffffffff88203a49>{:subdomain:subdomain_file_permission+324} <ffffffff80168857>{vfs_write+225} <ffffffff80168d75>{sys_write+69} <ffffffff8010a66a>{system_call+126} Code: 48 8b 18 48 89 ef e8 58 fd ff ff 48 8b 30 48 c7 c7 88 34 2e RIP <ffffffff801644c8>{print_objinfo+31} RSP <ffff810017009b18> CR2: ffffffff13f1c661
This sounds horrible. But these seem to be two different issues. The nmi watchdog thing sounds like something forgot to re-enable IRQs. The other oopses are pretty bad memory corruption. Did you run a memtest on this machine recently?
Wow!!! I'm really impressed by your knowledge/experience. I started memtest and had >1200 failures in test#3. Replaced RAM now.
Pointing at the usual suspects is easy :-) I'm closing this as invalid; please reopen if this nmi_watchdog oops happens again. Thanks!