Bugzilla – Bug 148176
Intermittent BUG: soft lockup detected on CPU#
Last modified: 2007-06-05 10:34:11 UTC
Detail: The problem occurs during hotplug cpu testing when bringing a previously offlined CPU online. ========================================================================= Log output: Feb 3 13:16:06 linux klogd: Booting processor 3/4 APIC 0x7 Feb 3 13:16:06 linux klogd: Initializing CPU#3 Feb 3 13:16:06 linux klogd: Calibrating delay using timer specific routine.. 5985.52 BogoMIPS (lpj=11971045) Feb 3 13:16:06 linux klogd: CPU: Trace cache: 12K uops, L1 D cache: 16K Feb 3 13:16:06 linux klogd: CPU: L2 cache: 2048K Feb 3 13:16:06 linux klogd: CPU 3 -> Node 0 Feb 3 13:16:06 linux klogd: CPU: Physical Processor ID: 3 Feb 3 13:16:06 linux klogd: Intel(R) Xeon(TM) CPU 3.00GHz stepping 03 Feb 3 13:16:06 linux klogd: APIC error on CPU3: 00(40) Feb 3 13:16:06 linux klogd: migration_cost=8,1792 Feb 3 13:16:06 linux klogd: BUG: soft lockup detected on CPU#3! Feb 3 13:16:06 linux klogd: CPU 3: Feb 3 13:16:06 linux klogd: Modules linked in: joydev xt_pkttype ipt_LOG xt_limit freq_table radeon drm edd button battery ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat ip_nat iptable_filter ip6table_mangle ip_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables ipv6 apparmor aamatch_pcre loop dm_mod e1000 generic ehci_hcd uhci_hcd e752x_edac edac_mc ide_cd cdrom i8xx_tco shpchp pci_hotplug usbcore hw_random floppy reiserfs fan thermal processor aic7xxx scsi_transport_spi sg megaraid_mbox megaraid_mm piix sd_mod scsi_mod ide_disk ide_core Feb 3 13:16:06 linux klogd: Pid: 0, comm: swapper Tainted: G U 2.6.16-rc1-git3-7-smp #1 Feb 3 13:16:06 linux klogd: RIP: 0010:[<ffffffff80109aff>] <ffffffff80109aff>{mwait_idle+54} Feb 3 13:16:06 linux klogd: RSP: 0018:ffff810037f1fef8 EFLAGS: 00000246 Feb 3 13:16:06 linux klogd: RAX: 0000000000000000 RBX: 00000000000424d4 RCX: 0000000000000000 Feb 3 13:16:06 linux klogd: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff803bd518 Feb 3 13:16:06 linux klogd: RBP: 000011d30dde85cd R08: ffff810037f1fea0 R09: 0000000000000000 Feb 3 13:16:06 linux klogd: R10: 0000000000000296 R11: ffff810037f1fef0 R12: ffff810037fb60c0 Feb 3 13:16:06 linux klogd: R13: ffff810037fb60c0 R14: ffff810037fb62e8 R15: 0000000000000001 Feb 3 13:16:06 linux klogd: FS: 0000000000000000(0000) GS:ffff810037fc1c80(0000) knlGS:0000000000000000 Feb 3 13:16:06 linux klogd: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Feb 3 13:16:06 linux klogd: CR2: 00002b73ec458d70 CR3: 00000000334ec000 CR4: 00000000000006a0 Feb 3 13:16:06 linux klogd: Feb 3 13:16:06 linux klogd: Call Trace: <ffffffff80109aa6>{cpu_idle+151} <ffffffff8011875a>{start_secondary+1144} ========================================================================= To Reproduce: Run the following script on a 4 processor system and check the log periodically for 'BUG'. #!/bin/sh #Test offline/online LIMIT=1000 COUNT=0 while [ $COUNT -lt $LIMIT ] do echo LOOP $COUNT date #offline processors for i in 3 2 1 do echo 0 > /sys/devices/system/cpu/cpu${i}/online date cat /proc/cpuinfo | grep processor sleep 15 done #online processors for i in 1 2 3 do echo 1 > /sys/devices/system/cpu/cpu${i}/online date cat /proc/cpuinfo | grep processor sleep 15 done COUNT=`expr $COUNT + 1` done date echo $0 test complete ========================================================================= Proposed Resolution: Modify watchdog() function in softlockup.c to execute the touch_softlockup_watchdog() function prior to calling msleep_interruptible(). This will update the timestamp before the timer interrupt can call softlockup_tick() which detects a false soft lockup. --- softlockup.c.orig 2006-02-03 13:16:52.000000000 -0700 +++ softlockup.c 2006-02-03 13:17:12.000000000 -0700 @@ -85,8 +85,8 @@ * in softlockup_tick(): */ while (!kthread_should_stop()) { - msleep_interruptible(1000); touch_softlockup_watchdog(); + msleep_interruptible(1000); } __set_current_state(TASK_RUNNING); =========================================================================
Well, if it helps ... as the softlockup detection will be switched off for GA this is hardly a crucial fix. Nevertheless, if it's valid we should include it. I'll ask for a review.
Change looks ok to me.
I enabled the patch now. There's also another softlockup change I wanted to do - disable it during bootup. Will add that to the patch too.
Closed.