Bug 1173819 - 'BUG: workqueue lockup' on ThunderX2 machines with kernel 4.12.14 in OBS
'BUG: workqueue lockup' on ThunderX2 machines with kernel 4.12.14 in OBS
Status: RESOLVED WONTFIX
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Kernel
Leap 15.1
aarch64 Other
: P5 - None : Normal (vote)
: ---
Assigned To: openSUSE Kernel Bugs
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-07-07 12:39 UTC by Guillaume GARDET
Modified: 2021-10-22 08:48 UTC (History)
8 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Log from yesterday (11.38 KB, text/plain)
2020-07-07 12:58 UTC, Ismail Dönmez
Details
dmesg of obs-arm-8 (1.54 MB, text/plain)
2020-07-16 07:37 UTC, Adrian Schröter
Details
dmesg of obs-arm-9 (1.92 MB, text/plain)
2020-07-16 07:37 UTC, Adrian Schröter
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Guillaume GARDET 2020-07-07 12:39:41 UTC
obs-arm-7, -8 and -9 are often down in OBS these days due to kernel issues.
One issue is a workqueue lockup: 

BUG: workqueue lockup - pool cpus=59 node=0 flags=0x0 nice=0 stuck for 92265s!
Comment 1 Ismail Dönmez 2020-07-07 12:58:07 UTC
Log from yesterday:

2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.526080] INFO: rcu_sched self-detected stall on CPU
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.531221] #011205-...: (5976 ticks this GP) idle=902/140000000000001/0 softirq=33742/33742 fqs=2851
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.540252] #011 (t=6001 jiffies g=28000 c=27999 q=176626)
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.545486] Task dump for CPU 205:
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.546079] INFO: rcu_sched detected stalls on CPUs/tasks:
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.548877] qemu-system-aar R
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.554351]   running task        0 19576   6280 0x00000006
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.562872] Call trace:
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.565315]  dump_backtrace+0x0/0x188
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.568968]  show_stack+0x24/0x30
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.572277]  sched_show_task+0xec/0x138
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.576105]  dump_cpu_task+0x48/0x58
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.579680]  rcu_dump_cpu_stacks+0xa0/0xe8
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.583770]  rcu_check_callbacks+0x6e4/0x938
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.588037]  update_process_times+0x34/0x60
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.592216]  tick_sched_handle.isra.6+0x38/0x70
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.596735]  tick_sched_timer+0x4c/0x98
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.600561]  __hrtimer_run_queues+0xc4/0x278
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.604819]  hrtimer_interrupt+0xa8/0x228
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.608828]  arch_timer_handler_phys+0x38/0x58
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.613269]  handle_percpu_devid_irq+0x90/0x248
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.617789]  generic_handle_irq+0x34/0x50
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.621786]  __handle_domain_irq+0x68/0xc0
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.625873]  gic_handle_irq+0x80/0x18c
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.629612]  el1_irq+0xb0/0x140
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.632744]  osq_lock+0x108/0x1b8
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.636048]  rwsem_optimistic_spin+0x70/0x130
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.640398]  rwsem_down_write_failed+0x48/0x200
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.644915]  down_write+0x58/0x70
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.648228]  ext4_file_write_iter+0x74/0x388
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.652492]  __vfs_write+0xd0/0x148
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.655970]  vfs_write+0xac/0x1b8
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.659274]  SyS_pwrite64+0x8c/0xa8
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.662752]  el0_svc_naked+0x44/0x48
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.666328] #011205-...: (5976 ticks this GP) idle=902/140000000000001/0 softirq=33742/33742 fqs=2852
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.675368] #011(detected by 17, t=6014 jiffies, g=28000, c=27999, q=177177)
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.675390] Task dump for CPU 205:
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.685554] qemu-system-aar R  running task        0 19576   6280 0x00000006
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.692597] Call trace:
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.692608]  __switch_to+0xe4/0x150
2020-07-06T10:48:15+00:00 obs-arm-9 kernel: [ 1983.692616]  0xffff89bcbd10
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.166037] BUG: workqueue lockup - pool cpus=59 node=0 flags=0x0 nice=0 stuck for 47s!
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.166090] BUG: workqueue lockup - pool cpus=205 node=1 flags=0x0 nice=0 stuck for 59s!
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.182238] Showing busy workqueues and worker pools:
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.182249] workqueue events: flags=0x0
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.191156]   pwq 410: cpus=205 node=1 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.191167]     pending: cache_reap
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.201884]   pwq 172: cpus=86 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.208960]     in-flight: 1375:wait_rcu_exp_gp
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.213560]   pwq 118: cpus=59 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.220609]     pending: cache_reap
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.224370] workqueue mm_percpu_wq: flags=0x8
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.228730]   pwq 410: cpus=205 node=1 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.235865]     pending: vmstat_update
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.250910] workqueue kblockd: flags=0x18
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.254959]   pwq 119: cpus=59 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.262179]     pending: blk_mq_run_work_fn
2020-07-06T10:48:16+00:00 obs-arm-9 kernel: [ 1984.267886] pool 172: cpus=86 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 22586 528
2020-07-06T10:48:20+00:00 obs-arm-9 kernel: [ 1988.645557] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 205-... } 6205 jiffies s: 6281 root: 0x1000/.
2020-07-06T10:48:20+00:00 obs-arm-9 kernel: [ 1988.656101] blocking rcu_node structures: l=1:192-207:0x2000/.
2020-07-06T10:48:20+00:00 obs-arm-9 kernel: [ 1988.661946] Task dump for CPU 205:
2020-07-06T10:48:20+00:00 obs-arm-9 kernel: [ 1988.665342] qemu-system-aar R  running task        0 19576   6280 0x00000006
2020-07-06T10:48:20+00:00 obs-arm-9 kernel: [ 1988.672427] Call trace:
2020-07-06T10:48:20+00:00 obs-arm-9 kernel: [ 1988.674880]  __switch_to+0xe4/0x150
2020-07-06T10:48:20+00:00 obs-arm-9 kernel: [ 1988.674884]  0xffff89bcbd10
2020-07-06T10:48:26+00:00 obs-arm-9 systemd-udevd[22482]: seq 12812 '/devices/virtual/block/loop2' is taking a long time
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.882768] BUG: workqueue lockup - pool cpus=59 node=0 flags=0x0 nice=0 stuck for 78s!
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.890815] BUG: workqueue lockup - pool cpus=59 node=0 flags=0x0 nice=-20 stuck for 59s!
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.899089] BUG: workqueue lockup - pool cpus=205 node=1 flags=0x0 nice=0 stuck for 90s!
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.907227] Showing busy workqueues and worker pools:
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.907238] workqueue events: flags=0x0
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.916130]   pwq 410: cpus=205 node=1 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.916143]     pending: cache_reap
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.916185]   pwq 172: cpus=86 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.926769]     in-flight: 1375:wait_rcu_exp_gp
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.938369]   pwq 118: cpus=59 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.945423]     pending: cache_reap
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.945624] workqueue events_freezable_power_: flags=0x84
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.954332]   pwq 356: cpus=178 node=1 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.961469]     in-flight: 1612:disk_events_workfn
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.966298]   pwq 322: cpus=161 node=1 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.973443]     in-flight: 1595:disk_events_workfn
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.978308] workqueue mm_percpu_wq: flags=0x8
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.982663]   pwq 410: cpus=205 node=1 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.989798]     pending: vmstat_update
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2014.993654]   pwq 118: cpus=59 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2015.000700]     pending: vmstat_update
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2015.004854] workqueue kblockd: flags=0x18
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2015.008903]   pwq 119: cpus=59 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2015.016124]     pending: blk_mq_run_work_fn
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2015.021841] pool 172: cpus=86 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 22586 528
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2015.029928] pool 322: cpus=161 node=1 flags=0x0 nice=0 hung=0s workers=3 idle: 199260 123733
2020-07-06T10:48:47+00:00 obs-arm-9 kernel: [ 2015.038410] pool 356: cpus=178 node=1 flags=0x0 nice=0 hung=0s workers=3 idle: 199074 122038
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.609482] BUG: workqueue lockup - pool cpus=59 node=0 flags=0x0 nice=0 stuck for 108s!
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.617578] BUG: workqueue lockup - pool cpus=59 node=0 flags=0x0 nice=-20 stuck for 90s!
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.625795] BUG: workqueue lockup - pool cpus=205 node=1 flags=0x0 nice=0 stuck for 121s!
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.633988] Showing busy workqueues and worker pools:
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.633994] workqueue events: flags=0x0
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.634001]   pwq 410: cpus=205 node=1 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.634006]     pending: cache_reap
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.634048]   pwq 172: cpus=86 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.653482]     in-flight: 1375:wait_rcu_exp_gp
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.653503]   pwq 118: cpus=59 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.665051]     pending: cache_reap
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.665276] workqueue mm_percpu_wq: flags=0x8
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.679941]   pwq 410: cpus=205 node=1 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.687070]     pending: vmstat_update
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.687133]   pwq 118: cpus=59 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.697891]     pending: vmstat_update
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.698298] workqueue kblockd: flags=0x18
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.705690]   pwq 119: cpus=59 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.705695]     pending: blk_mq_run_work_fn
2020-07-06T10:49:17+00:00 obs-arm-9 kernel: [ 2045.707145] pool 172: cpus=86 node=0 flags=0x0 nice=0 hung=1s workers=3 idle: 22586 528
Comment 2 Ismail Dönmez 2020-07-07 12:58:47 UTC
Created attachment 839442 [details]
Log from yesterday
Comment 3 Adrian Schröter 2020-07-16 07:36:30 UTC
both systems did hang up again similar. Kernel reports quite soon internal errors while handling ext4 jobs. Will attach full dmesg files.
Comment 4 Adrian Schröter 2020-07-16 07:37:09 UTC
Created attachment 839757 [details]
dmesg of obs-arm-8
Comment 5 Adrian Schröter 2020-07-16 07:37:46 UTC
Created attachment 839758 [details]
dmesg of obs-arm-9
Comment 6 Miroslav Beneš 2020-11-13 15:03:58 UTC
Forgotten...

There are earlier bugs in both obs-arm-8 and -9 logs. Somewhere in ext4 code. Workqueue lockups could be just consequences. Adrian pointed that out.

Anyway, it was reported against 15.1. Is it still happening on 15.2?

CCing Jack, so that he is aware (it may have been reported against SLES as well), but I don't think this is worth pursuing.
Comment 7 Jan Kara 2020-11-18 10:11:27 UTC
Yeah, the lockups are likely secondary. It is unclear why we are crashing at given addresses - from a first look they appear valid. Also the inode which we are trying to lock is safely pinned at this point by the open file... It may be some ARM specific issue... Anyway, if it still happens, we'd need to have a closer look.
Comment 8 Guillaume GARDET 2021-10-22 08:48:59 UTC
Leap 15.1 is EOL and I think we did not encountered this problem for a while now (likely because hosts have been upgraded).