Bug 1151395

Summary: Processes stuck on NMVE "Internal error: : 96000210", in nvme_timeout+0x44/0x360
Product: [openSUSE] openSUSE Distribution Reporter: Oliver Kurz <okurz>
Component: KernelAssignee: E-mail List <kernel-maintainers>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: hare, mbenes, okurz
Version: Leap 15.1   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Oliver Kurz 2019-09-19 20:07:06 UTC
## Observation

Observed on the machine openqaworker-arm-1.suse.de, openSUSE Leap 15.1, aarch64:

```
[ 1940.508202] Synchronous External Abort: synchronous external abort (0x96000210) at 0xffff0000187da01c
[ 1940.520263] Internal error: : 96000210 [#2] SMP
[ 1940.527514] Modules linked in: nf_log_ipv6 ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_comment xt_TCPMSS nf_log_ipv4 nf_log_common xt_LOG xt_limit iptable_nat nfsv3 nfs_acl nfnetlink_cthelper nfnetlink rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache af_packet tun openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_physdev br_netfilter bridge stp llc xt_pkttype xt_tcpudp iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack libcrc32c ip6table_filter ip6_tables x_tables nls_iso8859_1 nls_cp437 vfat fat nicvf cavium_ptp nicpf thunder_bgx cavium_rng_vf thunder_xcv joydev mdio_thunder mdio_cavium thunderx_edac cavium_rng aes_ce_blk uio_pdrv_genirq
[ 1940.619902]  uio crypto_simd cryptd ipmi_ssif ipmi_devintf ipmi_msghandler aes_ce_cipher crc32_ce crct10dif_ce ghash_ce aes_arm64 sha2_ce sha256_arm64 sha1_ce btrfs xor zstd_decompress zstd_compress xxhash zlib_deflate raid6_pq hid_generic usbhid ast i2c_algo_bit xhci_pci drm_kms_helper syscopyarea sysfillrect xhci_hcd sysimgblt fb_sys_fops ttm drm nvme nvme_core drm_panel_orientation_quirks gpio_keys usbcore i2c_thunderx i2c_smbus thunderx_mmc mmc_core dm_mirror dm_region_hash dm_log sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
[ 1940.685890] CPU: 21 PID: 7693 Comm: kworker/21:2H Tainted: G      D          4.12.14-lp151.27-default #1 openSUSE Leap 15.1
[ 1940.700651] Hardware name: GIGABYTE R120-T32/MT30-GS1, BIOS T19 09/29/2016
[ 1940.711184] Workqueue: kblockd blk_mq_timeout_work
[ 1940.719632] task: ffff803e1b10a000 task.stack: ffff803e770fc000
[ 1940.729237] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 1940.737746] pc : nvme_timeout+0x44/0x360 [nvme]
[ 1940.746003] lr : blk_mq_check_expired+0x140/0x178
[ 1940.754435] sp : ffff803e770ffc20
[ 1940.761471] x29: ffff803e770ffc20 x28: ffff803ee6f96c00 
[ 1940.770521] x27: 0000000000000000 x26: 0000000000000000 
[ 1940.779576] x25: ffff803ee6e9a910 x24: ffff803de8528000 
[ 1940.788643] x23: ffff803de8528138 x22: ffff803de4e78000 
[ 1940.797666] x21: ffff000008f59710 x20: ffff803ee6da8000 
[ 1940.806650] x19: ffff803de8528000 x18: 0000aaaaedcddda0 
[ 1940.815634] x17: 0000ffffabcf4830 x16: ffff000008163338 
[ 1940.824622] x15: 000081f290000000 x14: 0039387000000000 
[ 1940.833612] x13: 00000003e8000000 x12: 0000000000000018 
[ 1940.842583] x11: 00000000000ec744 x10: 0000000000001950 
[ 1940.851543] x9 : ffff803e770ffd80 x8 : ffff803e1b10b9b0 
[ 1940.860508] x7 : 000000000005d1d0 x6 : ffff803ee7204080 
[ 1940.869470] x5 : 0000000000000000 x4 : ffff000008f57000 
[ 1940.878429] x3 : 0000000000000001 x2 : ffff0000187da01c 
[ 1940.887380] x1 : 0000000000000000 x0 : 0000000000000000 
[ 1940.896305] Process kworker/21:2H (pid: 7693, stack limit = 0xffff803e770fc000)
[ 1940.907256] Call trace:
[ 1940.913258]  nvme_timeout+0x44/0x360 [nvme]
[ 1940.920927]  blk_mq_check_expired+0x140/0x178
[ 1940.928697]  bt_for_each+0x118/0x140
[ 1940.935595]  blk_mq_queue_tag_busy_iter+0xa8/0x140
[ 1940.943643]  blk_mq_timeout_work+0x58/0x118
[ 1940.951003]  process_one_work+0x1e4/0x430
[ 1940.958104]  worker_thread+0x50/0x478
[ 1940.964774]  kthread+0x134/0x138
[ 1940.970918]  ret_from_fork+0x10/0x20
[ 1940.977346] Code: d2800000 f94006d4 f9409282 91007042 (b9400053) 
[ 1940.986318] ---[ end trace b68294eca1d85d4b ]---
```

This seems to block some processes within lspci called by salt-minion as it looks:

```
root      9223  0.1  0.0 996644 61488 ?        Dl   19:23   0:04 /usr/bin/python3 /usr/bin/salt-minion
root      9294  0.0  0.0   3476   912 ?        R    19:23   0:00 /sbin/lspci -vmm
root     10024  0.0  0.0  39592 28272 ?        Ss   19:37   0:00 /usr/bin/python3 /usr/bin/salt-minion
root     10027  1.0  0.0 905292 57388 ?        Sl   19:37   0:18  \_ /usr/bin/python3 /usr/bin/salt-minion
root     10030  0.0  0.0 123196 28440 ?        S    19:37   0:00      \_ /usr/bin/python3 /usr/bin/salt-minion
root     10064  0.0  0.0   3476   932 ?        R    19:37   0:00      \_ /sbin/lspci -vmm
```
Comment 1 Hannes Reinecke 2019-11-25 14:07:13 UTC
Kernel version?
Comment 2 Oliver Kurz 2019-11-25 17:17:58 UTC
The kernel version is visible as part of the log output in the description, it is "4.12.14-lp151.27-default"
Comment 3 Miroslav Beneš 2020-04-07 12:29:48 UTC
Looks like a duplicate to me (well, this one is older, but Daniel already works on the other one).

*** This bug has been marked as a duplicate of bug 1156813 ***