Bug 1215921

Summary: NVIDIA - Grace: Backport iommu/arm-smmu-v3: Fix soft lockup triggered by arm_smmu_mm_invalidate_range
Product: [openSUSE] PUBLIC SUSE Linux Enterprise Server 15 SP5 Reporter: Matt Ochs <mochs>
Component: KernelAssignee: Ivan Ivanov <ivan.ivanov>
Status: VERIFIED FIXED QA Contact:
Severity: Normal    
Priority: P5 - None CC: ddavis, ivan.ivanov, stanimir.varbanov
Version: unspecified   
Target Milestone: ---   
Hardware: aarch64   
OS: SLES 15   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Matt Ochs 2023-10-04 02:17:35 UTC
When running an SVA case, the following soft lockup is triggered:
--------------------------------------------------------------------
watchdog: BUG: soft lockup - CPU#244 stuck for 26s!
pstate: 83400009 (Nzcv daif +PAN UAO +TCO +DIT -SSBS BTYPE=-)
pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50
sp : ffff8000d83ef290
x29: ffff8000d83ef290 x28: 000000003b9aca00 x27: 0000000000000000
x26: ffff8000d83ef3c0 x25: da86c0812194a0e8 x24: 0000000000000000
x23: 0000000000000040 x22: ffff8000d83ef340 x21: ffff0000c63980c0
x20: 0000000000000001 x19: ffff0000c6398080 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: ffff3000b4a3bbb0
x14: ffff3000b4a30888 x13: ffff3000b4a3cf60 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc08120e4d6bc
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000048cfa
x5 : 0000000000000000 x4 : 0000000000000001 x3 : 000000000000000a
x2 : 0000000080000000 x1 : 0000000000000000 x0 : 0000000000000001
Call trace:
 arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
 __arm_smmu_tlb_inv_range+0x118/0x254
 arm_smmu_tlb_inv_range_asid+0x6c/0x130
 arm_smmu_mm_invalidate_range+0xa0/0xa4
 __mmu_notifier_invalidate_range_end+0x88/0x120
 unmap_vmas+0x194/0x1e0
 unmap_region+0xb4/0x144
 do_mas_align_munmap+0x290/0x490
 do_mas_munmap+0xbc/0x124
 __vm_munmap+0xa8/0x19c
 __arm64_sys_munmap+0x28/0x50
 invoke_syscall+0x78/0x11c
 el0_svc_common.constprop.0+0x58/0x1c0
 do_el0_svc+0x34/0x60
 el0_svc+0x2c/0xd4
 el0t_64_sync_handler+0x114/0x140
 el0t_64_sync+0x1a4/0x1a8


The is resolved by the following v6.6-rc5 commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d5afb4b47e13161b3f33904d45110f9e6463bad6


Please backport to SLES 15 SP5.
Comment 1 Ivan Ivanov 2023-10-17 07:26:15 UTC
Merged in SP5 with commit 37a5554f2b6bf6c7bb8b9d80b1cde10028f6ac36
Comment 8 Matt Ochs 2023-11-01 16:21:48 UTC
Will this patch also be present in SLES 15 SP6? Currently it looks to be absent from that branch.
Comment 9 Ivan Ivanov 2023-11-02 08:12:44 UTC
Sure, I will back port it to SP6 too. Thank you for noticing!
Comment 10 Maintenance Automation 2023-11-02 16:30:15 UTC
SUSE-SU-2023:4343-1: An update that solves nine vulnerabilities and has five security fixes can now be installed.

Category: security (important)
Bug References: 1211162, 1211307, 1213772, 1214754, 1214874, 1215545, 1215921, 1215955, 1216062, 1216202, 1216322, 1216324, 1216333, 1216512
CVE References: CVE-2023-2163, CVE-2023-2860, CVE-2023-31085, CVE-2023-34324, CVE-2023-39189, CVE-2023-39191, CVE-2023-39193, CVE-2023-45862, CVE-2023-5178
Sources used:
SUSE Linux Enterprise Live Patching 15-SP5 (src): kernel-livepatch-SLE15-SP5-RT_Update_7-1-150500.11.3.1
SUSE Real Time Module 15-SP5 (src): kernel-source-rt-5.14.21-150500.13.24.1, kernel-syms-rt-5.14.21-150500.13.24.1
openSUSE Leap 15.5 (src): kernel-source-rt-5.14.21-150500.13.24.1, kernel-syms-rt-5.14.21-150500.13.24.1, kernel-livepatch-SLE15-SP5-RT_Update_7-1-150500.11.3.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 11 Maintenance Automation 2023-11-06 16:30:27 UTC
SUSE-SU-2023:4375-1: An update that solves nine vulnerabilities and has 17 security fixes can now be installed.

Category: security (important)
Bug References: 1208788, 1211162, 1211307, 1212423, 1212649, 1213705, 1213772, 1214754, 1214874, 1215095, 1215104, 1215523, 1215545, 1215921, 1215955, 1215986, 1216062, 1216202, 1216322, 1216323, 1216324, 1216333, 1216345, 1216512, 1216621, 802154
CVE References: CVE-2023-2163, CVE-2023-31085, CVE-2023-34324, CVE-2023-3777, CVE-2023-39189, CVE-2023-39191, CVE-2023-39193, CVE-2023-46813, CVE-2023-5178
Sources used:
SUSE Linux Enterprise Live Patching 15-SP5 (src): kernel-livepatch-SLE15-SP5_Update_7-1-150500.11.5.1
openSUSE Leap 15.5 (src): kernel-livepatch-SLE15-SP5_Update_7-1-150500.11.5.1, kernel-source-5.14.21-150500.55.36.1, kernel-obs-qa-5.14.21-150500.55.36.1, kernel-syms-5.14.21-150500.55.36.1, kernel-obs-build-5.14.21-150500.55.36.1, kernel-default-base-5.14.21-150500.55.36.1.150500.6.15.3
SUSE Linux Enterprise Micro 5.5 (src): kernel-default-base-5.14.21-150500.55.36.1.150500.6.15.3
Basesystem Module 15-SP5 (src): kernel-source-5.14.21-150500.55.36.1, kernel-default-base-5.14.21-150500.55.36.1.150500.6.15.3
Development Tools Module 15-SP5 (src): kernel-source-5.14.21-150500.55.36.1, kernel-obs-build-5.14.21-150500.55.36.1, kernel-syms-5.14.21-150500.55.36.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 12 Matt Ochs 2023-11-07 17:00:53 UTC
(In reply to Ivan Ivanov from comment #9)
> Sure, I will back port it to SP6 too. Thank you for noticing!

Ack, thanks!

7166c4815c5 iommu/arm-smmu-v3: Fix soft lockup triggered by (bsc#1215921)
Comment 13 Matt Ochs 2023-11-07 17:02:08 UTC
Verified the soft lockup is resolved with this backport using 5.14.21-150500.55.36-64kb.
Comment 16 Maintenance Automation 2023-11-10 20:30:10 UTC
SUSE-SU-2023:4414-1: An update that solves 11 vulnerabilities and has 11 security fixes can now be installed.

Category: security (important)
Bug References: 1208788, 1211162, 1211307, 1212423, 1213705, 1213772, 1214754, 1214874, 1215104, 1215523, 1215545, 1215921, 1215955, 1215986, 1216062, 1216202, 1216322, 1216323, 1216324, 1216333, 1216345, 1216512
CVE References: CVE-2023-2163, CVE-2023-2860, CVE-2023-31085, CVE-2023-34324, CVE-2023-3777, CVE-2023-39189, CVE-2023-39191, CVE-2023-39193, CVE-2023-45862, CVE-2023-46813, CVE-2023-5178
Sources used:
openSUSE Leap 15.5 (src): kernel-source-azure-5.14.21-150500.33.23.1, kernel-syms-azure-5.14.21-150500.33.23.1
Public Cloud Module 15-SP5 (src): kernel-source-azure-5.14.21-150500.33.23.1, kernel-syms-azure-5.14.21-150500.33.23.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.