Bugzilla – Bug 1213533
Backport irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4
Last modified: 2024-06-25 17:50:37 UTC
This upstreamed patch provides a hardware errata workaround and is required to support compute on 3 and 4-node NVIDIA Grace systems. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=35727af2b15d98a2dd2811d631d3a3886111312e This fix needs to be back ported to SLES 15 SP5.
Patches merged in SLE15-SP5 kernel sources. Thank you!
SUSE-SU-2023:3172-1: An update that solves seven vulnerabilities, contains two features and has 25 fixes can now be installed. Category: security (important) Bug References: 1150305, 1193629, 1194869, 1207894, 1208788, 1211243, 1211867, 1212256, 1212301, 1212525, 1212846, 1212905, 1213059, 1213061, 1213205, 1213206, 1213226, 1213233, 1213245, 1213247, 1213252, 1213258, 1213259, 1213263, 1213264, 1213286, 1213493, 1213523, 1213524, 1213533, 1213543, 1213705 CVE References: CVE-2023-20593, CVE-2023-2985, CVE-2023-3117, CVE-2023-31248, CVE-2023-3390, CVE-2023-35001, CVE-2023-3812 Jira References: PED-4718, PED-4758 Sources used: openSUSE Leap 15.5 (src): kernel-obs-qa-5.14.21-150500.55.12.1, kernel-source-5.14.21-150500.55.12.1, kernel-obs-build-5.14.21-150500.55.12.1, kernel-livepatch-SLE15-SP5_Update_2-1-150500.11.3.2, kernel-default-base-5.14.21-150500.55.12.1.150500.6.4.2, kernel-syms-5.14.21-150500.55.12.1 Basesystem Module 15-SP5 (src): kernel-source-5.14.21-150500.55.12.1, kernel-default-base-5.14.21-150500.55.12.1.150500.6.4.2 Development Tools Module 15-SP5 (src): kernel-obs-build-5.14.21-150500.55.12.1, kernel-source-5.14.21-150500.55.12.1, kernel-syms-5.14.21-150500.55.12.1 SUSE Linux Enterprise Live Patching 15-SP5 (src): kernel-livepatch-SLE15-SP5_Update_2-1-150500.11.3.2 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
SUSE-SU-2023:3180-1: An update that solves seven vulnerabilities, contains two features and has 26 fixes can now be installed. Category: security (important) Bug References: 1150305, 1193629, 1194869, 1207894, 1208788, 1211243, 1211867, 1212256, 1212301, 1212525, 1212846, 1212905, 1213059, 1213061, 1213205, 1213206, 1213226, 1213233, 1213245, 1213247, 1213252, 1213258, 1213259, 1213263, 1213264, 1213286, 1213311, 1213493, 1213523, 1213524, 1213533, 1213543, 1213705 CVE References: CVE-2023-20593, CVE-2023-2985, CVE-2023-3117, CVE-2023-31248, CVE-2023-3390, CVE-2023-35001, CVE-2023-3812 Jira References: PED-4718, PED-4758 Sources used: openSUSE Leap 15.5 (src): kernel-source-azure-5.14.21-150500.33.11.1, kernel-syms-azure-5.14.21-150500.33.11.1 Public Cloud Module 15-SP5 (src): kernel-source-azure-5.14.21-150500.33.11.1, kernel-syms-azure-5.14.21-150500.33.11.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
SUSE-SU-2023:3302-1: An update that solves 28 vulnerabilities, contains two features and has 115 fixes can now be installed. Category: security (important) Bug References: 1150305, 1187829, 1193629, 1194869, 1206418, 1207129, 1207894, 1207948, 1208788, 1210335, 1210565, 1210584, 1210627, 1210780, 1210825, 1210853, 1211014, 1211131, 1211243, 1211738, 1211811, 1211867, 1212051, 1212256, 1212265, 1212301, 1212445, 1212456, 1212502, 1212525, 1212603, 1212604, 1212685, 1212766, 1212835, 1212838, 1212842, 1212846, 1212848, 1212861, 1212869, 1212892, 1212901, 1212905, 1212961, 1213010, 1213011, 1213012, 1213013, 1213014, 1213015, 1213016, 1213017, 1213018, 1213019, 1213020, 1213021, 1213024, 1213025, 1213032, 1213034, 1213035, 1213036, 1213037, 1213038, 1213039, 1213040, 1213041, 1213059, 1213061, 1213087, 1213088, 1213089, 1213090, 1213092, 1213093, 1213094, 1213095, 1213096, 1213098, 1213099, 1213100, 1213102, 1213103, 1213104, 1213105, 1213106, 1213107, 1213108, 1213109, 1213110, 1213111, 1213112, 1213113, 1213114, 1213116, 1213134, 1213167, 1213205, 1213206, 1213226, 1213233, 1213245, 1213247, 1213252, 1213258, 1213259, 1213263, 1213264, 1213272, 1213286, 1213287, 1213304, 1213417, 1213493, 1213523, 1213524, 1213533, 1213543, 1213578, 1213585, 1213586, 1213588, 1213601, 1213620, 1213632, 1213653, 1213705, 1213713, 1213715, 1213747, 1213756, 1213759, 1213777, 1213810, 1213812, 1213856, 1213857, 1213863, 1213867, 1213870, 1213871, 1213872 CVE References: CVE-2022-40982, CVE-2023-0459, CVE-2023-1829, CVE-2023-20569, CVE-2023-20593, CVE-2023-21400, CVE-2023-2156, CVE-2023-2166, CVE-2023-2430, CVE-2023-2985, CVE-2023-3090, CVE-2023-31083, CVE-2023-3111, CVE-2023-3117, CVE-2023-31248, CVE-2023-3212, CVE-2023-3268, CVE-2023-3389, CVE-2023-3390, CVE-2023-35001, CVE-2023-3567, CVE-2023-3609, CVE-2023-3611, CVE-2023-3776, CVE-2023-3812, CVE-2023-38409, CVE-2023-3863, CVE-2023-4004 Jira References: PED-4718, PED-4758 Sources used: openSUSE Leap 15.5 (src): kernel-livepatch-SLE15-SP5-RT_Update_3-1-150500.11.5.1, kernel-syms-rt-5.14.21-150500.13.11.1, kernel-source-rt-5.14.21-150500.13.11.1 SUSE Linux Enterprise Live Patching 15-SP5 (src): kernel-livepatch-SLE15-SP5-RT_Update_3-1-150500.11.5.1 SUSE Real Time Module 15-SP5 (src): kernel-syms-rt-5.14.21-150500.13.11.1, kernel-source-rt-5.14.21-150500.13.11.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Verified: host-10-176-223-220:~ # dmesg | grep -i gicv3 [ 0.000000] GICv3: GIC: Using split EOI/Deactivate mode [ 0.000000] GIC: enabling workaround for GICv3: NVIDIA erratum T241-FABRIC-4 Ran stress workload (memtester + stress-ng + iozone + irqbalance) and encountered lockups within 15 minutes of starting tests. With fix applied, same workload has been running for 6+ hours without issue.