Bug 1083836

Summary: Combination of 4.4.116+ kernel with older KMP makes user-space crashing
Product: [openSUSE] openSUSE Distribution Reporter: Takashi Iwai <tiwai>
Component: KernelAssignee: Miroslav Beneš <mbenes>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: bpetkov, jkosina, jslaby, mbenes, meissner, sndirsch
Version: Leap 42.3   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: strace of X with drm-kmp installed
strace of X with drm-kmp NOT installed

Description Takashi Iwai 2018-03-03 14:08:34 UTC
When an old KMP is used together with the recent Leap 42.3 kernel (4.4.116 and later) leads to the unexpected crash of user-space.

Typically, this is seen with drm-kmp (build-base on 42.3 GM) and the newer kernel (4.4.116-1.g1719d92 and later).  The machine appears like crashing as X stops working and user loses the control (no keyboard control back to VT any longer).

The bisection pointed to the patches:
  patches.suse/x86-entry-64-separate-cpu_current_top_of_stack-from-tss-sp0.patch
  patches.suse/x86-entry-64-use-a-per-cpu-trampoline-stack.patch

And, reverting these two patches from the latest 42.3 KOTD fixes the problem.

Actually, only reverting the latter patch
  patches.suse/x86-entry-64-use-a-per-cpu-trampoline-stack.patch
seems enough.

One more thing to be noted is that, if you rebuild the same KMP with the latest kernel and use it with the new kernel, the bug doesn't appear.  The bug manifests only with the combination of old KMP and new kernel.

We'll need to rebuild KMPs in anyway for addressing spectre v2 issues, so practically seen, it can be avoided by updating both kernel and KMPs at the same time.  But still user may hit the issues with 3rd party KMPs that won't be synced with the kernel update.
Comment 1 Takashi Iwai 2018-03-03 14:10:10 UTC
The issue should be seen on SLE12-SP3 as well, but it hasn't been checked, so I created an entry for Leap 42.3.
Comment 2 Jiri Slaby 2018-03-04 14:18:00 UTC
Created attachment 762604 [details]
strace of X with drm-kmp installed

OK, I do not understand the issue at all (yet). I see no way how a rebuild can help when the patch applied (neither after looking at the sync_core changes jikos mentioned).

I can (perhaps) reproduce in qemu with 3d enabled (gl=on). It is enough to start X and glxgears. With 4.4.119, drm-kmp installed, it does not spin at all and X crashes after killing glxgears (screen's root window is NULL). Strace of X is attached (read from /dev/dri/card0 repeatedly returns -EFAULT).

When I uninstall drm-kmp-default and reboot, the wheels are spinning and the reads return 32 bytes or so. I will attach strace of this in the next round.
Comment 3 Jiri Slaby 2018-03-04 14:21:19 UTC
Created attachment 762605 [details]
strace of X with drm-kmp NOT installed

And given your lighdm crashes due to strlen(NULL) (it would be interesting to see the call trace), is this some memory corruption?

The stack trace of dereferencing NULL root window looks like this (with drm-kmp installed):
#8  <signal handler called>
#9  _fbGetWindowPixmap (pWindow=0x0) at fbscreen.c:84
#10 0x000000000050f267 in present_restore_screen_pixmap (screen=0x12eaad0) at present.c:442
#11 0x000000000050f335 in present_set_abort_flip (screen=screen@entry=0x12eaad0) at present.c:458
#12 0x000000000050fac4 in present_flip_destroy (screen=screen@entry=0x12eaad0) at present.c:1021
#13 0x000000000050e6e3 in present_close_screen (screen=0x12eaad0) at present_screen.c:61
#14 0x00000000004c7278 in CursorCloseScreen (pScreen=0x12eaad0) at cursor.c:187
#15 0x000000000050cea4 in AnimCurCloseScreen (pScreen=<optimized out>) at animcur.c:106
#16 0x000000000043d2e7 in dix_main (argc=2, argv=0x7fff01d0a498, envp=<optimized out>) at main.c:354
#17 0x00007f72e0c84725 in __libc_start_main () from /lib64/libc.so.6
#18 0x0000000000428719 in _start () at ../sysdeps/x86_64/start.S:118
Comment 4 Jiri Slaby 2018-03-04 14:59:15 UTC
Well, I maybe see now.

Thread_info is since:
 patches.suse/x86-entry-64-separate-cpu_current_top_of_stack-from-tss-sp0.patch
in cpu_tss.x86_tss.sp1 and not in cpu_tss.x86_tss.sp0.

But all built KMPs use cpu_tss.x86_tss.sp0 for access_ok (and many other inlines/macros and friends) via current_thread_info()->current_top_of_stack(). 

This is no problem as sp0 and sp1 are identical (mostly, I guess) until:
 patches.suse/x86-entry-64-use-a-per-cpu-trampoline-stack.patch
which starts using sp0 for per_cpu stack. It obviously breaks current_thread_info() inlined in KMPs.
Comment 7 Takashi Iwai 2018-03-07 16:04:05 UTC
(In reply to Jiri Slaby from comment #4)
> This is no problem as sp0 and sp1 are identical (mostly, I guess) until:
>  patches.suse/x86-entry-64-use-a-per-cpu-trampoline-stack.patch
> which starts using sp0 for per_cpu stack. It obviously breaks
> current_thread_info() inlined in KMPs.

So, can we fix this somehow?  It shouldn't be impossible to detect the old KMP code path and switch the mechanics, but I'm not sure whether it's practically feasible at all...

Alternatively, we may disallow to run the old code from the old KMP, not only warning of spectre v2 vulnerability.  We'll end up upgrading all our own KMPs in anyway together with the kernel update.
But still it'll be a problem if the system requires the 3rd party KMP inevitably (e.g. for storage); then the system will be non-bootable.

Any ideas?
Comment 8 Miroslav Beneš 2018-03-07 16:17:31 UTC
I'm working on a kabi friendly version of the patch...
Comment 9 Miroslav Beneš 2018-03-21 12:11:00 UTC
Pushed now to SLE12-SP2 for-next branch. Commits

9b4f0b2024bb ("x86/kaiser: Duplicate cpu_tss for an entry trampoline usage (bsc#1077560 bsc#1083836).")
1c9c53d78f7d ("x86/kaiser: Use a per-CPU trampoline stack for kernel entry (bsc#1077560).")
3e459797ac5e ("x86/kaiser: Remove a user mapping of cpu_tss structure (bsc#1077560 bsc#1083836).")

Takashi reported successful pass with old drm-kmp.

SLE12-SP3 coming soon.
Comment 10 Miroslav Beneš 2018-03-26 11:20:10 UTC
SLE12-SP3 pushed as well. Let me close this...
Comment 11 Swamp Workflow Management 2018-04-17 16:11:21 UTC
openSUSE-SU-2018:0972-1: An update that solves three vulnerabilities and has 52 fixes is now available.

Category: security (important)
Bug References: 1012382,1019695,1019699,1022604,1031717,1046610,1060799,1064206,1068032,1073059,1073069,1075428,1076033,1077560,1081358,1083574,1083745,1083836,1084223,1084310,1084328,1084353,1084452,1084610,1084829,1084889,1084898,1084914,1084918,1084967,1085042,1085058,1085224,1085383,1085402,1085404,1085487,1085507,1085981,1086015,1086194,1086357,1086499,1086518,1086607,1087088,1087211,1087231,1087260,1087659,1087845,1087906,1087999,1088087,1088324
CVE References: CVE-2018-1091,CVE-2018-7740,CVE-2018-8043
Sources used:
openSUSE Leap 42.3 (src):    kernel-debug-4.4.126-48.2, kernel-default-4.4.126-48.2, kernel-docs-4.4.126-48.1, kernel-obs-build-4.4.126-48.2, kernel-obs-qa-4.4.126-48.1, kernel-source-4.4.126-48.1, kernel-syms-4.4.126-48.1, kernel-vanilla-4.4.126-48.2
Comment 12 Swamp Workflow Management 2018-04-23 19:11:21 UTC
SUSE-SU-2018:1048-1: An update that solves 5 vulnerabilities and has 62 fixes is now available.

Category: security (important)
Bug References: 1012382,1019695,1019699,1022604,1031717,1046610,1060799,1064206,1068032,1073059,1073069,1075428,1076033,1077560,1083574,1083745,1083836,1084223,1084310,1084328,1084353,1084452,1084610,1084699,1084829,1084889,1084898,1084914,1084918,1084967,1085042,1085058,1085224,1085383,1085402,1085404,1085487,1085507,1085511,1085679,1085981,1086015,1086162,1086194,1086357,1086499,1086518,1086607,1087088,1087211,1087231,1087260,1087274,1087659,1087845,1087906,1087999,1088050,1088087,1088241,1088267,1088313,1088324,1088600,1088684,1088871,802154
CVE References: CVE-2017-18257,CVE-2018-1091,CVE-2018-7740,CVE-2018-8043,CVE-2018-8822
Sources used:
SUSE Linux Enterprise Workstation Extension 12-SP3 (src):    kernel-default-4.4.126-94.22.1
SUSE Linux Enterprise Software Development Kit 12-SP3 (src):    kernel-docs-4.4.126-94.22.1, kernel-obs-build-4.4.126-94.22.1
SUSE Linux Enterprise Server 12-SP3 (src):    kernel-default-4.4.126-94.22.1, kernel-source-4.4.126-94.22.2, kernel-syms-4.4.126-94.22.1
SUSE Linux Enterprise Live Patching 12-SP3 (src):    kgraft-patch-SLE12-SP3_Update_11-1-4.5.1
SUSE Linux Enterprise High Availability 12-SP3 (src):    kernel-default-4.4.126-94.22.1
SUSE Linux Enterprise Desktop 12-SP3 (src):    kernel-default-4.4.126-94.22.1, kernel-source-4.4.126-94.22.2, kernel-syms-4.4.126-94.22.1
SUSE CaaS Platform ALL (src):    kernel-default-4.4.126-94.22.1
Comment 13 Swamp Workflow Management 2018-05-08 22:17:05 UTC
SUSE-SU-2018:1173-1: An update that solves 9 vulnerabilities and has 27 fixes is now available.

Category: security (important)
Bug References: 1012382,1031717,1046610,1057734,1070536,1075428,1076847,1077560,1082153,1082299,1083125,1083745,1083836,1084353,1084610,1084721,1084829,1085042,1085185,1085224,1085402,1085404,1086162,1086194,1087088,1087260,1087845,1088241,1088242,1088600,1088684,1089198,1089608,1089644,1089752,1090643
CVE References: CVE-2017-18257,CVE-2018-10087,CVE-2018-10124,CVE-2018-1087,CVE-2018-7740,CVE-2018-8043,CVE-2018-8781,CVE-2018-8822,CVE-2018-8897
Sources used:
SUSE OpenStack Cloud 7 (src):    kernel-default-4.4.121-92.73.1, kernel-source-4.4.121-92.73.1, kernel-syms-4.4.121-92.73.1, kgraft-patch-SLE12-SP2_Update_21-1-3.3.1
SUSE Linux Enterprise Server for SAP 12-SP2 (src):    kernel-default-4.4.121-92.73.1, kernel-source-4.4.121-92.73.1, kernel-syms-4.4.121-92.73.1, kgraft-patch-SLE12-SP2_Update_21-1-3.3.1
SUSE Linux Enterprise Server 12-SP2-LTSS (src):    kernel-default-4.4.121-92.73.1, kernel-source-4.4.121-92.73.1, kernel-syms-4.4.121-92.73.1, kgraft-patch-SLE12-SP2_Update_21-1-3.3.1
SUSE Enterprise Storage 4 (src):    kernel-default-4.4.121-92.73.1, kernel-source-4.4.121-92.73.1, kernel-syms-4.4.121-92.73.1, kgraft-patch-SLE12-SP2_Update_21-1-3.3.1
OpenStack Cloud Magnum Orchestration 7 (src):    kernel-default-4.4.121-92.73.1
Comment 14 Swamp Workflow Management 2018-05-11 16:13:22 UTC
SUSE-SU-2018:1217-1: An update that solves 7 vulnerabilities and has 93 fixes is now available.

Category: security (important)
Bug References: 1005778,1005780,1005781,1012382,1015336,1015337,1015340,1015342,1015343,1019695,1019699,1022604,1022743,1024296,1031717,1046610,1060799,1064206,1068032,1073059,1073069,1075091,1075428,1075994,1076033,1077560,1083125,1083574,1083745,1083836,1084223,1084310,1084328,1084353,1084452,1084610,1084699,1084721,1084829,1084889,1084898,1084914,1084918,1084967,1085042,1085058,1085185,1085224,1085383,1085402,1085404,1085487,1085507,1085511,1085679,1085958,1085981,1086015,1086162,1086194,1086357,1086499,1086518,1086607,1087088,1087211,1087231,1087260,1087274,1087659,1087845,1087906,1087999,1088050,1088087,1088242,1088267,1088313,1088324,1088600,1088684,1088865,1088871,1089198,1089608,1089644,1089752,1089925,802154,810912,812592,813453,880131,966170,966172,966186,966191,969476,969477,981348
CVE References: CVE-2017-18257,CVE-2018-10087,CVE-2018-10124,CVE-2018-1091,CVE-2018-7740,CVE-2018-8043,CVE-2018-8822
Sources used:
SUSE Linux Enterprise Real Time Extension 12-SP3 (src):    kernel-rt-4.4.128-3.11.1, kernel-rt_debug-4.4.128-3.11.1, kernel-source-rt-4.4.128-3.11.1, kernel-syms-rt-4.4.128-3.11.1
Comment 19 Swamp Workflow Management 2018-10-18 17:39:52 UTC
SUSE-SU-2018:1173-2: An update that solves 9 vulnerabilities and has 27 fixes is now available.

Category: security (important)
Bug References: 1012382,1031717,1046610,1057734,1070536,1075428,1076847,1077560,1082153,1082299,1083125,1083745,1083836,1084353,1084610,1084721,1084829,1085042,1085185,1085224,1085402,1085404,1086162,1086194,1087088,1087260,1087845,1088241,1088242,1088600,1088684,1089198,1089608,1089644,1089752,1090643
CVE References: CVE-2017-18257,CVE-2018-10087,CVE-2018-10124,CVE-2018-1087,CVE-2018-7740,CVE-2018-8043,CVE-2018-8781,CVE-2018-8822,CVE-2018-8897
Sources used:
SUSE Linux Enterprise Server 12-SP2-BCL (src):    kernel-default-4.4.121-92.73.1, kernel-source-4.4.121-92.73.1, kernel-syms-4.4.121-92.73.1, kgraft-patch-SLE12-SP2_Update_21-1-3.3.1