Bugzilla – Bug 1117833
Wayland: brief unresponsiveness and repeated keys
Last modified: 2020-06-04 13:05:20 UTC
The problem I'm observing is well described at https://forum.antergos.com/topic/9533/key-repeats-and-an-unresponsive-wayland . On a fresh TW installation with Wayland, the desktop briefly freezes during typing and when it comes back the last character is repeated many times. This is not isolated to a single application. I've seen this behavior with VirtualBox, Chromium, GNOME Terminal and GVim. TTTTTTTTThe system log is filling with messages like this: Nov 29 17:16:33 fmn org.gnome.Shell.desktop[1914]: Key repeat discarded, Wayland compositor doesn't seem to be processing events fast enough! Nov 29 17:16:33 fmn org.gnome.Shell.desktop[1914]: Key repeat discarded, Wayland compositor doesn't seem to be processing events fast enough! Nov 29 17:16:33 fmn org.gnome.Shell.desktop[1914]: libinput error: client bug: timer event1 keyboard: offset negative (-372ms) Nov 29 17:16:33 fmn org.gnome.Shell.desktop[1914]: Window manager warning: last_user_time (251082637) is greater than comparison timestamp (251082073). This most likely represents a buggy client sending inaccurate timestamps in messages such as _NET_ACTIVE_WINDOW. Trying to work around... Nov 29 17:16:33 fmn org.gnome.Shell.desktop[1914]: Window manager warning: 0x2000001 (Enter Bug:) appears to be one of the offending windows with a timestamp of 251082637. Working around... Hardware is Lenovo X260. More details can be provided upon request.
Hmm. Is this also reproducable on an Xsession?
(In reply to Stefan Dirsch from comment #1) > Hmm. Is this also reproducable on an Xsession? Any recipe to start Wayland based Xsession, Stefan? So far I know Xwayland needs to be started as the emulation layer but don't know how to combine it with "startx". Anyway, I gave a shot to Weston, and it works completely smooth. I suspect there is something in Gnome or Xwayland causing the glitches. I also found a hint in Arch Wiki (https://wiki.archlinux.org/index.php/wayland#Slow_motion,_graphical_glitches,_and_crashes) but I guess it's unrelated. Mentioning it anyway in case my guess is wrong.
In gdm you can select GNOME sessions running on Xorg or just a windowmanager, which does not support Wayland (like icewm, xfce).
I am trying to reproduce it on my laptop with Tumbleweed - so far it happened to me only once. It was in gnome-shell (wayland) and it happened inside xterm, so X application tunneling thru Xwayland. The applications you named - VirtualBox, Chromium, GNOME Terminal and GVim - I think they all use Xwayland except for GNOME Terminal, which should use wayland directly. I have also found this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1242210 It starts with the warning messages about timestamps and eventually talks also about repeated keys in Wayland. No solution though.
(In reply to Stefan Dirsch from comment #3) > In gdm you can select GNOME sessions running on Xorg or just a > windowmanager, which does not support Wayland (like icewm, xfce). Took some time to find the switch but I succeeded in starting an Xorg based session. I've been using it for the whole day today without a single repetition. There are also no libinput error messages in system log. If you have more debugging suggestions, I'l be happy to perform the data collection. (In reply to Michal Srb from comment #4) > I am trying to reproduce it on my laptop with Tumbleweed - so far it > happened to me only once. It was in gnome-shell (wayland) and it happened > inside xterm, so X application tunneling thru Xwayland. Thanks, Michal, for looking into it. On my side the issue appears in waves. Mostly there are no problems and only occasionally the system becomes sluggish and the key repeat trouble triggers. Looks like something is running in background but I haven't yet investigated. > The applications you named - VirtualBox, Chromium, GNOME Terminal and GVim - > I think they all use Xwayland except for GNOME Terminal, which should use > wayland directly. It is possible that also VirtualBox is using Wayland directly as it is Qt based and has the plugin built in.
I can confirm this problem. I have been noticing it for some time, though I never thought of reporting it as a bug. I mainly use KDE, and I don't think I have ever seen it happen there even with Plasma Wayland, though I suppose I should recheck. When I use Gnome, I always see this happen. I have only seen it once per session, at perhaps 2-5 minutes into the session. When using Gnome, I normally use Wayland. I don't recall whether I have see it with Xorg. When I have seen the problem, it has always been in an "xterm" which is my preferred terminal emulator. I do have an "xterm" starting as an automatic startup program when I login.
*** Bug 1119495 has been marked as a duplicate of this bug. ***
It seems the issue is hardware related somehow. I have two laptops with the same Tumbleweed installed. ThinkPad T460 has this issue, while T470p not. T460 with CPU: Intel i5-6300U and Intel HD Graphics 520 T470p: Intel i5-7440HQ and Intel HD Graphics 630 The behavior under wayland and XOrg is quite different. wayland session duplicates keypresses, while XOrg just has micro freezes and character appear with delay, but always only one. Other input devices also have such freezes: mouse and touchpad stop moving for a second. The cpu utilization on both laptops are fine and the issue seems not depending on the CPU load. Any ideas how to find the root cause of the issue?
The issue appeared on change gnome-shell version from 3.28 to 3.30, probably. But maybe it's related to different package changed about 2 month ago.
Can confirm, same behaviour on a Lenovo T480 on gnome-shell snapshot 20181212-0. X-Session: Freezes but no key repeat Wayland: Freezes with key repeat.
I found that both my laptops affected with the issue. The only difference that it's not so easy notice on T470p because of it's performance. The CPU is faster on T470p than on T460. And there are less such messages in the log on T470p than on T460: Dec 14 11:36:05 MYHOST org.gnome.Shell.desktop[2777]: libinput error: client bug: timer event21 debounce: offset negative (-13ms) Dec 14 11:36:05 MYHOST org.gnome.Shell.desktop[2777]: libinput error: client bug: timer event21 debounce short: offset negative (-26ms) Dec 14 11:36:24 MYHOST org.gnome.Shell.desktop[2777]: libinput error: client bug: timer event21 debounce: offset negative (-26ms)
I found old copy of Tumbleweed repo - 20181015, the newest one with GNOME 3.28 I found. And it seems different with this behavior. There are messages in the journal like this: Dec 26 16:08:53 MYHOST org.gnome.Shell.desktop[4090]: libinput error: client bug: timer event21 debounce: offset negative (-7ms) Dec 26 16:10:47 MYHOST org.gnome.Shell.desktop[4090]: libinput error: client bug: timer event21 debounce: offset negative (-6ms) Dec 26 16:10:47 MYHOST org.gnome.Shell.desktop[4090]: libinput error: client bug: timer event21 debounce short: offset negative (-19ms) Dec 26 16:11:47 MYHOST org.gnome.Shell.desktop[4090]: libinput error: client bug: timer event21 debounce: offset negative (-7ms) Dec 26 16:11:47 MYHOST org.gnome.Shell.desktop[4090]: libinput error: client bug: timer event21 debounce short: offset negative (-20ms) But there are no repeating keys in the interface and it seems more responsive. I didn't find the copy of the repo close to the time period the issue probably appeared. It's probably 20181022 or a couple of days prior. I suppose it's related to gnome-shell or mutter.
Can someone try booting the system with the kernel parameter "spectre_v2=off" and see if that helps at all? It helps mitigate this same problem on my system (as does disabling hyperthreading entirely from BIOS settings, if that is an option).
Updating summary so that it shows up in search
Thanks for the proposal, Atri. I suppose it could help, but in common case it can't be used as a solution. I'm currently trying the solution I found: https://gitlab.gnome.org/GNOME/mutter/merge_requests/168 I build mutter packages with OBS. The mutter in this repo is just a copy of the original one from Tumbleweed, the only difference is the patch applied. https://download.opensuse.org/repositories/home:/vzhestkov:/mutter/openSUSE_Factory/ It's hard to say if it works yet, as I just installed it on my system. But if somebody can try it also and give a feedback it will be easier to understand if it really works as a solution. Atri, I'll try to boot with this parameter later to identify if there any significant difference in behavior. But on my mind the most important thing in this case the difference of the behavior in XOrg and wayland. Wayland issue seems more annnnnnnoyiiiiiiing, while XOrg just has some lags with no key repeating.
The patch seems just slightly changes the issue. It's still exists, but probably the time of the lag less than with no patch. Under XOrg it's almost insensible. Now I'm trying with kernel parameters turning vulnerability mitigations off, as Atri proposed. Not sure yet, I need some more time for testing. But the messages about negative offset still in the journal.
Atri, it seems the system become slightly more responsive on vulnerabilities mitigation turning off, but the lags still persists, not so often and maybe slightly shorter. I suppose all input devices affected, mouse movement also friezes sometimes for unknown reason. There are no any CPU utilization this time. Sometimes gnome-shell process seems to use cpu with no reason.
(In reply to Atri Bhattacharya from comment #13) > Can someone try booting the system with the kernel parameter > "spectre_v2=off" and see if that helps at all? I booted 4.20.0-1-default with "spectre_v2=off" and there are no glitches. I'll reboot with the mitigation enabled to verify effects of the parameter. Thanks for the hint!
This might be related to bug#1112824
nosmt=force keeps popping up in reports as a possible solution so this could be the STIBP mitigation. Libor, can you boot with "spectre_v2_user=off" only and see if that fixes it too. Please upload full dmesg and $ grep . /sys/devices/system/cpu/vulnerabilities/* Thx.
Hi. It seems nosmt=force and disabling HT in BIOS is different in behaviour. Not sure, but system looks more responsive with nosmt=force. And spectre_v2_user=off really makes sense. /sys/devices/system/cpu/vulnerabilities/l1tf:Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT disabled /sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI /sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Mitigation: Speculative Store Bypass disabled via prctl and seccomp /sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: __user pointer sanitization /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Indirect Branch Restricted Speculation, IBPB: disabled, IBRS_FW, STIBP: disabled, RSB filling I'll also attach my dmesg output. Now I'm using 4.20.3-2.g4b478de-default kernel from https://download.opensuse.org/repositories/Kernel:/stable/standard/ and with no boot options it seems to work slightly better than 4.20.0 from TW 20190115. I've tried the other Linux distribs and it seems only openSUSE is affected while using GNOME, I don't see the issue with other WMs. Wayland behavior is different than in XOrg. bug#1112824 seems really the same issue.
Created attachment 795143 [details] dmesg output
(In reply to Victor Zhestkov from comment #21) > /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Indirect > Branch Restricted Speculation, IBPB: disabled, IBRS_FW, STIBP: disabled, RSB ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Well, you haz the skylake on which we force IBRS and that thing is not upstream. Which would mean that if you boot with spectre_v2=retpoline you should not be seeing those hickups either. IMHO of course. > I've tried the other Linux distribs and it seems only openSUSE is affected > while using GNOME, I don't see the issue with other WMs. That could be explained by the IBRS thing above. HTH.
(In reply to Borislav Petkov from comment #23) > (In reply to Victor Zhestkov from comment #21) > > /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Indirect > > Branch Restricted Speculation, IBPB: disabled, IBRS_FW, STIBP: disabled, RSB > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Well, you haz the skylake on which we force IBRS and that thing is not > upstream. > > Which would mean that if you boot with > > spectre_v2=retpoline > > you should not be seeing those hickups either. > > IMHO of course. Can confirm this on my Core i5-7200U machine. Performance is very much improved when using spectre_v2=retpoline, i.e. no hiccups on GNOME Wayland. Thanks for the tip.
(In reply to Atri Bhattacharya from comment #24) > Can confirm this on my Core i5-7200U machine. Performance is very much > improved when using spectre_v2=retpoline, i.e. no hiccups on GNOME Wayland. > > Thanks for the tip. Right, so IBRS we enable by default on SKL because retpolines don't protect 100% on that microarchitecture. Here's some blurb on the background: http://lkml.iu.edu/hypermail/linux/kernel/1801.2/05282.html So if people use the "=retpoline" setting on SKL, there would still be the theoretical possibility of running a spectre v2 exploit. BUT(!), the general consensus is that running such exploit is very very hard on a retpolines kernel. I'd say the decision whether one wants more security or decent performance is left to the end user who knows her/his setup best. For the future, we will most likely drop this IBRS hack in later kernels in favor of the the much leaner EIBRS which Intel have implemented in some microcode: http://git.kernel.org/linus/706d51681d636 HTH.
Some more data from me: Here is output about vulnerabilities from proc: grep . /sys/devices/system/cpu/vulnerabilities/* /sys/devices/system/cpu/vulnerabilities/l1tf:Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT disabled /sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI /sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Mitigation: Speculative Store Bypass disabled via prctl and seccomp /sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: __user pointer sanitization /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Indirect Branch Restricted Speculation, IBPB: conditional, IBRS_FW, STIBP: disabled, RSB filling The kernel is 4.20.0-1-default, TW 20190115 installed. But the laptop T470p with i5-7440HQ CPU, the system is much more responsive and looks not affected with this issue. I've never changed kernel boot options on this laptop. dmesg output is also attached.
Bugzilla not allowing me to attach the file. It returns that file was not specified. Here is the link to google drive: https://drive.google.com/file/d/13W1UUdSl3eelLgnQ30XH0cB-bcCh4dD9/view?usp=sharing
There is significant difference between my T460 and T470p, T470p has no HT at all, just 4 cores while T460 two physical cores and HT.
Yes, i5-7440HQ is Kabylake and apparently doesn't have SMT: https://ark.intel.com/products/97459/Intel-Core-i5-7440HQ-Processor-6M-Cache-up-to-3_80-GHz but actual 4 cores which are not hyperthreaded (or HT is disabled). So no prediction sharing between HT threads. VS i5-6300U which is SMT: 2 cores with 2 threads in each core: https://ark.intel.com/products/88190/Intel-Core-i5-6300U-Processor-3M-Cache-up-to-3-00-GHz-
Boris, thanks. I see. But if I run XOrg and other WM than GNOME, there is no such issue. I suppose it's not the only issue related to the vulnerability mitigation, but GNOME also has some issue or issues. Not sure but it could be gnome-shell or gjs or both. Sometimes there is rather high CPU utilization by gnome-shell and some messages related to gjs. Probably the issue in GNOME become more noticeable with vulnerability mitigation issue in the kernel. And what is the current recommendation about this issue? Avoid using HT and set spectre_v2=retpoline, right? I've already tried to turn off all mitigations before, but didn't turn off HT at that time and as I remember the issue was noticeable on such attempt. And the other question about vulnerabilities mitigation. What is the best benchmark to check the difference in performance with different kernel options? I've seen some benchmarks, but not sure if it was a good idea just to measure the time of context switches as one of the tests do.
(In reply to Libor Pechacek from comment #18) > I booted 4.20.0-1-default with "spectre_v2=off" and there are no glitches. > I'll reboot with the mitigation enabled to verify effects of the parameter. This is now confirmed. Booting kernel with "spectre_v2=off" eliminates the key repetitions on my system.
(In reply to Libor Pechacek from comment #32) > This is now confirmed. Booting kernel with "spectre_v2=off" eliminates the > key repetitions on my system. Try spectre_v2=retpoline if you have a skylake.
I've got i7-6600U (SKL) CPU and spectre_v2=retpoline seems to help. No disturbing key repetitions or desktop freezes so far. System journal lists occasional complaints about negative time offsets from libinput though.
(In reply to Jiri Kosina from comment #19) > This might be related to bug#1112824 Having read the mentioned bug, I confirm that bug 1112824, comment 103 (last paragraph) and bug 1112824, comment 115 do describe behavior I'm seeing.