Bug 1117833 - Wayland: brief unresponsiveness and repeated keys
Wayland: brief unresponsiveness and repeated keys
Status: CONFIRMED
: 1119495 (view as bug list)
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: GNOME
Current
x86-64 Linux
: P5 - None : Major with 7 votes (vote)
: ---
Assigned To: E-mail List
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-11-29 16:20 UTC by Libor Pechacek
Modified: 2020-06-04 13:05 UTC (History)
22 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
dmesg output (68.67 KB, text/plain)
2019-01-23 10:55 UTC, Victor Zhestkov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Libor Pechacek 2018-11-29 16:20:35 UTC
The problem I'm observing is well described at https://forum.antergos.com/topic/9533/key-repeats-and-an-unresponsive-wayland .

On a fresh TW installation with Wayland, the desktop briefly freezes during typing and when it comes back the last character is repeated many times. This is not isolated to a single application. I've seen this behavior with VirtualBox, Chromium, GNOME Terminal and GVim. TTTTTTTTThe system log is filling with messages like this:
Nov 29 17:16:33 fmn org.gnome.Shell.desktop[1914]: Key repeat discarded, Wayland compositor doesn't seem to be processing events fast enough!
Nov 29 17:16:33 fmn org.gnome.Shell.desktop[1914]: Key repeat discarded, Wayland compositor doesn't seem to be processing events fast enough!
Nov 29 17:16:33 fmn org.gnome.Shell.desktop[1914]: libinput error: client bug: timer event1 keyboard: offset negative (-372ms)
Nov 29 17:16:33 fmn org.gnome.Shell.desktop[1914]: Window manager warning: last_user_time (251082637) is greater than comparison timestamp (251082073).  This most likely represents a buggy client sending inaccurate timestamps in messages such as _NET_ACTIVE_WINDOW.  Trying to work around...
Nov 29 17:16:33 fmn org.gnome.Shell.desktop[1914]: Window manager warning: 0x2000001 (Enter Bug:) appears to be one of the offending windows with a timestamp of 251082637.  Working around...

Hardware is Lenovo X260. More details can be provided upon request.
Comment 1 Stefan Dirsch 2018-11-30 14:03:45 UTC
Hmm. Is this also reproducable on an Xsession?
Comment 2 Libor Pechacek 2018-12-05 09:07:25 UTC
(In reply to Stefan Dirsch from comment #1)
> Hmm. Is this also reproducable on an Xsession?

Any recipe to start Wayland based Xsession, Stefan? So far I know Xwayland needs to be started as the emulation layer but don't know how to combine it with "startx".

Anyway, I gave a shot to Weston, and it works completely smooth. I suspect there is something in Gnome or Xwayland causing the glitches.

I also found a hint in Arch Wiki (https://wiki.archlinux.org/index.php/wayland#Slow_motion,_graphical_glitches,_and_crashes) but I guess it's unrelated. Mentioning it anyway in case my guess is wrong.
Comment 3 Stefan Dirsch 2018-12-05 11:04:47 UTC
In gdm you can select GNOME sessions running on Xorg or just a windowmanager, which does not support Wayland (like icewm, xfce).
Comment 4 Michal Srb 2018-12-06 16:21:39 UTC
I am trying to reproduce it on my laptop with Tumbleweed - so far it happened to me only once. It was in gnome-shell (wayland) and it happened inside xterm, so X application tunneling thru Xwayland.

The applications you named - VirtualBox, Chromium, GNOME Terminal and GVim - I think they all use Xwayland except for GNOME Terminal, which should use wayland directly.

I have also found this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1242210

It starts with the warning messages about timestamps and eventually talks also about repeated keys in Wayland. No solution though.
Comment 5 Libor Pechacek 2018-12-07 15:31:41 UTC
(In reply to Stefan Dirsch from comment #3)
> In gdm you can select GNOME sessions running on Xorg or just a
> windowmanager, which does not support Wayland (like icewm, xfce).

Took some time to find the switch but I succeeded in starting an Xorg based session. I've been using it for the whole day today without a single repetition. There are also no libinput error messages in system log.

If you have more debugging suggestions, I'l be happy to perform the data collection.

(In reply to Michal Srb from comment #4)
> I am trying to reproduce it on my laptop with Tumbleweed - so far it
> happened to me only once. It was in gnome-shell (wayland) and it happened
> inside xterm, so X application tunneling thru Xwayland.

Thanks, Michal, for looking into it. On my side the issue appears in waves. Mostly there are no problems and only occasionally the system becomes sluggish and the key repeat trouble triggers. Looks like something is running in background but I haven't yet investigated.

> The applications you named - VirtualBox, Chromium, GNOME Terminal and GVim -
> I think they all use Xwayland except for GNOME Terminal, which should use
> wayland directly.

It is possible that also VirtualBox is using Wayland directly as it is Qt based and has the plugin built in.
Comment 6 Neil Rickert 2018-12-13 00:26:57 UTC
I can confirm this problem.  I have been noticing it for some time, though I never thought of reporting it as a bug.

I mainly use KDE, and I don't think I have ever seen it happen there even with Plasma Wayland, though I suppose I should recheck.

When I use Gnome, I always see this happen.  I have only seen it once per session, at perhaps 2-5 minutes into the session.  When using Gnome, I normally use Wayland.  I don't recall whether I have see it with Xorg.

When I have seen the problem, it has always been in an "xterm" which is my preferred terminal emulator.  I do have an "xterm" starting as an automatic startup program when I login.
Comment 7 Victor Zhestkov 2018-12-15 21:07:35 UTC
*** Bug 1119495 has been marked as a duplicate of this bug. ***
Comment 8 Victor Zhestkov 2018-12-16 10:52:21 UTC
It seems the issue is hardware related somehow. I have two laptops with the same Tumbleweed installed. ThinkPad T460 has this issue, while T470p not.
T460 with CPU: Intel i5-6300U and Intel HD Graphics 520
T470p: Intel i5-7440HQ and Intel HD Graphics 630

The behavior under wayland and XOrg is quite different.
wayland session duplicates keypresses, while XOrg just has micro freezes and character appear with delay, but always only one.
Other input devices also have such freezes: mouse and touchpad stop moving for a second.

The cpu utilization on both laptops are fine and the issue seems not depending on the CPU load.

Any ideas how to find the root cause of the issue?
Comment 9 Victor Zhestkov 2018-12-16 11:01:58 UTC
The issue appeared on change gnome-shell version from 3.28 to 3.30, probably.
But maybe it's related to different package changed about 2 month ago.
Comment 10 Dennis Irrgang 2018-12-17 08:46:23 UTC
Can confirm, same behaviour on a Lenovo T480 on gnome-shell snapshot 20181212-0.

X-Session: Freezes but no key repeat
Wayland: Freezes with key repeat.
Comment 11 Victor Zhestkov 2018-12-20 09:14:16 UTC
I found that both my laptops affected with the issue.
The only difference that it's not so easy notice on T470p because of it's performance. The CPU is faster on T470p than on T460. And there are less such messages in the log on T470p than on T460:
Dec 14 11:36:05 MYHOST org.gnome.Shell.desktop[2777]: libinput error: client bug: timer event21 debounce: offset negative (-13ms)
Dec 14 11:36:05 MYHOST org.gnome.Shell.desktop[2777]: libinput error: client bug: timer event21 debounce short: offset negative (-26ms)
Dec 14 11:36:24 MYHOST org.gnome.Shell.desktop[2777]: libinput error: client bug: timer event21 debounce: offset negative (-26ms)
Comment 12 Victor Zhestkov 2018-12-26 13:18:14 UTC
I found old copy of Tumbleweed repo - 20181015, the newest one with GNOME 3.28 I found. And it seems different with this behavior. There are messages in the journal like this:
Dec 26 16:08:53 MYHOST org.gnome.Shell.desktop[4090]: libinput error: client bug: timer event21 debounce: offset negative (-7ms)
Dec 26 16:10:47 MYHOST org.gnome.Shell.desktop[4090]: libinput error: client bug: timer event21 debounce: offset negative (-6ms)
Dec 26 16:10:47 MYHOST org.gnome.Shell.desktop[4090]: libinput error: client bug: timer event21 debounce short: offset negative (-19ms)
Dec 26 16:11:47 MYHOST org.gnome.Shell.desktop[4090]: libinput error: client bug: timer event21 debounce: offset negative (-7ms)
Dec 26 16:11:47 MYHOST org.gnome.Shell.desktop[4090]: libinput error: client bug: timer event21 debounce short: offset negative (-20ms)
But there are no repeating keys in the interface and it seems more responsive.
I didn't find the copy of the repo close to the time period the issue probably appeared. It's probably 20181022 or a couple of days prior.
I suppose it's related to gnome-shell or mutter.
Comment 13 Atri Bhattacharya 2019-01-15 15:44:17 UTC
Can someone try booting the system with the kernel parameter "spectre_v2=off" and see if that helps at all? It helps mitigate this same problem on my system (as does disabling hyperthreading entirely from BIOS settings, if that is an option).
Comment 14 Atri Bhattacharya 2019-01-15 19:32:57 UTC
Updating summary so that it shows up in search
Comment 15 Victor Zhestkov 2019-01-16 07:20:42 UTC
Thanks for the proposal, Atri.
I suppose it could help, but in common case it can't be used as a solution.
I'm currently trying the solution I found: https://gitlab.gnome.org/GNOME/mutter/merge_requests/168
I build mutter packages with OBS. The mutter in this repo is just a copy of the original one from Tumbleweed, the only difference is the patch applied.
https://download.opensuse.org/repositories/home:/vzhestkov:/mutter/openSUSE_Factory/

It's hard to say if it works yet, as I just installed it on my system.
But if somebody can try it also and give a feedback it will be easier to understand if it really works as a solution.

Atri, I'll try to boot with this parameter later to identify if there any significant difference in behavior.

But on my mind the most important thing in this case the difference of the behavior in XOrg and wayland. Wayland issue seems more annnnnnnoyiiiiiiing, while XOrg just has some lags with no key repeating.
Comment 16 Victor Zhestkov 2019-01-16 11:40:48 UTC
The patch seems just slightly changes the issue. It's still exists, but probably the time of the lag less than with no patch. Under XOrg it's almost insensible.
Now I'm trying with kernel parameters turning vulnerability mitigations off, as Atri proposed. Not sure yet, I need some more time for testing.
But the messages about negative offset still in the journal.
Comment 17 Victor Zhestkov 2019-01-17 13:11:05 UTC
Atri, it seems the system become slightly more responsive on vulnerabilities mitigation turning off, but the lags still persists, not so often and maybe slightly shorter. I suppose all input devices affected, mouse movement also friezes sometimes for unknown reason. There are no any CPU utilization this time. Sometimes gnome-shell process seems to use cpu with no reason.
Comment 18 Libor Pechacek 2019-01-22 18:37:26 UTC
(In reply to Atri Bhattacharya from comment #13)
> Can someone try booting the system with the kernel parameter
> "spectre_v2=off" and see if that helps at all?

I booted 4.20.0-1-default with "spectre_v2=off" and there are no glitches. I'll reboot with the mitigation enabled to verify effects of the parameter. Thanks for the hint!
Comment 19 Jiri Kosina 2019-01-22 18:48:22 UTC
This might be related to bug#1112824
Comment 20 Borislav Petkov 2019-01-22 22:43:22 UTC
nosmt=force keeps popping up in reports as a possible solution so this could be the STIBP mitigation.

Libor, can you boot with

"spectre_v2_user=off"

only and see if that fixes it too.

Please upload full dmesg and

$ grep . /sys/devices/system/cpu/vulnerabilities/*

Thx.
Comment 21 Victor Zhestkov 2019-01-23 10:54:43 UTC
Hi.
It seems nosmt=force and disabling HT in BIOS is different in behaviour. Not sure, but system looks more responsive with nosmt=force.
And spectre_v2_user=off really makes sense.
/sys/devices/system/cpu/vulnerabilities/l1tf:Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT disabled
/sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Mitigation: Speculative Store Bypass disabled via prctl and seccomp
/sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: __user pointer sanitization
/sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Indirect Branch Restricted Speculation, IBPB: disabled, IBRS_FW, STIBP: disabled, RSB filling

I'll also attach my dmesg output. Now I'm using 4.20.3-2.g4b478de-default kernel from https://download.opensuse.org/repositories/Kernel:/stable/standard/ and with no boot options it seems to work slightly better than 4.20.0 from TW 20190115.

I've tried the other Linux distribs and it seems only openSUSE is affected while using GNOME, I don't see the issue with other WMs.
Wayland behavior is different than in XOrg.
bug#1112824 seems really the same issue.
Comment 22 Victor Zhestkov 2019-01-23 10:55:19 UTC
Created attachment 795143 [details]
dmesg output
Comment 23 Borislav Petkov 2019-01-23 12:16:22 UTC
(In reply to Victor Zhestkov from comment #21)
> /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Indirect
> Branch Restricted Speculation, IBPB: disabled, IBRS_FW, STIBP: disabled, RSB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Well, you haz the skylake on which we force IBRS and that thing is not
upstream.

Which would mean that if you boot with

spectre_v2=retpoline

you should not be seeing those hickups either.

IMHO of course.

> I've tried the other Linux distribs and it seems only openSUSE is affected
> while using GNOME, I don't see the issue with other WMs.

That could be explained by the IBRS thing above.

HTH.
Comment 24 Atri Bhattacharya 2019-01-23 12:55:51 UTC
(In reply to Borislav Petkov from comment #23)
> (In reply to Victor Zhestkov from comment #21)
> > /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Indirect
> > Branch Restricted Speculation, IBPB: disabled, IBRS_FW, STIBP: disabled, RSB
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Well, you haz the skylake on which we force IBRS and that thing is not
> upstream.
> 
> Which would mean that if you boot with
> 
> spectre_v2=retpoline
> 
> you should not be seeing those hickups either.
> 
> IMHO of course.

Can confirm this on my Core i5-7200U machine. Performance is very much improved when using spectre_v2=retpoline, i.e. no hiccups on GNOME Wayland.

Thanks for the tip.
Comment 25 Borislav Petkov 2019-01-23 14:35:09 UTC
(In reply to Atri Bhattacharya from comment #24)
> Can confirm this on my Core i5-7200U machine. Performance is very much
> improved when using spectre_v2=retpoline, i.e. no hiccups on GNOME Wayland.
> 
> Thanks for the tip.

Right, so IBRS we enable by default on SKL because retpolines don't protect 100% on that microarchitecture. Here's some blurb on the background:

http://lkml.iu.edu/hypermail/linux/kernel/1801.2/05282.html

So if people use the "=retpoline" setting on SKL, there would still be the theoretical possibility of running a spectre v2 exploit.

BUT(!), the general consensus is that running such exploit is very very hard on a retpolines kernel. I'd say the decision whether one wants more security or decent performance is left to the end user who knows her/his setup best.

For the future, we will most likely drop this IBRS hack in later kernels in favor of the the much leaner EIBRS which Intel have implemented in some microcode:

http://git.kernel.org/linus/706d51681d636

HTH.
Comment 26 Victor Zhestkov 2019-01-23 17:36:25 UTC
Some more data from me:
Here is output about vulnerabilities from proc:
grep . /sys/devices/system/cpu/vulnerabilities/*
/sys/devices/system/cpu/vulnerabilities/l1tf:Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT disabled
/sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Mitigation: Speculative Store Bypass disabled via prctl and seccomp
/sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: __user pointer sanitization
/sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Indirect Branch Restricted Speculation, IBPB: conditional, IBRS_FW, STIBP: disabled, RSB filling
The kernel is 4.20.0-1-default, TW 20190115 installed.
But the laptop T470p with i5-7440HQ CPU, the system is much more responsive and looks not affected with this issue.
I've never changed kernel boot options on this laptop.
dmesg output is also attached.
Comment 27 Victor Zhestkov 2019-01-23 17:42:32 UTC
Bugzilla not allowing me to attach the file.
It returns that file was not specified.
Here is the link to google drive: https://drive.google.com/file/d/13W1UUdSl3eelLgnQ30XH0cB-bcCh4dD9/view?usp=sharing
Comment 28 Victor Zhestkov 2019-01-23 17:50:05 UTC
There is significant difference between my T460 and T470p, T470p has no HT at all, just 4 cores while T460 two physical cores and HT.
Comment 29 Borislav Petkov 2019-01-23 18:24:38 UTC
Yes, i5-7440HQ is Kabylake and apparently doesn't have SMT:

https://ark.intel.com/products/97459/Intel-Core-i5-7440HQ-Processor-6M-Cache-up-to-3_80-GHz

but actual 4 cores which are not hyperthreaded (or HT is disabled).

So no prediction sharing between HT threads.

VS i5-6300U which is SMT: 2 cores with 2 threads in each core:

https://ark.intel.com/products/88190/Intel-Core-i5-6300U-Processor-3M-Cache-up-to-3-00-GHz-
Comment 30 Victor Zhestkov 2019-01-23 19:25:12 UTC
Boris, thanks. I see.
But if I run XOrg and other WM than GNOME, there is no such issue.
I suppose it's not the only issue related to the vulnerability mitigation, but GNOME also has some issue or issues. Not sure but it could be gnome-shell or gjs or both. Sometimes there is rather high CPU utilization by gnome-shell and some messages related to gjs. Probably the issue in GNOME become more noticeable with vulnerability mitigation issue in the kernel.

And what is the current recommendation about this issue? Avoid using HT and set spectre_v2=retpoline, right?
I've already tried to turn off all mitigations before, but didn't turn off HT at that time and as I remember the issue was noticeable on such attempt.

And the other question about vulnerabilities mitigation. What is the best benchmark to check the difference in performance with different kernel options? I've seen some benchmarks, but not sure if it was a good idea just to measure the time of context switches as one of the tests do.
Comment 32 Libor Pechacek 2019-01-24 14:56:57 UTC
(In reply to Libor Pechacek from comment #18)
> I booted 4.20.0-1-default with "spectre_v2=off" and there are no glitches.
> I'll reboot with the mitigation enabled to verify effects of the parameter.

This is now confirmed. Booting kernel with "spectre_v2=off" eliminates the key repetitions on my system.
Comment 33 Borislav Petkov 2019-01-24 15:27:57 UTC
(In reply to Libor Pechacek from comment #32)
> This is now confirmed. Booting kernel with "spectre_v2=off" eliminates the
> key repetitions on my system.

Try spectre_v2=retpoline if you have a skylake.
Comment 34 Libor Pechacek 2019-01-31 09:22:41 UTC
I've got i7-6600U (SKL) CPU and spectre_v2=retpoline seems to help. No disturbing key repetitions or desktop freezes so far. System journal lists occasional complaints about negative time offsets from libinput though.
Comment 35 Libor Pechacek 2019-02-05 14:26:49 UTC
(In reply to Jiri Kosina from comment #19)
> This might be related to bug#1112824

Having read the mentioned bug, I confirm that bug 1112824, comment 103 (last paragraph) and bug 1112824, comment 115 do describe behavior I'm seeing.