Bug 1062904

Summary: Latest Tumbleweed update adversely impacting qemu-system-x86_64 -display sdl usage
Product: [openSUSE] openSUSE Tumbleweed Reporter: Bruce Rogers <brogers>
Component: GNOMEAssignee: E-mail List <gnome-bugs>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P3 - Medium CC: denis.kondratenko, msrb, sreeves
Version: Current   
Target Milestone: ---   
Hardware: x86-64   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Simplified reproducer

Description Bruce Rogers 2017-10-11 22:27:41 UTC
I've noticed a recent issue with running QEMU with the sdl display interface. The issue seems to have begun with the 3.26.x update to gnome. I am not able to reproduce this if wayland is being used instead of X. Also, I can't reproduce with other windows manager, such as icewm or plasma5.

QEMU is able to do direct display of guest screens using SDL 1.2, 2.0, and GTK 2.0 and GTK 3.0, along with indirect protocols such as vnc and spice. I've built QEMU for each of these and the only problem appears to be when using SDL 2.0.

The reproducer is on Tumbleweed, with latest updates, as follows:

tux: qemu-system-x86_64 -display sdl

Normally you would see a guest screen with BIOS related boot activity, but in the failing instance, the desktop windows "flash", and no guest screen appears. I've instrumented into SDL and find the code stuck in SDL's src/video/x11/SDL_x11window.c:X11_ShowWindow() where it calls XIfEvent(). I find the callback passed to that function repeatedly called with the event "GenericEvent" at this point and the XifEvent call never returns.

Here is some additional log info which occurs at this time as well:

dmesg -T contains this tidbit:
[Wed Oct 11 16:07:32 2017] traps: gnome-shell[21688] trap int3 ip:7f8aaaa399c1 sp:7ffefacb66e0 error:0

and journalctl output has the following:
Oct 11 16:07:33 brogers1.provo.novell.com gnome-shell[21688]: The program 'gnome-shell' received an X Window System error.
                                                              This probably reflects a bug in the program.
                                                              The error was 'BadWindow (invalid Window parameter)'.
                                                                (Details: serial 9274 error_code 3 request_code 18 (core protocol) minor_code 0)
                                                                (Note to programmers: normally, X errors are reported asynchronously;
                                                                 that is, you will receive the error a while after causing it.
                                                                 To debug your program, run it with the GDK_SYNCHRONIZE environment
                                                                 variable to change this behavior. You can then get a meaningful
                                                                 backtrace from your debugger if you break on the gdk_x_error() function.)
Oct 11 16:07:34 brogers1.provo.novell.com /usr/lib/gdm/gdm-x-session[21615]: gnome-session-binary[21624]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 5
Oct 11 16:07:34 brogers1.provo.novell.com gnome-session-binary[21624]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 5

I'll start with our X expert to see what he thinks of this from the X perspective.
Comment 1 Stefan Dirsch 2017-10-12 03:33:50 UTC
Could this be related to the security fixes added recently to the Xserver?

- CVE-2017-10971: Fix endianess handling of GenericEvent to prevent a
     stack overflow by clients. (bnc#1035283)
   - Make sure the type of all events to be sent by ProcXSendExtensionEvent
     are in the allowed range.
   - CVE-2017-10972: Initialize the xEvent eventT with zeros to avoid
     information leakage.

Adding Michal Srb.

It looks like a GNOME bug.
Comment 2 Stefan Dirsch 2017-10-12 03:43:08 UTC
> It looks like a GNOME bug.
Or a bug in qemu's SDL Display video output or in SDL library itself? Now triggered by the security update.

Michal? Could this be?
Comment 3 Michal Srb 2017-10-12 07:58:06 UTC
I was able to reproduce it in my virtual machine. So far I don't think it is related to the security fixes. It seem to me like a bug in gnome-shell so far.

Gnome-shell terminates after receiving BadWindow error for a ChangeProperty request.

This is what I observed happening:
1) qemu creates a window (one of several) and sets it up
2) gnome-shell notices the window, queries and sets various properties on it
3) qemu destroys that window
4) gnome-shell attempts to change another property on that window
5) gnome-shell receives BadWindow error and terminates

I'll debug deeper.
Comment 4 Michal Srb 2017-10-12 08:00:13 UTC
One more note: It does NOT happen when gnome-shell is started with GDK_SYNCHRONIZE=1. Probably because of different timing.
Comment 5 Michal Srb 2017-10-12 11:10:56 UTC
Created attachment 744068 [details]
Simplified reproducer

SDL creates the main window, then finds out that it should have created it differently, destroys it and creates another one. That is bit strange behavior, but not illegal.

The attached program does the same thing, It creates a window and destroys it after short delay. It is able to crash gnome-shell the same way.

In my tests, gnome-shell always attempts to set _NET_WM_STATE=_NET_WM_STATE_FOCUSED and _GTK_EDGE_CONSTRAINTS=0xaa properties after the window was destroyed. I don't know why it did not happen in older versions of gnome-shell. It is possible that there was some unrelated change that affected the timing.

Setting properties of windows belonging to another application is always racy, it can not be avoided. Gnome-shell should be able to handle errors instead of aborting. If you run the same test under kwin, you can see errors reported in ~/.xsession-errors-:0, but kwin never aborts.
Comment 6 Michal Srb 2017-10-12 11:12:09 UTC
Reassigning to Gnome. It may be possible to bisect the problem. It may be worth reporting upstream.
Comment 7 Denis Kondratenko 2017-10-20 11:34:19 UTC
have same issue, also probably same as bug #1062269
Comment 8 Michal Srb 2017-10-20 11:57:18 UTC
I have reported the bug upstream:
https://bugzilla.gnome.org/show_bug.cgi?id=789242
Comment 9 Denis Kondratenko 2017-10-20 12:49:45 UTC
https://bugzilla.gnome.org/show_bug.cgi?id=788666
looks like here there is a fix for it.
Comment 10 Denis Kondratenko 2017-10-23 12:31:42 UTC
looks like bug #1062269 was fixed, but with new packages from GNOME:Factory for gnome-shell and mutter - I still see issue with qemu.
Comment 11 Bruce Rogers 2017-10-24 15:14:24 UTC
Both my initial reproducer and the simplified reproducer attached seem to work now, with the 20171022 snapshot release of tumbleweed.

Thanks to all who helped!