Bug 1219970

Summary: [gnome-shell] Segfault when using libgda on Wayland and X11
Product: [openSUSE] openSUSE Tumbleweed Reporter: Pavin Joseph <me>
Component: GNOMEAssignee: E-mail List <gnome-bugs>
Status: NEW --- QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P2 - High CC: ailin.nemui, alynx.zhou, vliaskovitis, yfjiang
Version: Current   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE Tumbleweed   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: coredump_gnome-shell
coredump from gnome-shell-45.3-2.2.x86_64

Description Pavin Joseph 2024-02-15 13:37:04 UTC
Overview:
gnome-shell is segfaulting and crashing when using libgda in an extension (simple emoji extension using SQLite to retrieve data).
Issue happens with both Wayland and X11 sessions.
Issue reproduced when tested against freshly installed, fully updated instance of Tumbleweed 20240213.
Issue does not happen when tested against the following distributions: Fedora 39 Silverblue, Arch Linux

Steps to Reproduce:
1. zypper in libgda-6_0-6_0_0 libgda-6_0-tools typelib-1_0-Gda-6_0
2. git clone https://github.com/pavinjosdev/Emoji-Copy.git
3. cd Emoji-Copy/
4. python3 build/parser.py
5. ./install.sh
6. Logout of Wayland session and log back in
7. Open "Extensions" GUI app and enable Emoji-Copy

Actual Results:
Gnome Wayland session crashes and logs me out, on subsequent login all extensions are disabled.
Gnome X11 session crashes and produces an error message which itself is unresponsive. Cannot login with an X11 session until extensions are disabled using a Wayland session.

Expected Results:
Gnome on Wayland and X11 should not crash on enabling an extension that uses libgda.

Additional information:
1. systemd-journal errors:
Feb 15 17:49:19 localhost.localdomain systemd-coredump[2742]: [\U0001f855] Process 1279 (gnome-shell) of user 1000 dumped core.
Stack trace of thread 2740:
#0  0x0000000000000000 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64

2. Coredump info:
           PID: 1279 (gnome-shell)
           UID: 1000 (pavin)
           GID: 1000 (pavin)
        Signal: 11 (SEGV)
     Timestamp: Thu 2024-02-15 17:49:17 IST (4min 58s ago)
  Command Line: /usr/bin/gnome-shell
    Executable: /usr/bin/gnome-shell
 Control Group: /user.slice/user-1000.slice/user@1000.service/session.slice/org.gnome.Shell@wayland.service
          Unit: user@1000.service
     User Unit: org.gnome.Shell@wayland.service
         Slice: user-1000.slice
     Owner UID: 1000 (pavin)
       Boot ID: 769889a87873473e892d3642aebca31f
    Machine ID: e792f12cd59b4a85a8e048fcdd9e49f8
      Hostname: localhost.localdomain
       Storage: /var/lib/systemd/coredump/core.gnome-shell.1000.769889a87873473e892d3642aebca31f.1279.1707999557000000.zst (present)
  Size on Disk: 21.5M
       Message: Process 1279 (gnome-shell) of user 1000 dumped core.
                
                Stack trace of thread 2740:
                #0  0x0000000000000000 n/a (n/a + 0x0)
                ELF object binary architecture: AMD x86-64

3. Full coredump attached.
Comment 1 Pavin Joseph 2024-02-15 13:39:25 UTC
Created attachment 872762 [details]
coredump_gnome-shell
Comment 2 Pavin Joseph 2024-02-24 13:06:00 UTC
Sorry, the steps to reproduce issue doesn't work anymore as the extension code was updated to use sql.js library instead of relying of libgda.

Using libgda and sqlite to make a query is enough to trigger the bug and segfault.
Comment 3 Ailin Nemui 2024-02-27 09:00:55 UTC
can you load the coredump into gdb and request a backtrace (bt full)?
Comment 4 Pavin Joseph 2024-02-27 10:13:26 UTC
Sorry, I'm not familiar with using gdb. Getting this output:

[Thread debugging using libthread_db enabled]                                                                                                     
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/gnome-shell'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
[Current thread is 1 (Thread 0x7fd3be6006c0 (LWP 3798))]
(gdb) bt full
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#1  0x00007fd3ec035674 in ?? () from /usr/lib64/libsqlite3.so.0
No symbol table info available.
#2  0x00007fd3ec0e934b in ?? () from /usr/lib64/libsqlite3.so.0
No symbol table info available.
#3  0x00007fd3ec26fb7d in ?? () from /lib64/libgda-6.0.so.6.0.0
No symbol table info available.
#4  0x00007fd3ec23a975 in ?? () from /lib64/libgda-6.0.so.6.0.0
No symbol table info available.
#5  0x00007fd3ec298b85 in ?? () from /lib64/libgda-6.0.so.6.0.0
No symbol table info available.
#6  0x00007fd40cd5349e in ?? () from /lib64/libglib-2.0.so.0
No symbol table info available.
#7  0x00007fd40c292bb2 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#8  0x00007fd40c31400c in clone3 () from /lib64/libc.so.6
No symbol table info available.
Comment 5 Vasilis Liaskovitis 2024-02-27 13:29:07 UTC
Created attachment 873039 [details]
coredump from gnome-shell-45.3-2.2.x86_64

I am attaching a compressed gnome-shell coredump from my Tumbleweed (out-of-date) with gnome-shell-45.3-2.2. 

The coredump was produced a few weeks ago with the original steps to reproduce in comment#0. According to comment#0 the steps are not longer valid. I have not reproduced another way yet.

From the backtrace it looks like an issue with sqlite3 memory allocation. 

gdb /usr/bin/gnome-shell core.gnome-shell.1000.12b034247d384e09b0b3637d9224839e.21523.1708076809000000

(gdb) bt
#0  0x0000000000000000 in  ()
#1  0x00007f8aa802f674 in mallocWithAlarm (pp=<synthetic pointer>, n=824) at /usr/src/debug/sqlite-src-3440200/sqlite3.c:30055
#2  sqlite3Malloc (n=824) at /usr/src/debug/sqlite-src-3440200/sqlite3.c:30120
#3  sqlite3Malloc (n=824) at /usr/src/debug/sqlite-src-3440200/sqlite3.c:30114
#4  0x00007f8aa80e334b in sqlite3MallocZero (n=824) at /usr/src/debug/sqlite-src-3440200/sqlite3.c:30399
#5  openDatabase (zFilename=0x7f8ae8001d70 "/home/vliaskovitis/.local/share/gnome-shell/extensions/emoji-copy@felipeftn/data/emojis.db", ppDb=0x7f8ae8001030, flags=<optimized out>, zVfs=0x0) at /usr/src/debug/sqlite-src-3440200/sqlite3.c:49832
#6  0x00007f8a52a41b7d in gda_sqlite_provider_open_connection (provider=0x5604a75609d0 [GdaSqliteProvider], cnc=0x5604a71f6f20 [GdaConnection], params=<optimized out>, auth=<optimized out>) at ../libgda/sqlite/gda-sqlite-provider.c:1216
#7  0x00007f8a52a0c975 in worker_open_connection (data=0x5604a73ac2d0, error=0x5604a70a6ca8) at ../libgda/gda-server-provider.c:2097
#8  0x00007f8a52a6ab85 in worker_thread_main (worker=0x5604aa579050) at ../libgda/thread-wrapper/gda-worker.c:213
#9  0x00007f8b092f148e in g_thread_proxy (data=0x5604a8165680) at ../glib/gthread.c:831
#10 0x00007f8b0888ff44 in start_thread () at /lib64/libc.so.6
#11 0x00007f8b089184cc in clone3 () at /lib64/libc.so.6

In any case libgda should be able to handle the error, without crashing the whole desktop.

I think this is likely the same bug as GNOME/libgda upstream issue:

https://gitlab.gnome.org/GNOME/libgda/-/issues/267

Discussion and core analysis should probably be continued there, as it looks like this is an unsolved upstream issue, not only affecting downstream.
Comment 6 Pavin Joseph 2024-03-12 10:04:39 UTC
Related OpenSuse bug (sorry I missed it before creating this report):
https://bugzilla.opensuse.org/show_bug.cgi?id=1213339

Not sure why the problem affects only OpenSuse. I see Fedora is also using the same version of libgda but it's not affected. Arch is still using the stable 5.x series and it also does not have the problem.
Perhaps the Fedora team has some patches that prevent this bug?