Bug 1214043

Summary: Slow startup in Firefox on OpenSUSE, 3 Lock Files Timeout?
Product: [openSUSE] openSUSE Tumbleweed Reporter: Chris Holland <zrenfire>
Component: FirefoxAssignee: Factory Mozilla <factory-mozilla>
Status: NEW --- QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: martin.sirringhaus, pujos.michael, zrenfire
Version: CurrentFlags: martin.sirringhaus: needinfo? (zrenfire)
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Firefox profiler of 3 proflie lock attempts

Description Chris Holland 2023-08-07 18:14:11 UTC
Created attachment 868663 [details]
Firefox profiler of 3 proflie lock attempts

Is it just me that's experiencing slow Firefox startups in OpenSUSE? I tied Ubuntu and it had fairly quick startups.

* https://old.reddit.com/r/openSUSE/comments/15jygg0/slow_firefox_startup/
* https://bugzilla.mozilla.org/show_bug.cgi?id=1847463

My old tumbleweed PC had a shit GPU setup, so I didn't mind the slow startups. But I now have a Ryzen 5 5600G desktop and I'm still getting slow startups.

I copied the old ~/.mozilla/ folder to my new PC, and it's been slow to startup as well.

I tried using the https://profiler.firefox.com/ and running with:

    MOZ_PROFILER_STARTUP=1 MOZ_PROFILER_STARTUP_ENTRIES=20000000 firefox

and noticed it's spending 10+10+5 seconds stuck polling the lock files.

https://imgur.com/a/0ubcvo8

The lock file appears to be a symbolic link pointing to `localhost:+9999` which I didn't even know links could do.

* OpenSUSE Tumbleweed
* KDE Plasma Wayland
* btrfs

I tried:

* `firefox --safe-mode`
* Deleting/moving `~/.mozilla` + `~/.cache/mozilla`
* New User
* Plasma XOrg
* Kubuntu 23.04 (startup was only 2 seconds) ext4
* Ubuntu 22.04 (startup was only 2 seconds) ext4

-----

As you can see in the screenshots of the profiler (https://i.imgur.com/d5IUXj4.png), the network requests don't even start until 25s in (the network thread doesn't even start). It gets stuck in the locking of the profile.

* `XREMain::XRE_mainStartup` calls
    * `nsRemoteService::LockStartup`, `nsProfileLock::LockWithSymlink`, `nsProfileLock::Lock(...)`, `PR_GetHostByName`, `__new_gethostbyname_r`
        * `__nscd_gethostbyname_r`, `nscd_gethst_r`, `__nscd_open_socket`, `wait_on_socket`
            * `poll` (Spends 5000ms = 5s)
        * `_nss_dns_gethostbyname_r`, `__GI__nss_dns_gethostbyname_r`, `gethostbyname3_context`, `__GI___res_context_search`, `__res_context_querydomain`, `__GI___res_context_query`, `__res_context_send`, `send_dg`
            * `poll` (Spends 10s)
    * `LockProfile(nslNativeAppSupport, nslFile, nslFile, nslToolkitProfile, nslProfileLock)`
        * `nsToolkitProfile::Lock`, `nsToolkitProfile::Lock`, `nsToolkitProfileLock::Init`, `nsToolkitProfileLock::Init`, `nsProfileLock::Lock`, `nsProfileLock::LockWithSymlink`, `PR_GetHostByName`, `__new_gethostbyname_r`
            * `_nss_dns_gethostbyname_r`, `__GI__nss_dns_gethostbyname_r`, `gethostbyname3_context`, `__GI___res_context_search`, `__res_context_querydomain`, `__GI___res_context_query`, `__res_context_send`, `send_dg`
                * `poll` (Spends 10s)



-----

The only things I can think of to be OpenSUSE exclusive is it's Firewall or BTRFS.
Comment 1 Michael Pujos 2023-08-25 09:07:11 UTC
I do have this issue since I upgraded to snapshot 20230823.

It happens with Firefox, Thunderbird and deadbeef. using strace on either of these executables always shows a poll() system call waiting and timeouting after 25 seconds, early on program startup:

read(12, "\1\0\0\0\0\0\0\0", 8)         = 8
read(12, 0x7ffd67536c60, 8)             = -1 EAGAIN (Resource temporarily unavailable)
write(12, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x55b090cfc940, FUTEX_WAKE_PRIVATE, 2147483647) = 0
close(12)                               = 0
write(8, "\1\0\0\0\0\0\0\0", 8)         = 8
write(8, "\1\0\0\0\0\0\0\0", 8)         = 8
write(8, "\1\0\0\0\0\0\0\0", 8)         = 8
futex(0x55b090bf4da0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55b090bedda8, FUTEX_WAKE_PRIVATE, 1) = 1
eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK)   = 12
futex(0x55b090bedda8, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=12, events=POLLIN}], 1, 25000) = 0 (Timeout)
Comment 2 Michael Pujos 2023-08-25 09:08:23 UTC
This is observed with 6.4.11 kernel but reverting to 6.4.9 does not fix it.
Comment 3 Michael Pujos 2023-08-25 09:14:15 UTC
It seems to affect any program that uses GTK: gedit, meld, ...
Comment 4 Martin Sirringhaus 2023-08-25 09:28:13 UTC
Without having looked deeply into it: Maybe related to https://bugzilla.suse.com/show_bug.cgi?id=1210065?
Are you all on non-GNOME desktops, and do you have xdg-desktop-portal-gnome installed?
Comment 5 Michael Pujos 2023-08-25 09:34:59 UTC
Yes non GNOME setup (I'm using i3) but xdg-desktop-portal-gnome is installed.
I'm investigating the bug report you linked to.
What's curious is that this issue happened out of the blue.
Comment 6 Michael Pujos 2023-08-25 10:40:28 UTC
Mystery solved.

The problem was caused by the xdg-desktop-portal-gnome package being *installed*. Uninstalling it fixed the issue.

This Arch forum thread revealed that fix:
https://bbs.archlinux.org/viewtopic.php?id=285590

Longer story:

before stumbling on the post above, I noticed that xdg-desktop-portal-gnome.service did not start:


Aug 25 12:02:39 p72 systemd[6676]: Dependency failed for Portal service (GNOME implementation).
Aug 25 12:02:39 p72 systemd[6676]: xdg-desktop-portal-gnome.service: Job xdg-desktop-portal-gnome.service/start failed with result 'dependency'.

The cause is that it depends on graphical-session.target

[Unit]
Description=Portal service (GNOME implementation)
After=graphical-session.target
Requisite=graphical-session.target
PartOf=graphical-session.target

And I start my system with startx (launched automatically after login on vconsole, so without a Desktop Manager), thus it is in multi-user.target and xdg-desktop-portal-gnome.service cannot start.

The xdg-desktop-portal-gnome package was installed automatically on my system during one the latest snapshot a few days ago, probably as an optional dependency for something else.

So the permanent fix on my system (and other weird people not using a Desktop Manager) is to lock package xdg-desktop-portal-gnome so it is never installed.
That also explains why very few user had this issue.
Comment 7 Martin Sirringhaus 2023-08-25 10:48:09 UTC
Ah, sorry for wording my comment ambiguously. I should have mentioned that you need to uninstall it.

Chris, if uninstalling xdg-desktop-portal-gnome works for you as well, we can close this as a duplicate.
Comment 8 Chris Holland 2023-08-25 16:25:52 UTC
ATM Firefox is starting up in 2 sec for me.

I've been waiting to see if I can reproduce my Firefox **Profile** lock slow startup for a couple weeks now.

I've no idea what changed, but Firefox start's up within 2sec in OpenSUSE now... And I've no clue why. I had been dealing with this issue for weeks.
The only thing I did was update Kubuntu... And they only share the /boot/efi partition.
I did do a full update in OpenSUSE before testing too, and it seems there was a kernel update... But I restarted before testing!

-----

I'm using a 1Tb M.2 Drive with multiple partitions for OpenSUSE KDE, Kubuntu, Ubuntu.

OpenSUSE Yast Partitioner: https://i.imgur.com/KQtyE6d.png
Kubuntu Partitioner: https://i.imgur.com/bw3Uusz.png

-----

Here's a (filtered) log of journalctl, as well as profile.json + Screenshot timestamps from multiple boots:
https://gist.github.com/Zren/f8e33ccdb4b9fffd2894ff9ae70270a7


The weird thing is that I can read some `journalctl` logs with the kubuntu user... in OpenSUSE.

At some point or another, I think I'll do a clean reformat and reinstall of OpenSUSE without any other partitions.
Comment 9 Chris Holland 2023-09-18 05:35:00 UTC
After an update a week or two ago (early September) this bug came back. I had played around with the firewall too before restarting so I wasn't sure if it was the update or editing the firewall (toggling samba+samba-client on then off).

However, I noticed that in the journalctl logs, the "hostname" switches from "localhost" to "localhost.localdomain" when Firefox loaded in 2 seconds. However when it's taking 30 seconds to load, the hostname is "Client-57".

❯ sudo journalctl --since=2023-09-01 -g "purge"
-- Boot XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX --
Sep 04 21:42:21 localhost systemd[1]: Starting Purge old kernels...
Sep 04 21:42:21 localhost systemd[1]: Started Purge old kernels.
Sep 04 21:42:21 localhost zypper[934]: Preparing to purge obsolete kernels...
Sep 04 21:42:32 localhost.localdomain systemd[1]: purge-kernels.service: Deactivated successfully.
Sep 04 21:42:32 localhost.localdomain systemd[1]: purge-kernels.service: Consumed 4.612s CPU time.
-- Boot XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX --
Sep 10 19:50:47 localhost systemd[1]: Purge old kernels was skipped because of an unmet condition check (ConditionPathExists=/boot/do_purge_kernels).
-- Boot XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX --
Sep 11 17:21:22 localhost systemd[1]: Purge old kernels was skipped because of an unmet condition check (ConditionPathExists=/boot/do_purge_kernels).
-- Boot XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX --
Sep 14 12:29:38 localhost systemd[1]: Starting Purge old kernels...
Sep 14 12:29:38 localhost zypper[1016]: Preparing to purge obsolete kernels...
Sep 14 12:29:38 localhost systemd[1]: Started Purge old kernels.
Sep 14 12:29:48 Client-57 systemd[1]: purge-kernels.service: Deactivated successfully.
Sep 14 12:29:48 Client-57 systemd[1]: purge-kernels.service: Consumed 4.651s CPU time.


❯ hostnamectl
   Static hostname: (unset)                            
Transient hostname: Client-57
         Icon name: computer-desktop
           Chassis: desktop 🖥
        Machine ID: [snip]
           Boot ID: [snip]
  Operating System: openSUSE Tumbleweed                
       CPE OS Name: cpe:/o:opensuse:tumbleweed:20230915
            Kernel: Linux 6.5.3-1-default
      Architecture: x86-64
   Hardware Vendor: Micro-Star International Co., Ltd.
    Hardware Model: MS-7C95
  Firmware Version: H.80
     Firmware Date: Tue 2023-07-04
      Firmware Age: 2month 2w 1d



Since "ping localhost.localdomain" is fast, but "ping Client-57" times out, I think I've found the root cause.

I looked at /etc/hosts and saw:

127.0.0.1	localhost
# fallback hostname used by NetworkManager
127.0.0.1	localhost.localdomain

So I edited /etc/hosts and added the following, Firefox was fast to startup in 2sec.

127.0.0.1	Client-57

Any ideas what's causing the hostname to be "Client-57"?