Bug 1212345 - Sway fails to start after Mesa update to 23.1.x
Summary: Sway fails to start after Mesa update to 23.1.x
Status: RESOLVED FIXED
: 1212324 1212433 1212478 1212481 (view as bug list)
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: X.Org (show other bugs)
Version: Current
Hardware: Other Other
: P3 - Medium : Normal with 5 votes (vote)
Target Milestone: ---
Assignee: Gfx Bugs
QA Contact: Gfx Bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-14 07:54 UTC by Filippo Bonazzi
Modified: 2024-01-05 18:00 UTC (History)
19 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Package changes on my machine between snapshots 20230610 to 20230612 (6.87 KB, text/plain)
2023-06-14 08:54 UTC, Filippo Bonazzi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Filippo Bonazzi 2023-06-14 07:54:03 UTC
After yesterday's updates to Tumbleweed snapshot 20230612, Sway now fails to start with the following error:

```
[wlr] [EGL] command: elgCreateContext, error: EGL_BAD_ALLOC (0x3003), message: "dri2_create_context"
[wlr] [render/egl.c:409] Failed to create EGL context
[wlr] [renger/egl.c:554] Failed to initialize EGL context
[wlr] [render/gles2/renderer.c:679] Could not initialize EGL
[wlr] [render/wlr_renderer.c:333] Could not initialize renderer
[sway/server.c:79] Failed to create renderer
```

One possible culprit from the updated packages is `Mesa-dri`.
Comment 1 Filippo Bonazzi 2023-06-14 08:10:39 UTC
It looks like it might have been caused by Mesa updates 23.1.x (https://build.opensuse.org/request/show/1092024)
Comment 2 Stefan Dirsch 2023-06-14 08:48:25 UTC
Adding Joan. He may know more about wlroots...
Comment 3 Stefan Dirsch 2023-06-14 08:49:04 UTC
So this is Intel GPU.
Comment 4 Filippo Bonazzi 2023-06-14 08:52:49 UTC
Thanks, let me restate here what I wrote on Slack.

This problem was introduced in snapshot 20230612 and did not exist in 20230610.
I am going to attach the list of packages that was updated on my machine to narrow it down.

I first observed the issue on my dev laptop (Intel + nouveau NVIDIA), but I am able to reproduce it in a VM with only the Intel card passed through for OpenGL.
Comment 5 Filippo Bonazzi 2023-06-14 08:54:43 UTC
Created attachment 867562 [details]
Package changes on my machine between snapshots 20230610 to 20230612
Comment 6 Simon Lees 2023-06-14 09:41:06 UTC
Someone on IRC has mentioned that this has also been seen on a  amd radeon rx 5700
Comment 7 Thomas Zimmermann 2023-06-14 10:43:15 UTC
Today, I ran sway on an up-to-date TW with software rendering. That worked.

You may want to boot the kernel with the nodemodeset parameter. That will disable any hardware graphics rendering.
Comment 8 Stefan Dirsch 2023-06-14 10:46:58 UTC
Good idea, Thomas!
Comment 9 Filippo Bonazzi 2023-06-14 10:49:24 UTC
I can confirm that adding `nomodeset` to the kernel command line allows sway to start. I tried this on the VM with Intel mentioned above.

A side effect is that the output display is now detected as 'Unknown-1' with the only supported resolution being 640x480.
Comment 10 Joan Torres 2023-06-14 11:05:31 UTC
(In reply to Thomas Zimmermann from comment #7)
> Today, I ran sway on an up-to-date TW with software rendering. That worked.
> 
> You may want to boot the kernel with the nodemodeset parameter. That will
> disable any hardware graphics rendering.

That makes sway use the pixman renderer.

The issue still is why fails when using the EGL renderer.

The failure happens here: 
https://gitlab.freedesktop.org/mesa/mesa/-/blob/23.1/src/egl/drivers/dri2/egl_dri2.c#L1404

Might be a problem with glibc, the C compiler or a wrong use of sizeof ?
Comment 11 Ed Jackson 2023-06-14 15:42:26 UTC
Can confirm I'm also experiencing this issue, except I'm using Hyprland and an AMD Radeon 680m iGPU on a Ryzen 7 Pro 6850U.

I fixed it by rolling back to a snapshot of 20230610. I'm using the Packman versions of Mesa, but I experience the same issues when using the openSUSE versions.
Comment 12 Soc Virnyl Estela 2023-06-15 02:14:41 UTC
I can also confirm as well. I rolled back to snapshot "20230610". I used Hyprland and when I distro-upgraded today, I noticed that it was Mesa 23.1. Not sure if it's Mesa *entirely* but I experimented a bit. 

The best way is to rollback for now because I am experiencing graphical issues when locking the Mesa package to a version before 23.1.

As for my setup, I am using a laptop with an NVIDIA 3060 Mobile GPU with Intel i5-10300H. Mesa is from openSUSE, not from Packman.
Comment 13 llyyr 2023-06-15 04:29:52 UTC
This is an issue with OpenSUSE Mesa, wlroots-based renders start properly when Mesa is built locally and not installed from Factory.

You can try by building Mesa locally, then running `meson devenv` and running sway from that env.

I don't know what's wrong with our build but building Mesa-dri and Mesa together resolves the issue.

I had similar issues few months back when trying out a Mesa repos on OBS that built from the master branch and reported it*, but didn't realize it would be an issue with how opensuse builds mesa.

* https://gitlab.freedesktop.org/mesa/mesa/-/issues/8394
Comment 14 Soc Virnyl Estela 2023-06-15 12:53:32 UTC
I believe we should change the title of this bug report then since we confirmed it's Mesa?
Comment 15 Richard Palethorpe 2023-06-15 15:28:16 UTC
> That makes sway use the pixman renderer.

Another workaround is to set the env var WLR_RENDERER=pixman before starting sway.
Comment 16 llyyr 2023-06-15 15:30:56 UTC
You should not use the pixman renderer, that's software rendering. The vulkan backend should still work just fine, just set WLR_BACKEND=vulkan for now
Comment 17 llyyr 2023-06-15 15:32:09 UTC
(In reply to llyyr from comment #16)
> You should not use the pixman renderer, that's software rendering. The
> vulkan backend should still work just fine, just set WLR_BACKEND=vulkan for
> now

sorry, WLR_RENDERER=vulkan...
Comment 18 Richard Palethorpe 2023-06-15 15:47:02 UTC
(In reply to llyyr from comment #17)
> (In reply to llyyr from comment #16)
> > You should not use the pixman renderer, that's software rendering. The
> > vulkan backend should still work just fine, just set WLR_BACKEND=vulkan for
> > now
> 
> sorry, WLR_RENDERER=vulkan...

That doesn't work for me, I get the same error as listed here:
https://github.com/NixOS/nixpkgs/issues/229108

I'm also using amdgpu.
Comment 19 Ed Jackson 2023-06-15 23:08:29 UTC
For now, the solution is just to add package locks to all mesa-related drivers (for me on an AMD system this was zypper addlock libvulkan_radeon-32bit Mesa Mesa-32bit Mesa-dri Mesa-dri-32bit Mesa-gallium Mesa-gallium-32bit Mesa-KHR-devel Mesa-libEGL1 Mesa-libEGL-devel Mesa-libGL1 Mesa-libGL1-32bit Mesa-libGL-devel Mesa-vulkan-device-select-32bit, but obviously if you are on an nvidia or intel system this will differ), and continue updating your system as normal otherwise. Hopefully this gets fixed soon.
Comment 20 Richard Palethorpe 2023-06-16 08:16:00 UTC
(In reply to Joan Torres from comment #10)
> The issue still is why fails when using the EGL renderer.
> 
> The failure happens here: 
> https://gitlab.freedesktop.org/mesa/mesa/-/blob/23.1/src/egl/drivers/dri2/
> egl_dri2.c#L1404
> 
> Might be a problem with glibc, the C compiler or a wrong use of sizeof ?

This is really odd, using ltrace it seems malloc is called with no argument?

...
libEGL.so.1->malloc() = <void>
libEGL_mesa.so.0->malloc() = <void>
libgallium_dri.so->malloc() = <void>
libgallium_dri.so->malloc() = <void>
...
libgallium_dri.so->malloc() = <void>
libgallium_dri.so->malloc() = <void>
libgallium_dri.so->malloc() = <void>
00:00:00.854 [ERROR] [wlr] [EGL] command: eglCreateContext, error: EGL_BAD_ALLOC (0x3003), message: "dri2_create_context"
00:00:00.854 [ERROR] [wlr] [render/egl.c:409] Failed to create EGL context
00:00:00.854 [ERROR] [wlr] [render/egl.c:554] Failed to initialize EGL context
libEGL.so.1->malloc() = <void>
libgallium_dri.so->malloc() = <void>
libgallium_dri.so->malloc() = <void>
00:00:00.858 [ERROR] [wlr] [render/gles2/renderer.c:679] Could not initialize EGL
00:00:00.858 [DEBUG] [wlr] [render/wlr_renderer.c:271] Failed to create a GLES2 renderer. Skipping!
00:00:00.858 [ERROR] [wlr] [render/wlr_renderer.c:333] Could not initialize renderer
00:00:00.858 [ERROR] [sway/server.c:79] Failed to create renderer
Comment 21 Stefan Dirsch 2023-06-16 08:41:15 UTC
(In reply to llyyr from comment #13)
> This is an issue with OpenSUSE Mesa, wlroots-based renders start properly
> when Mesa is built locally and not installed from Factory.

You mean when you're running

osc build openSUSE_Tumbleweed  x86_64
osc build -M drivers openSUSE_Tumbleweed  x86_64

on your test machine and install the generated packages it just works for you? Weird ...


> You can try by building Mesa locally, then running `meson devenv` and
> running sway from that env.

Not sure what 'meson devenv' does ..,

 
> I don't know what's wrong with our build but building Mesa-dri and Mesa
> together resolves the issue.

See above.
Comment 22 Fabian Vogt 2023-06-16 09:42:10 UTC
*** Bug 1212433 has been marked as a duplicate of this bug. ***
Comment 23 Joan Torres 2023-06-16 09:52:59 UTC
The problem is with this new change:
https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/mesa/main/context.c#L1006

Packages from Mesa-dri are built with:

 -Dgles1=disabled -Dgles2=disabled

I'm already changing the build args to fix it.
Comment 24 Thomas Zimmermann 2023-06-16 10:48:20 UTC
Just a driver-by comment:

Starting weston currently aborts with the error that the EGL_ANDROID_native_fence_sync extension is missing. For now, I assume that it is caused by the same problem.
Comment 25 Thomas Zimmermann 2023-06-16 10:48:49 UTC
s/driver-by/drive-by/
Comment 26 Stefan Dirsch 2023-06-16 10:59:16 UTC
(In reply to Joan Torres from comment #23)
> The problem is with this new change:
> https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/mesa/main/context.
> c#L1006
> 
> Packages from Mesa-dri are built with:
> 
>  -Dgles1=disabled -Dgles2=disabled
> 
> I'm already changing the build args to fix it.

Thanks, Joan. With re-enabling this I guess we need to remove some libs after build of -drivers in specfile.
Comment 27 Joan Torres 2023-06-16 11:26:56 UTC
Please, can someone test the fix?


zypper addrepo https://download.opensuse.org/repositories/home:jtorres:branches:X11:XOrg/openSUSE_Tumbleweed/home:jtorres:branches:X11:XOrg.repo
zypper refresh
zypper install -f -r home_jtorres_branches_X11_XOrg Mesa-dri
Comment 28 Filippo Bonazzi 2023-06-16 11:32:56 UTC
Thanks Joan, this works for me on my VM. I don't see anything dodgy in journalctl or dmesg either.
Comment 29 Thomas Zimmermann 2023-06-16 11:33:41 UTC
Hi Joan

(In reply to Joan Torres from comment #27)
> Please, can someone test the fix?
> 
> 
> zypper addrepo
> https://download.opensuse.org/repositories/home:jtorres:branches:X11:XOrg/
> openSUSE_Tumbleweed/home:jtorres:branches:X11:XOrg.repo
> zypper refresh
> zypper install -f -r home_jtorres_branches_X11_XOrg Mesa-dri

I can confirm that this resolves the problem with weston. Thanks a lot!
Comment 30 Joan Torres 2023-06-16 11:38:18 UTC
Thank you.
Sent a SR: https://build.opensuse.org/request/show/1093479.
Closing this as FIXED.
Comment 31 Guillaume GARDET 2023-06-16 11:49:55 UTC
(In reply to Joan Torres from comment #27)
> Please, can someone test the fix?
> 
> 
> zypper addrepo
> https://download.opensuse.org/repositories/home:jtorres:branches:X11:XOrg/
> openSUSE_Tumbleweed/home:jtorres:branches:X11:XOrg.repo
> zypper refresh
> zypper install -f -r home_jtorres_branches_X11_XOrg Mesa-dri

I confirm this fixes the problem seen on aarch64 - Originally reported as bug#1212433
Comment 32 OBSbugzilla Bot 2023-06-16 13:35:03 UTC
This is an autogenerated message for OBS integration:
This bug (1212345) was mentioned in
https://build.opensuse.org/request/show/1093496 Factory / Mesa
Comment 33 Ed Jackson 2023-06-16 14:54:39 UTC
Will this fix automatically propagate to the Packman set of Mesa drivers? If not, how can we go about getting it fixed in Packman?
Comment 34 Stefan Dirsch 2023-06-16 15:48:00 UTC
(In reply to Ed Jackson from comment #33)
> Will this fix automatically propagate to the Packman set of Mesa drivers? If
> not, how can we go about getting it fixed in Packman?

Honestly I don't know anything about the build of the Mesa Packman package ...
Comment 35 Guillaume GARDET 2023-06-16 16:20:21 UTC
(In reply to Stefan Dirsch from comment #34)
> (In reply to Ed Jackson from comment #33)
> > Will this fix automatically propagate to the Packman set of Mesa drivers? If
> > not, how can we go about getting it fixed in Packman?
> 
> Honestly I don't know anything about the build of the Mesa Packman package
> ...

The packman Mesa package is here https://pmbs.links2linux.org/package/show/Essentials/A_tw-Mesa

And it seems there is no linkdiff compared to Factory Mesa, so it should propagate just fine.
Comment 36 Fabian Vogt 2023-06-17 22:24:55 UTC
*** Bug 1212478 has been marked as a duplicate of this bug. ***
Comment 37 Stefan Dirsch 2023-06-18 07:13:17 UTC
*** Bug 1212481 has been marked as a duplicate of this bug. ***
Comment 38 Mykola Krachkovsky 2023-06-18 07:38:16 UTC
*** Bug 1212324 has been marked as a duplicate of this bug. ***
Comment 39 Richard Palethorpe 2023-06-19 10:32:48 UTC
I started an OpenQA test which should catch similar issues:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/17285