Bugzilla – Bug 1063139
/usr/bin/X coredump in RRSetChanged()
Last modified: 2019-03-27 17:10:14 UTC
#8 <signal handler called> #9 RRSetChanged (pScreen=0x5560e172ad90) at randr.c:558 #10 0x00005560e0e9082f in RRScreenSetSizeRange (pScreen=pScreen@entry=0x5560e172ad90, minWidth=<optimized out>, minHeight=<optimized out>, maxWidth=<optimized out>, maxHeight=<optimized out>) at rrinfo.c:228 #11 0x00005560e0e4e693 in xf86RandR12CreateScreenResources12 (pScreen=0x5560e172ad90) at xf86RandR12.c:1795 #12 xf86RandR12CreateScreenResources (pScreen=pScreen@entry=0x5560e172ad90) at xf86RandR12.c:844 #13 0x00005560e0e413f0 in xf86CrtcCreateScreenResources (screen=<optimized out>) at xf86Crtc.c:719 #14 0x00005560e0dd2031 in dix_main (argc=12, argv=0x7ffdbd25b448, envp=<optimized out>) at main.c:208 #15 0x00007fb0014ecf4a in __libc_start_main () from /lib64/libc.so.6 #16 0x00005560e0dbbeba in _start () at ../sysdeps/x86_64/start.S:120
Created attachment 744232 [details] startx log
Tumbleweed 20171009 > rpm -qf /usr/bin/X xorg-x11-server-1.19.3-5.1.x86_64 > rpm -q xf86-video-intel xf86-video-intel-2.99.917+git781.c8990575-1.1.x86_64 > lspci 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device 21da Flags: bus master, fast devsel, latency 0, IRQ 30 Memory at f0000000 (64-bit, non-prefetchable) [size=4M] Memory at e0000000 (64-bit, prefetchable) [size=256M] I/O ports at 5000 [size=64] [virtual] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: i915 Kernel modules: i915
Scratch the lspci info from the last comment, this one's better: > lspci 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller]) Subsystem: Dell Device 05a4 Flags: bus master, fast devsel, latency 0, IRQ 33 Memory at f7800000 (64-bit, non-prefetchable) [size=4M] Memory at d0000000 (64-bit, prefetchable) [size=256M] I/O ports at f000 [size=64] Capabilities: <access denied> Kernel driver in use: i915 Kernel modules: i915
Created attachment 744235 [details] Xorg.0.log
From the stacktrace it looks like it could be some problem with PRIME (multi GPU support). From the X log it looks like you also have some AMD card in your computer: PCI: (0:0:2:0) 8086:0412:1028:05a4 rev 6, Mem @ 0xf7800000/4194304, 0xd0000000/268435456, I/O @ 0x0000f000/64 PCI:*(0:1:0:0) 1002:6611:1028:210b rev 0, Mem @ 0xe0000000/268435456, 0xf7c00000/262144, I/O @ 0x0000e000/256, BIOS @ 0x????????/131072 Is there also AMD GPU in the computer? (In reply to Klaus Kämpf from comment #2) > > rpm -qf /usr/bin/X > xorg-x11-server-1.19.3-5.1.x86_64 The newest version in Tumbleweed is 1.19.4. I don't see any related fixes in 1.19.4, but it is worth trying. > > rpm -q xf86-video-intel > xf86-video-intel-2.99.917+git781.c8990575-1.1.x86_64 Strange that it is installed, but X server is failing to find it. There seem to be some other discrepancies between versions: [ 68920.106] (II) Module modesetting: vendor="X.Org Foundation" [ 68920.106] compiled for 1.19.4, module version = 1.19.4 ^^^^^^ Please make sure you have up-to-date X server and all drivers. If the crash still happens, please attach coredump as well.
Michal, thanks for looking into this ! (In reply to Michal Srb from comment #5) > > Is there also AMD GPU in the computer? Yes: > lspci 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240/340 OEM] (prog-if 00 [VGA controller]) Subsystem: Dell Radeon R5 240 OEM Flags: fast devsel, IRQ 16 Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at f7c00000 (64-bit, non-prefetchable) [size=256K] I/O ports at e000 [size=256] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: <access denied> Kernel modules: radeon, amdgpu > > There seem to be some other discrepancies between versions: > [ 68920.106] (II) Module modesetting: vendor="X.Org Foundation" > [ 68920.106] compiled for 1.19.4, module version = 1.19.4 > ^^^^^^ Huh ? Shouldn't package dependencies prevent this ?
(In reply to Klaus Kämpf from comment #6) > > > > There seem to be some other discrepancies between versions: > > [ 68920.106] (II) Module modesetting: vendor="X.Org Foundation" > > [ 68920.106] compiled for 1.19.4, module version = 1.19.4 > > ^^^^^^ > > Huh ? Shouldn't package dependencies prevent this ? Argh, my bad. Again, I confused remote with local. Remote system is now up to date with > rpm -q xorg-x11-server xorg-x11-server-1.19.4-1.1.x86_64 And the crash is with the AMD GPU > rpm -q xf86-video-amdgpu xf86-video-amdgpu-1.4.0-1.1.x86_64 Background: I used to run with multiple graphics cards and multiple monitors. This setup was running fine with Tumbleweed until Jun/Aug timeframe. Since then, only one card was functioning but not both together. Since last week (Tumbleweed 20171004) X completely refused to start. Just yesterday I detected the core dumps :-/
Created attachment 744258 [details] Xorg.0.log from X.Org X Server 1.19.4
(In reply to Klaus Kämpf from comment #6) > Huh ? Shouldn't package dependencies prevent this ? Well in theory the different versions should work together as long as they have the same ABI_VIDEODRV_VERSION (they have in this case). Package dependencies only make sure that ABI_VIDEODRV_VERSION of the drivers match the one of X server. But something is broken, so it is worth checking every option, maybe upstream made some changes and forgot to bump ABI_VIDEODRV_VERSION.
Created attachment 744259 [details] bzip2'ed core It still crashes at #9 RRSetChanged (pScreen=0x556fecb88d00) at randr.c:558 #10 0x0000556feb06382f in RRScreenSetSizeRange (pScreen=pScreen@entry=0x556fecb88d00, minWidth=<optimized out>, minHeight=<optimized out>, maxWidth=<optimized out>, maxHeight=<optimized out>) at rrinfo.c:228 #11 0x0000556feb021693 in xf86RandR12CreateScreenResources12 (pScreen=0x556fecb88d00) at xf86RandR12.c:1795 #12 xf86RandR12CreateScreenResources (pScreen=pScreen@entry=0x556fecb88d00) at xf86RandR12.c:844 #13 0x0000556feb0143f0 in xf86CrtcCreateScreenResources (screen=<optimized out>) at xf86Crtc.c:719
Thank you for the coredump. It seems that fbdev is used for the AMD card and modesetting for the Intel card. When setting up the Intel card, it crashes because the Intel card does not have master set. Afaik it should not even try to set it up that way. This is a bug and I will look into it, but it should not be using those two drivers in the first place. X server is trying to load the proper drivers - intel and ati, but fails to find their files. The drivers come from xf86-video-intel and xf86-video-ati packages. You said that you have xf86-video-intel installed, do you have xf86-video-ati as well? Can you please verify that their files are OK? (rpm -qV xf86-video-intel xf86-video-ati) If the files are there, could you record strace of X server so we know why is it failing to find their files?
Also, we had a regression in 1.19.4, which has been addressed again in 1.19.5. Klaus has installed the -amdgpu driver for unknown reasons. It appears to be the wrong driver. Should be -ati instead. OTOH we want to switch to modesetting for everything supporting KMS, right? Is this not supported by PRIME?
(In reply to Stefan Dirsch from comment #12) > Also, we had a regression in 1.19.4, which has been addressed again in > 1.19.5. Klaus has installed the -amdgpu driver for unknown reasons. It > appears to be the wrong driver. Should be -ati instead. I had no idea which one to install. I did some experiments of installing only one of -amdgpu, -ati, or -intel in the past. To no avail. Also installing xf86-video-ati (alongside xf86-video-intel) lead to the same crash. To be continued next week ... thanks a lot so far !
Running under gdb reveals: Thread 1 "X" received signal SIGSEGV, Segmentation fault. RRSetChanged (pScreen=0x555555a96fd0) at randr.c:558 558 mastersp->changed = TRUE; (gdb) print mastersp $1 = (rrScrPrivPtr) 0x0
Looking further (gdb) print pScreen->isGPU $2 = 1 => rrGetScrPriv() returns NULL
I've seen this in the coredump, I know where it is crashing. It is trying to setup the intel GPU (which is using modesetting driver) to be a slave to the radeon GPU (which is using fbdev driver). This is of course broken, fbdev can not be DRM PRIME master - it is not a DRM driver... This should be fixed, but it is not priority at the moment. The more important question is why is your system using modesetting and fbdev drivers when xf86-video-intel and now also xf86-video-radeon are installed. The X server logs show that it could not find their files, which sounds like broken packages/filesystem. That's why I asked you to: (In reply to Michal Srb from comment #11) > Can you please verify that their files are OK? > (rpm -qV xf86-video-intel xf86-video-ati) > > If the files are there, could you record strace of X server so we know why > is it failing to find their files?
(In reply to Michal Srb from comment #16) > > That's why I asked you to: > > (In reply to Michal Srb from comment #11) > > Can you please verify that their files are OK? > > (rpm -qV xf86-video-intel xf86-video-ati) > > > > If the files are there, could you record strace of X server so we know why > > is it failing to find their files? Sorry, I missed this comment. (rpm -qV xf86-video-intel xf86-video-ati) does not return any output. Both packages were re-installed by me after the last TW update.
Created attachment 744814 [details] bzip of strace output The following command was used to generate the strace output: strace -o x.out /usr/bin/X vt7 -displayfd 3 -auth /run/user/417/gdm/Xauthority -background none -noreset -keeptty -verbose 3
The strace output is a mess, which hardly is readable. Can you please also provide a regular Xorg.0.log with xf86-video-intel xf86-video-ati packages installed?
Created attachment 744837 [details] Requested Xorg.0.log Can I do anything to improve the strace output ?
OMG. Now vesa and intel X drivers are used. :-(
Created attachment 744911 [details] Xorg.0.log after removal of xf86-video-vesa Hmm, "no screens found" if I remove xf86-video-vesa.
Alright, now the situation looks bit different. Both the strace and the logs in comment 20 and comment 22 show that the xf86-video-ati is found and loaded. However, it fails to initialize and in case of comment 20 it fallbacks to VESA, in case of comment 22 it has no options left and terminates. The driver prints this in log: [KMS] drm report modesetting isn't supported. That is because drmCheckModesettingSupported returned 0. The strace shows that it tried to find a /dev/dri/card* file belonging to the card, but the only one present is card0 which belongs to the intel. This sounds like the kernel driver is not loaded. Can you please provide output of lsmod and kernel log?
(In reply to Michal Srb from comment #23) > Alright, now the situation looks bit different. Both the strace and the logs > in comment 20 and comment 22 show that the xf86-video-ati is found and > loaded. However, it fails to initialize and in case of comment 20 it > fallbacks to VESA, in case of comment 22 it has no options left and > terminates. Why doesn't it fall back to the intel card ?
Created attachment 744927 [details] lsmod
Created attachment 744928 [details] bzip2'ed output of "journalctl -t kernel"
(In reply to Klaus Kämpf from comment #24) > (In reply to Michal Srb from comment #23) > > Alright, now the situation looks bit different. Both the strace and the logs > > in comment 20 and comment 22 show that the xf86-video-ati is found and > > loaded. However, it fails to initialize and in case of comment 20 it > > fallbacks to VESA, in case of comment 22 it has no options left and > > terminates. > > Why doesn't it fall back to the intel card ? Good question. It's yet another bug I would say. But please let's concentrate now on why radeon KMS support isn't working, please.
For some reason amdgpu instead of radeon KMS driver is loaded and apparently does not work, not even with generic modesetting X driver it seems. Please try these kernel options. radeon.si_support=1 amdgpu.si_support=0
Our kernel config for openSUSE:Factory [...] CONFIG_DRM_AMDGPU_SI=y [...] This explains a lot! I want this change reverted!
------------------------------------------------------------------- Mon Oct 17 19:37:51 CEST 2016 - jeffm@suse.com - Update to 4.9-rc1. [...] - DRM: - DRM_AMDGPU_SI=y [...] So broken since a year in TW ...
Klaus, can you confirm, that this kernel option is also set on your system? zcat /proc/config.gz | grep DRM_AMDGPU_SI
(In reply to Stefan Dirsch from comment #31) > Klaus, can you confirm, that this kernel option is also set on your system? > > zcat /proc/config.gz | grep DRM_AMDGPU_SI Confirmed :-(
(In reply to Stefan Dirsch from comment #28) > For some reason amdgpu instead of radeon KMS driver is loaded and apparently > does not work, not even with generic modesetting X driver it seems. Please > try these kernel options. > > radeon.si_support=1 amdgpu.si_support=0 No, this does not improve the situation.
Created attachment 744943 [details] Xorg.0.log with radeon.si_support=1 amdgpu.si_support=0
When I re-install xf86-video-vesa, I'm back at the segfault in RRSetChanged()
I think the "radeon.si_support=1 amdgpu.si_support=0" options are the default. These lines are in the kernel log: amdgpu 0000:01:00.0: enabling device (0000 -> 0003) amdgpu 0000:01:00.0: SI support provided by radeon. amdgpu 0000:01:00.0: Use radeon.si_support=0 amdgpu.si_support=1 to override. They come from the amdgpu_driver_load_kms function. They get printed out if amdgpu detected that the card is too old and should be handled by radeon driver. It then returns -ENODEV and the amdgpu driver is not used. It suggests the opposite parameters that would disable radeon and enable amdgpu for this card, but that is experimental and we don't want that. I think that disabling the CONFIG_DRM_AMDGPU_SI will just remove the option to opt-in for this experimental support. It seems to me that amdgpu puts its hands away from the card correctly. What I don't understand is why the radeon driver doesn't get used for it after that.
(In reply to Michal Srb from comment #36) > It seems to me that amdgpu puts its hands away from the card correctly. I agree. > What I don't understand is why the radeon driver doesn't get used for it after > that. Exactly. We need to know what happens, if you run modprobe radeon manually and the dmesg output afterwards. Even better the dmesg output *before* and after running dmesg -c > /dev/null modprobe radeon
Created attachment 744955 [details] dmesg "before" (no additional kernel parameters)
Created attachment 744956 [details] dmesg "after" (no additional kernel parameters) With the 'radeon' module loaded, X starts again. But it doesn't use both cards.
Created attachment 744959 [details] Xorg.0.log With the radeon module loaded, startx succeeds (amd card only). Xorg.0.log attached.
(In reply to Klaus Kämpf from comment #40) > With the radeon module loaded, startx succeeds (amd card only). Xorg.0.log > attached. The log shows that both (user-space) radeon and intel drivers are used. Maybe the PRIME failed to associate the cards as output sink/source automatically. Let's add it to the list of issues to solve later. So we found out that manually loaded (kernel) radeon driver works. The question remains why is it not loaded automatically.
Well, X starts, but it's unusable (horrible flickering)
(In reply to Klaus Kämpf from comment #39) > Created attachment 744956 [details] > dmesg "after" (no additional kernel parameters) [ 105.932494] radeon 0000:01:00.0: Invalid PCI ROM data signature: expecting 0x52494350, got 0xe808aa55 [...] [ 106.123771] radeon 0000:01:00.0: failed VCE resume (-110). Hmm. Something sounds wrong here. Maybe kernel-firmware is not installed or outdated or whatever. Or even hardware is broken. Could you verify, that the graphics card's fan is still rotating? ;-)
Honestly, if it's possible in Firmware/BIOS to disable Intel card, I would like to know whether the radeon cards works during boot.
(In reply to Stefan Dirsch from comment #44) > Honestly, if it's possible in Firmware/BIOS to disable Intel card, I would > like to know whether the radeon cards works during boot. Yes, actually both cards work during boot and show boot log messages (first the ati card, then kernel logging switches to the intel card surprisingly).
BTW, I'm using the same graphics card (I guess you're using the same Dell desktop), but with Intel onboard disabled. And indeed it broke with fan no longer rotating! I replaced it with the same graphics card model. Works fine with KOTD installed (4.14.0-rc4-1.g879f297-default) and kernel-firmware-20171009-181.1 (radeon X driver).
(In reply to Stefan Dirsch from comment #44) > Honestly, if it's possible in Firmware/BIOS to disable Intel card, I would > like to know whether the radeon cards works during boot. I can't disable a card, but I can configure the primary display in the BIOS. If I choose the ATI card, kernel messages appear on the ATI displays (2 monitors are attached to the ATI card) during boot up to a specific point where the kernel switches message output to the Intel display. The GDM login appears on the ATI displays. But it's unusable due to flickering. If I choose the Intel card, all kernel messages appear on the Intel card. The ATI displays stay completely dark. Running "startx" provides a GNOME desktop on the Intel display. But it's unusable due to flickering. ==> Both cards work separately. I can choose the card via BIOS setting. The actual desktop is broken for both cards (excessive flickering) though.
Latest Tumbleweed (20171102) still does not auto-load the radeon kernel module.
Also, X/GDM stays unusable due to excessive flickering.
Could this be causing host gx151 coredumps in TW20180202? Dell Motherboard/CPU has i945G, but VGA on Radeon is in PCIe slot: # uname -a Linux gx151 4.14.15-2-pae #1 SMP PREEMPT Mon Jan 29 08:15:43 UTC 2018 (9a6fca5) i686 i686 i386 GNU/Linux # cat /proc/cmdline root=LABEL=osTWst80 ipv6.disable=1 net.ifnames=0 noresume vga=791 video=1024x768@60 video=1440x900@60 3 # inxi -c0 -G Graphics: Card: Advanced Micro Devices [AMD/ATI] RV370 [Radeon X600/X600 SE] Display Server: X.org 1.19.6 drivers: ati,radeon (unloaded: modesetting,fbdev,vesa) tty size: 180x56 Advanced Data: N/A for root out of X # dmesg | tail [ 65.729996] Key type id_legacy registered [ 657.370713] systemd-coredump[1431]: Not enough arguments passed by the kernel (0, expected 6). [ 659.208028] systemd-coredump[1434]: Not enough arguments passed by the kernel (0, expected 6). # journalctl -b | grep edump Feb 04 06:23:18 gx151 systemd[1]: Created slice system-systemd\x2dcoredump.slice. Feb 04 06:23:19 gx151 systemd-coredump[1204]: Process 1200 (X) of user 0 dumped core. Feb 04 06:28:05 gx151 systemd-coredump[1415]: Process 1411 (X) of user 0 dumped core. Feb 04 06:28:59 gx151 systemd-coredump[1431]: Not enough arguments passed by the kernel (0, expected 6). Feb 04 06:29:01 gx151 systemd-coredump[1434]: Not enough arguments passed by the kernel (0, expected 6). Feb 04 06:36:56 gx151 systemd-coredump[1612]: Process 1608 (X) of user 0 dumped core. Feb 04 06:37:22 gx151 systemd-coredump[1691]: Process 1687 (X) of user 0 dumped core. Feb 04 06:43:47 gx151 systemd-coredump[1914]: Process 1910 (X) of user 0 dumped core. # rpm -qa | grep 11-server xorg-x11-server-1.19.6-2.1.i586 xorg-x11-server-extra-1.19.6-2.1.i586 Removing the rv370 and using i945G doesn't stop the coredumps: # tail of Xorg.0.log running on i945G [ 154.969] (EE) [ 154.969] (EE) Backtrace: [ 154.970] (EE) 0: X (xorg_backtrace+0x50) [0x5f6230] [ 154.970] (EE) 1: X (0x44e000+0x1ac1b2) [0x5fa1b2] [ 154.970] (EE) 2: linux-gate.so.1 (__kernel_rt_sigreturn+0x0) [0xb7f62d10] [ 154.970] (EE) 3: linux-gate.so.1 (__kernel_vsyscall+0x9) [0xb7f62cf9] [ 154.970] (EE) 4: /lib/libc.so.6 (gsignal+0xc2) [0xb77b88e2] [ 154.970] (EE) 5: /lib/libc.so.6 (abort+0x121) [0xb77b9fd1] [ 154.970] (EE) 6: /lib/libc.so.6 (0xb778a000+0x266cb) [0xb77b06cb] [ 154.970] (EE) 7: /lib/libc.so.6 (0xb778a000+0x2672b) [0xb77b072b] [ 154.970] (EE) 8: X (0x44e000+0x280da) [0x4760da] [ 154.970] (EE) 9: X (0x44e000+0xcf300) [0x51d300] [ 154.970] (EE) 10: /usr/lib/xorg/modules/extensions/libglx.so (0xb7384000+0x26d8f) [0xb73aad8f] [ 154.970] (EE) 11: /usr/lib/xorg/modules/extensions/libglx.so (0xb7384000+0x2e2fa) [0xb73b22fa] [ 154.970] (EE) 12: /usr/lib/xorg/modules/extensions/libglx.so (0xb7384000+0x25729) [0xb73a9729] [ 154.970] (EE) 13: X (InitExtensions+0x4b) [0x500dcb] [ 154.971] (EE) 14: X (0x44e000+0x40ca2) [0x48eca2] [ 154.971] (EE) 15: X (0x44e000+0x2a2ca) [0x4782ca] [ 154.971] (EE) 16: /lib/libc.so.6 (__libc_start_main+0xf3) [0xb77a27b3] [ 154.971] (EE) 17: X (0x44e000+0x2a308) [0x478308] Fatal server error: [ 154.971] (EE) Caught signal 6 (Aborted). Server aborting [ 155.044] (EE) Server terminated with error (1). Closing log file. No help using prior 4.13.11 kernel either.
Klaus, is this still an issue with curent TW?
(In reply to Stefan Dirsch from comment #53) > Klaus, is this still an issue with curent TW? No, this seems fixed. Thanks !
.