Bug 1212696 - Q965 with modesetting DIX GPU HANG
Summary: Q965 with modesetting DIX GPU HANG
Status: NEW
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Leap 15.5
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: openSUSE Kernel Bugs
QA Contact: E-mail List
URL: https://gitlab.freedesktop.org/xorg/x...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-26 04:52 UTC by Felix Miata
Modified: 2024-05-18 02:50 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
mrmazda: needinfo?


Attachments
dmesg from 15.5 using modesetting DIX (485.33 KB, text/plain)
2023-06-26 04:52 UTC, Felix Miata
Details
journalctl -b from using modesetting DIX in 15.5 (588.25 KB, text/plain)
2023-06-26 04:53 UTC, Felix Miata
Details
/var/log/zypp/history block after which problem reproduced (38.97 KB, text/plain)
2023-07-13 22:23 UTC, Felix Miata
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Miata 2023-06-26 04:52:26 UTC
Created attachment 867811 [details]
dmesg from 15.5 using modesetting DIX

Original Summary:
Q965 with modesetting DIX GPU HANG

To reproduce: Try to do anything much in an X session using modesetting DIX on Intel Q965, e.g. in Konsole, run systemd-analyze critical-chain or blame, do something with xrandr, or run glmark2. DE at some point before long becomes unresponsive, usually first blanking both displays one or more times, and eventually dumps back to login screen after a going black.

First observed in 15.4 with Leap kernel .24.60.
Previous Leap kernel working OK was .24.49.
After tested reproducible with .24.63.
After tested not reproduced with .24.55.
Not reproducible in Debian 12 with 6.1.20 or 6.1.27 kernel.
Not reproducible in TW with 6.2.12 or 6.2.9 kernel.
Not reproducible using intel DDX display driver instead of modesetting DIX display driver.
Reproduced in both KDE3 and IceWM sessions.

From 15.4, only experienced 5 times to date:
# journalctl | grep -A4 HANG
Apr 28 18:55:40 gx745 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:87edfafe, in X [7201]
Apr 28 18:55:40 gx745 systemd-udevd[1764]: Network interface NamePolicy= disabled by default.
Apr 28 18:55:40 gx745 kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Apr 28 18:55:40 gx745 kernel: i915 0000:00:02.0: [drm] X[7201] context reset due to GPU hang
Apr 28 18:55:40 gx745 kernel: i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
--
Apr 28 18:55:50 gx745 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:9fe7fbfd, in X [7201]
Apr 28 18:55:50 gx745 kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Apr 28 18:55:50 gx745 kernel: i915 0000:00:02.0: [drm] X[7201] context reset due to GPU hang
Apr 28 18:55:50 gx745 kernel: i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed
Apr 28 18:56:15 gx745 kernel: i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed
--
Jun 25 14:44:42 gx745 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:9fe7fbfd, in X [6894]
Jun 25 14:44:42 gx745 kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Jun 25 14:44:42 gx745 kernel: i915 0000:00:02.0: [drm] X[6894] context reset due to GPU hang
Jun 25 14:44:42 gx745 kernel: i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
Jun 25 14:44:42 gx745 kernel: i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed
Jun 25 14:45:44 gx745 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:9fe7fbfd, in X [6894]
Jun 25 14:45:44 gx745 kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Jun 25 14:45:44 gx745 kernel: i915 0000:00:02.0: [drm] X[6894] context reset due to GPU hang
Jun 25 14:45:44 gx745 kernel: i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed
Jun 25 14:45:56 gx745 smartd[567]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 193 to 150
--
Jun 25 14:46:28 gx745 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:87edfaf6, in X [6894]
Jun 25 14:46:28 gx745 kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Jun 25 14:46:28 gx745 kernel: i915 0000:00:02.0: [drm] X[6894] context reset due to GPU hang
Jun 25 14:46:28 gx745 kernel: i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed
Jun 25 14:46:28 gx745 kernel: i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed
# journalctl | grep 'Linux version' | egrep 'Apr 28|Jun 25'
Apr 28 14:15:56 gx745 kernel: Linux version 5.14.21-150400.24.49-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.39.0.20220810-150100.7.40) #1 SMP PREEMPT_DYNAMIC Tue Mar 7 08:07:05 UTC 2023 (bad820e)
Apr 28 14:56:56 gx745 kernel: Linux version 5.14.21-150400.24.60-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.39.0.20220810-150100.7.40) #1 SMP PREEMPT_DYNAMIC Wed Apr 12 12:13:32 UTC 2023 (93dbe2e)
Apr 28 15:01:54 gx745 kernel: Linux version 5.14.21-150400.24.60-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.39.0.20220810-150100.7.40) #1 SMP PREEMPT_DYNAMIC Wed Apr 12 12:13:32 UTC 2023 (93dbe2e)
Jun 25 10:15:10 gx745 kernel: Linux version 5.14.21-150400.24.60-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.39.0.20220810-150100.7.40) #1 SMP PREEMPT_DYNAMIC Wed Apr 12 12:13:32 UTC 2023 (93dbe2e)
Jun 25 10:48:55 gx745 kernel: Linux version 5.14.21-150400.24.63-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.39.0.20220810-150100.7.40) #1 SMP PREEMPT_DYNAMIC Tue May 2 15:49:04 UTC 2023 (fd0cc4f)
Jun 25 13:20:43 gx745 kernel: Linux version 5.14.21-150400.24.63-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.39.0.20220810-150100.7.40) #1 SMP PREEMPT_DYNAMIC Tue May 2 15:49:04 UTC 2023 (fd0cc4f)
Jun 25 17:00:14 gx745 kernel: Linux version 5.14.21-150400.24.41-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.39.0.20220810-150100.7.40) #1 SMP PREEMPT_DYNAMIC Fri Jan 13 08:55:22 UTC 2023 (1d4442d)
Jun 25 17:37:18 gx745 kernel: Linux version 5.14.21-150400.24.55-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.39.0.20220810-150100.7.40) #1 SMP PREEMPT_DYNAMIC Mon Mar 27 15:25:48 UTC 2023 (cc75cf8)
Jun 25 18:40:33 gx745 kernel: Linux version 5.14.21-150400.24.60-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.39.0.20220810-150100.7.40) #1 SMP PREEMPT_DYNAMIC Wed Apr 12 12:13:32 UTC 2023 (93dbe2e)
Jun 25 18:59:34 gx745 kernel: Linux version 5.14.21-150400.24.63-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.39.0.20220810-150100.7.40) #1 SMP PREEMPT_DYNAMIC Tue May 2 15:49:04 UTC 2023 (fd0cc4f)
Jun 25 19:21:55 gx745 kernel: Linux version 5.14.21-150400.24.63-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.39.0.20220810-150100.7.40) #1 SMP PREEMPT_DYNAMIC Tue May 2 15:49:04 UTC 2023 (fd0cc4f)
#

After the 4th or 5th time occurring in 15.4 I upgraded a clone of this 15.4 to 15.5. In 15.5, this is so readily reproducible that 15.5 is unusable except by using Intel DDX.

# pinxi -GSaz --vs --zl --hostname
pinxi 3.3.27-25 (2023-06-22)
System:
  Host: gx745 Kernel: 5.14.21-150400.24.63-default arch: x86_64 bits: 64
    compiler: gcc v: 7.5.0 parameters: root=LABEL=<filter> ipv6.disable=1
    net.ifnames=0 noresume consoleblank=0 preempt=full mitigations=off
  Desktop: KDE v: 3.5.10 tk: Qt v: 3.3.8c info: kicker wm: kwin vt: 7 dm:
    1: KDM 2: XDM Distro: openSUSE Leap 15.4
Graphics:
  Device-1: Intel 82Q963/Q965 Integrated Graphics vendor: Dell driver: i915
    v: kernel arch: Gen-4 process: Intel 65n built: 2006-07 ports:
    active: DVI-D-1,VGA-1 empty: none bus-ID: 00:02.0 chip-ID: 8086:2992
    class-ID: 0300
  Display: x11 server: X.Org v: 1.20.3 driver: X: loaded: modesetting
    unloaded: fbdev,vesa alternate: intel dri: crocus gpu: i915 display-ID: :0
    screens: 1
  Screen-1: 0 s-res: 3600x1200 s-dpi: 120 s-size: 762x254mm (30.00x10.00")
    s-diag: 803mm (31.62")
  Monitor-1: DVI-D-1 pos: primary,left model: NEC EA243WM serial: <filter>
    built: 2011 res: 1920x1200 hz: 60 dpi: 94 gamma: 1.2
    size: 519x324mm (20.43x12.76") diag: 612mm (24.1") ratio: 16:10 modes:
    max: 1920x1200 min: 640x480
  Monitor-2: VGA-1 pos: right model: Dell P2213 serial: <filter> built: 2012
    res: 1680x1050 hz: 60 dpi: 90 gamma: 1.2 size: 473x296mm (18.62x11.65")
    diag: 558mm (22") ratio: 16:10 modes: max: 1680x1050 min: 720x400
  API: OpenGL v: 2.1 Mesa 21.2.4 renderer: Mesa Intel 965Q (BW)
    direct-render: Yes
Comment 1 Felix Miata 2023-06-26 04:53:10 UTC
Created attachment 867812 [details]
journalctl -b from using modesetting DIX in 15.5
Comment 2 Felix Miata 2023-06-26 04:55:09 UTC
> Not reproducible in TW with 6.2.12 or 6.2.9 kernel.

Was supposed to be 6.3.9, not 6.2.9. :(
Comment 3 Stefan Dirsch 2023-06-27 12:38:02 UTC
Looks like a kernel regression. But keep in mind this is still Broadwater released in 2006. Not sure whether still anyone will look into this ...
Comment 4 Takashi Iwai 2023-07-10 16:07:11 UTC
Please retest with the kernel in OBS Kernel:SLE15-SP5 repo.
Comment 5 Felix Miata 2023-07-10 16:49:24 UTC
No apparent improvement on 15.5 host gx745 using 5.14.21-150500.146.gf998de5-default:
[   38.350359] ext3 filesystem being mounted at /home supports timestamps until 2038 (0x7fffffff)
[   39.007320] EXT4-fs (sda20): mounting ext3 file system using the ext4 subsystem
[   39.042967] EXT4-fs (sda3): mounting ext2 file system using the ext4 subsystem
[   39.172347] EXT4-fs (sda3): mounted filesystem without journal. Opts: (null). Quota mode: none.
[   39.172361] ext2 filesystem being mounted at /disks/boot supports timestamps until 2038 (0x7fffffff)
[   42.361449] EXT4-fs (sda19): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[   42.361467] ext3 filesystem being mounted at /pub supports timestamps until 2038 (0x7fffffff)
[   43.965681] EXT4-fs (sda20): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[   43.965698] ext3 filesystem being mounted at /usr/local supports timestamps until 2038 (0x7fffffff)
[   46.021963] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[   47.022024] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[   47.025080] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[   48.024751] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[  117.807116] i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:9fe7fbfd, in Xorg [697]
[  117.855059] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[  117.957901] i915 0000:00:02.0: [drm] Xorg[697] context reset due to GPU hang
[  117.983475] i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed
[  124.714669] i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:87edfafe, in Xorg [697]
[  124.739815] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[  124.842578] i915 0000:00:02.0: [drm] Xorg[697] context reset due to GPU hang
[  124.868417] i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed
Comment 6 Takashi Iwai 2023-07-11 06:12:36 UTC
OK, then install the latest Leap 15.4 kernel on top of Leap 15.5 system and verify that the problem persists.  Also, install the 6.3.9 kernel from OBS home:tiwai:kernel:6.3 "backport" repo, and verify the problem doesn't exist there.  We want to make sure that it's a pure kernel problem.

Then, you can install kernel-default-5.14.21-150400.24.49.  Verify that this works.  If this is OK, you can try to copy i915.ko from this kernel to the latest Leap 15.4 kernel (5.14.21-150400.XXX) directory.  e.g.

  % mkdir -p /lib/modules/5.14.21-150400.XXX/updates
  % cp /lib/modules/5.14.21-150400.24.49-default/kernel/drivers/gpu/drm/i915/i915.ko.* /lib/modules/5.14.21-150400-XXX/updates/

Then depmod, rebuild initrd and retest. If this works, it means that the problem is in i915 driver locally.
Comment 7 Felix Miata 2023-07-11 21:40:16 UTC
(In reply to Takashi Iwai from comment #6)
> OK, then install the latest Leap 15.4 kernel on top of Leap 15.5 system and
> verify that the problem persists. 

You lost me already with "on top of". The PC has 15.4, 15.5 and TW installed. Which 15.4 kernel, 24.66? Install it on 15.5? I've been trying to wait for the bug 1212957 fix in suse-module-tools before doing any more initrd-related package updates and Leap kernel additions or removals.

> Also, install the 6.3.9 kernel from OBS
> home:tiwai:kernel:6.3 "backport" repo, and verify the problem doesn't exist
> there.  We want to make sure that it's a pure kernel problem.

https://download.opensuse.org/repositories/home:/tiwai:/kernel:/6.3/backport/x86_64/ ? 15.4? 15.5? Both?

> Then, you can install kernel-default-5.14.21-150400.24.49.  Verify that this
> works.  If this is OK, you can try to copy i915.ko from this kernel to the
> latest Leap 15.4 kernel (5.14.21-150400.XXX) directory.  e.g.

24.49 is still installed on 15.4:

# systemctl status purge-kernels
○ purge-kernels.service
     Loaded: masked (Reason: Unit purge-kernels.service is masked.)
     Active: inactive (dead)
# alias | grep rpmqa
alias rpmqa='rpm -qa | sort | grep $*'
# grep RETT /etc/os-release
PRETTY_NAME="openSUSE Leap 15.4"
# rpmqa nel-def
kernel-default-5.14.21-150400.10.2.x86_64
kernel-default-5.14.21-150400.19.1.x86_64
kernel-default-5.14.21-150400.22.1.x86_64
kernel-default-5.14.21-150400.24.21.2.x86_64
kernel-default-5.14.21-150400.24.33.2.x86_64
kernel-default-5.14.21-150400.24.41.1.x86_64
kernel-default-5.14.21-150400.24.49.3.x86_64
kernel-default-5.14.21-150400.24.55.3.x86_64
kernel-default-5.14.21-150400.24.60.1.x86_64
kernel-default-5.14.21-150400.24.63.1.x86_64
kernel-default-5.3.18-150300.59.49.1.x86_64
On 15.5:
# rpmqa nel-def
kernel-default-5.14.21-150400.24.63.1.x86_64
kernel-default-5.14.21-150500.146.1.gf998de5.x86_64
kernel-default-5.14.21-150500.53.2.x86_64
kernel-default-5.3.18-150300.59.49.1.x86_64

>   % mkdir -p /lib/modules/5.14.21-150400.XXX/updates
>   % cp
> /lib/modules/5.14.21-150400.24.49-default/kernel/drivers/gpu/drm/i915/i915.
> ko.* /lib/modules/5.14.21-150400-XXX/updates/

When you put % of a command line you lose me, and the first ten hits from 'linux shell % command' on Google.

> Then depmod, rebuild initrd and retest. If this works, it means that the
> problem is in i915 driver locally.

Are you looking for all this to be done on 15.5, or 15.4, or both?

Does "in the i915 driver locally" mean *SUSE rather than upstream?
Comment 8 Felix Miata 2023-07-11 22:05:24 UTC
[   44.339325] ext3 filesystem being mounted at /usr/local supports timestamps until 2038 (0x7fffffff)
[   48.733610] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[   49.732984] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[   49.736022] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[   50.735721] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[  122.082823] i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:9fe7fbfd, in Xorg [705]
[  122.097413] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[  122.200018] i915 0000:00:02.0: [drm] Xorg[705] context reset due to GPU hang
[  122.225653] i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed
^C
# grep RETT /etc/os-release
PRETTY_NAME="openSUSE Leap 15.5"
# uname -r
6.3.9-lp154.1.g0df701d-default
Comment 9 Takashi Iwai 2023-07-12 06:20:08 UTC
It wasn't clear whether the problem could be seen on Leap 15.4 system with the latest Leap 15.4 kernel.  If yes, we can stick with Leap 15.4 at first to identify the regression range.

In that case, the same procedure applies:
- install the working kernel (kernel-default-5.14.21-150400.24.49?), and confirm that it's still working

- install the update Leap 15.4 kernel that breaks, confirm it's broken

- Copy i915.ko.* from the working one to the broken one (in updates directory), depmod, rebuild initrd and retest. It should show the breakage.
Comment 10 Felix Miata 2023-07-12 07:47:16 UTC
I have a third Leap on gx745, a clone of 15.4 a month ago just zypper upgraded to 15.5, for switching to 15.6 when available (in 3 days), without having removed .24.49 or .24.41, and without having .24.66 yet installed. The hang does reproduce with .24.49, and .24.41.

So I booted to 15.4 and .24.33, and it didn't reproduce simply by opening a KDE3 session, as "15.6" with .24.49 and .24.41 do, or by then opening SeaMonkey too.

Next I booted 15.4 to .24.41, and again reproducing did not result from Konsole or SeaMonkey. Next boot, .24.49, and not reproduced. Next, .24.55, again not reproduced. Next, .24.60, and again not. So .24.63 and again not.

I don't have .24.66 on 15.4 on gx745 yet. I don't know what happened to cause me to conclude and report the problem exists in 15.4, but it's definitely happening in 15.5, and seems to be tied to something in 15.5 that's different from 15.4 that is not related or not directly related to kernel.

Time for bed now....What to try next?
Comment 11 Takashi Iwai 2023-07-12 08:08:50 UTC
Then check the Leap 15.4 kernel (that should work) on top of Leap 15.5 system.
It's to confirm that it's really a kernel problem, not tightly coupled with something else.

If Leap 15.4 kernel on top of Leap 15.5 system works, it's the backport issue.

Then the next would be to test 6.2.x, 6.1.x, 6.0.x, so on from my OBS home:tiwai:kernel:6.2, home:tiwai:kernel:6.1, etc.  The "backport" repo contains the kernel for Leap.  If you can see a breakage in those and works in some later version, the fix has been landed likely between those versions.
Comment 12 Felix Miata 2023-07-12 10:43:42 UTC
(In reply to Takashi Iwai from comment #11)
> Then check the Leap 15.4 kernel (that should work) on top of Leap 15.5
> system.

I did that already (and it doesn't; #3 Leap LABEL 16s156 has no 15.5 or 15.6 kernels installed yet):

(In reply to Felix Miata from comment #10)
> I have a third Leap on gx745, a clone of 15.4 a month ago just zypper
> upgraded to 15.5, for switching to 15.6 when available (in 3 days), without
> having removed .24.49 or .24.41, and without having .24.66 yet installed.
> The hang does reproduce with .24.49, and .24.41.

# grep RETT /etc/os-release
PRETTY_NAME="openSUSE Leap 15.5"
# lsblk -f | grep s15
├─sda10 ext4   1.0   10s155      82643cb0-a621-43f6-a9e9-82e50e839260    1.8G    65% /
├─sda11 ext4   1.0   11s154      8d48a7af-a1a3-45cd-9bf4-eca1c03029c3    1.1G    76% /disks/s154
├─sda16 ext4   1.0   16s156      822316e9-d5cd-4536-aa56-41cd845c7127      2G    62% /disks/s156
# uname -r
5.14.21-150500.146.gf998de5-default
# dmesg (tail)
...
[   44.565368] ext3 filesystem being mounted at /pub supports timestamps until 2038 (0x7fffffff)
[   45.965191] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[   46.964773] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[   46.974434] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[   47.974491] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[  125.172857] i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:9fe7fbfd, in Xorg [702]
[  125.196797] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[  125.300439] i915 0000:00:02.0: [drm] Xorg[702] context reset due to GPU hang
[  125.326149] i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed
[  131.316798] i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:87edfaf6, in Xorg [702]
[  131.348393] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[  131.450031] i915 0000:00:02.0: [drm] Xorg[702] context reset due to GPU hang
[  131.475708] i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed
# grep RETT /etc/os-release
PRETTY_NAME="openSUSE Leap 15.5"
# lsblk -f | grep s15
├─sda10 ext4   1.0   10s155      82643cb0-a621-43f6-a9e9-82e50e839260    1.8G    65% /disks/s155
├─sda11 ext4   1.0   11s154      8d48a7af-a1a3-45cd-9bf4-eca1c03029c3    1.1G    76% /disks/s154
├─sda16 ext4   1.0   16s156      822316e9-d5cd-4536-aa56-41cd845c7127      2G    62% /
# uname -r
5.14.21-150400.24.63-default
# dmesg | tail
[   42.429636] EXT4-fs (sda19): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[   42.429656] ext3 filesystem being mounted at /pub supports timestamps until 2038 (0x7fffffff)
[   43.702214] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[   44.702321] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[   44.705394] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[   45.705336] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
[  106.637718] i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:9fe7fbfd, in Xorg [691]
[  106.669617] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[  106.772761] i915 0000:00:02.0: [drm] Xorg[691] context reset due to GPU hang
[  106.798475] i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed
Comment 13 Takashi Iwai 2023-07-12 10:54:25 UTC
(In reply to Felix Miata from comment #12)
> (In reply to Takashi Iwai from comment #11)
> > Then check the Leap 15.4 kernel (that should work) on top of Leap 15.5
> > system.
> 
> I did that already (and it doesn't; #3 Leap LABEL 16s156 has no 15.5 or 15.6
> kernels installed yet):

So the problem happens with Leap 15.4 kernel when it's running on Leap 15.5 system, although the very same kernel works fine on Leap 15.4 system?

If yes, it implies that this is no kernel "regression", per se.

(Also, you don't need to set NEEDINFO if you have no concrete question to a specific developer.)
Comment 14 Felix Miata 2023-07-12 11:02:15 UTC
(In reply to Takashi Iwai from comment #13)
> So the problem happens with Leap 15.4 kernel when it's running on Leap 15.5
> system, although the very same kernel works fine on Leap 15.4 system?

Apparently yes, on two different 15.5s that were originally the same 15.4, but each separately upgraded to 15.5 after being cloned from 15.4.
 
> If yes, it implies that this is no kernel "regression", per se.

So do what next?
Comment 15 Takashi Iwai 2023-07-12 11:08:33 UTC
Basically, two things:
- Identify what broke in user space side; this is more about X or others
- Fix the GPU hang if possible

The former would be up to you, you can try to downgrade things piece-by-piece, and check whether it works or not.

I guess the latter is hard, as it's a very old chip and Intel would have little interest. But you can try to check whether it's really a kernel regression from some old version, for example. If it's a regression, there can be a slight better chance to get it fixed.
For that, you can try my old kernel packages from OBS home:tiwai:kernel:X.Y repos.
Comment 16 Felix Miata 2023-07-12 16:57:13 UTC
Does not repro on GM965 (laptop), with 15.5 kernel:
# inxi -GMS
System:
  Host: ts205 Kernel: 5.14.21-150500.53-default arch: x86_64 bits: 64
    Desktop: KDE Plasma v: 5.27.4 Distro: openSUSE Leap 15.5
Machine:
  Type: Laptop System: TOSHIBA product: Satellite A205 v: PSAE3U-06Y023
    serial: 38339383K
  Mobo: TOSHIBA model: ISKAA v: 1.00 serial: 0123456789AB BIOS: TOSHIBA
    v: 2.20 date: 03/10/2008
Graphics:
  Device-1: Intel Mobile GM965/GL960 Integrated Graphics driver: i915
    v: kernel
  Display: x11 server: X.Org v: 1.21.1.4 with: Xwayland v: 22.1.5 driver: X:
    loaded: modesetting dri: crocus gpu: i915 resolution: 1: 1280x800~60Hz
    2: 1680x1050~60Hz
  API: OpenGL v: 2.1 Mesa 22.3.5 renderer: Mesa Intel 965GM (CL)

or with 15.4 kernel:
# inxi -GMS
System:
  Host: ts205 Kernel: 5.14.21-150400.24.63-default arch: x86_64 bits: 64
    Desktop: KDE Plasma v: 5.27.4 Distro: openSUSE Leap 15.5
Machine:
  Type: Laptop System: TOSHIBA product: Satellite A205 v: PSAE3U-06Y023
    serial: 38339383K
  Mobo: TOSHIBA model: ISKAA v: 1.00 serial: 0123456789AB BIOS: TOSHIBA
    v: 2.20 date: 03/10/2008
Graphics:
  Device-1: Intel Mobile GM965/GL960 Integrated Graphics driver: i915
    v: kernel
  Display: x11 server: X.Org v: 1.21.1.4 with: Xwayland v: 22.1.5 driver: X:
    loaded: modesetting dri: crocus gpu: i915 resolution: 1: 1280x800~60Hz
    2: 1680x1050~60Hz
  API: OpenGL v: 2.1 Mesa 22.3.5 renderer: Mesa Intel 965GM (CL)
# dmesg | tail -2
[   21.352631] acpi device:08: registered as cooling_device2
[   22.371785] i915 0000:00:02.0: [drm] not enough stolen space for compressed buffer (need 9830400 more bytes), disabling. Hint: you may be able to increase stolen memory size in the BIOS to avoid this.
Comment 17 Felix Miata 2023-07-13 22:23:40 UTC
Created attachment 868195 [details]
/var/log/zypp/history block after which problem reproduced

I recloned 15.4 to my "s156" partition to upgrade to 15.5 in sections, starting with most likely suspects. Eventually I ran out of suspects likely in my eyes, so after renaming /var/log/zypp/history, I did:

# time zypper -v up -d
...
225 packages to upgrade, 11 new, 2 to remove.
Overall download size: 143.8 MiB. Already cached: 0 B. Download only.
Continue? [y/n/v/...? shows all options] (y): y
...
CommitResult  (total 236, done 0, error 0, skipped 236, updateMessages 0)

After rebooting, the problem was back. The history file attached here contains only those transactions. I did since try backleveling irqbalance, but it didn't help. I'd appreciate if anyone could look at the attachment so as to recommend any packages that could conceivably be suspect that I should try reverting.

I did afterward run zypper dup to complete the upgrade from 15.4 to 15.5:
...
1 package to upgrade, 30 to downgrade, 1 to reinstall, 20  to change vendor.
...
The downgrades were mostly replacing packman AV with OEM. The others:
  kbd                 2.4.0-150400.5.6.1 -> 2.4.0-150400.5.3.1
  kbd-legacy          2.4.0-150400.5.6.1 -> 2.4.0-150400.5.3.1
  libatkmm-1_6-1      2.28.3-150400.4.6.1 -> 2.28.3-150400.4.3.1
  libheif1            1.12.0-150400.3.11.1 -> 1.12.0-150400.3.8.1
  libraw20            0.20.2-150400.3.6.1 -> 0.20.2-150400.3.3.1
  libsigc-2_0-0       2.10.7-150400.3.3.1 -> 2.10.7-150400.1.7
  libvmaf1            2.3.1-bp154.5.1 -> 2.2.0-150400.1.8
  python3-packaging   21.3-150200.3.3.1 -> 20.3-1.9
  systemd-rpm-macros  13-150000.7.33.1 -> 12-150000.7.30.1
  ucode-intel         20230512-150200.24.1 -> 20230214-150200.21.1

Completing the dup didn't change the behavior.

(In reply to Takashi Iwai from comment #15)
> For that, you can try my old kernel packages from OBS home:tiwai:kernel:X.Y
> repos.

# lsattr /boot/initrd-5*
----i---------e------- /boot/initrd-5.14.21-150400.10-default
----i---------e------- /boot/initrd-5.14.21-150400.19-default
----i---------e------- /boot/initrd-5.14.21-150400.22-default
----i---------e------- /boot/initrd-5.14.21-150400.24.21-default
----i---------e------- /boot/initrd-5.14.21-150400.24.33-default
----i---------e------- /boot/initrd-5.14.21-150400.24.41-default
----i---------e------- /boot/initrd-5.14.21-150400.24.49-default
----i---------e------- /boot/initrd-5.14.21-150400.24.55-default
----i---------e------- /boot/initrd-5.14.21-150400.24.60-default
----i---------e------- /boot/initrd-5.14.21-150400.24.63-default
#

So far as best I remember, .22, .24.33, .24.41 .24.49, .24.55 & .24.63 have all been tried and reproduced on either the original "s156" or the newer, and/or on the current s155, which includes 5.14.21-150500.53 and 5.14.21-150500.146.gf998de5 .
Comment 18 Felix Miata 2023-07-18 06:57:26 UTC
More data suggesting not a kernel issue - every kernel attributable to the following reproduced the problem on 15.5:
# ls -Gg .ini*
-rw------- 1 15999240 Jun 25 14:41 .initrd-5.14.21-150400.24.63-default1
-rw------- 1 17065880 Jul 10 12:32 .initrd-5.14.21-150500.146.gf998de5-default1
-rw------- 1 17071840 Jun 25 16:41 .initrd-5.14.21-150500.53-default1
-rw------- 1 16814400 Jul 18 01:45 .initrd-5.17.9-lp153.1.gc1eda89-default1
-rw------- 1 16668936 Jul 18 01:56 .initrd-5.18.15-lp153.1.g0b7935a-default1
-rw------- 1 16814856 Jul 18 02:04 .initrd-5.19.12-lp153.1.g95fa5b8-default1
-rw------- 1 11688844 Mar  5  2022 .initrd-5.3.18-150300.59.49-default1
-rw------- 1 16864780 Jul 18 02:12 .initrd-6.0.12-lp154.1.ga6c4f4e-default1
-rw------- 1 17322832 Jul 18 02:24 .initrd-6.1.12-lp154.1.g373f017-default1
-rw------- 1 17954568 Jul 18 02:35 .initrd-6.2.12-lp154.1.geb3255d-default1
-rw------- 1 18078916 Jul 11 17:57 .initrd-6.3.9-lp154.1.g0df701d-default1
Comment 19 Felix Miata 2023-09-17 05:04:13 UTC
TW and Debian remain good.
5.14.21-150400.24.81 on 15.4 failed to reproduce.
5.14.21-150500.55.19 on 15.5 reproduced.
5.14.21-150400.24.63 on 15.6 zypper dup'd from 15.4 reproduced.
5.14.21-150500.55.19 on 15.6 reproduced.
6.5.3-lp154.4.1.gba6631b on 15.6 reproduced:
[  103.623513] i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:9fe7fbfd, in Xorg [705]
[  103.640959] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[  103.744278] i915 0000:00:02.0: [drm] Xorg[705] context reset due to GPU hang
[  103.769708] i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed
Comment 20 Felix Miata 2023-10-23 01:22:06 UTC
5.14.21-150500.55.28-default in 15.5 reproduces:
[  107.560703] i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:87edfafe, in Xorg [689]
[  107.589473] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[  107.692121] i915 0000:00:02.0: [drm] Xorg[689] context reset due to GPU hang
[  107.717902] i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed

5.14.21-150500.55.31-default in 15.6 reproduces

6.1.0-13-amd64 in Debian Bookworm/12 also reproduces using modesetting DIX, not intel DDX. Apparently I forgot to confirm which display driver was used last time. This time with modesetting DIX:
[ 1058.278236] i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:9fe7fbfd, in Xorg [1495]
[ 1058.301979] i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
[ 1058.405529] i915 0000:00:02.0: [drm] Xorg[1495] context reset due to GPU hang
[ 1058.431491] i915 0000:00:02.0: [drm] Setting output timings on SDVOB failed

6.5.6-1-default on Tumbleweed 20231020 does not reproduce
Comment 21 Felix Miata 2023-10-23 21:25:29 UTC
Given reproduction in Debian 12 and with 6.5.8-lp155.3.g51baea8 on 15.5, I reported upstream:
https://gitlab.freedesktop.org/xorg/xserver/-/issues/1593
Q965 ID 8086:2992 with modesetting DIX GPU HANG
Comment 22 Felix Miata 2024-02-09 21:11:09 UTC
Hanging continues with 6.4.0-150600.4.16 kernel-default.
Comment 23 Felix Miata 2024-03-29 05:34:39 UTC
Comment added upstream (latest 15.5, 15.6 & Debian kernels continue to produce problem; TW 6.6.22-longterm does not):
https://gitlab.freedesktop.org/xorg/xserver/-/issues/1593
Comment 24 Felix Miata 2024-05-15 00:12:38 UTC
New comment upstream. Still bad with 6.7.12-amd64 on Trixie, 15.5's 5.14.21-150500.55.59-default, or 15.6's 6.4.0-150600.17-default.