Bug 1219522 - Kernel panic with 6.7.x version
Summary: Kernel panic with 6.7.x version
Status: NEW
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: x86-64 openSUSE Tumbleweed
: P5 - None : Major (vote)
Target Milestone: ---
Assignee: openSUSE Kernel Bugs
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-02-03 17:36 UTC by Edwin KM
Modified: 2024-04-06 12:03 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
dmesg 6.6.11-1 kernel (81.27 KB, text/plain)
2024-02-03 17:36 UTC, Edwin KM
Details
stops at this point (606.07 KB, image/jpeg)
2024-02-05 14:35 UTC, Edwin KM
Details
failed boot (14.75 MB, video/vnd.avi)
2024-02-07 19:56 UTC, Edwin KM
Details
kernel panic with "working" kernel (464.85 KB, image/jpeg)
2024-02-07 19:59 UTC, Edwin KM
Details
crash 1 (9.20 MB, image/jpeg)
2024-03-31 12:27 UTC, Edwin KM
Details
crash 2 (8.70 MB, image/jpeg)
2024-03-31 12:27 UTC, Edwin KM
Details
crash 3 (8.69 MB, image/jpeg)
2024-03-31 12:28 UTC, Edwin KM
Details
Test fix patch (591 bytes, patch)
2024-04-01 12:50 UTC, Takashi Iwai
Details | Diff
dmesg of the 6.8.2-2.1.gb88b81e kernel (no panics seen yet) (95.15 KB, text/plain)
2024-04-03 15:43 UTC, Edwin KM
Details
fail01_6.8.2-2.gb88b81e-default (100.38 KB, text/plain)
2024-04-04 19:51 UTC, Edwin KM
Details
fail02_6.8.2-2.gb88b81e-default (98.45 KB, text/plain)
2024-04-04 20:01 UTC, Edwin KM
Details
ok_6.8.2-2.gb88b81e-default (93.83 KB, text/plain)
2024-04-06 11:18 UTC, Edwin KM
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Edwin KM 2024-02-03 17:36:04 UTC
Created attachment 872422 [details]
dmesg 6.6.11-1 kernel

I get a kernel panic since the 6.7.x kernels.
The capslock key is flickering and if i use the "recovery mode" i usually see:
  [T8] psmouse serio1: synaptics: Touchpad model: 1, fw: 8.16, id: 0x1e2b1, caps: 0xf01fa3/0x940300/0x12e800/0x400000, board id: 3276, fw id: 2700068
  [T8] psmouse serio1: synaptics: serio: Synaptics pass-through port at isa0060/serio1/input0
  [T8] input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio1/input/input2
  
I have no idea how to debug this. No error message. In the past i applied a kernel bisect (for debian i think) but i could not find a guide for Tumbleweed.

Hardware: Lenovo T580

Grub menu contains:
* 6.7.2-1
* 6.7.1-2
* 6.6.11-1

Included a "dmesg" output of a normal boot with the "6.6.11-1" kernel.
Comment 1 Takashi Iwai 2024-02-04 09:20:55 UTC
Check with 6.7.3 kernel in OBS Kernel:stable repo at first.
  http://download.opensuse.org/repositories/Kernel:/stable/standard/

If the problem persists, try to remove "verbose" and "splash=...." boot options.  This might give you a bit better insight.

If the problem still isn't visible, try to boot with "nomodeset" option instead.  If this works, the problem lies in the graphics driver.

In addition, you can try the 6.8-rc kernel in OBS Kernel:HEAD repo, too
  http://download.opensuse.org/repositories/Kernel:/HEAD/standard/
Comment 2 Edwin KM 2024-02-04 16:57:58 UTC
I am using the "recovery mode". I do not see a "splash" line. Also i do not see a "verbose" option.
I tried "${extra_cmdline} nomodeset" and does not change anything.


I am hesitant to install kernels. My grub list contains 3 items. 2 broken. If it will remove my working kernel i can not boot anymore.

Also unclear which rpm i should install.
Comment 3 Takashi Iwai 2024-02-04 17:06:22 UTC
(In reply to Edwin KM from comment #2)
> I am hesitant to install kernels. My grub list contains 3 items. 2 broken.
> If it will remove my working kernel i can not boot anymore.

Increase the number of installable kernels by editing /etc/zypp/zypp.conf before installing more test kernels.  Add entries in multiversion.kernels line, e.g.
  multiversion.kernels = latest,latest-1,latest-2,latest-3,running
so that the system can keep more kernel packages.

> Also unclear which rpm i should install.

Just kernel-default.rpm should suffice.
Comment 4 Edwin KM 2024-02-05 08:32:37 UTC
Not a success. Hope it does not break my system.

sudo rpm -i kernel-default-6.8~rc3-1.1.gae4495f.x86_64.rpm
warning: kernel-default-6.8~rc3-1.1.gae4495f.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 03579c1d: NOKEY
dracut[I]: Executing: /usr/bin/dracut -f /boot/initrd-6.8.0-rc3-1.gae4495f-default 6.8.0-rc3-1.gae4495f-default
dracut[I]: Module 'systemd-pcrphase' will not be installed, because command '/usr/lib/systemd/systemd-pcrphase' could not be found!
dracut[I]: Module 'systemd-portabled' will not be installed, because command 'portablectl' could not be found!
dracut[I]: Module 'systemd-portabled' will not be installed, because command '/usr/lib/systemd/systemd-portabled' could not be found!
dracut[I]: Module 'systemd-repart' will not be installed, because command 'systemd-repart' could not be found!
dracut[I]: Module 'dbus-broker' will not be installed, because command 'dbus-broker' could not be found!
dracut[I]: Module 'rngd' will not be installed, because command 'rngd' could not be found!
dracut[I]: Module 'connman' will not be installed, because command 'connmand' could not be found!
dracut[I]: Module 'connman' will not be installed, because command 'connmanctl' could not be found!
dracut[I]: Module 'connman' will not be installed, because command 'connmand-wait-online' could not be found!
dracut[I]: Module 'tpm2-tss' will not be installed, because command 'tpm2' could not be found!
dracut[I]: Module 'nvmf' will not be installed, because command 'nvme' could not be found!
dracut[I]: Module 'biosdevname' will not be installed, because command 'biosdevname' could not be found!
dracut[I]: Module 'memstrack' will not be installed, because command 'memstrack' could not be found!
dracut[I]: memstrack is not available
dracut[I]: If you need to use rd.memdebug>=4, please install memstrack and procps-ng
dracut[I]: Module 'squash' will not be installed, because command 'mksquashfs' could not be found!
dracut[I]: Module 'squash' will not be installed, because command 'unsquashfs' could not be found!
dracut[I]: Module 'systemd-pcrphase' will not be installed, because command '/usr/lib/systemd/systemd-pcrphase' could not be found!
dracut[I]: Module 'systemd-portabled' will not be installed, because command 'portablectl' could not be found!
dracut[I]: Module 'systemd-portabled' will not be installed, because command '/usr/lib/systemd/systemd-portabled' could not be found!
dracut[I]: Module 'systemd-repart' will not be installed, because command 'systemd-repart' could not be found!
dracut[I]: Module 'dbus-broker' will not be installed, because command 'dbus-broker' could not be found!
dracut[I]: Module 'rngd' will not be installed, because command 'rngd' could not be found!
dracut[I]: Module 'connman' will not be installed, because command 'connmand' could not be found!
dracut[I]: Module 'connman' will not be installed, because command 'connmanctl' could not be found!
dracut[I]: Module 'connman' will not be installed, because command 'connmand-wait-online' could not be found!
dracut[I]: Module 'tpm2-tss' will not be installed, because command 'tpm2' could not be found!
dracut[I]: Module 'nvmf' will not be installed, because command 'nvme' could not be found!
dracut[I]: Module 'memstrack' will not be installed, because command 'memstrack' could not be found!
dracut[I]: memstrack is not available
dracut[I]: If you need to use rd.memdebug>=4, please install memstrack and procps-ng
dracut[I]: Module 'squash' will not be installed, because command 'mksquashfs' could not be found!
dracut[I]: Module 'squash' will not be installed, because command 'unsquashfs' could not be found!
dracut[I]: *** Including module: systemd ***
dracut[I]: *** Including module: systemd-initrd ***
dracut[I]: *** Including module: i18n ***
dracut[I]: *** Including module: drm ***
dracut[I]: *** Including module: plymouth ***
dracut[I]: *** Including module: btrfs ***
dracut[I]: *** Including module: kernel-modules ***
dracut[I]: *** Including module: kernel-modules-extra ***
dracut[I]: *** Including module: zfs ***
dracut-install: Failed to find module 'zfs'
dracut[E]: FAILED:  /usr/lib/dracut/dracut-install -D /var/tmp/dracut.uflnAv/initramfs -H -N ^i2o_scsi$ --kerneldir /lib/modules/6.8.0-rc3-1.gae4495f-default/ -m zfs
dracut[F]: installkernel failed in module zfs
warning: %post(kernel-default-6.8~rc3-1.1.gae4495f.x86_64) scriptlet failed, exit status 1
Comment 5 Takashi Iwai 2024-02-05 08:37:15 UTC
(In reply to Edwin KM from comment #4)
> dracut[I]: *** Including module: zfs ***
> dracut-install: Failed to find module 'zfs'
> dracut[E]: FAILED:  /usr/lib/dracut/dracut-install -D
> /var/tmp/dracut.uflnAv/initramfs -H -N ^i2o_scsi$ --kerneldir
> /lib/modules/6.8.0-rc3-1.gae4495f-default/ -m zfs
> dracut[F]: installkernel failed in module zfs
> warning: %post(kernel-default-6.8~rc3-1.1.gae4495f.x86_64) scriptlet failed,
> exit status 1

Do you use zfs, i.e. out-of-tree module?  There is no guarantee that it'd work with such a module, of course.
Comment 6 Edwin KM 2024-02-05 08:55:19 UTC
No custom stuff (afaik).
The error seems normal?
https://forums.opensuse.org/t/upgrade-to-kernel-default-6-5-4-1-1-fails-on-zfs-module-install/169706/5

But grub does not contain another kernel to select.
Comment 7 Takashi Iwai 2024-02-05 09:10:08 UTC
OK, relieved :)

And why are you booting with recovery mode?  Didn't the normal boot work in the past?  The recovery mode isn't meant for the daily use.
Comment 8 Edwin KM 2024-02-05 09:20:33 UTC
(In reply to Takashi Iwai from comment #7)
> OK, relieved :)
> 
> And why are you booting with recovery mode?  Didn't the normal boot work in
> the past?  The recovery mode isn't meant for the daily use.

For normal day-to-day usage i use the normal grub line. (need to figure out how to make it stick to the old kernel though because have to wait 2 seconds after each click in the grub menu. Seems still a issue years later).

I only use the recovery mode to see the log lines. Hoped it would show something useful to debug such issue.

Not sure what is the plan here. I can update to a new kernel and maybe it is already fixed. In that case i also can wait for months to update Suse?
Comment 9 Takashi Iwai 2024-02-05 09:41:28 UTC
Hm, and if you do boot the kernel normally but with nomodeset option and the removal of verbose & splash=*, you still don't see any messages at the crash?
If so, at which point does it crash?  Is it the very early stage?

It's difficult to judge without knowing what's going on.

And, you can try 6.8-rc kernel as mentioned.  If it works, there is a good chance that 6.7.x will catch up the fix later.  OTOH, if it doesn't work with 6.8-rc, it's something to be addressed in the upstream.
Comment 10 Edwin KM 2024-02-05 14:34:32 UTC
Included a photo. After this point the caplock is blinking.

* "Verbose" is not in the grub menu, so nothing to remove
* remove "splash=*" does not help (is this not equal to "recovery mode"? This line does not contain it either)
* "nomodeset" does not help for both modes

Is there not a verbose mode in Linux with just tells us which driver is actually loaded?

Upgrading the kernel seems to fail to that "zfs" bug.

I appreciate the help. I also understand that people can create bugs. But the interface/feedback of the boot process is horrible.

Not sure if related. But the only special thing about my laptop is that a thunderbolt3 is "broken". This was a flash bug of Intel/Lenovo and it fails to flash. Since this issue Windows fails to display output using the Thunderbolt3 port (but the usb-c port works just fine). And Linux did work fine using the Thunderbolt3 port (but i assume it fallbacks to dp-alt).
This not works for at least a year.
Comment 11 Edwin KM 2024-02-05 14:35:35 UTC
Created attachment 872462 [details]
stops at this point
Comment 12 Takashi Iwai 2024-02-05 16:56:43 UTC
Actually it should be "quiet" to be removed instead "verbose". 

And, the photo snapshot was taken with "nomodeset" boot session?  This option disables the native graphics, hence if it's the case of native graphics, the kernel continues to use EFI frame buffer, which is more robust.

Also, at this moment, is LED flushing?  If so, it's weird; the LED flush indicates usually a kernel panic, and a kernel panic should print something to the screen as much as possible.
Comment 13 Edwin KM 2024-02-05 18:27:47 UTC
(In reply to Takashi Iwai from comment #12)
> Actually it should be "quiet" to be removed instead "verbose". 

The recovery mode does not contain this line. If you want i can use the "normal boot" without this item.


> And, the photo snapshot was taken with "nomodeset" boot session?  This
> option disables the native graphics, hence if it's the case of native
> graphics, the kernel continues to use EFI frame buffer, which is more robust.

yes

 
> Also, at this moment, is LED flushing?  If so, it's weird; the LED flush
> indicates usually a kernel panic, and a kernel panic should print something
> to the screen as much as possible.

Output stops and the capslock is blinking.
Comment 14 Edwin KM 2024-02-07 16:14:54 UTC
Is there something i can do to find the cullprit?

Note: a good kernel will sometimes also fail. A retry will usually fix this. I also have some freezes (but not that often). Usually the sound will replay the last second at that point.

So, until now, i blamed this on bad kernels. But not sure. This kernel panic is consistent however.
Comment 15 Takashi Iwai 2024-02-07 16:25:58 UTC
Oh that's an important point that sometimes boot failed in the past.

First off: don't use recovery mode.  It brings nothing but confusion.  It's rather for certain purposes, e.g. where the installation failed or so, but not for debugging or recovering like this case.  It lead to other problems.  So, keep away from it.

So, test only with the normal boot, but with extra options or removal of options.  Do I understand correctly that the very same symptom appears with nomodeset option and the removal of "silence" and "splash=*" options on the normal boot?  That is, even though the caps lock flushing, you see no kernel messages but the screen got frozen?  Did you wait long enough (e.g. for a minute) after that?

The second point to be checked is why dracut invocation for 6.8-rc kernel fails.
Does it fail in that way only with 6.8-rc kernels?  Or did you see the similar failures with current or older versions?
Comment 16 Edwin KM 2024-02-07 19:56:59 UTC
Created attachment 872554 [details]
failed boot
Comment 17 Edwin KM 2024-02-07 19:58:49 UTC
(In reply to Takashi Iwai from comment #15)


> So, test only with the normal boot, but with extra options or removal of
> options.  Do I understand correctly that the very same symptom appears with
> nomodeset option and the removal of "silence" and "splash=*" options on the
> normal boot?  That is, even though the caps lock flushing, you see no kernel
> messages but the screen got frozen?  Did you wait long enough (e.g. for a
> minute) after that?

see the movie.
The boot after this test with the "working" kernel throwed something like a output on screen. Will upload it after this post.
Comment 18 Edwin KM 2024-02-07 19:59:17 UTC
Created attachment 872556 [details]
kernel panic with "working" kernel
Comment 19 Takashi Iwai 2024-02-08 10:56:08 UTC
Thanks.  It indicates that some interrupt-related bug was triggered and it likely remains after the (warm-) reboot with 6.6.x.

For now, we can track two things:
- Set up kdump and try to catch the crash on 6.7.x kernel
- Test 6.8-rc kernel

The latter was attempted in comment 4, and it showed an error of zfs.
But this looks really strange.  The default dracut package has no zfs module.  You must have installed the zfs stuff in addition.

Try to check the contents in /usr/lib/dracut/modules.d.  There should be a directory with '*zfs' (e.g. "90zfs").  If there is, figure out which package it belongs to:
  rpm -qf /usr/lib/dracut/modules.d/90zfs
If you don't use zfs on your system, there is really no need for that package, and better to get rid of it.
Comment 20 Edwin KM 2024-02-11 09:49:29 UTC
just a update. 

I removed the "zfs" package and i now can install the kernel. Created another bug report: https://bugzilla.opensuse.org/show_bug.cgi?id=1219799

Installed kernel kernel-debug-6.8~rc3-3.1.g7450939.x86_64.rpm. Now i am back to the "random" chance of a booting system.

Booted to Windows (which seems quite stable but is not used often) to update to the latest BIOS and Intel ME. Both seemed only CVE updates and does indeed not change anything.

Memtest86 passed two runs.

Can i go back to a really old kernel? Like a year old?
Comment 21 Takashi Iwai 2024-02-11 09:54:05 UTC
(In reply to Edwin KM from comment #20)
> Installed kernel kernel-debug-6.8~rc3-3.1.g7450939.x86_64.rpm. Now i am back
> to the "random" chance of a booting system.

What do you mean exactly?

If you get a kernel panic even with this kernel, you'll need to report to the upstream.  In either way, it's better to set up kdump and try to catch the crash log at first.
Comment 22 Takashi Iwai 2024-02-11 10:08:49 UTC
(In reply to Edwin KM from comment #20)
> Can i go back to a really old kernel? Like a year old?

There are unofficial kernel builds for old versions found in my OBS repos, e.g. home:tiwai:kernel:6.6, home:tiwai:kernel:6.5, etc.

But 6.6.x kernel still works even after rebuilding initrd without zfs, right?  Then the regression is clearly after 6.6.x.
Comment 23 Edwin KM 2024-03-31 12:25:44 UTC
The kdump tool does not seem to kick-in at boot time.

I configured kdump using yast:
  zypper install yast2-kdump
  yast2 kdump
  
It seems to work if i run the "s", "u", "c" pattern. I see the text and a folder in /var/crash.

Restarted a couple of times to trigger a kernel panic. In such a case it just hangs a minute or so. And reboots afterwards.

Also changed the boot default line in yast boot loader to hide the boot logo (and acutally show the text).
  "splash=verbose resume=/dev/disk/by-id/nvme-WDC_PC_SN720_SDAQNTW-512G-1001_184523424461-part6 crashkernel=353M,high crashkernel=72M,low"

  
These tests actually corrupted my desktop text file with notes. It was never open at the "suc" test.
Created photo's of the panics. Not sure there is any pattern (but the output is garbled anyway)
Thinking about a reinstall or throwing this thing in the bin.
Comment 24 Edwin KM 2024-03-31 12:27:21 UTC
Created attachment 873948 [details]
crash 1
Comment 25 Edwin KM 2024-03-31 12:27:51 UTC
Created attachment 873949 [details]
crash 2
Comment 26 Edwin KM 2024-03-31 12:28:09 UTC
Created attachment 873950 [details]
crash 3
Comment 27 Takashi Iwai 2024-03-31 15:08:43 UTC
Thanks, now it's more interesting.  The crash logs show consistently about the NULL dereference of synaptics stuff.

I'm building a test kernel with an additional NULL check in OBS home:tiwai:bsc1219522 repo.  Once after the build finishes (takes an hour or so), it'll appear at
   http://download.opensuse.org/repositories/home:/tiwai:/bsc1219522/standard/

Please give it a try.  It'll still show a kernel warning with the stack trace (intentionally), but it shouldn't really crash, if my guess is correct.
Comment 28 Edwin KM 2024-04-01 07:08:42 UTC
I get a 404 for that link.

fwiw: If i remember correctly i disabled the touchpad that years ago in the bios.
Comment 29 Takashi Iwai 2024-04-01 08:09:18 UTC
(In reply to Edwin KM from comment #28)
> I get a 404 for that link.

It appears to be a problem of OBS web UI.  You can get the binaries via osc directly, instead.
  osc getbinaries home:tiwai:bsc1219522/kernel-source:kernel-default/standard/x86_64

> fwiw: If i remember correctly i disabled the touchpad that years ago in the
> bios.

Obviously synaptics stuff is still detected and enabled.  It might be the reason of the breakage, though; some inconsistent configuration that confused the kernel driver.
Comment 30 Edwin KM 2024-04-01 10:54:19 UTC
installed kernel-default-6.8.2-2.1.gb88b81e.x86_64.rpm and restarted about 20 times. Can not trigger the kernel panic.

Is it possible the issue is fixed recently? I assume we should look at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/drivers/input/mouse/synaptics.c. 

This commit seems suspicious because it seems a timing issue:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/input/mouse/synaptics.c?id=5030b2fe6aab37fe42d14f31842ea38be7c55c57


Not sure if that would even result in a kernel panic.
Comment 31 Takashi Iwai 2024-04-01 12:49:57 UTC
Could you check that you still have a kernel warning with stack trace from the patched test kernel?

And, since you disabled the thouchpad in BIOS, the touchpad itself doesn't work after the boot?

FWIW, the below is the test patch.
Comment 32 Takashi Iwai 2024-04-01 12:50:20 UTC
Created attachment 873973 [details]
Test fix patch
Comment 33 Edwin KM 2024-04-02 20:24:13 UTC
can you create a older kernel with the patch applied? Something like 6.8.1-1 (or older).
Comment 34 Takashi Iwai 2024-04-03 05:43:56 UTC
(In reply to Edwin KM from comment #33)
> can you create a older kernel with the patch applied? Something like 6.8.1-1
> (or older).

Why?
Comment 35 Takashi Iwai 2024-04-03 13:58:30 UTC
I'm asking it because my test patch is merely a workaround and for spotting out the cause.  It has to be reported to the upstream devs and address more properly.  That'll be the final fix.

If the bug is really about the NULL dereference there, the kernel warning should appear, and it has to be verified.  Then we need to understand why this NULL dereference happens at the first place.  My test kernel is provided for checking that.

So, please upload the dmesg output from the test kernel.  Then please confirm whether the touchpad is still actually enabled or not.  If the touchpad is dead,  it might be a half-baked probe of touchpad that caused the problem.
Comment 36 Edwin KM 2024-04-03 15:42:38 UTC
I understand. But i can not reproduce the crash with your latest test kernel. So, i think there a multiple scenarios:
* It is very recently fixed (> 6.8.1-1). Seems unlikely
* It is really timing related. This 6.8.1-1 kernel is maybe faster/slower and the crash does not happen.

So, i hope that a older kernel with your patch will crash again to generate more information.


About the touchpad. Sorry, i remembered incorrectly. The pad, the mouse buttons and the red mouse-tracker are indeed disabled.
But not in the BIOS, i disabled them in XFCE it seems.
Comment 37 Edwin KM 2024-04-03 15:43:34 UTC
Created attachment 874028 [details]
dmesg of the 6.8.2-2.1.gb88b81e kernel (no panics seen yet)
Comment 38 Takashi Iwai 2024-04-03 16:50:58 UTC
Indeed there appears no kernel WARNING in your log, so it's likely something else that made working.

OBS Kernel:stable contains also 6.8.2 kernel.  Could you check with that kernel instead?  It should work like mine.

Or it's really a timing issue, and in that case, it'd be tough to hunt properly.
Comment 39 Edwin KM 2024-04-03 19:02:47 UTC
not sure how to modify the osc command to retrieve the file :(
Next command downloads newer files:
osc getbinaries Kernel:stable/kernel-source:kernel-default/standard/x86_64

Assume it does match the site: https://build.opensuse.org/package/show/Kernel:stable/kernel-source

But this is also unknown to me.
Comment 40 Takashi Iwai 2024-04-04 05:51:07 UTC
You don't have to play with osc usually.  I suggested osc because the publishing on OBS was broken at that time.

Just grab kernel-default.rpm from the URL listed in comment 1, and install it via zypper install.  It was upgraded to 6.8.3 meanwhile.
Comment 41 Edwin KM 2024-04-04 19:51:10 UTC
i think i misunderstood. I expected 6.8.2-2.gb88b81e-default still to panic. Failed to find 6.8.3 so i installed  kernel-default-6.8.4-1.1.g1089550.x86_64.rpm . That one will also panic sometimes.

Will include the dmesg output of 6.8.2-2.gb88b81e-default.
Comment 42 Edwin KM 2024-04-04 19:51:43 UTC
Created attachment 874078 [details]
fail01_6.8.2-2.gb88b81e-default
Comment 43 Edwin KM 2024-04-04 20:01:46 UTC
Created attachment 874079 [details]
fail02_6.8.2-2.gb88b81e-default
Comment 44 Takashi Iwai 2024-04-05 08:30:49 UTC
OK, thanks.  The kernel WARNING with stack trace like the following is no real crash but it's intentionally showing the stack trace for debugging:
[    8.036410] ------------[ cut here ]------------
[    8.037093] WARNING: CPU: 2 PID: 662 at drivers/input/mouse/psmouse-base.c:123 psmouse_from_serio+0x1e/0x30

This appears in the both logs.  So far, so good.

Meanwhile, the first log followed another Oops messages:

[    8.094105] RIP: 0010:__mem_cgroup_charge+0xb/0xb0
[    8.095183] Code: 81 58 01 00 00 c3 cc cc cc cc 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 41 54 <41> 89 d4 55 48 89 fd 48 89 f7 53 e8 35 89 ff ff ba 01 00 00 00 48
[    8.096284] RSP: 0000:ffffb0f2407bbd88 EFLAGS: 00000246
[    8.097449] RAX: 0017ffffc0000000 RBX: ffffb0f2407bbe08 RCX: 0000000000000000
[    8.098584] RDX: 0000000000000cc0 RSI: ffff90d080071600 RDI: ffffe0e484c8b8c0
[    8.099787] RBP: ffffb0f2407bbe08 R08: ffffe0e484c8b8c0 R09: ffff90d3ef3404f0
[    8.100868] R10: 0000000000000000 R11: 0000000000000001 R12: ffff90d082919840
[    8.102187] R13: fffffffffffff000 R14: 0000000000000001 R15: 00007ff171d89000
[    8.103263]  do_anonymous_page+0x23e/0x6e0
[    8.104764]  ? pmdp_invalidate+0x130/0x130
[    8.105930]  __handle_mm_fault+0xb4d/0xe60
[    8.107361]  handle_mm_fault+0x17f/0x360
[    8.108505]  do_user_addr_fault+0x15b/0x670
[    8.109694]  exc_page_fault+0x71/0x160
[    8.110883]  asm_exc_page_fault+0x26/0x30
[    8.112031] RIP: 0033:0x7ff17376c9e4
[    8.113173] Code: 3a e0 c5 f8 77 c3 c5 fe 6f 4e 20 f7 c1 00 0e 00 00 75 65 49 89 c9 48 8d 4c 16 ff 48 83 ce 3f 4a 8d 7c 0e 01 48 29 f1 48 ff c6 <f3> a4 c4 c1 7e 7f 00 c4 c1 7e 7f 48 20 c5 f8 77 c3 66 66 2e 0f 1f
[    8.114373] RSP: 002b:00007ffc29992758 EFLAGS: 00010212
[    8.115565] RAX: 00007ff171d7d010 RBX: 0000555933e399e0 RCX: 0000000000014010
[    8.116710] RDX: 0000000000020000 RSI: 00007ff171e4c000 RDI: 00007ff171d89000
[    8.117976] RBP: 00007ffc29992830 R08: 00007ff171d7d010 R09: fffffffffff3d000
[    8.119134] R10: 186afaaa2a71579a R11: d9670ae1eee0759f R12: 0000555933e57bda
[    8.120302] R13: 00007ff1735f2be0 R14: 0000000000020000 R15: 00007ff171d7d010
[    8.121613]  </TASK>
[    8.122608] ---[ end trace 0000000000000000 ]---

This is unexpected, and this can be a real problem.  But as it's not visible in the second log, it might be intermittent.

In anyway, at least the above logs indicate that my guess was correct: it was the NULL dereference in synaptics driver.

I'm going to submit the fix patch; it might be no best fix, but better than crash, obviously.
Comment 45 Takashi Iwai 2024-04-05 08:45:58 UTC
The upstream submission
  https://lore.kernel.org/r/20240405084448.15754-1-tiwai@suse.de
Comment 46 Takashi Iwai 2024-04-05 08:50:54 UTC
... and I updated the OBS home:tiwai:bsc1219522 repo with 6.8.4 kernel now.
Comment 47 Edwin KM 2024-04-06 11:18:29 UTC
Thanks for all the help!

Can this bug also explain the random freezes while in desktop mode or is this code called only once (at boot)?

How to continue? The current change at least prevent a panic but i assume a driver still contains a bug/race-condition.
Can i see in which (suse) mainline kernel this workaround is upstreamed? Latest commit seems a while ago:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/drivers/input/mouse/psmouse-base.c?h=next-20240405
Should i file a bug somewhere?

I will include "ok_6.8.2-2.gb88b81e-default.txt". The same kernel version log, but that startup did not trigger the error.
Comment 48 Edwin KM 2024-04-06 11:18:58 UTC
Created attachment 874102 [details]
ok_6.8.2-2.gb88b81e-default
Comment 49 Takashi Iwai 2024-04-06 12:03:35 UTC
My fix patch is included in OBS Kernel:stable branch, so the later TW kernel will include it.  You can use the kernel from OBS Kernel:stable repo instead, too.

Let's keep testing with the kernel including my fix for a while, and see whether the crash happens later or not.  My wild guess is that it's an issue happening only in the early boot stage.  If a crash happens later, it's likely something else.

About the acceptance in the upstream: we just need to wait.  Nowadays the response is a bit slow in the input driver subsystem.