Bug 1212740 - sometime inoperable after start run
Summary: sometime inoperable after start run
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Leap 15.5
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: openSUSE Kernel Bugs
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-26 15:58 UTC by Peter Thoms
Modified: 2023-07-25 12:04 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Journal of operable start run (235.51 KB, text/plain)
2023-06-26 15:58 UTC, Peter Thoms
Details
Journal sequence of inoperable start run (241.60 KB, text/plain)
2023-06-26 16:00 UTC, Peter Thoms
Details
hwinfo, start trial succsessful (760.58 KB, text/plain)
2023-07-17 15:58 UTC, Peter Thoms
Details
kdump README.txt.gz (170 bytes, application/gzip)
2023-07-18 19:47 UTC, Peter Thoms
Details
kdump System.map-5.14.21-150500.53-default.gz (996.41 KB, application/gzip)
2023-07-18 19:48 UTC, Peter Thoms
Details
kdump vmlinux-5.14.21-150500.53-default.gz (16.72 MB, application/gzip)
2023-07-18 19:49 UTC, Peter Thoms
Details
kdump dmesg.txt (64.56 KB, text/plain)
2023-07-18 20:52 UTC, Peter Thoms
Details
pci info (4.07 KB, text/plain)
2023-07-19 14:50 UTC, Peter Thoms
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Thoms 2023-06-26 15:58:51 UTC
Created attachment 867822 [details]
Journal of operable start run

Sometimes after start run the Laptop is inoperable, seems without some drivers, very slow and sticky behavior
Comment 1 Peter Thoms 2023-06-26 16:00:48 UTC
Created attachment 867823 [details]
Journal sequence of inoperable start run
Comment 2 Peter Thoms 2023-06-26 22:12:32 UTC
Just I found a difference in the Journal,
Please filter "wlan"
Comment 3 Takashi Iwai 2023-07-10 15:56:50 UTC
The description is too ambiguous and it's difficult to judge what is wrong.
How often is "sometimes"?  Can you reproduce it reliably?
And, is it a regression from the earlier releases?

(In reply to Peter Thoms from comment #2)
> Just I found a difference in the Journal,
> Please filter "wlan"

What does it mean exactly?
Comment 4 Peter Thoms 2023-07-10 16:50:33 UTC
Hello, Thank you very much for working on the bug!

Yes, it is an regression to 15.4
Estimated about 5 start up and one fail.
Comment 5 Takashi Iwai 2023-07-11 06:20:54 UTC
The symptom is still not clear.  It happens occasionally after a few reboots, even if you don't touch systems (no update, etc)?

For verifying whether it's a kernel problem, you can try to install the old Leap 15.4 kernel on the Leap 15.5 system.  Grab kernel-default.rpm, kernel-default-extra.rpm and kernel-default-optional.rpm from OBS Kernel:SLE15-SP4 repo, and install them with "zypper install" with --oldpackage option. Then boot with this kernel, and verify that the problem doesn't happen with it.
Comment 6 Peter Thoms 2023-07-11 13:16:14 UTC
> The symptom is still not clear.  It happens occasionally after a few
> reboots, even if you don't touch systems (no update, etc)?

yes, exactly

> For verifying whether it's a kernel problem, you can try to install the old
> Leap 15.4 kernel on the Leap 15.5 system.  Grab kernel-default.rpm,
> kernel-default-extra.rpm and kernel-default-optional.rpm from OBS
> Kernel:SLE15-SP4 repo, and install them with "zypper install" with
> --oldpackage option. Then boot with this kernel, and verify that the problem
> doesn't happen with it.

Oh... ok, could you please give more detailed commands for repros and packages?
Comment 7 Takashi Iwai 2023-07-11 13:51:04 UTC
You can get the SLE15-SP4 kernel from the latest git branch in OBS Kernel:SLE15-SP4 repo:
  http://download.opensuse.org/repositories/Kernel:/SLE15-SP4/pool/

Download kernel-default.rpm, kernel-default-extra.rpm and kernel-default-optional.rpm from x86_64 directory (the actual file names are with version numbers). Then install those files via
  % zypper in --oldpackage kernel-default*.rpm

But, maybe it's better to increase the number of installable kernels before the installation. Edit /etc/zypp/zypp.conf, and add more entries to the line defining "multiversion.kernels", e.g.
  multiversion.kernels = latest,latest-1,latest-2,latest-3,running

Then reboot and retest with that kernel.
Once after testing, you can remove the kernel simply by "zypper rm kernel-default-xxx" where specifying the kernel package you'd like to remove.
Comment 8 Peter Thoms 2023-07-11 16:32:34 UTC
Ok, the new config is active.
After 20 trials 15.4 I will start up with 15.5 and I resume.
Comment 9 Peter Thoms 2023-07-13 12:45:27 UTC
Ok, today:
After 10 start ups the last one failed
Comment 10 Peter Thoms 2023-07-16 04:28:38 UTC
1 fail of 17 trials
Comment 11 Peter Thoms 2023-07-16 07:31:13 UTC
1/3, 1 fail out of 3 trials
Comment 12 Takashi Iwai 2023-07-17 13:23:11 UTC
Unfortunately it's not helpful if you just write "it doesn't work"...

Basically we need:
- A more clear description of the symptom;
  does it happen only with the graphics UI?  e.g. can you remote login and test whether it equally slows down?

- Whether any relevant errors are seen;
  kernel log, journal, or other logs.  Do you see any relevant errors?

- What *exactly* did you test?  How?
  Please describe, so that other people can reproduce the very same problem with his machine.

And about the test with SLE15-SP4 kernel:

(In reply to Peter Thoms from comment #9)
> Ok, today:
> After 10 start ups the last one failed
(In reply to Peter Thoms from comment #10)
> 1 fail of 17 trials
(In reply to Peter Thoms from comment #11)
> 1/3, 1 fail out of 3 trials

So all those are the results with SLE15-SP4 kernel on top of Leap 15.5 system, right?  Or what were those?

If SP4 kernel also fails, you can try to install older SLE15-SP4 kernel instead -- the version you had and worked in the past.  It must be found in the standard distribution repo in OBS.
Comment 13 Peter Thoms 2023-07-17 15:33:48 UTC
Ok,
I am always using the Power Button to start up to KDE.

Sorry, there is no remote possible, the ethernet via wlan is down.
By the way, even the internal speaker seem to be off, indicated by a small icon.

What about a command to serve your wishes and intercept the fault corresponding to the uploaded journal?
Comment 14 Takashi Iwai 2023-07-17 15:41:30 UTC
As a start, please give hwinfo output from the good running system (with both Leap 15.4 and Leap 15.5 kernels).  Then give the dmesg outputs from both kernels.
Comment 15 Peter Thoms 2023-07-17 15:58:50 UTC
Created attachment 868253 [details]
hwinfo, start trial succsessful

uname-r 5.14.21-150500.53-default
Comment 16 Peter Thoms 2023-07-18 12:59:34 UTC
Summary of fail start:

Start a terminal possible
"Enter new line" possible
sudo hanging in the new line, no action
also
atop
sudo hwinfo > peter (only generating an empty file)

kinfocenter is starting
no kill and no term for sudo, atop etc. possible

kinfocenter/network is hanging, no quit possible

shut down runs into a black screen with cursor blinking on Pos 1
Power down with Power button

I got some screenshots.
Please ask for if might be helpful.
Comment 17 Takashi Iwai 2023-07-18 13:39:39 UTC
Hm, this doesn't sound like a usual graphics problem, then.

The best thing we can do for now would be to set up kdump, and trigger manually via alt-sysrq-c, and get the crash dump.
Comment 18 Peter Thoms 2023-07-18 19:30:59 UTC
Ok, I got a capture.
How to transfer the dir?
Comment 19 Peter Thoms 2023-07-18 19:47:42 UTC
Created attachment 868296 [details]
kdump README.txt.gz
Comment 20 Peter Thoms 2023-07-18 19:48:33 UTC
Created attachment 868297 [details]
kdump System.map-5.14.21-150500.53-default.gz
Comment 21 Peter Thoms 2023-07-18 19:49:53 UTC
Created attachment 868298 [details]
kdump vmlinux-5.14.21-150500.53-default.gz
Comment 22 Peter Thoms 2023-07-18 20:52:00 UTC
Created attachment 868299 [details]
kdump dmesg.txt
Comment 23 Peter Thoms 2023-07-18 20:54:08 UTC
vmcore is not transferable
Do you need vmcore? Then I will try to put it in the cloud
Comment 24 Takashi Iwai 2023-07-19 06:36:04 UTC
Thanks.  You can't upload the full crash contents as they are too big, and README, System.map and vmlinux are superfluous (they are as same as found in the kernel package).  The dmesg is helpful, and at least it doesn't show anything suspicious.

That is, we see no sign of a kernel crash or such at this moment.  You can check it with "crash" program (included in crash package) for further investigation in your side, too.  There one can see which processes are running, and examine the stack traces.

OTOH, I wonder whether this is rather related to desktop instead.  When the problem happens, can you try to switch to VT1 via ctrl-alt-F1, and see whether things work?  Also, can you try to kill the desktop forcibly?

But, above all, please retest with the freshly released Leap 15.5 update kernel (5.14.21-150500.55.7).
Comment 25 Peter Thoms 2023-07-19 14:06:14 UTC
Many thanks!

Could it be that this Laptop only misses an real Handshake at the right place?

Maybe some of my hardware is straight malade.
When it appears both the wlan and voice output (speaker) are out of function
Can you see what both have collectively?

Please give command set for switch over to the newest kernel.
Comment 26 Peter Thoms 2023-07-19 14:50:31 UTC
Created attachment 868326 [details]
pci info

Could it be helpful having a focus at PCI?
Comment 27 Peter Thoms 2023-07-19 14:52:07 UTC
Now linux got a simple update command
Comment 28 Peter Thoms 2023-07-19 18:29:43 UTC
(In reply to Takashi Iwai from comment #24)
...
> OTOH, I wonder whether this is rather related to desktop instead.  When the
> problem happens, can you try to switch to VT1 via ctrl-alt-F1, and see
> whether things work?  Also, can you try to kill the desktop forcibly?

Ok,
to start VT1 I have to use Strg-Fn-Alt-F1, now I`m able to start it.

What about to kill the desktop, what is the name of via VT1?
Comment 29 Takashi Iwai 2023-07-20 06:09:06 UTC
At first, check whether you can work normally on VT1.  For example, the command you tried beforehand and stalled (sudo, top, whatever).  Possibly you can face the same problem on VT1, too.

If everything seems working on VT1, then simply kill the whole desktop session.
If you're running X, killing X.org process will do it.  (Or pressing ctrl-alt-backspace *quickly* twice on the desktop screen might work, too -- it's self-killing X server.)
Comment 30 Peter Thoms 2023-07-22 10:06:57 UTC
Hello, here some news, quite short:
After starting VT1 I saw only the cursor blinking at Pos1, no commands possible

So I switched back to the gui and started a teminal and got there a prompt
Comment 31 Peter Thoms 2023-07-23 05:43:29 UTC
Today a strange situation:
new:
bluetooth mouse inavtive all the time, only mousepad usable
But the bluetooth setup is working with my fujitsu-pc

Fail start: no VT1 possible, no crashdump possible
Comment 32 Peter Thoms 2023-07-23 08:49:26 UTC
I think, all my external usb ports passed away
Comment 33 Peter Thoms 2023-07-23 11:18:18 UTC
ok,
the usb Hardware on the motherboard passed away.
WIN7 runs all the time well, but newly without USB.

If you are more interested in the actual linux behavior, I will hold on debugging.

The future of this laptop will be selling it as replacement (with defect motherboard: USB inactive).
Comment 34 Peter Thoms 2023-07-25 11:57:09 UTC
Hardware malade
Thank you for patience!
Comment 35 Takashi Iwai 2023-07-25 12:04:37 UTC
OK, thank you for the information update.