Bug 1225968 - Since kernel 6.9, applications running on only 1 core
Summary: Since kernel 6.9, applications running on only 1 core
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: x86-64 Linux
: P5 - None : Major (vote)
Target Milestone: ---
Assignee: openSUSE Kernel Bugs
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-06-04 23:05 UTC by Ivan Topolsky
Modified: 2024-06-20 11:38 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
tiwai: needinfo?


Attachments
excerpt of the journal on kernel 6.9.1 (7.68 KB, text/plain)
2024-06-04 23:05 UTC, Ivan Topolsky
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ivan Topolsky 2024-06-04 23:05:48 UTC
Created attachment 875310 [details]
excerpt of the journal on kernel 6.9.1

(based on forum thread: https://forums.opensuse.org/t/since-kernel-6-9-running-on-only-1-core/175460)

Since updating my openSUSE Tumbleweed install from kernel 6.8.9 to kernels 6.9.x, all my applications (such as firefox) run on mostly only 1 core of the CPU.

CPU is AMD FX 8350 8 core (as 4 modules of 2 half-cores which share their FPU)
Motherboard is GA-990FXA-UD7 (rev 1.x) (firmware is BIOS with some EFI compatibility layer. Not UEFI like rev 3.x)

on kernels 6.9.1 and 6.9.3, while smpboot is initialising the other cores, I get a bunch of errors, which disappear if I boot into version 6.8.9 instead (See linked thread / attachment).

This is then followed by:
> boot.6.9.txt:jun 03 10:02:33 saturn kernel: mtrr: your CPUs had inconsistent variable MTRR settings
> boot.6.9.txt:jun 03 10:02:33 saturn kernel: mtrr: probably your BIOS does not setup all CPUs.
> boot.6.9.txt:jun 03 10:02:33 saturn kernel: mtrr: corrected configuration.

What would be the next best step to gather information useful to understand bug?
Comment 1 Ivan Topolsky 2024-06-05 08:51:47 UTC
On EndeavourOS, gmhh has a similar outcome (running on 1 core only) with a very similar setup.

https://forum.endeavouros.com/t/linux-6-9-seems-stuck-on-one-core/55979/14

Machine:
Type: Desktop Mobo: Gigabyte model: GA-970A-UD3 serial:
BIOS: Award v: F8f date: 12/16/2013
CPU:
Info: 6-core model: AMD FX-6300 bits: 64 type: MT MCP cache: L2: 6 MiB
Speed (MHz): avg: 1400 min/max: 1400/3500 cores: 1: 1400 2: 1400 3: 1400
4: 1400 5: 1400 6: 1400

(CPU is same Piledriver family, motherboard is same era and could potentially be another "Hybrid EFI" firmware).

Seems that this problem is specific to the kernel, not the distro.
And happens when initialising additional core of Bulldozer/Piledriver gen AMD CPUs.
(Potentially on BIOS-based motherboard?)
Comment 2 Smith 2024-06-06 08:05:56 UTC
This was reported to the LKML from the Arch side of things. It seems there were patches sent there.
https://lore.kernel.org/all/7skhx6mwe4hxiul64v6azhlxnokheorksqsdbp7qw6g2jduf6c@7b5pvomauugk/
Comment 3 Ivan Topolsky 2024-06-06 09:39:54 UTC
Thanks to pointing out.

Hopefully the patch will make it eventually into upstream and then into a opensuse RPM update.
Comment 4 Michiel Janssens 2024-06-06 17:10:49 UTC
Upstream patches seem to be on it's way.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a693b9c95abd4947c2d06e05733de5d470ab6586

Interesting tag: 'x86-urgent-2024-06-02'
So hopefully they will be released in v6.10 kernel.
Comment 5 Smith 2024-06-12 15:40:12 UTC
Fix queued up and will certainly show up in a 6.9 release. Won't be too long a wait now.
https://lore.kernel.org/stable/2024061206-avatar-company-2f39@gregkh/
Comment 6 Ivan Topolsky 2024-06-12 15:41:23 UTC
good news:

Christian Heusel, on the Arch issue (https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/56#note_191049):

> Fix is now queued for 6.9.5: https://lore.kernel.org/all/2024061206-avatar-company-2f39@gregkh/

So we could get this fix soon.

Yay!
Comment 7 Ivan Topolsky 2024-06-12 15:42:25 UTC
(In reply to Smith from comment #5)
> Fix queued up and will certainly show up in a 6.9 release. Won't be too long
> a wait now.
> https://lore.kernel.org/stable/2024061206-avatar-company-2f39@gregkh/

You're faster on the keyboard than me! :-D
Comment 8 Takashi Iwai 2024-06-17 08:50:40 UTC
6.9.5 is found in OBS Kernel:stable repo.
  http://download.opensuse.org/repositories/Kernel:/stable/standard/

Can anyone confirm that it works?
Comment 9 Ivan Topolsky 2024-06-17 09:14:09 UTC
(In reply to Takashi Iwai from comment #8)
> 6.9.5 is found in OBS Kernel:stable repo.
>   http://download.opensuse.org/repositories/Kernel:/stable/standard/
> 
> Can anyone confirm that it works?

Okay, I'll install the RPM and report once rebooted.

And, drumroll...
Comment 10 Ivan Topolsky 2024-06-17 09:48:52 UTC
(In reply to Takashi Iwai from comment #8)
> 6.9.5 is found in OBS Kernel:stable repo.
>   http://download.opensuse.org/repositories/Kernel:/stable/standard/
> 
> Can anyone confirm that it works?

...and one reboot later:

Success: with kernel 6.9.5 all core are correctly initialized, and process are correctly spread between cores.

Success!


In the logs:
- the crashes (in "sched_cpu_starting" and "build_sched_domains") that 6.9.1 introduced are gone.
- the errors "__common_interrupt: 1.55 No irq handler for vector" are still present but don't seem to affect core initialisation.

(outside the always preset, and completely unrelated to the present issue, broken SATA and and "ACPI Error: AE_NOT_FOUND"s)

Thanks to everybody involved.
Looking forward to this package arriving in the main channel.
Comment 11 Ivan Topolsky 2024-06-20 11:38:21 UTC
6.9.5 has now made it to the main channel.

Tested it: cores now correctly initialised on default Tumbleweed kernel.

Thanks!