Bug 1211568

Summary: general protection fault with kernel 6.3.1-2 and 6.3.2-1 (nouveau)
Product: [openSUSE] openSUSE Tumbleweed Reporter: Philippe Condé <conde.philippe>
Component: KernelAssignee: openSUSE Kernel Bugs <kernel-bugs>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: conde.philippe, patrik.jakobsson, tiwai, tzimmermann
Version: Current   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE Tumbleweed   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: error in journalctl
Error in journalctl 22:05
Erro journalctl 22:21

Description Philippe Condé 2023-05-20 09:27:22 UTC
Created attachment 867135 [details]
error in journalctl

I use KDE as De and boot with xen (4.17.1_02).
I have some locks: my session freeze with the keyboard not more working, I can still move the mouse but no action with it.I cannot change to another terminal with alt-ctrl-F1 and I need to do a hard reset.

This occurs mainly when I try to do a "save as" operation in vim or gwenview.
In the journalctl I find errors related to nouveau driver (see attachment) 
Here my hardware info
hpprol2:~ # inxi -GaC
CPU:
  Info: model: Intel Xeon E5-2620 0 socket: LGA2011 (Proc 2) note: check
    bits: 64 type: MT SMP arch: Sandy Bridge level: v2 built: 2010-12
    process: Intel 32nm family: 6 model-id: 0x2D (45) stepping: 7
    microcode: 0x710
  Topology: cpus: 1x cores: 1 tpc: 12 threads: 12 smt: enabled cache:
    L1: 2x 64 KiB (128 KiB) desc: d-1x32 KiB; i-1x32 KiB
    L2: 2x 256 KiB (512 KiB) desc: 1x256 KiB L3: 2x 15 MiB (30 MiB)
    desc: 1x15 MiB
  Speed (MHz): avg: 1995 min/max: N/A base/boost: 2000/4800 volts: 1.4 V
    ext-clock: 200 MHz cores: 1: 1995 2: 1995 3: 1995 4: 1995 5: 1995 6: 1995
    7: 1995 8: 1995 9: 1995 10: 1995 11: 1995 12: 1995 bogomips: 47879
  Flags: avx ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3
  Vulnerabilities:
  Type: itlb_multihit status: KVM: VMX unsupported
  Type: l1tf mitigation: PTE Inversion
  Type: mds status: Vulnerable: Clear CPU buffers attempted, no microcode;
    SMT Host state unknown
  Type: meltdown status: Unknown (XEN PV detected, hypervisor mitigation
    required)
  Type: mmio_stale_data status: Unknown: No mitigations
  Type: retbleed status: Not affected
  Type: spec_store_bypass status: Vulnerable
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
    sanitization
  Type: spectre_v2 mitigation: Retpolines, STIBP: disabled, RSB filling,
    PBRSB-eIBRS: Not affected
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: NVIDIA GK104GL [Quadro K4200] driver: nouveau v: kernel non-free:
    series: 470.xx+ status: legacy-active (EOL~2023/24) arch: Kepler code: GKxxx
    process: TSMC 28nm built: 2012-18 pcie: gen: 2 speed: 5 GT/s lanes: 16
    ports: active: DP-1,DVI-I-1 empty: DP-2 bus-ID: 0a:00.0 chip-ID: 10de:11b4
    class-ID: 0300 temp: 54.0 C
  Display: x11 server: X.Org v: 21.1.8 with: Xwayland v: 23.1.1
    compositor: kwin_x11 driver: X: loaded: modesetting unloaded: fbdev,vesa
    alternate: nouveau,nv,nvidia dri: nouveau gpu: nouveau display-ID: :0
    screens: 1
  Screen-1: 0 s-res: 3840x1080 s-dpi: 96 s-size: 1016x285mm (40.00x11.22")
    s-diag: 1055mm (41.54")
  Monitor-1: DP-1 pos: right model: Asus VG248 serial: F2LMQS017700
    built: 2015 res: 1920x1080 hz: 60 dpi: 92 gamma: 1.2
    size: 531x299mm (20.91x11.77") diag: 609mm (24") ratio: 16:9 modes:
    max: 1920x1080 min: 720x400
  Monitor-2: DVI-I-1 pos: primary,left model: Asus VG248
    serial: F5LMQS077420 built: 2015 res: 1920x1080 hz: 60 dpi: 92 gamma: 1.2
    size: 531x299mm (20.91x11.77") diag: 609mm (24") ratio: 16:9 modes:
    max: 1920x1080 min: 720x400
  API: OpenGL v: 4.3 Mesa 23.0.3 renderer: NVE4 direct-render: Yes

This problem is not present with kernel 6.2.10-1
Comment 1 Takashi Iwai 2023-05-22 15:40:06 UTC
Looks like a problem of nouveau.  There was another report (bug 1211217) showing a similar problem but in a different code path.  Both look like some memory corruption, though.

It was reported to the upstream at
  https://gitlab.freedesktop.org/nouveau/mesa/-/issues/70
and it'd be good if you can join there and keep tracking.
Comment 2 Takashi Iwai 2023-05-23 09:51:40 UTC
Just to be sure: could you check whether you have kernel-firmware-nvidia package installed?
If not, please try to install it, rebuild initrd and reboot/retest.
Comment 3 Philippe Condé 2023-05-23 11:32:49 UTC
Hello,

Yes thsi firmware is installed
hpprol2:~ # rpm -qa | grep -i kernel
kernel-firmware-qlogic-20230427-1.1.noarch
kernel-install-tools-0.3.0-2.2.x86_64
kernel-firmware-iwlwifi-20230427-1.1.noarch
kernel-firmware-mediatek-20230427-1.1.noarch
purge-kernels-service-0-9.4.noarch
kernel-firmware-ueagle-20230427-1.1.noarch
kernel-firmware-media-20230427-1.1.noarch
nfs-kernel-server-2.6.3-39.1.x86_64
kernel-firmware-mellanox-20230427-1.1.noarch
kernel-firmware-platform-20230427-1.1.noarch
kernel-default-devel-6.3.1-1.1.x86_64
kernel-firmware-nvidia-20230427-1.1.noarch
.....
eg
Comment 4 Takashi Iwai 2023-05-23 11:44:04 UTC
OK, then please check the kernel in OBS home:tiwai:bsc1211217 repo.  It has a backport of the patch suggested in the upsteram bug tracker.
Comment 5 Philippe Condé 2023-05-23 14:19:25 UTC
Hello,

the link to 
https://download.opensuse.org/repositories/home:/tiwai:/bsc1211217/standard/x86_64/kernel-default-6.3.3-1.1.gee20d1b.x86_64.rpm gives an error: 404 Not found

Many thanks in advance
Philippe
Comment 6 Takashi Iwai 2023-05-23 15:04:33 UTC
Currently download.o.o seems to have a problem.
You can download directly via osc CLI, instead, though:
  % osc getbinaries home:tiwai:bsc1211217/kernel-default/standard/x86_64
Comment 7 Philippe Condé 2023-05-23 20:42:02 UTC
Created attachment 867174 [details]
Error in journalctl  22:05

Hello,

I installed kernel-6.3.3-1.gee20d1b-default.
I tested it and had a lock today at 22:05.using game PysolFC
I rebooted  and saved the error found in journalctl.
I had a second lock a 22:21 when I was using vim looking at the error
I'll attach also the second error.

Regards
Philippe
.
Comment 8 Philippe Condé 2023-05-23 20:43:10 UTC
Created attachment 867175 [details]
Erro journalctl 22:21

Second lock today using vim
Comment 9 Takashi Iwai 2023-05-24 07:27:29 UTC
Thanks, that was more or less expected, and I see your commenting the upstream bug tracker.  Let's continue there.
  https://gitlab.freedesktop.org/drm/nouveau/-/issues/213
Comment 10 Philippe Condé 2023-06-27 06:35:51 UTC
Solved  in kernel 6.3.9. Correction in nouveau