Bug 1212841

Summary: Lots of objtool warnings when building the nvidia driver
Product: [openSUSE] openSUSE Tumbleweed Reporter: Christophe Marin <christophe>
Component: X11 3rd Party DriverAssignee: Stefan Dirsch <sndirsch>
Status: CONFIRMED --- QA Contact: Stefan Dirsch <sndirsch>
Severity: Normal    
Priority: P3 - Medium CC: ddadap, jslaby, mbenes, vliaskovitis
Version: Current   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: zypper log
build.gz

Description Christophe Marin 2023-06-29 08:11:08 UTC
# rpm -qv nvidia-driver-G06-kmp-default
nvidia-driver-G06-kmp-default-535.54.03_k6.3.7_1-10.1.x86_64

# zgrep -c objtool zypper.log-20230627.xz
213230

extracts:

[rpm>   LD [M]  /usr/src/kernel-modules/nvidia-535.54.03-default/nvidia-modeset.o
[rpm> /usr/src/kernel-modules/nvidia-535.54.03-default/nvidia-modeset.o: warning: objtool: _nv000675kms+0x46: 'naked' return found in RETHUNK build
[rpm> /usr/src/kernel-modules/nvidia-535.54.03-default/nvidia-modeset.o: warning: objtool: _nv000676kms+0x46: 'naked' return found in RETHUNK build

[cut]

[rpm> /usr/src/kernel-modules/nvidia-535.54.03-default/nvidia.o: warning: objtool: _nv040720rm+0xb9: return with modified stack frame
[rpm> /usr/src/kernel-modules/nvidia-535.54.03-default/nvidia.o: warning: objtool: _nv013581rm+0xd9: stack state mismatch: reg1[5]=-1+0 reg2[5]=-2-64
[rpm> /usr/src/kernel-modules/nvidia-535.54.03-default/nvidia.o: warning: objtool: _nv034058rm+0x83: stack state mismatch: reg1[5]=-1+0 reg2[5]=-2-56
[rpm> /usr/src/kernel-modules/nvidia-535.54.03-default/nvidia.o: warning: objtool: _nv024689rm+0xb: missing int3 after ret
[rpm> /usr/src/kernel-modules/nvidia-535.54.03-default/nvidia.o: warning: objtool: _nv043791rm+0x4: missing int3 after ret

[cut]

then it finishes building and prints a couple last warnings such as:
[rpm> Skipping BTF generation for /usr/src/kernel-modules/nvidia-535.54.03-default/nvidia-peermem.ko due to unavailability of vmlinux
Comment 1 Miroslav Beneš 2023-06-29 08:30:04 UTC
Christophe, could you attach the full log, please?
Comment 2 Christophe Marin 2023-06-29 08:35:24 UTC
Created attachment 867880 [details]
zypper log

(In reply to Miroslav Beneš from comment #1)
> Christophe, could you attach the full log, please?

done
Comment 3 Stefan Dirsch 2023-06-29 08:41:06 UTC
I already discussed this with matz and Richard (because I mixed up objtool with objdump). Here are the outcomings so far.

- With an older nvidia driver source we see the same warnings. So it's not related to changes in the driver code.
- source code might be compiled with the wrong compiler options (not compatible to them?); when adding 'V=1" to make command
  I figured out that  "-mfunction-return=thunk-extern" and "-mindirect-branch=thunk-extern" is being used (coming from our 
  kernel sources somehow; is this new maybe?)
- the warnings might be related to changes in objtool or how we run it now
Comment 5 Stefan Dirsch 2023-06-29 08:46:40 UTC
Created attachment 867881 [details]
build.gz

Build log with "make ... V=1" so you also see the compiler options for building.
Comment 7 Stefan Dirsch 2023-06-29 09:02:56 UTC
@Christophe Just being curious. Are you actually using the driver RPMs (any issues with that") or were you just accidently stumbling across these build warnings? I'm wondering, whether we 've actively broken sth or now are just seing more warnings ...
Comment 8 Stefan Dirsch 2023-06-29 09:04:06 UTC
Adding our NVIDIA contact ...
@Daniel In case this rings a bell for you, please let us know ...
Comment 9 Christophe Marin 2023-06-29 09:36:57 UTC
(In reply to Stefan Dirsch from comment #7)
> @Christophe Just being curious. Are you actually using the driver RPMs (any
> issues with that") or were you just accidently stumbling across these build
> warnings? I'm wondering, whether we 've actively broken sth or now are just
> seing more warnings ...

Yes, I use the g06 driver with a NVidia RTX3060. Works flawlessly despite the warnings.
Comment 10 Stefan Dirsch 2023-08-02 09:04:02 UTC
(In reply to Christophe Marin from comment #9)
> (In reply to Stefan Dirsch from comment #7)
> > @Christophe Just being curious. Are you actually using the driver RPMs (any
> > issues with that") or were you just accidently stumbling across these build
> > warnings? I'm wondering, whether we 've actively broken sth or now are just
> > seing more warnings ...
> 
> Yes, I use the g06 driver with a NVidia RTX3060. Works flawlessly despite
> the warnings.

I can confirm on a Turing card (T400 4GB) , that I see no driver issues with that despite the warnings.
Comment 11 Christophe Marin 2023-11-22 17:47:35 UTC
(In reply to Stefan Dirsch from comment #8)
> Adding our NVIDIA contact ...
> @Daniel In case this rings a bell for you, please let us know ...

ping?
Comment 12 Miroslav Beneš 2023-11-22 18:18:26 UTC
I realized that I failed to add a comment here. Sorry about that.

Both warnings are about incorrect assembly code with respect to retpolines. Meaning that you very likely use plain "ret" instructions in your code while the whole module is built with retpolines/rethunks in mind. Objtool warns about that. There is no harm in terms of functionality. There are security implications though. RET macro should be used instead. There may be more problems too.
Comment 13 Stefan Dirsch 2023-11-26 15:15:14 UTC
I'm afraid this needs to be addressed by nVidia themselves. Daniel, could you have a look? Thanks!

NVIDIA-Linux-x86_64-545.29.06/kernel> grep -r -i retpo .
./Makefile:  SPECTRE_V2_RETPOLINE ?= 0
./Makefile:  KBUILD_PARAMS += NV_SPECTRE_V2=$(SPECTRE_V2_RETPOLINE)
./common/inc/nv-retpoline.h:#ifndef _NV_RETPOLINE_H_
./common/inc/nv-retpoline.h:#define _NV_RETPOLINE_H_
./common/inc/nv-retpoline.h:#define NV_RETPOLINE_THUNK NV_SPEC_THUNK
./common/inc/nv-retpoline.h:#define NV_RETPOLINE_THUNK NV_NOSPEC_THUNK
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(rax);
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(rbx);
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(rcx);
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(rdx);
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(rsi);
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(rdi);
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(rbp);
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(r8);
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(r9);
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(r10);
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(r11);
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(r12);
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(r13);
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(r14);
./common/inc/nv-retpoline.h:    NV_RETPOLINE_THUNK(r15);
./common/inc/nv-retpoline.h:#endif /* _NV_RETPOLINE_H_ */
./nvidia-modeset/nvidia-modeset-linux.c:#if !defined(CONFIG_RETPOLINE)
./nvidia-modeset/nvidia-modeset-linux.c:#include "nv-retpoline.h"
./nvidia/nv.c:#if !defined(CONFIG_RETPOLINE)
./nvidia/nv.c:#include "nv-retpoline.h"
Comment 14 Stefan Dirsch 2024-06-15 08:46:14 UTC
Comment for myself. Although this was reported for 535.54.03 and we saw this change in a later version 

--- NVIDIA-Linux-x86_64-550.54.14/kernel/nvidia/nv.c    2024-02-22 03:22:32.000000000 +0100
+++ NVIDIA-Linux-x86_64-550.67.00/kernel/nvidia/nv.c    2024-03-13 01:28:21.000000000 +0100
@@ -57,7 +57,11 @@
 #include "nv-dmabuf.h"
 #include "nv-caps-imex.h"
 
-#if !defined(CONFIG_RETPOLINE)
+/*
+ * Commit aefb2f2e619b ("x86/bugs: Rename CONFIG_RETPOLINE =>
+ * CONFIG_MITIGATION_RETPOLINE) in v6.8 renamed CONFIG_RETPOLINE.
+ */
+#if !defined(CONFIG_RETPOLINE) && !defined(CONFIG_MITIGATION_RETPOLINE)
 #include "nv-retpoline.h"
 #endif

the issue still exists in latest version 550.90.07.