Bug 1224459 - WARNING at apply_retpolines when using NVIDIA proprietary driver - 470 series
Summary: WARNING at apply_retpolines when using NVIDIA proprietary driver - 470 series
Status: RESOLVED NORESPONSE
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: X11 3rd Party Driver (show other bugs)
Version: Current
Hardware: x86-64 openSUSE Tumbleweed
: P3 - Medium : Normal (vote)
Target Milestone: ---
Assignee: Stefan Dirsch
QA Contact: Stefan Dirsch
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-05-19 12:41 UTC by Ionut Nechita
Modified: 2024-07-18 12:25 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
sndirsch: needinfo? (ionut_n2001)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ionut Nechita 2024-05-19 12:41:45 UTC
Upstream NVIDIA Bug Ticket:

https://forums.developer.nvidia.com/t/warning-at-apply-retpolines-when-using-nvidia-proprietary-driver-470-series/292638


Downstream NVIDIA Bug Ticket:

[    5.884307] ------------[ cut here ]------------
[    5.884310] WARNING: CPU: 13 PID: 714 at arch/x86/kernel/alternative.c:654 apply_retpolines+0x350/0x400
[    5.884316] Modules linked in: xfs(+) mac80211 ac97_bus snd_pcm_dmaengine snd_hda_codec snd_rpl_pci_acp6x snd_acp_pci snd_hda_core snd_acp_legacy_common kvm snd_pci_acp6x snd_hwdep snd_pci_acp5x snd_pcm asus_nb_wmi cfg80211 snd_rn_pci_acp3x snd_timer rapl wmi_bmof snd_acp_config pcspkr libarc4 snd snd_soc_acpi k10temp soundcore snd_pci_acp3x i2c_piix4 asus_wireless amd_pmc joydev mac_hid nvme_fabrics efi_pstore nfnetlink dmi_sysfs ip_tables x_tables hid_asus asus_wmi sparse_keymap platform_profile usbkbd usbhid amdgpu crct10dif_pclmul crc32_pclmul polyval_clmulni amdxcp i2c_algo_bit polyval_generic drm_ttm_helper ghash_clmulni_intel nvme drm_exec sha256_ssse3 sha1_ssse3 gpu_sched hid_multitouch aesni_intel xhci_pci nvme_core drm_suballoc_helper hid_generic xhci_pci_renesas ucsi_acpi crypto_simd drm_buddy cryptd xhci_hcd typec_ucsi drm_display_helper ccp nvme_auth sp5100_tco typec i2c_hid_acpi video i2c_hid wmi hid btrfs blake2b_generic libcrc32c xor raid6_pq msr autofs4
[    5.884393] CPU: 13 PID: 714 Comm: modprobe Tainted: P           O       6.9.0-rc7-x64v1+ #3 openSUNLIGHT Rolling
[    5.884396] Hardware name: ASUSTeK COMPUTER INC. ROG Zephyrus G14 GA401QM_GA401QM/GA401QM, BIOS GA401QM.415 08/11/2023
[    5.884397] RIP: 0010:apply_retpolines+0x350/0x400
[    5.884400] Code: 40 80 fe e9 0f 85 96 00 00 00 39 ca 0f 8e 40 ff ff ff 8d 71 01 48 63 c9 c6 44 0d c0 cc e9 51 ff ff ff 41 b8 e0 ff ff ff eb bb <0f> 0b e9 94 fd ff ff 0f 0b 0f b6 8d 69 ff ff ff 89 ce 83 e6 f0 40
[    5.884402] RSP: 0018:ffffa644c0af7950 EFLAGS: 00010212
[    5.884404] RAX: 0000000001867640 RBX: ffffffffc2160654 RCX: 0000000000000005
[    5.884406] RDX: 0000000000000005 RSI: 0000000000000000 RDI: 0000000000000000
[    5.884407] RBP: ffffa644c0af7a20 R08: 0000000000000000 R09: 0000000000000000
[    5.884408] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    5.884409] R13: ffffa644c0af7970 R14: ffffffffc434cdb4 R15: ffffffffc434cd60
[    5.884411] FS:  00007cb6ad272740(0000) GS:ffff8c404e880000(0000) knlGS:0000000000000000
[    5.884413] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.884414] CR2: 0000557df1daa410 CR3: 00000001104c6000 CR4: 0000000000f50ef0
[    5.884416] PKRU: 55555554
[    5.884417] Call Trace:
[    5.884418]  <TASK>
[    5.884421]  ? show_regs.cold+0x19/0x20
[    5.884426]  ? apply_retpolines+0x350/0x400
[    5.884428]  ? __warn.cold+0xc3/0x11d
[    5.884431]  ? apply_retpolines+0x350/0x400
[    5.884433]  ? report_bug+0xed/0x160
[    5.884438]  ? handle_bug+0x51/0xa0
[    5.884441]  ? exc_invalid_op+0x18/0x80
[    5.884444]  ? asm_exc_invalid_op+0x1b/0x20
[    5.884449]  ? apply_retpolines+0x350/0x400
[    5.884452]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.884457]  module_finalize+0x1b9/0x330
[    5.884461]  ? add_kallsyms+0x2bb/0x350
[    5.884466]  load_module+0x1734/0x1dd0
[    5.884469]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.884474]  ? vfree.part.0+0xf0/0x280
[    5.884477]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.884481]  ? kfree+0x2a2/0x2f0
[    5.884488]  init_module_from_file+0x96/0x100
[    5.884491]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.884494]  ? init_module_from_file+0x96/0x100
[    5.884500]  idempotent_init_module+0x11c/0x2b0
[    5.884504]  __x64_sys_finit_module+0x64/0xd0
[    5.884507]  x64_sys_call+0x1d6e/0x25c0
[    5.884510]  do_syscall_64+0x7e/0x180
[    5.884513]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.884516]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.884518]  ? ksys_lseek+0x80/0xd0
[    5.884524]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.884527]  ? syscall_exit_to_user_mode+0x81/0x270
[    5.884531]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.884534]  ? do_syscall_64+0x8b/0x180
[    5.884537]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.884539]  ? do_user_addr_fault+0x339/0x660
[    5.884541]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.884543]  ? irqentry_exit_to_user_mode+0x76/0x270
[    5.884546]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.884548]  ? irqentry_exit+0x43/0x50
[    5.884550]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.884552]  ? exc_page_fault+0x96/0x1c0
[    5.884555]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    5.884557] RIP: 0033:0x7cb6ac911bcd
[    5.884559] Code: 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1b d2 0d 00 f7 d8 64 89 01 48
[    5.884561] RSP: 002b:00007ffe483a5418 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    5.884563] RAX: ffffffffffffffda RBX: 00005e3822c24170 RCX: 00007cb6ac911bcd
[    5.884564] RDX: 0000000000000004 RSI: 00005e3822c1e110 RDI: 0000000000000003
[    5.884566] RBP: 00005e3822c1e110 R08: 00007cb6ac9efb20 R09: 00005e3822c241f0
[    5.884567] R10: 0000000000000050 R11: 0000000000000246 R12: 0000000000040004
[    5.884568] R13: 0000000000000000 R14: 00005e3822c1e170 R15: 00005e3822c1ddd0
[    5.884572]  </TASK>
[    5.884573] ---[ end trace 0000000000000000 ]---

Driver: nvidia-glG05-470.239.06-60.1 - 470 Series

OS: TW 20240510

Kernel: 6.9.x stable version and mainline version(6.9.0+)
Comment 1 Stefan Dirsch 2024-05-19 15:41:47 UTC
Hmm. I don't know anything about that. Is this related to boo#1212841 maybe?
Comment 2 Ionut Nechita 2024-05-25 12:07:47 UTC
Hi Stefan,

Is not related to boo#1212841, it would be a problem related to the proprietary driver.
After the new video driver from NVIDIA will be integrated, this ticket should be taken as a reference for the internal tests in openQA.
Comment 3 Stefan Dirsch 2024-05-26 17:38:46 UTC
Ok. So this is a regression caused by updating to kernel 6.9.0+. Mabye our kernel guys have a clue about this.

(In reply to Ionut Nechita from comment #2)
> Is not related to boo#1212841, it would be a problem related to the
> proprietary driver.
> After the new video driver from NVIDIA will be integrated, this ticket
> should be taken as a reference for the internal tests in openQA.

I find this comment confusing and not helpful at all.
Comment 4 Stefan Dirsch 2024-06-12 12:30:48 UTC
Reassigning to kernel guys in hope to receive some feedback. ;-)
Comment 5 Takashi Iwai 2024-06-15 07:55:37 UTC
Through a quick glance, I see no big code change in the relevant part triggering the WARNING.

Might it be because of CONFIG_* renames which confuses Nvidia driver builds?  e.g. CONFIG_RETPOLINE was renamed to CONFIG_MITIGATION_RETPOLINE.
Comment 6 Stefan Dirsch 2024-06-15 08:34:25 UTC
Wow! Thanks for this hint. When I differ the previous driver with the current one I see:

--- NVIDIA-Linux-x86_64-470.239.06/kernel/nvidia/nv.c   2024-02-03 07:26:18.000000000 +0100
+++ NVIDIA-Linux-x86_64-470.256.02/kernel/nvidia/nv.c   2024-05-02 17:16:35.000000000 +0200
[...]
-#if !defined(CONFIG_RETPOLINE)
+/*
+ * Commit aefb2f2e619b ("x86/bugs: Rename CONFIG_RETPOLINE =>
+ * CONFIG_MITIGATION_RETPOLINE) in v6.8 renamed CONFIG_RETPOLINE.
+ */
+#if !defined(CONFIG_RETPOLINE) && !defined(CONFIG_MITIGATION_RETPOLINE)
 #include "nv-retpoline.h"
 #endif

So I guess this issue has been fixed with current G05 driver.
Comment 7 Stefan Dirsch 2024-06-15 08:39:29 UTC
@Ionut Please test again with 470.256.02 driver (current G05 package) and let me know whether this fixes the issue. Thanks!
Comment 8 Stefan Dirsch 2024-07-08 13:57:12 UTC
@Ionut Any news on that one?
Comment 9 Jiri Slaby 2024-07-09 08:19:47 UTC
This is an nvidia issue, not the kernel. And hopefully fixed...
Comment 10 Stefan Dirsch 2024-07-18 12:25:15 UTC
Still waiting for a response for more than a month now. Please reopen once you can provide the requested feedback. Thanks.