Bugzilla – Bug 1167245
[Build 20200318] kernel segfault - modprobe mac80211_hwsim
Last modified: 2023-04-26 13:50:25 UTC
## Observation openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-extra_tests_on_gnome@64bit fails in [hwsim_wpa2_enterprise_setup](https://openqa.opensuse.org/tests/1209293/modules/hwsim_wpa2_enterprise_setup/steps/9) ## Test suite description Maintainer: asmorodskyi, okurz. Extra tests which were designed to run on gnome ## Reproducible Fails since (at least) Build [20200309](https://openqa.opensuse.org/tests/1199785) ## Expected result Last good: [20200307](https://openqa.opensuse.org/tests/1198559) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=64bit&test=extra_tests_on_gnome&version=Tumbleweed)
Not everything starting with k is KDE, reassigning.
Hrm, this is weird. I see the only difference between the last good one (20200307) and the first bad one (20200309) releases wrt kernel is kernel-default-devel-5.5.7-1.1.x86_64.rpm vs kernel-default-devel-5.5.7-1.2.x86_64.rpm. And, both kernels are built from the very same git commit ID, only the other toolchain & co might be different. Where can I get the serial logs for those two tests?
*** Bug 1166979 has been marked as a duplicate of this bug. ***
(In reply to Takashi Iwai from comment #2) > Hrm, this is weird. I see the only difference between the last good one > (20200307) and the first bad one (20200309) releases wrt kernel is The last good pointer is actually wrongly identified by openQA... factually, last good was 20200314 - first failure for this was 20200316 0316 contained a kernel upgrade from 5.5.7 to 5.5.9
OK, thanks, that makes sense. Through a quick glance, there is no obvious patch that may trigger the bug, so we might need bisection...
Some facts I found: - The issue can be reproduced easily on a local VM - 5.5.11 KOTD still shows the same Oops - 5.6-rc7 works fine So it looks like an issue specific to 5.5.y stable. And, the bisection pointed the patch patches.kernel.org/5.5.9-066-driver-core-Call-sync_state-even-if-supplier-ha.patch I'm building a test kernel with the revert in OBS home:tiwai:bsc1167245.
The crash is in device_links_flush_sync_list: if (dev->bus->sync_state) dev->bus->sync_state(dev); dev->bus is NULL, sync_state is at offset 0x48, hence the crash: > BUG: kernel NULL pointer dereference, address: 0000000000000048 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: 0000 [#1] SMP PTI > CPU: 0 PID: 2433 Comm: modprobe Not tainted 5.5.9-1-default #1 openSUSE Tumbleweed (unreleased) > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-rebuilt.opensuse.org 04/01/2014 > RIP: 0010:device_links_flush_sync_list+0xa7/0xe0 > Code: 48 89 4a 08 48 89 11 48 89 85 d0 00 00 00 48 89 85 d8 00 00 00 49 39 ec 74 0c 48 8d bd 80 00 00 00 e8 ad 5a 2c 00 48 8b 45 60 <48> 8b 40 48 48 85 c0 75 80 48 8b 45 68 48 85 c0 0f 84 7b ff ff ff > RSP: 0018:ffffa55dc2803b40 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffffa55dc2803a98 RCX: ffffa55dc2803b68 > RDX: ffffa55dc2803b68 RSI: ffff90831c64e800 RDI: ffffa55dc2803b68 > RBP: ffff90831c64e800 R08: 0000000000000000 R09: 0000000000000228 > R10: 0000000000000dc0 R11: 0000000001320122 R12: ffff90831c64e800 > R13: ffffa55dc2803b68 R14: ffffffffa6f20080 R15: 0000000000000000 > FS: 00007f252bf63740(0000) GS:ffff90831e400000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000048 CR3: 00000000123ec000 CR4: 00000000000006f0 > Call Trace: > device_links_driver_bound+0x194/0x220 > driver_bound+0x4c/0xe0 > device_bind_driver+0x4d/0x60 > mac80211_hwsim_new_radio+0x14a/0xdc0 [mac80211_hwsim] > ? __class_register+0x10c/0x170 > ? 0xffffffffc092c000 > init_mac80211_hwsim+0x26f/0x1000 [mac80211_hwsim] > ? 0xffffffffc092c000 > do_one_initcall+0x46/0x200 > ? _cond_resched+0x15/0x30 > ? kmem_cache_alloc_trace+0x189/0x280 > ? do_init_module+0x23/0x230 > do_init_module+0x5c/0x230 > load_module+0x14b2/0x1650 > ? __do_sys_init_module+0x16e/0x1a0 > __do_sys_init_module+0x16e/0x1a0 > do_syscall_64+0x64/0x240 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x7f252c08ed9a > Code: 48 8b 0d f9 f0 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c6 f0 0b 00 f7 d8 64 89 01 48 > RSP: 002b:00007ffd0bc12378 EFLAGS: 00000246 ORIG_RAX: 00000000000000af > RAX: ffffffffffffffda RBX: 0000559358949ee0 RCX: 00007f252c08ed9a > RDX: 000055935894a750 RSI: 000000000002180b RDI: 00007f2527cc8010 > RBP: 00007f2527cc8010 R08: 0000000000000000 R09: 00007f252c4559e0 > R10: 0000000000000001 R11: 0000000000000246 R12: 000055935894a750 > R13: 0000000000000000 R14: 0000559358949f80 R15: 0000559358949ee0 > Modules linked in: mac80211_hwsim(+) mac80211 cfg80211 libarc4 nls_utf8 isofs fuse af_packet rfkill xt_tcpudp ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables scsi_transport_iscsi ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter ppdev snd_hda_codec_generic ledtrig_audio bochs_drm drm_vram_helper drm_ttm_helper ttm drm_kms_helper snd_hda_intel snd_intel_dspcfg snd_hda_codec drm snd_hda_core joydev snd_hwdep pcspkr snd_pcm parport_pc snd_timer snd parport fb_sys_fops syscopyarea sysfillrect soundcore sysimgblt i2c_piix4 button hid_generic usbhid btrfs blake2b_generic libcrc32c xor ehci_pci ata_generic raid6_pq ehci_hcd sr_mod cdrom usbcore ata_piix virtio_net virtio_blk serio_raw floppy virtio_scsi > net_failover failover qemu_fw_cfg sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua > CR2: 0000000000000048 > ---[ end trace 085626c17d7c1908 ]---
(In reply to Takashi Iwai from comment #6) > And, the bisection pointed the patch > > patches.kernel.org/5.5.9-066-driver-core-Call-sync_state-even-if-supplier-ha. > patch On one hand, it makes sense as it touches exactly the code. On the other hand, I don't see why... (yet)
Ah, maybe we miss: commit 77036165d8bcf7c7b2a2df28a601ec2c52bb172d Author: Saravana Kannan <saravanak@google.com> Date: Fri Feb 21 00:05:10 2020 -0800 driver core: Skip unnecessary work when device doesn't have sync_state() ?
(In reply to Jiri Slaby from comment #9) > Ah, maybe we miss: > commit 77036165d8bcf7c7b2a2df28a601ec2c52bb172d > Author: Saravana Kannan <saravanak@google.com> > Date: Fri Feb 21 00:05:10 2020 -0800 > > driver core: Skip unnecessary work when device doesn't have sync_state() > > ? As you already communicated with Greg, backporting this and the depending commit (ac338acf514e7b578fa9e3742ec2c292323b4c1a) fixes the problem. I built a test kernel in OBS home:tiwai:bsc1167245-2 repo, and confirmed that it's working. Shall I push the fixes to stable tree, or did you already work on it?
I pushed my for-next branch. But I noticed that Greg also published 5.5.13 with exactly those two fixes, so maybe better to upgrade to 5.5.13.
Yeah, I had the patches in my queue and now updated to .13. Will submit it soon.