Bug 1167245 - [Build 20200318] kernel segfault - modprobe mac80211_hwsim
Summary: [Build 20200318] kernel segfault - modprobe mac80211_hwsim
Status: RESOLVED FIXED
: 1166979 (view as bug list)
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL: https://openqa.opensuse.org/tests/120...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-03-20 11:57 UTC by Dominique Leuenberger
Modified: 2023-04-26 13:50 UTC (History)
4 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
tiwai: needinfo? (jslaby)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dominique Leuenberger 2020-03-20 11:57:50 UTC
## Observation

openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-extra_tests_on_gnome@64bit fails in
[hwsim_wpa2_enterprise_setup](https://openqa.opensuse.org/tests/1209293/modules/hwsim_wpa2_enterprise_setup/steps/9)

## Test suite description
Maintainer: asmorodskyi, okurz. Extra tests which were designed to run on gnome


## Reproducible

Fails since (at least) Build [20200309](https://openqa.opensuse.org/tests/1199785)


## Expected result

Last good: [20200307](https://openqa.opensuse.org/tests/1198559) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=64bit&test=extra_tests_on_gnome&version=Tumbleweed)
Comment 1 Fabian Vogt 2020-03-20 12:23:39 UTC
Not everything starting with k is KDE, reassigning.
Comment 2 Takashi Iwai 2020-03-24 16:45:59 UTC
Hrm, this is weird.  I see the only difference between the last good one (20200307) and the first bad one (20200309) releases wrt kernel is kernel-default-devel-5.5.7-1.1.x86_64.rpm vs kernel-default-devel-5.5.7-1.2.x86_64.rpm.  And, both kernels are built from the very same git commit ID, only the other toolchain & co might be different.

Where can I get the serial logs for those two tests?
Comment 3 Takashi Iwai 2020-03-24 17:04:10 UTC
*** Bug 1166979 has been marked as a duplicate of this bug. ***
Comment 4 Dominique Leuenberger 2020-03-25 08:34:59 UTC
(In reply to Takashi Iwai from comment #2)
> Hrm, this is weird.  I see the only difference between the last good one
> (20200307) and the first bad one (20200309) releases wrt kernel is

The last good pointer is actually wrongly identified by openQA... factually, last good was 20200314 - first failure for this was 20200316

0316 contained a kernel upgrade from 5.5.7 to 5.5.9
Comment 5 Takashi Iwai 2020-03-25 08:37:48 UTC
OK, thanks, that makes sense.

Through a quick glance, there is no obvious patch that may trigger the bug, so we might need bisection...
Comment 6 Takashi Iwai 2020-03-25 12:55:43 UTC
Some facts I found:
- The issue can be reproduced easily on a local VM
- 5.5.11 KOTD still shows the same Oops
- 5.6-rc7 works fine

So it looks like an issue specific to 5.5.y stable.

And, the bisection pointed the patch
  patches.kernel.org/5.5.9-066-driver-core-Call-sync_state-even-if-supplier-ha.patch

I'm building a test kernel with the revert in OBS home:tiwai:bsc1167245.
Comment 7 Jiri Slaby 2020-03-25 13:10:13 UTC
The crash is in device_links_flush_sync_list:
                if (dev->bus->sync_state)
                        dev->bus->sync_state(dev);

dev->bus is NULL, sync_state is at offset 0x48, hence the crash:

> BUG: kernel NULL pointer dereference, address: 0000000000000048
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> CPU: 0 PID: 2433 Comm: modprobe Not tainted 5.5.9-1-default #1 openSUSE Tumbleweed (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-rebuilt.opensuse.org 04/01/2014
> RIP: 0010:device_links_flush_sync_list+0xa7/0xe0
> Code: 48 89 4a 08 48 89 11 48 89 85 d0 00 00 00 48 89 85 d8 00 00 00 49 39 ec 74 0c 48 8d bd 80 00 00 00 e8 ad 5a 2c 00 48 8b 45 60 <48> 8b 40 48 48 85 c0 75 80 48 8b 45 68 48 85 c0 0f 84 7b ff ff ff
> RSP: 0018:ffffa55dc2803b40 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffa55dc2803a98 RCX: ffffa55dc2803b68
> RDX: ffffa55dc2803b68 RSI: ffff90831c64e800 RDI: ffffa55dc2803b68
> RBP: ffff90831c64e800 R08: 0000000000000000 R09: 0000000000000228
> R10: 0000000000000dc0 R11: 0000000001320122 R12: ffff90831c64e800
> R13: ffffa55dc2803b68 R14: ffffffffa6f20080 R15: 0000000000000000
> FS:  00007f252bf63740(0000) GS:ffff90831e400000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000048 CR3: 00000000123ec000 CR4: 00000000000006f0
> Call Trace:
>  device_links_driver_bound+0x194/0x220
>  driver_bound+0x4c/0xe0
>  device_bind_driver+0x4d/0x60
>  mac80211_hwsim_new_radio+0x14a/0xdc0 [mac80211_hwsim]
>  ? __class_register+0x10c/0x170
>  ? 0xffffffffc092c000
>  init_mac80211_hwsim+0x26f/0x1000 [mac80211_hwsim]
>  ? 0xffffffffc092c000
>  do_one_initcall+0x46/0x200
>  ? _cond_resched+0x15/0x30
>  ? kmem_cache_alloc_trace+0x189/0x280
>  ? do_init_module+0x23/0x230
>  do_init_module+0x5c/0x230
>  load_module+0x14b2/0x1650
>  ? __do_sys_init_module+0x16e/0x1a0
>  __do_sys_init_module+0x16e/0x1a0
>  do_syscall_64+0x64/0x240
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x7f252c08ed9a
> Code: 48 8b 0d f9 f0 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c6 f0 0b 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffd0bc12378 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> RAX: ffffffffffffffda RBX: 0000559358949ee0 RCX: 00007f252c08ed9a
> RDX: 000055935894a750 RSI: 000000000002180b RDI: 00007f2527cc8010
> RBP: 00007f2527cc8010 R08: 0000000000000000 R09: 00007f252c4559e0
> R10: 0000000000000001 R11: 0000000000000246 R12: 000055935894a750
> R13: 0000000000000000 R14: 0000559358949f80 R15: 0000559358949ee0
> Modules linked in: mac80211_hwsim(+) mac80211 cfg80211 libarc4 nls_utf8 isofs fuse af_packet rfkill xt_tcpudp ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables scsi_transport_iscsi ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter ppdev snd_hda_codec_generic ledtrig_audio bochs_drm drm_vram_helper drm_ttm_helper ttm drm_kms_helper snd_hda_intel snd_intel_dspcfg snd_hda_codec drm snd_hda_core joydev snd_hwdep pcspkr snd_pcm parport_pc snd_timer snd parport fb_sys_fops syscopyarea sysfillrect soundcore sysimgblt i2c_piix4 button hid_generic usbhid btrfs blake2b_generic libcrc32c xor ehci_pci ata_generic raid6_pq ehci_hcd sr_mod cdrom usbcore ata_piix virtio_net virtio_blk serio_raw floppy virtio_scsi
>  net_failover failover qemu_fw_cfg sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
> CR2: 0000000000000048
> ---[ end trace 085626c17d7c1908 ]---
Comment 8 Jiri Slaby 2020-03-25 13:12:50 UTC
(In reply to Takashi Iwai from comment #6)
> And, the bisection pointed the patch
>  
> patches.kernel.org/5.5.9-066-driver-core-Call-sync_state-even-if-supplier-ha.
> patch

On one hand, it makes sense as it touches exactly the code. On the other hand, I don't see why... (yet)
Comment 9 Jiri Slaby 2020-03-25 13:14:40 UTC
Ah, maybe we miss:
commit 77036165d8bcf7c7b2a2df28a601ec2c52bb172d
Author: Saravana Kannan <saravanak@google.com>
Date:   Fri Feb 21 00:05:10 2020 -0800

    driver core: Skip unnecessary work when device doesn't have sync_state()

?
Comment 10 Takashi Iwai 2020-03-25 16:29:11 UTC
(In reply to Jiri Slaby from comment #9)
> Ah, maybe we miss:
> commit 77036165d8bcf7c7b2a2df28a601ec2c52bb172d
> Author: Saravana Kannan <saravanak@google.com>
> Date:   Fri Feb 21 00:05:10 2020 -0800
> 
>     driver core: Skip unnecessary work when device doesn't have sync_state()
> 
> ?

As you already communicated with Greg, backporting this and the depending commit (ac338acf514e7b578fa9e3742ec2c292323b4c1a) fixes the problem.

I built a test kernel in OBS home:tiwai:bsc1167245-2 repo, and confirmed that it's working.

Shall I push the fixes to stable tree, or did you already work on it?
Comment 11 Takashi Iwai 2020-03-25 20:39:37 UTC
I pushed my for-next branch.  But I noticed that Greg also published 5.5.13 with exactly those two fixes, so maybe better to upgrade to 5.5.13.
Comment 12 Jiri Slaby 2020-03-26 06:20:03 UTC
Yeah, I had the patches in my queue and now updated to .13. Will submit it soon.