Bug 1068793

Summary: Tumbleweed on Dell laptop hangs on reboot or switch to runlevel 3
Product: [openSUSE] openSUSE Tumbleweed Reporter: Vadim Krevs <vkrevs>
Component: KernelAssignee: E-mail List <kernel-maintainers>
Status: RESOLVED UPSTREAM QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: jslaby, tiwai, vkrevs
Version: Current   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE Factory   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: /var/log/messages
dmesg output
lspci -vk output
lsmod output
/var/log/messages for first boot with 4.14 kernel

Description Vadim Krevs 2017-11-18 11:53:16 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
Build Identifier: 

I've recently purchased a Dell Inspiron 5770 with and AMD Radeon 530 and installed Tumbleweed/KDE on it. Everything is fine except whenever I try to reboot (both Leave->Reboot or reboot command) or switch to console (via "init 3" or "Ctrl-Alt-F1") the system hangs. When this happens, /var/log/messages (attached) typically contains a kernel bug trace referencing smu7_populate_single_firmware_entry:


2017-11-18T10:03:25.977899+00:00 thor kernel: [   19.724142] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 5secs aborting
2017-11-18T10:03:25.977910+00:00 thor kernel: [   19.724151] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing 7C32 (len 272, WS 0, PS 4) @ 0x7C7B
2017-11-18T10:03:25.977911+00:00 thor kernel: [   19.724160] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing 6440 (len 70, WS 0, PS 8) @ 0x6466
2017-11-18T10:03:25.977912+00:00 thor kernel: [   19.724168] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu asic init failed
2017-11-18T10:03:26.673887+00:00 thor kernel: [   20.421297] amdgpu 0000:01:00.0: Wait for MC idle timedout !
2017-11-18T10:03:27.021931+00:00 thor kernel: [   20.769717] amdgpu 0000:01:00.0: Wait for MC idle timedout !
2017-11-18T10:03:27.177884+00:00 thor kernel: [   20.925661] [drm] PCIE GART of 4096M enabled (table at 0x0000000000040000).
2017-11-18T10:03:27.186773+00:00 thor kernel: [   20.928692] amdgpu: [powerplay] smu not running, upload firmware again 
2017-11-18T10:03:27.186777+00:00 thor kernel: [   20.931704] BUG: unable to handle kernel paging request at ffffc37c421f8fec
2017-11-18T10:03:27.186777+00:00 thor kernel: [   20.931731] IP: smu7_populate_single_firmware_entry.isra.4+0x41/0xa0 [amdgpu]
2017-11-18T10:03:27.186778+00:00 thor kernel: [   20.931732] PGD 18fd1e067 
2017-11-18T10:03:27.186778+00:00 thor kernel: [   20.931733] P4D 18fd1e067 
2017-11-18T10:03:27.186779+00:00 thor kernel: [   20.931734] PUD 0 
2017-11-18T10:03:27.186779+00:00 thor kernel: [   20.931735] 
2017-11-18T10:03:27.186780+00:00 thor kernel: [   20.931736] Oops: 0002 [#1] PREEMPT SMP
2017-11-18T10:03:27.186780+00:00 thor kernel: [   20.931738] Modules linked in: ses enclosure scsi_transport_sas af_packet snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device uas usb_storage cdc_ether usbnet hid_logitech_hidpp bnep rtsx_usb_sdmmc rtsx_usb_ms mmc_core memstick uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core rtsx_usb r8152 hid_logitech_dj videodev btusb btrtl usbhid snd_hda_codec_hdmi arc4 vboxpci(O) vboxnetadp(O) vboxnetflt(O) snd_hda_codec_realtek snd_hda_codec_generic snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match snd_soc_core snd_compress snd_pcm_dmaengine hid_multitouch raw idma64 iTCO_wdt iTCO_vendor_support vboxdrv(O) msr nls_iso8859_1 nls_cp437 vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp ath10k_pci wmi_bmof kvm_intel snd_hda_intel
2017-11-18T10:03:27.186782+00:00 thor kernel: [   20.931767]  dell_wmi snd_hda_codec kvm ath10k_core irqbypass snd_hda_core hci_uart ath snd_hwdep crct10dif_pclmul crc32_pclmul btbcm crc32c_intel ghash_clmulni_intel snd_pcm mac80211 pcbc serdev dell_laptop btqca dell_smbios snd_timer r8169 btintel dcdbas cfg80211 mii snd aesni_intel bluetooth dell_smm_hwmon joydev aes_x86_64 ecdh_generic processor_thermal_device crypto_simd pcspkr mei_me soundcore i2c_i801 int3403_thermal shpchp mei intel_lpss_pci intel_pch_thermal glue_helper intel_soc_dts_iosf pinctrl_sunrisepoint rfkill cryptd int3402_thermal wmi pinctrl_intel int340x_thermal_zone ucsi_acpi typec_ucsi typec int3400_thermal acpi_thermal_rel battery tpm_crb acpi_pad tpm_tis tpm_tis_core intel_lpss_acpi intel_lpss tpm intel_hid acpi_als thermal kfifo_buf sparse_keymap industrialio ac amdkfd amd_iommu_v2
2017-11-18T10:03:27.186782+00:00 thor kernel: [   20.931797]  i915 amdgpu serio_raw i2c_algo_bit drm_kms_helper syscopyarea sysfillrect xhci_pci sysimgblt ttm sr_mod fb_sys_fops xhci_hcd cdrom drm usbcore video i2c_hid button sg efivarfs
2017-11-18T10:03:27.186783+00:00 thor kernel: [   20.931807] CPU: 0 PID: 1193 Comm: X Tainted: G           O    4.13.12-1-default #1
2017-11-18T10:03:27.186784+00:00 thor kernel: [   20.931808] Hardware name: Dell Inc. Inspiron 5770/0XH3XD, BIOS 1.0.5 10/06/2017
2017-11-18T10:03:27.186784+00:00 thor kernel: [   20.931810] task: ffff9fef6af9a0c0 task.stack: ffffc36042d74000
2017-11-18T10:03:27.186784+00:00 thor kernel: [   20.931828] RIP: 0010:smu7_populate_single_firmware_entry.isra.4+0x41/0xa0 [amdgpu]
2017-11-18T10:03:27.186785+00:00 thor kernel: [   20.931830] RSP: 0018:ffffc36042d778c8 EFLAGS: 00010246
2017-11-18T10:03:27.186785+00:00 thor kernel: [   20.931831] RAX: 000000000000007e RBX: ffffc37c421f8fec RCX: 000000010003d000
2017-11-18T10:03:27.186786+00:00 thor kernel: [   20.931832] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffff9fef67024860
2017-11-18T10:03:27.186786+00:00 thor kernel: [   20.931834] RBP: 0000000000000003 R08: 0000000000033930 R09: 0000000000000452
2017-11-18T10:03:27.186786+00:00 thor kernel: [   20.931835] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9fef6a633688
2017-11-18T10:03:27.186787+00:00 thor kernel: [   20.931836] R13: ffff9fef6a6b4000 R14: 00000000000005fe R15: 0000000000000000
2017-11-18T10:03:27.186787+00:00 thor kernel: [   20.931838] FS:  00007f49260fda40(0000) GS:ffff9fef7f400000(0000) knlGS:0000000000000000
2017-11-18T10:03:27.186788+00:00 thor kernel: [   20.931845] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2017-11-18T10:03:27.186788+00:00 thor kernel: [   20.931852] CR2: ffffc37c421f8fec CR3: 000000046de3f000 CR4: 00000000003406f0
2017-11-18T10:03:27.186789+00:00 thor kernel: [   20.931858] Call Trace:
2017-11-18T10:03:27.186789+00:00 thor kernel: [   20.931882]  smu7_request_smu_load_fw+0xa3/0x310 [amdgpu]
2017-11-18T10:03:27.186790+00:00 thor kernel: [   20.931903]  pp_resume+0x8a/0xe0 [amdgpu]
2017-11-18T10:03:27.186790+00:00 thor kernel: [   20.931921]  amdgpu_resume_phase2+0x3f/0xb0 [amdgpu]
2017-11-18T10:03:27.186790+00:00 thor kernel: [   20.931936]  amdgpu_device_resume+0x130/0x400 [amdgpu]
2017-11-18T10:03:27.186791+00:00 thor kernel: [   20.931945]  ? pci_read_config_word.part.7+0x38/0x50
2017-11-18T10:03:27.186791+00:00 thor kernel: [   20.931952]  ? __pci_set_master+0x21/0xb0
2017-11-18T10:03:27.186792+00:00 thor kernel: [   20.931960]  ? vga_switcheroo_set_dynamic_switch+0x80/0x80
2017-11-18T10:03:27.186792+00:00 thor kernel: [   20.931974]  amdgpu_pmops_runtime_resume+0x6b/0xb0 [amdgpu]
2017-11-18T10:03:27.186792+00:00 thor kernel: [   20.931982]  pci_pm_runtime_resume+0x77/0xa0
2017-11-18T10:03:27.186793+00:00 thor kernel: [   20.931990]  __rpm_callback+0xb6/0x1e0
2017-11-18T10:03:27.186793+00:00 thor kernel: [   20.931997]  rpm_callback+0x1f/0x70
2017-11-18T10:03:27.186793+00:00 thor kernel: [   20.932005]  ? vga_switcheroo_set_dynamic_switch+0x80/0x80
2017-11-18T10:03:27.186794+00:00 thor kernel: [   20.932012]  rpm_resume+0x4bb/0x7c0
2017-11-18T10:03:27.186794+00:00 thor kernel: [   20.932020]  ? finish_wait+0x80/0x80
2017-11-18T10:03:27.186794+00:00 thor kernel: [   20.932027]  __pm_runtime_resume+0x3a/0x50
2017-11-18T10:03:27.186795+00:00 thor kernel: [   20.932041]  amdgpu_drm_ioctl+0x33/0x80 [amdgpu]
2017-11-18T10:03:27.186795+00:00 thor kernel: [   20.932049]  do_vfs_ioctl+0x8d/0x5d0
2017-11-18T10:03:27.186795+00:00 thor kernel: [   20.932057]  ? memzero_explicit+0xa/0x10
2017-11-18T10:03:27.186796+00:00 thor kernel: [   20.932065]  ? urandom_read+0xfe/0x250
2017-11-18T10:03:27.186796+00:00 thor kernel: [   20.932073]  ? __fget+0x67/0xb0
2017-11-18T10:03:27.186796+00:00 thor kernel: [   20.932080]  SyS_ioctl+0x74/0x80
2017-11-18T10:03:27.186797+00:00 thor kernel: [   20.932087]  entry_SYSCALL_64_fastpath+0x1e/0xa9
2017-11-18T10:03:27.186797+00:00 thor kernel: [   20.932094] RIP: 0033:0x7f4923a5d2f7
2017-11-18T10:03:27.186798+00:00 thor kernel: [   20.932100] RSP: 002b:00007ffd39e56d28 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
2017-11-18T10:03:27.186798+00:00 thor kernel: [   20.932108] RAX: ffffffffffffffda RBX: 00007f49208b72b0 RCX: 00007f4923a5d2f7
2017-11-18T10:03:27.186798+00:00 thor kernel: [   20.932114] RDX: 00007ffd39e56d70 RSI: 00000000c04064a0 RDI: 0000000000000017
2017-11-18T10:03:27.186799+00:00 thor kernel: [   20.932121] RBP: 0000000000000001 R08: 00007ffd39e56e90 R09: 0000000000000000
2017-11-18T10:03:27.186799+00:00 thor kernel: [   20.932127] R10: 0000000000000000 R11: 0000000000003246 R12: 00007ffd39e56ea0
2017-11-18T10:03:27.186799+00:00 thor kernel: [   20.932133] R13: 000000000000001f R14: 000055e95028b620 R15: 000055e94f9b3120
2017-11-18T10:03:27.186800+00:00 thor kernel: [   20.932140] Code: 89 d3 48 83 ec 30 48 89 e7 48 89 e2 f3 48 ab 49 8b 3c 24 89 f0 0f b6 b0 00 30 63 c0 48 8b 07 ff 50 70 85 c0 75 44 0f b7 44 24 02 <66> 89 2b 48 c7 43 0c 00 00 00 00 66 89 43 02 48 8b 44 24 10 48 
2017-11-18T10:03:27.186800+00:00 thor kernel: [   20.932175] RIP: smu7_populate_single_firmware_entry.isra.4+0x41/0xa0 [amdgpu] RSP: ffffc36042d778c8
2017-11-18T10:03:27.186800+00:00 thor kernel: [   20.932182] CR2: ffffc37c421f8fec
2017-11-18T10:03:27.186801+00:00 thor kernel: [   20.936985] ---[ end trace ba2e81088989610f ]---




Reproducible: Always

Steps to Reproduce:
1. Boot into Tumbleweed
2. Attempt to reboot
3. Or attempt to switch to runlevel 3
Actual Results:  
System hangs

Expected Results:  
System does not hang, reboot or switch to runlevel 3 is successful.
Comment 1 Vadim Krevs 2017-11-18 11:54:31 UTC
Created attachment 749257 [details]
/var/log/messages
Comment 2 Vadim Krevs 2017-11-18 11:58:15 UTC
Created attachment 749258 [details]
dmesg output
Comment 3 Vadim Krevs 2017-11-18 11:58:40 UTC
Created attachment 749259 [details]
lspci -vk output
Comment 4 Vadim Krevs 2017-11-18 12:00:12 UTC
thor:/tmp # rpm -qa  | grep kernel | sort
kernel-default-4.13.11-1.2.x86_64
kernel-default-4.13.12-1.1.x86_64
kernel-default-devel-4.13.11-1.2.x86_64
kernel-default-devel-4.13.12-1.1.x86_64
kernel-devel-4.13.11-1.2.noarch
kernel-devel-4.13.12-1.1.noarch
kernel-docs-4.13.12-1.1.noarch
kernel-firmware-20171009-1.1.noarch
kernel-macros-4.13.12-1.1.noarch
kernel-syms-4.13.11-1.2.x86_64
kernel-syms-4.13.12-1.1.x86_64
Comment 5 Vadim Krevs 2017-11-18 12:00:32 UTC
Created attachment 749260 [details]
lsmod output
Comment 6 Vadim Krevs 2017-11-18 12:04:21 UTC
cat /etc/os-release 
NAME="openSUSE Tumbleweed"
# VERSION="20171117"
ID=opensuse
ID_LIKE="suse"
VERSION_ID="20171117"
PRETTY_NAME="openSUSE Tumbleweed"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:tumbleweed:20171117"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"
Comment 7 Vadim Krevs 2017-11-25 09:13:03 UTC
Updated to latest TW snapshot 20171123 with 4.14.0 kernel. 

First boot with the new kernel - GUI fails too start due to same kernel bug in amdgpu. Contents of /var/log/messages attached. 

Second boot - works.
Comment 8 Vadim Krevs 2017-11-25 09:13:45 UTC
Created attachment 750079 [details]
/var/log/messages for first boot with 4.14 kernel
Comment 9 Vadim Krevs 2017-11-27 20:06:36 UTC
Same issue was reported https://bugs.freedesktop.org/show_bug.cgi?id=103783.

There is apparently a "workaround" - uninstall TLP.
Comment 10 Takashi Iwai 2017-12-05 13:16:01 UTC
Does blacklisting pcieport in tlp as suggested in the upstream bugzilla work for you, too?  I see lots of pcieport-related errors in your log.
Comment 11 Vadim Krevs 2017-12-05 18:40:49 UTC
Yes, I can confirm that blacklisting pcieport in /etc/default/tlp results in functional runlevel switching.
Comment 12 Jiri Slaby 2018-06-16 11:38:17 UTC
This is an upstream bug. They have to fix it first before we can do something about it.