Bug 1220914 - LTP hp01 test: failed with WARNING: CPU: 1 PID: 25292 at ../mm/mremap.c:257 move_page_tables.part.46+0x8bc/0x8e8
Summary: LTP hp01 test: failed with WARNING: CPU: 1 PID: 25292 at ../mm/mremap.c:257 m...
Status: RESOLVED FIXED
Alias: None
Product: PUBLIC SUSE Linux Enterprise Server 15 SP6
Classification: openSUSE
Component: Kernel (show other bugs)
Version: unspecified
Hardware: aarch64 Other
: P5 - None : Normal
Target Milestone: ---
Assignee: Michal Hocko
QA Contact:
URL: https://openqa.suse.de/tests/13364318...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-03-05 08:08 UTC by WEI GAO
Modified: 2024-06-25 18:15 UTC (History)
4 users (show)

See Also:
Found By: openQA
Services Priority:
Business Priority:
Blocker: Yes
Marketing QA Status: ---
IT Deployment: ---


Attachments
thp01 cpu warning msg full log (294.53 KB, text/plain)
2024-03-05 08:09 UTC, WEI GAO
Details

Note You need to log in before you can comment on or make changes to this bug.
Description WEI GAO 2024-03-05 08:08:27 UTC
## Observation

openQA test in scenario sle-15-SP6-Online-aarch64-ltp_mm@aarch64-virtio fails in
[thp01](https://openqa.suse.de/tests/13364318/modules/thp01/steps/13)

## Test suite description
Test case:
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/thp/thp01.c

Following msg print out after running thp01 case(Full log in attachment):

[  596.807292][T25292] ------------[ cut here ]------------
[  596.807776][T25292] WARNING: CPU: 1 PID: 25292 at ../mm/mremap.c:257 move_page_tables.part.46+0x8bc/0x8e8
[  596.808562][T25292] Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs rfkill nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer virtio_net net_failover snd failover soundcore button joydev nfsd auth_rpcgss nfs_acl lockd grace sunrpc fuse dmi_sysfs ip_tables x_tables hid_generic usbhid sr_mod cdrom virtio_scsi sd_mod scsi_dh_emc scsi_dh_rdac crct10dif_ce scsi_dh_alua ghash_ce t10_pi gf128mul sm4 xhci_pci crc64_rocksoft_generic xhci_pci_renesas sha2_ce xhci_hcd crc64_rocksoft sg crc64 sha256_arm64 usbcore virtio_blk sha1_ce scsi_mod usb_common virtio_gpu virtio_mmio virtio_dma_buf btrfs blake2b_generic libcrc32c xor xor_neon zlib_deflate raid6_pq efivarfs qemu_fw_cfg virtio_rng aes_ce_blk aes_ce_cipher
[  596.814490][T25292] Supported: Yes
[  596.814785][T25292] CPU: 1 PID: 25292 Comm: true Not tainted 6.4.0-150600.9-default #1 SLE15-SP6 3c8e733979fdbcdb4a6e86e1f33921ba7364d90b
[  596.815856][T25292] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[  596.816523][T25292] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  596.817172][T25292] pc : move_page_tables.part.46+0x8bc/0x8e8
[  596.818082][T25292] lr : move_page_tables.part.46+0x7cc/0x8e8
[  596.818667][T25292] sp : ffff800083f63790
[  596.819016][T25292] x29: ffff800083f63820 x28: ffffdf3675010000 x27: ffff00004dcdb000
[  596.819698][T25292] x26: ffff00004dcdbfe8 x25: 00000000000001ff x24: ffff000046446000
[  596.820396][T25292] x23: 0001000000000000 x22: 0000ffffffa00000 x21: ffff00004dcdbff8
[  596.821074][T25292] x20: 0000ffffffe00000 x19: 0000000000200000 x18: 0000000000000001
[  596.821764][T25292] x17: 0000000000000000 x16: ffffdf3673d3f3b0 x15: 0000ffffffa03000
[  596.822432][T25292] x14: ffff800083f63898 x13: ffffdf3674b8d098 x12: ffff0000455bd980
[  596.823106][T25292] x11: ffffdf3674b8d000 x10: ffff000002e72841 x9 : 00000000000001fd
[  596.823769][T25292] x8 : bc2c000000000000 x7 : ffffdf3676321000 x6 : ffff00004dcdb000
[  596.824436][T25292] x5 : 000000004dcdb000 x4 : 00000000001fffff x3 : ffffdf3675010000
[  596.825107][T25292] x2 : ffff000002e72800 x1 : ffff000046446088 x0 : 08000000abcf4003
[  596.825813][T25292] Call trace:
[  596.826091][T25292]  move_page_tables.part.46+0x8bc/0x8e8
[  596.826557][T25292]  move_page_tables+0x20/0x94
[  596.827122][T25292]  shift_arg_pages+0xd8/0x1c8
[  596.827516][T25292]  setup_arg_pages+0x198/0x348
[  596.827914][T25292]  load_elf_binary+0x404/0x1478
[  596.828322][T25292]  bprm_execve+0x2cc/0x648
[  596.828691][T25292]  do_execveat_common.isra.51+0x1fc/0x268
[  596.829169][T25292]  __arm64_sys_execve+0x48/0x5c
[  596.829577][T25292]  invoke_syscall+0x74/0xf0
[  596.829957][T25292]  el0_svc_common.constprop.1+0x84/0x19c
[  596.830429][T25292]  do_el0_svc+0x40/0xa0
[  596.830790][T25292]  el0_svc+0x3c/0x168
[  596.831139][T25292]  el0t_64_sync_handler+0x9c/0xc0
[  596.831567][T25292]  el0t_64_sync+0x1a4/0x1a8
[  596.831953][T25292] ---[ end trace 0000000000000000 ]---


## Reproducible
Happen 3 times in 3 months in 15-sp6 openqa test job group.
Comment 1 WEI GAO 2024-03-05 08:09:48 UTC
Created attachment 873212 [details]
thp01 cpu warning msg full log
Comment 2 Petr Cervinka 2024-03-05 09:23:17 UTC
It looks that it happens only when openQA worker is overloaded. Rate failure is really low in history.
Comment 3 Michal Hocko 2024-03-05 10:25:03 UTC
This seems to be a duplicate of bug 1177305 resp. bug 1208967.

There are upstream fixes for this see bug 1208967 comment 10. We have decided to not fix that on older kernels but 15sp6 seems like a proper target for the fix.
Comment 4 Michal Hocko 2024-03-05 10:35:54 UTC
PR for 15sp6 sent. If this start showing up in 15sp5 then let me know and I will consider adding it as well.
Comment 5 Petr Cervinka 2024-03-05 11:30:22 UTC
(In reply to Michal Hocko from comment #3)
> We have
> decided to not fix that on older kernels but 15sp6 seems like a proper
> target for the fix.


That makes sense, thanks.