Bugzilla – Bug 1214416
xfs_repair core dumped with invalid opcode....
Last modified: 2023-11-22 14:51:21 UTC
Created attachment 868893 [details] Full dmesg output attached. xfs_repair core dumped when using to check/repair xfs filesystem. xfs_repair /dev/mapper/datavg-films Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... Illegal instruction (core dumped) dmesg after executing xfs_repair command: [ 172.099272] traps: xfs_repair[1499] trap invalid opcode ip:54a440 sp:bf9b9c60 error:0 in xfs_repair[4ca000+84000] [ 1377.239880] traps: xfs_repair[1807] trap invalid opcode ip:529440 sp:bf996e30 error:0 in xfs_repair[4a9000+84000] Kernel version: uname -r 6.4.9-1-default OS release: NAME="openSUSE Tumbleweed" # VERSION="20230812" ID="opensuse-tumbleweed" ID_LIKE="opensuse suse" VERSION_ID="20230812" PRETTY_NAME="openSUSE Tumbleweed" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:opensuse:tumbleweed:20230812" BUG_REPORT_URL="https://bugzilla.opensuse.org" SUPPORT_URL="https://bugs.opensuse.org" HOME_URL="https://www.opensuse.org" DOCUMENTATION_URL="https://en.opensuse.org/Portal:Tumbleweed" LOGO="distributor-logo-Tumbleweed" rpm -qi xfsprogs Name : xfsprogs Version : 6.4.0 Release : 1.1 Architecture: i586 Install Date: Tue Aug 8 17:41:14 2023 Group : System/Filesystems Size : 3817210 License : GPL-2.0-or-later Signature : RSA/SHA512, Tue Jul 25 15:26:23 2023, Key ID 35a2f86e29b700a4 Source RPM : xfsprogs-6.4.0-1.1.src.rpm Build Date : Tue Jul 25 15:24:39 2023 Build Host : goat47 Packager : http://bugs.opensuse.org Vendor : openSUSE cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Pentium(R) 4 CPU 3.20GHz stepping : 4 microcode : 0x17 cpu MHz : 3198.629 cache size : 1024 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pebs bts cpuid pni dtes64 monitor ds_cpl cid xtpr bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown bogomips : 6399.81 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 32 bits virtual power management:
Could you please provide a metadump of the filesystem, so that I can reproduce this locally? e.g.: > xfs_metadump -g /dev/mapper/datavg-films /tmp/datavg-films.md && zstd /tmp/datavg-films.md (you can attach the resulting file in bugzilla, it shouldn't be too large).
Thank you for response. Here the response. I thought this is something wrong with xfsprogs packet. cyclop:~ # xfs_metadump -g /dev/mapper/datavg-films /tmp/datavg-films.md && zstd /tmp/datavg-films.md /sbin/xfs_metadump: line 34: 1567 Illegal instruction (core dumped) xfs_db$DBOPTS -i -p xfs_metadump -c "metadump$OPTS $2" $1 dmesg: [ 663.555478] traps: xfs_db[1567] trap invalid opcode ip:514ed9 sp:bf9c4c70 error:0 in xfs_db[4a4000+75000]
It seems like your cpu is generating an invalid opcode exception for some instructions. I'm not able to reproduce this though on a similar machine (with identical cpu flags, although emulated). Could you please attach the actual coredump files so that I can have a better look?
Created attachment 868957 [details] xfs_repair dump the coredump is attached. (xfs_repair)
(In reply to Ilmir Mulyukov from comment #4) > Created attachment 868957 [details] > xfs_repair dump > > the coredump is attached. (xfs_repair) Thanks, so the issue is that xfsprogs is using liburcu for atomic ops, and liburcu is currently not supporting atomic operations for 64-bit variables on x86 32-bit architectures. The way that liburcu indicates this, is by generating an illegal instruction that aborts execution: Program terminated with signal SIGILL, Illegal instruction. #0 __uatomic_add (len=8, val=4, addr=0xbfe221dc) at /usr/include/urcu/uatomic/x86.h:416 416 __asm__ __volatile__("ud2"); This is why you see some xfsprogs utils crashing. Now, xfsprogs during compilation attempts to detect if liburcu can support atomic64 ops on the platform it is being compiled on, and if not it falls back to using pthread mutex locks. The detection logic for that fallback was added in commit 7448af588a2e ("libxfs: fix atomic64_t poorly for 32-bit architectures"). It relies on _uatomic_link_error() which is a link-time trick used by liburcu, but that only works for the generic liburcu code, and not for the x86-specific. I have a tentative fix for xfsprogs to enable proper detection and fallback, I'll try to prepare a temporary xfsprogs rpm so that you can test it (or I can provide directly the patch, if you prefer to compile xfsprogs yourself).
Thank you, Anthony! Can you, please, prepare the temp rpm? I will test your fix on P4 and give you feedback.
(In reply to Ilmir Mulyukov from comment #6) > Thank you, Anthony! > Can you, please, prepare the temp rpm? > I will test your fix on P4 and give you feedback. Sure, here it is: https://download.opensuse.org/repositories/home:/ailiopoulos:/branches:/bsc1214416/openSUSE_Tumbleweed/ Please try first to obtain a metadump as a first test, and then xfs_repair -n. Let me know how it goes.
Everything is working! Hooray! cyclop:/tmp # xfs_metadump -g /dev/mapper/datavg-films /tmp/datavg-films.md && zstd /tmp/datavg-films.md Zeroing clean log /tmp/datavg-films.md : 0.48% ( 525 MiB => 2.52 MiB, /tmp/datavg-films.md.zst) cyclop:/tmp # xfs_repair /dev/mapper/datavg-films Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done Thank you, Anthony!
Created attachment 869023 [details] xfs_metadump
thank you for confirming! I will leave the ticket until I push the fix upstream and in the opensuse repos in the following days.
The fix has landed upstream as commit 73ae943b19e6 ("libxfs: fix atomic64_t detection on x86 32-bit architectures"), and is part of the xfsprogs v6.5.0 release (which is pending for tumbleweed due to a grub issue, but will be released soon). I am resolving this bug as fixed.