Bug 1214416 - xfs_repair core dumped with invalid opcode....
Summary: xfs_repair core dumped with invalid opcode....
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Other (show other bugs)
Version: Current
Hardware: i586 Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Anthony Iliopoulos
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-20 07:16 UTC by Ilmir Mulyukov
Modified: 2023-11-22 14:51 UTC (History)
1 user (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Full dmesg output attached. (51.30 KB, text/plain)
2023-08-20 07:16 UTC, Ilmir Mulyukov
Details
xfs_repair dump (702.64 KB, application/zstd)
2023-08-23 05:54 UTC, Ilmir Mulyukov
Details
xfs_metadump (2.52 MB, application/zstd)
2023-08-26 15:18 UTC, Ilmir Mulyukov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ilmir Mulyukov 2023-08-20 07:16:28 UTC
Created attachment 868893 [details]
Full dmesg output attached.

xfs_repair core dumped when using to check/repair xfs filesystem.

xfs_repair /dev/mapper/datavg-films 
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
Illegal instruction (core dumped)

dmesg after executing xfs_repair command:
[  172.099272] traps: xfs_repair[1499] trap invalid opcode ip:54a440 sp:bf9b9c60 error:0 in xfs_repair[4ca000+84000]
[ 1377.239880] traps: xfs_repair[1807] trap invalid opcode ip:529440 sp:bf996e30 error:0 in xfs_repair[4a9000+84000]


Kernel version:
uname -r 
6.4.9-1-default

OS release:
NAME="openSUSE Tumbleweed"
# VERSION="20230812"
ID="opensuse-tumbleweed"
ID_LIKE="opensuse suse"
VERSION_ID="20230812"
PRETTY_NAME="openSUSE Tumbleweed"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:tumbleweed:20230812"
BUG_REPORT_URL="https://bugzilla.opensuse.org"
SUPPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org"
DOCUMENTATION_URL="https://en.opensuse.org/Portal:Tumbleweed"
LOGO="distributor-logo-Tumbleweed"

rpm -qi xfsprogs 
Name        : xfsprogs
Version     : 6.4.0
Release     : 1.1
Architecture: i586
Install Date: Tue Aug  8 17:41:14 2023
Group       : System/Filesystems
Size        : 3817210
License     : GPL-2.0-or-later
Signature   : RSA/SHA512, Tue Jul 25 15:26:23 2023, Key ID 35a2f86e29b700a4
Source RPM  : xfsprogs-6.4.0-1.1.src.rpm
Build Date  : Tue Jul 25 15:24:39 2023
Build Host  : goat47
Packager    : http://bugs.opensuse.org
Vendor      : openSUSE

cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 3
model name	: Intel(R) Pentium(R) 4 CPU 3.20GHz
stepping	: 4
microcode	: 0x17
cpu MHz		: 3198.629
cache size	: 1024 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fdiv_bug	: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pebs bts cpuid pni dtes64 monitor ds_cpl cid xtpr
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips	: 6399.81
clflush size	: 64
cache_alignment	: 128
address sizes	: 36 bits physical, 32 bits virtual
power management:
Comment 1 Anthony Iliopoulos 2023-08-22 11:41:57 UTC
Could you please provide a metadump of the filesystem, so that I can reproduce this locally? e.g.:

> xfs_metadump -g /dev/mapper/datavg-films /tmp/datavg-films.md && zstd /tmp/datavg-films.md

(you can attach the resulting file in bugzilla, it shouldn't be too large).
Comment 2 Ilmir Mulyukov 2023-08-22 14:50:10 UTC
Thank you for response.
Here the response. I thought this is something wrong with xfsprogs packet.

cyclop:~ #  xfs_metadump -g /dev/mapper/datavg-films /tmp/datavg-films.md && zstd /tmp/datavg-films.md
/sbin/xfs_metadump: line 34:  1567 Illegal instruction     (core dumped) xfs_db$DBOPTS -i -p xfs_metadump -c "metadump$OPTS $2" $1

dmesg:
[  663.555478] traps: xfs_db[1567] trap invalid opcode ip:514ed9 sp:bf9c4c70 error:0 in xfs_db[4a4000+75000]
Comment 3 Anthony Iliopoulos 2023-08-22 17:45:48 UTC
It seems like your cpu is generating an invalid opcode exception for some instructions. I'm not able to reproduce this though on a similar machine (with identical cpu flags, although emulated).

Could you please attach the actual coredump files so that I can have a better look?
Comment 4 Ilmir Mulyukov 2023-08-23 05:54:56 UTC
Created attachment 868957 [details]
xfs_repair dump

the coredump is attached. (xfs_repair)
Comment 5 Anthony Iliopoulos 2023-08-23 23:33:01 UTC
(In reply to Ilmir Mulyukov from comment #4)
> Created attachment 868957 [details]
> xfs_repair dump
> 
> the coredump is attached. (xfs_repair)

Thanks, so the issue is that xfsprogs is using liburcu for atomic ops, and liburcu is currently not supporting atomic operations for 64-bit variables on x86 32-bit architectures. The way that liburcu indicates this, is by generating an illegal instruction that aborts execution:

Program terminated with signal SIGILL, Illegal instruction.
#0  __uatomic_add (len=8, val=4, addr=0xbfe221dc) at /usr/include/urcu/uatomic/x86.h:416
416             __asm__ __volatile__("ud2");

This is why you see some xfsprogs utils crashing.

Now, xfsprogs during compilation attempts to detect if liburcu can support atomic64 ops on the platform it is being compiled on, and if not it falls back to using pthread mutex locks.

The detection logic for that fallback was added in commit 7448af588a2e ("libxfs: fix atomic64_t poorly for 32-bit architectures"). It relies on _uatomic_link_error() which is a link-time trick used by liburcu, but that only works for the generic liburcu code, and not for the x86-specific.

I have a tentative fix for xfsprogs to enable proper detection and fallback, I'll try to prepare a temporary xfsprogs rpm so that you can test it (or I can provide directly the patch, if you prefer to compile xfsprogs yourself).
Comment 6 Ilmir Mulyukov 2023-08-24 06:39:15 UTC
Thank you, Anthony!
Can you, please, prepare the temp rpm? 
I will test your fix on P4 and give you feedback.
Comment 7 Anthony Iliopoulos 2023-08-24 09:44:48 UTC
(In reply to Ilmir Mulyukov from comment #6)
> Thank you, Anthony!
> Can you, please, prepare the temp rpm? 
> I will test your fix on P4 and give you feedback.

Sure, here it is:

https://download.opensuse.org/repositories/home:/ailiopoulos:/branches:/bsc1214416/openSUSE_Tumbleweed/

Please try first to obtain a metadump as a first test, and then xfs_repair -n.

Let me know how it goes.
Comment 8 Ilmir Mulyukov 2023-08-26 15:15:57 UTC
Everything is working! Hooray!

cyclop:/tmp # xfs_metadump -g /dev/mapper/datavg-films /tmp/datavg-films.md && zstd /tmp/datavg-films.md
Zeroing clean log                                          
/tmp/datavg-films.md :  0.48%   (   525 MiB =>   2.52 MiB, /tmp/datavg-films.md.zst) 

cyclop:/tmp # xfs_repair /dev/mapper/datavg-films 
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

Thank you, Anthony!
Comment 9 Ilmir Mulyukov 2023-08-26 15:18:53 UTC
Created attachment 869023 [details]
xfs_metadump
Comment 10 Anthony Iliopoulos 2023-08-26 16:57:58 UTC
thank you for confirming! I will leave the ticket until I push the fix upstream and in the opensuse repos in the following days.
Comment 11 Anthony Iliopoulos 2023-11-22 14:51:21 UTC
The fix has landed upstream as commit 73ae943b19e6 ("libxfs: fix atomic64_t detection on x86 32-bit architectures"), and is part of the xfsprogs v6.5.0 release (which is pending for tumbleweed due to a grub issue, but will be released soon).

I am resolving this bug as fixed.