|
Bugzilla – Full Text Bug Listing |
| Summary: | xfs_repair core dumped with invalid opcode.... | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | Ilmir Mulyukov <ilmir.mulyukov> |
| Component: | Other | Assignee: | Anthony Iliopoulos <ailiopoulos> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | ilmir.mulyukov |
| Version: | Current | ||
| Target Milestone: | --- | ||
| Hardware: | i586 | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
Full dmesg output attached.
xfs_repair dump xfs_metadump |
||
|
Description
Ilmir Mulyukov
2023-08-20 07:16:28 UTC
Could you please provide a metadump of the filesystem, so that I can reproduce this locally? e.g.:
> xfs_metadump -g /dev/mapper/datavg-films /tmp/datavg-films.md && zstd /tmp/datavg-films.md
(you can attach the resulting file in bugzilla, it shouldn't be too large).
Thank you for response. Here the response. I thought this is something wrong with xfsprogs packet. cyclop:~ # xfs_metadump -g /dev/mapper/datavg-films /tmp/datavg-films.md && zstd /tmp/datavg-films.md /sbin/xfs_metadump: line 34: 1567 Illegal instruction (core dumped) xfs_db$DBOPTS -i -p xfs_metadump -c "metadump$OPTS $2" $1 dmesg: [ 663.555478] traps: xfs_db[1567] trap invalid opcode ip:514ed9 sp:bf9c4c70 error:0 in xfs_db[4a4000+75000] It seems like your cpu is generating an invalid opcode exception for some instructions. I'm not able to reproduce this though on a similar machine (with identical cpu flags, although emulated). Could you please attach the actual coredump files so that I can have a better look? Created attachment 868957 [details]
xfs_repair dump
the coredump is attached. (xfs_repair)
(In reply to Ilmir Mulyukov from comment #4) > Created attachment 868957 [details] > xfs_repair dump > > the coredump is attached. (xfs_repair) Thanks, so the issue is that xfsprogs is using liburcu for atomic ops, and liburcu is currently not supporting atomic operations for 64-bit variables on x86 32-bit architectures. The way that liburcu indicates this, is by generating an illegal instruction that aborts execution: Program terminated with signal SIGILL, Illegal instruction. #0 __uatomic_add (len=8, val=4, addr=0xbfe221dc) at /usr/include/urcu/uatomic/x86.h:416 416 __asm__ __volatile__("ud2"); This is why you see some xfsprogs utils crashing. Now, xfsprogs during compilation attempts to detect if liburcu can support atomic64 ops on the platform it is being compiled on, and if not it falls back to using pthread mutex locks. The detection logic for that fallback was added in commit 7448af588a2e ("libxfs: fix atomic64_t poorly for 32-bit architectures"). It relies on _uatomic_link_error() which is a link-time trick used by liburcu, but that only works for the generic liburcu code, and not for the x86-specific. I have a tentative fix for xfsprogs to enable proper detection and fallback, I'll try to prepare a temporary xfsprogs rpm so that you can test it (or I can provide directly the patch, if you prefer to compile xfsprogs yourself). Thank you, Anthony! Can you, please, prepare the temp rpm? I will test your fix on P4 and give you feedback. (In reply to Ilmir Mulyukov from comment #6) > Thank you, Anthony! > Can you, please, prepare the temp rpm? > I will test your fix on P4 and give you feedback. Sure, here it is: https://download.opensuse.org/repositories/home:/ailiopoulos:/branches:/bsc1214416/openSUSE_Tumbleweed/ Please try first to obtain a metadump as a first test, and then xfs_repair -n. Let me know how it goes. Everything is working! Hooray!
cyclop:/tmp # xfs_metadump -g /dev/mapper/datavg-films /tmp/datavg-films.md && zstd /tmp/datavg-films.md
Zeroing clean log
/tmp/datavg-films.md : 0.48% ( 525 MiB => 2.52 MiB, /tmp/datavg-films.md.zst)
cyclop:/tmp # xfs_repair /dev/mapper/datavg-films
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
clearing needsrepair flag and regenerating metadata
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
Thank you, Anthony!
Created attachment 869023 [details]
xfs_metadump
thank you for confirming! I will leave the ticket until I push the fix upstream and in the opensuse repos in the following days. The fix has landed upstream as commit 73ae943b19e6 ("libxfs: fix atomic64_t detection on x86 32-bit architectures"), and is part of the xfsprogs v6.5.0 release (which is pending for tumbleweed due to a grub issue, but will be released soon).
I am resolving this bug as fixed.
|