Bugzilla – Full Text Bug Listing |
Summary: | systemtap fails to build simple example from docs | ||
---|---|---|---|
Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | James Fehlig <jfehlig> |
Component: | Kernel | Assignee: | openSUSE Kernel Bugs <kernel-bugs> |
Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
Severity: | Normal | ||
Priority: | P2 - High | CC: | jfehlig, martin.liska, mliska, rguenther, tiwai, tonyj |
Version: | Current | ||
Target Milestone: | --- | ||
Hardware: | Other | ||
OS: | Other | ||
Whiteboard: | |||
Found By: | --- | Services Priority: | |
Business Priority: | Blocker: | --- | |
Marketing QA Status: | --- | IT Deployment: | --- |
Description
James Fehlig
2022-02-16 22:43:09 UTC
Please try the version in devel:tools. It is in staging right now for Tumbleweed. (In reply to Tony Jones from comment #1) > Please try the version in devel:tools. It is in staging right now for > Tumbleweed. It is now in openSUSE:Factory. systemtap version 4.6 Also please always report what version of systemtap you are using stap -V or rpm -q systemtap. Version 4.6 fixes the issue, thanks! Of course I had grander plans with which I'm still having problems. I'd like to dump the VMSA of a AMD SEV-ES guest with a script along the lines of # cat sev-es-vmsa.stp function dump_vmsa(addr:long) { printf("VMSA\n") for (i = 0; i < 4096 ; i+= 64) { printf("%.64M\n", addr + i); } } probe module("kvm_amd").statement("__sev_launch_update_vmsa@arch/x86/kvm/svm/sev.c:625") { dump_vmsa($svm->vmsa) } But trying to run this results in # stap -v sev-es-vmsa.stp Pass 1: parsed user script and 483 library scripts using 110832virt/101100res/7292shr/93924data kb, in 160usr/20sys/175real ms. semantic error: resolution failed in DWARF builder semantic error: while resolving probe point: identifier 'module' at sev-es-vmsa.stp:7:7 source: probe module("kvm_amd").statement("__sev_launch_update_vmsa@arch/x86/kvm/svm/sev.c:625") { ^ semantic error: no match Pass 2: analyzed script: 0 probes, 0 functions, 0 embeds, 0 globals using 112928virt/104800res/8712shr/96020data kb, in 770usr/40sys/814real ms. Pass 2: analysis failed. [man error::pass2] Not sure why it doesn't like the 'module' identifier. I tried something a bit simpler with the same result # stap -v -e 'probe module("ext4").function("*") { }' Pass 1: parsed user script and 483 library scripts using 110828virt/100900res/7092shr/93920data kb, in 150usr/20sys/173real ms. semantic error: resolution failed in DWARF builder semantic error: while resolving probe point: identifier 'module' at <input>:1:7 source: probe module("ext4").function("*") { } ^ semantic error: no match Pass 2: analyzed script: 0 probes, 0 functions, 0 embeds, 0 globals using 112924virt/104584res/8504shr/96016data kb, in 780usr/30sys/813real ms. Pass 2: analysis failed. [man error::pass2] (In reply to Tony Jones from comment #2) > Also please always report what version of systemtap you are using stap -V or > rpm -q systemtap. Thanks for the tip. FYI, prior to updating to 4.6 # stap -V Systemtap translator/driver (version 4.2/0.186, non-git sources) Copyright (C) 2005-2019 Red Hat, Inc. and others This is free software; see the source for copying conditions. tested kernel versions: 2.6.32 ... 5.4-rc6 enabled features: AVAHI BPF LIBSQLITE3 NLS NSS ok. definitely a regression here vs SP3. I'll look at it. # stap -vv --poison-cache -l 'module("ext4").function("*")' ... derive-probes (location #0): module("ext4").function("*") of keyword at <input>:1:1 semantic error: resolution failed in DWARF builder thrown from: elaborate.cxx:1093 semantic error: resolution failed in DWARF builder thrown from: elaborate.cxx:1093 semantic error: while resolving probe point: identifier 'module' at <input>:1:7 thrown from: elaborate.cxx:1081 source: probe module("ext4").function("*") {} ^ semantic error: no match thrown from: elaborate.cxx:1044 rguenther so looking at elfutils I see: "checking for zstd... no" is this an issue since kernel-default is packaging ko's with .zst suffix # rpm -ql kernel-default | grep ext4 /usr/lib/modules/5.16.10-1-default/kernel/fs/ext4 /usr/lib/modules/5.16.10-1-default/kernel/fs/ext4/ext4.ko.zst This isn't the issue for this bug as systemtap/elfutils should be opening /usr/lib/debug/usr/lib/modules/5.16.10-1-default/kernel/fs/ext4/ext4.ko.debug from kernel-default-debuginfo However 'strace stap --poison-cache -l 'module("ext4").function("write_mmp_block")' shows it's not accessing this debug file at all. build-id is consistent which has been a cause of these issues before # eu-readelf -a /usr/lib/debug/usr/lib/modules/5.16.10-1-default/kernel/fs/ext4/ext4.ko.debug | grep -A 1 GNU_BUILD_ID GNU 20 GNU_BUILD_ID Build ID: 771c20d08943980047dc325b9101eb04164d1956 # eu-readelf -a /lib/modules/5.16.10-1-default/kernel/fs/ext4/ext4.ko.zst | grep -A 1 GNU_BUILD_ID eu-readelf: failed reading '/lib/modules/5.16.10-1-default/kernel/fs/ext4/ext4.ko.zst': not a valid ELF file # eu-readelf -a /lib/modules/5.16.10-1-default/kernel/fs/ext4/ext4.ko | grep -A 1 GNU_BUILD_ID GNU 20 GNU_BUILD_ID Build ID: 771c20d08943980047dc325b9101eb04164d1956 Hmmn. I shall keep investigating. This is an issue on Factory and SP4. # strace -o /tmp/log stap --poison-cache -l 'module("ext4").function("write_mmp_block")' # grep ext4 /tmp/log execve("/bin/stap", ["stap", "--poison-cache", "-l", "module(\"ext4\").function(\"write_m"...], 0x7ffc2a83d928 /* 62 vars */) = 0 newfstatat(AT_FDCWD, "/lib/modules/5.16.10-1-default/kernel/fs/ext4", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0 openat(AT_FDCWD, "/lib/modules/5.16.10-1-default/kernel/fs/ext4", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3 newfstatat(AT_FDCWD, "/lib/modules/5.16.10-1-default/kernel/fs/ext4/ext4.ko", {st_mode=S_IFREG|0644, st_size=1972179, ...}, 0) = 0 newfstatat(AT_FDCWD, "/lib/modules/5.16.10-1-default/kernel/fs/ext4/ext4.ko.zst", {st_mode=S_IFREG|0644, st_size=476572, ...}, 0) = 0 [I manually uncompressed ext4.ko.zst -> ext4.ko] (In reply to Tony Jones from comment #5) > ok. definitely a regression here vs SP3. I'll look at it. > > # stap -vv --poison-cache -l 'module("ext4").function("*")' > ... > derive-probes (location #0): module("ext4").function("*") of keyword at > <input>:1:1 > semantic error: resolution failed in DWARF builder > thrown from: elaborate.cxx:1093 > semantic error: resolution failed in DWARF builder > thrown from: elaborate.cxx:1093 > semantic error: while resolving probe point: identifier 'module' at > <input>:1:7 > thrown from: elaborate.cxx:1081 > source: probe module("ext4").function("*") {} > ^ > > semantic error: no match > thrown from: elaborate.cxx:1044 Is this with the same kernel/module .ko or comparing results with different .kos (from SP3 and Tumbleweed)? There might be changes in the debug info that systemtap does not recognize, I'm not sure if the above diagnostics are a sign of that - somebody more familiar with systemtap should see what 'semantic error: resolution failed in DWARF builder' actually means. It could be it just fails to lookup a symbol, but maybe it's more. There's both the compiler and eventually dwz coming into play - to rule out the latter can you try disabling dwz? (I think you can do #!BuildIgnore: dwz when building the kernel and modules) You can also try if it's some new DWARF5 that confuses sytemtap by forcing GCC to use DWARF4 - you somehow need to inject -gdwarf-4 into the kernel/modules build to test that. Eventually it's also elfutils (CCing maintainer) which systemtap seems to use for DWARF parsing (libdw, that is). (In reply to Richard Biener from comment #8) > (In reply to Tony Jones from comment #5) > > ok. definitely a regression here vs SP3. I'll look at it. > > > > # stap -vv --poison-cache -l 'module("ext4").function("*")' > > ... > > derive-probes (location #0): module("ext4").function("*") of keyword at > > <input>:1:1 > > semantic error: resolution failed in DWARF builder > > thrown from: elaborate.cxx:1093 > > semantic error: resolution failed in DWARF builder > > thrown from: elaborate.cxx:1093 > > semantic error: while resolving probe point: identifier 'module' at > > <input>:1:7 > > thrown from: elaborate.cxx:1081 > > source: probe module("ext4").function("*") {} > > ^ > > > > semantic error: no match > > thrown from: elaborate.cxx:1044 > > Is this with the same kernel/module .ko or comparing results with different > .kos (from SP3 and Tumbleweed)? Same OS in all cases. Also SP3 works fine. > Eventually it's also elfutils (CCing maintainer) which systemtap seems to > use for DWARF parsing (libdw, that is). Sorry I thought you were the libdw maintainer. The problem appears to be debugging a kernel module ("module.function") in systemtap syntax when the main .ko for the module has been stripped and the debugging info is in a separate debuginfo. - systemtap is not having an issue debugging the primary kernel (aka "kernel.function works fine"). - systemtap is not having an issue debugging modules for a hand built kernel (here "module.function" works). So I don't think it's a DWARF5 vs DWARF4 issue. I don't think dwz is an issue (though I am curious why there is no support for it in elfutils). As I said in previous comment, the debuginfo version of the ext4.ko is where the debug info resides yet is not being referenced at all (per strace log). I'll look more at systemtap today. The ability to debug what is going on at this level isn't great. I opened https://bugzilla.suse.com/show_bug.cgi?id=1196510 against elfutils to add zst support. That isn't the issue but needs fixing IMO. Still debugging systemtap (setup_dwfl_report_kernel_p appears to not be working correctly but unclear if it's systemtap or elfutils). (In reply to Tony Jones from comment #10) > I opened https://bugzilla.suse.com/show_bug.cgi?id=1196510 against elfutils > to add zst support. > > That isn't the issue but needs fixing IMO. > > Still debugging systemtap (setup_dwfl_report_kernel_p appears to not be > working correctly but unclear if it's systemtap or elfutils). ok, the issue is due to elfutils. systemtap is building offline_search_names based off modules.dep, which contains the path with the zst suffix, but elfutils can't process this when asked. I was manually uncompressing the zst but since stap was still not finding it I assumed there was some other bug but there isn't as stap is basing it's search on what it finds in the modules.dep, aka still trying to process the zst version. If I hand edit the modules.dep removing the zst extension systemtap now works. @mliska, can we please fix the above elfutils bug please, in factory and SP4 # using updated libdw (now in Base:System, hopefully factory soon; also pushed for SP4:GA) stap -l 'module("ext4").function("*")' | head -10 module("ext4").function("ClearPageChecked@../include/linux/page-flags.h:433") module("ext4").function("ClearPageDirty@../include/linux/page-flags.h:423") module("ext4").function("ClearPageError@../include/linux/page-flags.h:419") module("ext4").function("ClearPageUptodate@../include/linux/page-flags.h:667") module("ext4").function("ERR_PTR@../include/linux/err.h:24") module("ext4").function("EXT4_SB@../fs/ext4/ext4.h:1757") module("ext4").function("INIT_LIST_HEAD@../include/linux/list.h:35") module("ext4").function("IS_ERR@../include/linux/err.h:34") module("ext4").function("IS_ERR_OR_NULL@../include/linux/err.h:39") module("ext4").function("PTR_ERR@../include/linux/err.h:29") > @mliska, can we please fix the above elfutils bug please, in factory and SP4
Thanks for the submit request you did and the investigation!
Can we close this as it's fixed in devel project now?
(In reply to Martin Liška from comment #13) > > @mliska, can we please fix the above elfutils bug please, in factory and SP4 > > Thanks for the submit request you did and the investigation! > Can we close this as it's fixed in devel project now? I was hoping to see it accepted in SP4. Although I still have problems with my own script, things seem to be working fine with the updated systemtap and libdw. (In reply to Tony Jones from comment #14) > I was hoping to see it accepted in SP4. Yes, would definitely be nice to have a working systemtap in SP4. I'll bump the priority. fix for elfutils accepted to SP4, closing. |