Bugzilla – Bug 1215856
[Build 20.1] aa-logprof fails after update AppArmor to the latest available
Last modified: 2024-02-07 16:55:43 UTC
Most likely linked with https://jira.suse.com/browse/PED-5600 when we run aa-logprof -d /tmp/apparmor.d we expect that this interactive tool will find something to review regarding nscd like in here: https://openqa.suse.de/tests/11988915#step/aa_logprof/23 but instead it is empty: https://openqa.suse.de/tests/12357846#step/aa_logprof/23 Actions in the previous module are making the test fail (I isoloated the issue to happen only if running these two openQA modules, running only the one failing doesn't fail) Therefore I'm guessing that the entries in /var/log/audit/audit.log might be different or unexpected for the new version of the tool. See https://openqa.suse.de/tests/12306367#step/aa_autodep/34 vs https://openqa.suse.de/tests/11988915#step/aa_autodep/34 So question remains if in the test we should be expecting now this owner change for /etc/nscd.conf ?
Thanks for CC'ing me, but unfortunately I don't have access to openqa.suse.de nur Jira. I assume jsc#PED-5600 means the update to AppArmor 3.1.6? If someone sends me some information (openqa test results, audit.log etc. - or a link to a test failure on openqa.opensuse.org) I might be able to help. Otherwise I'll just lean back and enjoy watching what you do ;-)
Created attachment 869930 [details] different output as result of command
Created attachment 869931 [details] Logs failed scenario
Created attachment 869932 [details] Logs pass scenario
Code executed in order: [1] https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/tests/security/apparmor/aa_autodep.pm [2] https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/tests/security/apparmor/aa_logprof.pm If I exclude steps on [1] I don't get the failure as mentioned above, so something is happening in [1] that affect to [2] where we see the failure or the different behavior, no questions prompt. Let me know if you need more information.
Small issue first: the aa-logprof test wants to delete a rule matching '\/usr.*\/nscd mrix' but the profile has /usr/{bin,sbin}/nscd rmix, Note mrix vs. rmix (the rmix is there since at least 2.11), so this part of the sed command won't match -> please update the regex. (That still leaves /etc/nscd.conf, which matches the sed regex and should get removed.) [Speaking about that - maybe add a "diff" call to the test and verify that two lines got deleted? Or just do the math with "wc -l".] Ok, now to the real issue: (The comment is a bit longer, as I try to understand what's going on. See the lines marked with ** for some info I need.) The obvious difference is the profile name and enforce vs. complain mode: - pass: apparmor="DENIED" operation="open" profile="nscd" name="/etc/nscd.conf" - fail: apparmor="ALLOWED" operation="open" profile="/usr/sbin/nscd" name="/etc/passwd" The upstream profile (which is the expected one in the aa-logprof test, if I understand the test right) is named "nscd". The profile generated during the aa-autodep test is named "/usr/sbin/nscd". So for some reason, the temporary profile generated by aa-autodep is still loaded - and as you can see in the log message, it lacks more rules than planned by the test (because aa-autodep only adds the bare minimum). In theory, the /usr/sbin/nscd profile should be unloaded at the end of the aa-autodep test by aa-disable Unfortunately (but understandably), the tests clear the audit.log, but that makes debugging much harder :-( For example, I can't see which profiles got loaded or unloaded. ** Seeing the audit.log at the end of the aa-autodep test (after the ** aa-disable call) would be helpful. ** I'd also be interested in the output of aa-status at the end of the ** aa-autodep test. ** Also, please check the output of aa-status at the beginning of the ** aa-logprof test (before it does anything) - it should list the "nscd" ** profile, but not the "/usr/sbin/nscd" profile. The boring (but not too helpful) explanation why aa-logprof doesn't ask anything in the fail case: There are only events for the profile /usr/sbin/nscd which does not exist in /tmp/apparmor.d and therefore get ignored. As a sidenote: I guess the reason why the test doesn't also fail in Tumbleweed is that someone added a workaround in 2019 (!): # For tumbleweed, unload /usr/sbin/nscd profile in case [...] if (is_tumbleweed) { script_run "echo '/usr/sbin/nscd {}' | apparmor_parser -R"; } The commit message for that says "Fix aa_logprof fails on non-clean audit.log and non-unload /usr/sbin/nscd profile". I mean, yeah, it "fixes" (with quotes) the problem, but maybe it also hides a bug (maybe in aa-disable). ** Can you please do a test run on Tumbleweed with this workaround removed? (ideally on openqa.opensuse.org - which would have the advantage that I can access all the details) ** Also, can you do a test run on Leap/SLE with the workaround enabled? (I don't recommend to enable or even keep the workaround, but at least it would confirm we are on the right track.) To make things a bit more interesting: I manually tested (on Tumbleweed) the steps from the aa-autodep test, and the profile was unloaded by aa-disable as expected. Another question, even if it's unlikely: ** At the start of the aa-logprof test, is there any file or symlink in ** /tmp/apparmor.d/disable/ ? BTW: I don't remember a bug (or even a change) in aa-disable that could explain why unloading the profile might fail. There were however some changes to the utils code, so at least in theory one of these changes could have a side effect. I have a final question: ** When these failures started, was there also a kernel update involved, or ** only the AppArmor update? (IIRC there were some changes that affect which profile is chosen if there are two profiles that match the path of a binary, but I'd have to ask John for details.)
Thanks for all the detailed suggestions. I updated the code and let me try to point you where it was applied and what happened: - Applied correct regex and now it drops both lines and also do some check for line count. The problem is that this lead us to the same error that we see in SLE now in Tumbleweed: https://openqa.opensuse.org/tests/3650957#step/aa_logprof/31 If we only drop line `nscd.conf r` we still succeed: https://openqa.opensuse.org/tests/3650913#step/aa_logprof/18 - Removing the workaround in Tumbleweed on top of the drop of the two lines give us similar result than just dropping the two lines: https://openqa.opensuse.org/tests/3650957#step/aa_logprof/31 - audit.log after the aa-disable call: https://openqa.opensuse.org/tests/3650957/file/audit_log_after_disable_nscd.log - aa-status at the end of the aa-autodep: https://openqa.opensuse.org/tests/3651487#step/aa_autodep/36 - aa-status at the beginning of the aa-logprof: https://openqa.opensuse.org/tests/3651487#step/aa_logprof/5 - Test run only with the workaround removed in Tumbleweed (btw in comments tab in openQA I added some comment to identify what are the changes for each run) https://openqa.opensuse.org/tests/3651487 - Test run with the workaround enabled in SLE (drop the two lines and apply the workaround): https://openqa.suse.de/tests/12551908#step/aa_logprof/39 It fails on validating `cat $aa_tmp_prof/usr.sbin.nscd"` which iterates over the items on that list to drop. If we try to only drop the second item as the original test was doing (because the first one was not found) and still apply the workaround: https://openqa.suse.de/tests/12551967 So it PASSED :) but the same solution I cannot apply to Tumbleweed...and still seems wrong to not drop the first item (although I don't know the reason why drop those). - Symlink at the start of aa-logprof: https://openqa.opensuse.org/tests/3650957#step/aa_logprof/2 - No Kernel update seen, Linux 5.14.21-150500.55.28 is both run, passing and failing.
Thanks, that (especially running the tests on openqa.opensuse.org so that I can see the step-by-step details) helps a lot. Unfortunately I'm still searching for the root cause of the issue. At least I found and fixed a few minor issues in the tests while doing so ;-) The most interesting part is aa_logprof-audit.log (from Tumbleweed with workarounds removed, https://openqa.opensuse.org/tests/3651487) type=AVC msg=audit(1697535771.318:331): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/nscd" pid=1493 comm="apparmor_parser" type=AVC msg=audit(1697535813.380:421): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="/usr/sbin/nscd" pid=5356 comm="apparmor_parser" type=AVC msg=audit(1697535874.117:481): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/nscd" pid=5461 comm="apparmor_parser" So the /usr/sbin/nscd profile gets loaded (by aa-autodep, expected), unloaded (by aa-disable in the autodep test, expected) - but surprisingly gets loaded again. (Hmm, maybe I even have an explanation - but I'll wait for the test results of my PR to verify that.) After that, there are several denials (in complain mode): type=AVC msg=audit(1697535922.188:503): apparmor="ALLOWED" [...] profile="/usr/sbin/nscd" [...] which indicates that nscd runs under the "/usr/sbin/nscd" profile (probably the one generated with aa-autodep), not with the "nscd" profile as shipped in the apparmor-profiles package. Since that profile doesn't exist on disk anymore (/tmp/apparmor.d/ gets deleted at the end of the autodep tests), aa-logprof will ignore events for it. I opened https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18042 to verify my theory and to find out why the "/usr/sbin/nscd" profile gets loaded again. (In reply to Joaquín Rivera from comment #7) > - Applied correct regex and now it drops both lines and also do some check > for line count. The problem is that this lead us to the same error that we > see in SLE now in Tumbleweed: > https://openqa.opensuse.org/tests/3650957#step/aa_logprof/31 I expected that ;-) [...] > So it PASSED :) but the same solution I cannot apply to Tumbleweed...and > still seems wrong to not drop the first item (although I don't know the > reason why drop those). That's easy - the idea is to make the profile incomplete, so that nscd runs into some denials, fills the audit.log, and aa-logprof then has something to pick up from the log and ask about adding it to the profile.
Some test runs later... As guessed/expected, nscd was running under the (wrong) /usr/sbin/nscd profile during the aa-logprof test. The added debugging (see the early versions of https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18042) then helped to find out what's going on. It turned out that reloading the profiles in lib/apparmortest.pm pre_run_hook() would load the cached /usr/sbin/nscd profile as created by aa-autodep (same cache filename, but different profile name). The now-obvious solution is to delete the cache file that contains the wrong profile name at the end of the aa-autodep test. Oh, and I removed the workaround for Tumbleweed :-) See https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18042 for a more detailed explanation. The only thing I still don't understand is why this didn't break earlier - the cache behaviour of AppArmor 3.0.x and 3.1.x didn't really change...
Great, thanks! you even create the PR, I approved and merged as I was able to verify the fix with your branch in SLE.
This bug (which was actually a bug in the openQA test) was fixed with https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18042 I'm surprised it was still open, and will finally close it.