Bug 1096726 (CVE-2018-15664) - VUL-0: CVE-2018-15664: docker: 'docker cp' is vulnerable to symlink-exchange race attacks
Summary: VUL-0: CVE-2018-15664: docker: 'docker cp' is vulnerable to symlink-exchange...
Status: RESOLVED FIXED
Alias: CVE-2018-15664
Product: SUSE Security Incidents
Classification: Novell Products
Component: Incidents (show other bugs)
Version: unspecified
Hardware: Other Other
: P3 - Medium : Normal
Target Milestone: ---
Assignee: Aleksa Sarai
QA Contact: Security Team bot
URL: https://smash.suse.de/issue/207692
Whiteboard: CVSSv3:SUSE:CVE-2018-15664:7.1:(AV:L/...
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-08 12:21 UTC by Marcus Meissner
Modified: 2019-09-01 22:37 UTC (History)
7 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
symlink_race.tar.xz (2.30 KB, application/x-xz)
2018-06-08 12:23 UTC, Marcus Meissner
Details
symlink_race.tar.xz (2.30 KB, application/x-xz)
2018-06-08 13:05 UTC, Aleksa Sarai
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcus Meissner 2018-06-08 12:21:32 UTC
via aleksa

From: Aleksa Sarai <asarai@suse.de>
Subject: 'docker cp' is vulnerable to symlink-exchange race attacks
Date: Fri, 8 Jun 2018 04:04:38 +1000

Hi all,

This is a bit of a continuation of 'docker cp' security bugs that I
helped find and fix more than 4 years ago in 2014[1,2] (ouch, I feel
old). The attack I outline here affects every version of Docker that
has had 'docker cp' (which according to my watch is v0.6.0 from 2013 --
versions before [1,2] were even more trivially exploitable but this
attack would work with them too).

To his credit, T=F5nis Tiigi mentioned this concern back when 'docker cp'
was being modified to use ContainerArchivePath[3]. It is also something
I had thought about when I was writing [1,2] but thought that the race
window was too small for an attack to be practical (not to mention that
[1,2] were considered to not be security fixes so I didn't give it much
more thought). But after I saw CVE-2017-1002101, I was reminded of how
difficult it is to sanely scope paths and decided to play around with
some symlink-exchange attacks.

The basic premise of this attack is that FollowSymlinkInScope suffers
from a fairly fundamental time-of-check-to-time-of-use (TOCTTOU if you
love acronyms) attack. If you're not familiar with FollowSymlinkInScope,
its job is to take a path and safely resolve it as though the process
was inside the container. After the full path has been resolved, the
resolved path is passed around a bit and then operated on a bit later
(in the case of 'docker cp' it is opened when creating the archive that
is streamed to the client). As you may notice, if an attacker can add a
symlink component to the path *after* the resolution but *before* it is
operated on, then you could end up resolving the symlink path component
on the host as root. In the case of 'docker cp' this gives you read
*and* write access to any path on the host.

Attached is a fairly dumb reproducer which basically does a
RENAME_EXCHANGE of a symlink to "/" and an empty directory in a loop,
hoping to hit the race condition. Then our "user" attempts to copy a
file from the path repeatedly. You can call it like this (note that
since this requires exploiting a race condition, only a small
percentage of the attempts succeed -- however if I had made my
reproducer a bit more clever about how quickly it does the
RENAME_EXCHANGE it could be more likely to hit the race).

  % ls
  build  run.sh
  % ./run.sh &>/dev/null & ; sleep 10s ; pkill -9 run.sh
  % chmod 0644 ex*/out # to fix up permissions for grep
  % grep 'SUCCESS' ex*/out | wc -l # managed to get it from the host
  2
  % grep 'FAILED'  ex*/out | wc -l # got the file from the container
  334

Note that 'run.sh' will ask for your sudo permissions, this is purely to
place a flag at /w00t_w00t_im_a_flag. You could use the same reproducer
to get the host's /etc/shadow, but this way you can just grep for
'SUCCESS'. As I said above, the 0.6% success rate is quite bad, but if
my reproducer had some more finely-tuned timing it might have a much
better success rate (the race window is not that small -- the problem is
that the path is used *multiple* times and each time it's used the
archiving library might decide to not resolve the swapped symlink by
accident).

Now, how do we fix this? Unfortunately there isn't an easy way of fixing
this particular brand of TOCTTOU. The only way of really fixing it (on
Linux -- I have no idea whether you'd need to fix it on Windows) would
be to get an O_PATH file descriptor and operate on it when generating
the archive. The simplest way of doing this would be to run a process
inside a chroot(2) which just does an O_PATH and then sends the file
descriptor back to dockerd using SCM_RIGHTS.

The more complicated way that wouldn't require spawning a new process
(but would result in you returning an error if an attack like this is in
progress, rather than returning a valid result) would involve something
like:

1. Do the current FollowSymlinkInScope logic to get a resolved path, but
   while resolving the path use O_PATH file descriptors that you keep
   around and operate through them.

2. After you've gotten the fully resolved path, open it with O_PATH.
   This is the file descriptor we will return if our validation
   succeeds. We now repeat the following step, using our existing O_PATH
   of the directory containing the fully resolved path as "parent_fd",
   and the fully resolved path's O_PATH as "current_fd".
   "next_path_component" is always the basename of "current_fd".

3. Do fstatat(parent_fd, next_path_component, ...) to see whether the
   next_path_component has the d_ino we expected (we compare this to
   fstat(current_fd, ...)). If they don't match, we error out. Otherwise
   we set parent_fd =3D openat(parent_fd, ".."). Continue this until
   current_fd is "/".

An important thing to note is that we have to modify our logic (in both
cases) in order to handle non-existent paths. FollowSymlinkInScope
allows you to scope non-existent paths (where the path components that
don't exist are just appended to the end), which requires a bit more
work to make safe. In principle this means that we will need to return
an O_PATH of the closest existing ancestor along with what path
components were left. I'm not sure there's really a safe way of doing
that though (a malicious process could make any of those trailing path
components symlinks, opening this can of worms again).

One other problem with the more complicated way of doing the resolution
is that we might need to be careful about comparing d_ino since
different filesystems could have the same d_ino for completely different
files (and I think FUSE can fake d_ino). So we would have to do an
fstatfs(2) as well and make sure that f_fsid and f_type are the same.

I'm a little bit sorry for dumping this one on you without a proper fix
in mind, because I helped write the original security fixes and I know
how hard it's going to be to make this safe. I really think there should
be a kernel interface like openat(..., O_NOFOLLOW|O_SERIOUSLY_NOFOLLOW)
which would cause symlinks to not be resolved as path components.
Actually maybe I could rustle up a kernel patch for that. The downside
is that you wouldn't be able to use this on older kernels, so that's a
bit of an issue.

[1]: https://github.com/moby/moby/pull/5720
[2]: https://github.com/moby/moby/pull/6000

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
Comment 11 Aleksa Sarai 2019-05-21 08:29:47 UTC
I just got back some emails from upstream with an agreement on the next steps.

In short, I am about to send a patch which (mostly) fixes the issue and then the issue will become public on their issue tracker. I've been side-tracked for the past few months by trying to get some kernel patches upstream which would allow for a more complete solution to this problem, but this hotfix is probably good enough for most users.
Comment 12 Aleksa Sarai 2019-05-22 13:01:41 UTC
I've submitted the upstream PR[1]. I will update the CVE entry to make it public.

[1]: https://github.com/moby/moby/pull/39252
Comment 13 Marcus Meissner 2019-05-28 06:02:05 UTC
From: Aleksa Sarai <cyphar@cyphar.com>
Subject: [oss-security] CVE-2018-15664: docker (all versions) is vulnerable to a symlink-race attack
Date: Tue, 28 May 2019 14:25:13 +1000

There is no released Docker version with a fix for this issue at the
time of writing. I've submitted a patch upstream[1] which is still
undergoing code review, and after discussion with them they agreed that
public disclosure of the issue was reasonable. Since the SUSE bug report
contains exploit scripts[2], I've attached them here too.

This attack was discovered by myself (Aleksa Sarai), though T=F5nis Tiigi
did mention the possibility of an attack like this in the past (at the
time we thought the race window was to small to exploit). In addition,
you could see this exploit as a continuation of some 'docker cp'
security bugs that I helped find and fix more than 4 years ago in
2014[3,4] (these were never assigned CVEs because at the time it was
thought that attacks which used access to docker.sock were not valid
security bugs).

[[ Overview ]]

The basic premise of this attack is that FollowSymlinkInScope suffers
=66rom a fairly fundamental TOCTOU attack. The purpose of
FollowSymlinkInScope is to take a given path and safely resolve it as
though the process was inside the container. After the full path has
been resolved, the resolved path is passed around a bit and then
operated on a bit later (in the case of 'docker cp' it is opened when
creating the archive that is streamed to the client). If an attacker can
add a symlink component to the path *after* the resolution but *before*
it is operated on, then you could end up resolving the symlink path
component on the host as root. In the case of 'docker cp' this gives you
read *and* write access to any path on the host.

As far as I'm aware there are no meaningful protections against this
kind of attack (other than not allowing "docker cp" on running
containers -- but that only helps with his particular attack through
FollowSymlinkInScope). Unless you have restricted the Docker daemon
through AppArmor, then it can affect the host filesystem -- I haven't
verified if the issue is as exploitable under the default SELinux
configuration on Fedora/CentOS/RHEL.

[[ Exploit Scripts ]]

Attacked are two reproducers of the issue. They both include a Docker
image which contains a simple binary that does a RENAME_EXCHANGE of a
symlink to "/" and an empty directory in a loop, hoping to hit the race
condition. In both of the scripts, the user is trying to copy a file to
or from a path containing the swapped symlink.

In the case of run_read.sh, I get a <1% chance of hitting the race
condition (my attack script is quite dumb, it's possible with better
timing you'd be able to hit the race window much more effectively).
However <1% still means it only takes 10s of trying to get read access
to the host with root permissions.

  % ./run_read.sh &>/dev/null & ; sleep 10s ; pkill -9 run.sh
  % chmod 0644 ex*/out # to fix up permissions for grep
  % grep 'SUCCESS' ex*/out | wc -l # managed to get it from the host
  2
  % grep 'FAILED'  ex*/out | wc -l # got the file from the container
  334

However, the run_write.sh script can overwrite the host filesystem in
very few iterations -- this is because internally Docker has a
"chrootarchive" concept where the archive is extracted from within a
chroot. However, Docker doesn't chroot into the container's "/" (which
would make this exploit ineffective), it chroots into the parent
directory of the archive target -- which is attacker controlled. As a
result, this actually results in the attack being more likely to succeed
(once the chroot has hit the race, the rest of the attack is guaranteed
to succeed).

The scripts will ask for sudo permissions, but that is only to be able
to create a "flag file" in /. You could modify the scripts to target
/etc/shadow instead if you like.

[[ Future Work ]]

In an attempt to come up with a better solution for this problem, I've
been working on some Linux kernel patches which add the ability to
safely resolve paths from within a rootfs[5]. But they are still being
reviewed and it will take a while for userspace to be able to take
advantage of the new interfaces. However, I am also working on
redesigning my "secure join" library's API[6] so that we can at least
better detect these attacks on older kernels and take advantage of [5]
in newer kernels.

[1]: https://github.com/docker/docker/pull/39252
[2]: https://bugzilla.suse.com/show_bug.cgi?id=3D1096726
[3]: https://github.com/docker/docker/pull/5720
[4]: https://github.com/docker/docker/pull/6000
[5]: https://marc.info/?l=3Dlinux-fsdevel&m=3D155835923516235&w=3D2
[6]: https://github.com/cyphar/filepath-securejoin

Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
Comment 17 Swamp Workflow Management 2019-06-17 13:11:49 UTC
SUSE-SU-2019:1514-1: An update that fixes one vulnerability is now available.

Category: security (moderate)
Bug References: 1096726
CVE References: CVE-2018-15664
Sources used:
SUSE Linux Enterprise Module for Containers 12 (src):    docker-18.09.6_ce-98.40.1
SUSE CaaS Platform 3.0 (src):    docker-kubic-18.09.6_ce-98.40.1
OpenStack Cloud Magnum Orchestration 7 (src):    docker-18.09.6_ce-98.40.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 18 Swamp Workflow Management 2019-06-19 10:25:27 UTC
SUSE-SU-2019:1562-1: An update that fixes one vulnerability is now available.

Category: security (moderate)
Bug References: 1096726
CVE References: CVE-2018-15664
Sources used:
SUSE Linux Enterprise Module for Open Buildservice Development Tools 15-SP1 (src):    docker-18.09.6_ce-6.20.3
SUSE Linux Enterprise Module for Open Buildservice Development Tools 15 (src):    docker-18.09.6_ce-6.20.3
SUSE Linux Enterprise Module for Containers 15-SP1 (src):    docker-18.09.6_ce-6.20.3
SUSE Linux Enterprise Module for Containers 15 (src):    docker-18.09.6_ce-6.20.3

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 19 Swamp Workflow Management 2019-06-25 10:11:40 UTC
openSUSE-SU-2019:1621-1: An update that fixes one vulnerability is now available.

Category: security (moderate)
Bug References: 1096726
CVE References: CVE-2018-15664
Sources used:
openSUSE Leap 15.1 (src):    docker-18.09.6_ce-lp151.2.6.1
openSUSE Leap 15.0 (src):    docker-18.09.6_ce-lp150.5.20.1
Comment 20 Aleksa Sarai 2019-07-22 22:22:07 UTC
This was fixed in all released Docker versions a while ago.
Comment 21 Swamp Workflow Management 2019-08-27 19:10:52 UTC
SUSE-SU-2019:2223-1: An update that solves three vulnerabilities and has four fixes is now available.

Category: security (moderate)
Bug References: 1096726,1123156,1123387,1135460,1136974,1137860,1143386
CVE References: CVE-2018-15664,CVE-2019-10152,CVE-2019-6778
Sources used:
SUSE Linux Enterprise Module for Containers 15-SP1 (src):    fuse-overlayfs-0.4.1-3.3.8, fuse3-3.6.1-3.3.8, podman-1.4.4-4.8.1, slirp4netns-0.3.0-3.3.3
SUSE Linux Enterprise Module for Basesystem 15-SP1 (src):    libcontainers-common-20190401-3.3.5

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 22 Swamp Workflow Management 2019-09-01 22:11:56 UTC
openSUSE-SU-2019:2044-1: An update that solves three vulnerabilities and has four fixes is now available.

Category: security (moderate)
Bug References: 1096726,1123156,1123387,1135460,1136974,1137860,1143386
CVE References: CVE-2018-15664,CVE-2019-10152,CVE-2019-6778
Sources used:
openSUSE Leap 15.1 (src):    fuse-overlayfs-0.4.1-lp151.2.1, fuse3-3.6.1-lp151.2.1, libcontainers-common-20190401-lp151.2.3.1, podman-1.4.4-lp151.3.3.1, slirp4netns-0.3.0-lp151.2.3.1