Bugzilla – Bug 1096726
VUL-0: CVE-2018-15664: docker: 'docker cp' is vulnerable to symlink-exchange race attacks
Last modified: 2019-09-01 22:37:47 UTC
via aleksa From: Aleksa Sarai <asarai@suse.de> Subject: 'docker cp' is vulnerable to symlink-exchange race attacks Date: Fri, 8 Jun 2018 04:04:38 +1000 Hi all, This is a bit of a continuation of 'docker cp' security bugs that I helped find and fix more than 4 years ago in 2014[1,2] (ouch, I feel old). The attack I outline here affects every version of Docker that has had 'docker cp' (which according to my watch is v0.6.0 from 2013 -- versions before [1,2] were even more trivially exploitable but this attack would work with them too). To his credit, T=F5nis Tiigi mentioned this concern back when 'docker cp' was being modified to use ContainerArchivePath[3]. It is also something I had thought about when I was writing [1,2] but thought that the race window was too small for an attack to be practical (not to mention that [1,2] were considered to not be security fixes so I didn't give it much more thought). But after I saw CVE-2017-1002101, I was reminded of how difficult it is to sanely scope paths and decided to play around with some symlink-exchange attacks. The basic premise of this attack is that FollowSymlinkInScope suffers from a fairly fundamental time-of-check-to-time-of-use (TOCTTOU if you love acronyms) attack. If you're not familiar with FollowSymlinkInScope, its job is to take a path and safely resolve it as though the process was inside the container. After the full path has been resolved, the resolved path is passed around a bit and then operated on a bit later (in the case of 'docker cp' it is opened when creating the archive that is streamed to the client). As you may notice, if an attacker can add a symlink component to the path *after* the resolution but *before* it is operated on, then you could end up resolving the symlink path component on the host as root. In the case of 'docker cp' this gives you read *and* write access to any path on the host. Attached is a fairly dumb reproducer which basically does a RENAME_EXCHANGE of a symlink to "/" and an empty directory in a loop, hoping to hit the race condition. Then our "user" attempts to copy a file from the path repeatedly. You can call it like this (note that since this requires exploiting a race condition, only a small percentage of the attempts succeed -- however if I had made my reproducer a bit more clever about how quickly it does the RENAME_EXCHANGE it could be more likely to hit the race). % ls build run.sh % ./run.sh &>/dev/null & ; sleep 10s ; pkill -9 run.sh % chmod 0644 ex*/out # to fix up permissions for grep % grep 'SUCCESS' ex*/out | wc -l # managed to get it from the host 2 % grep 'FAILED' ex*/out | wc -l # got the file from the container 334 Note that 'run.sh' will ask for your sudo permissions, this is purely to place a flag at /w00t_w00t_im_a_flag. You could use the same reproducer to get the host's /etc/shadow, but this way you can just grep for 'SUCCESS'. As I said above, the 0.6% success rate is quite bad, but if my reproducer had some more finely-tuned timing it might have a much better success rate (the race window is not that small -- the problem is that the path is used *multiple* times and each time it's used the archiving library might decide to not resolve the swapped symlink by accident). Now, how do we fix this? Unfortunately there isn't an easy way of fixing this particular brand of TOCTTOU. The only way of really fixing it (on Linux -- I have no idea whether you'd need to fix it on Windows) would be to get an O_PATH file descriptor and operate on it when generating the archive. The simplest way of doing this would be to run a process inside a chroot(2) which just does an O_PATH and then sends the file descriptor back to dockerd using SCM_RIGHTS. The more complicated way that wouldn't require spawning a new process (but would result in you returning an error if an attack like this is in progress, rather than returning a valid result) would involve something like: 1. Do the current FollowSymlinkInScope logic to get a resolved path, but while resolving the path use O_PATH file descriptors that you keep around and operate through them. 2. After you've gotten the fully resolved path, open it with O_PATH. This is the file descriptor we will return if our validation succeeds. We now repeat the following step, using our existing O_PATH of the directory containing the fully resolved path as "parent_fd", and the fully resolved path's O_PATH as "current_fd". "next_path_component" is always the basename of "current_fd". 3. Do fstatat(parent_fd, next_path_component, ...) to see whether the next_path_component has the d_ino we expected (we compare this to fstat(current_fd, ...)). If they don't match, we error out. Otherwise we set parent_fd =3D openat(parent_fd, ".."). Continue this until current_fd is "/". An important thing to note is that we have to modify our logic (in both cases) in order to handle non-existent paths. FollowSymlinkInScope allows you to scope non-existent paths (where the path components that don't exist are just appended to the end), which requires a bit more work to make safe. In principle this means that we will need to return an O_PATH of the closest existing ancestor along with what path components were left. I'm not sure there's really a safe way of doing that though (a malicious process could make any of those trailing path components symlinks, opening this can of worms again). One other problem with the more complicated way of doing the resolution is that we might need to be careful about comparing d_ino since different filesystems could have the same d_ino for completely different files (and I think FUSE can fake d_ino). So we would have to do an fstatfs(2) as well and make sure that f_fsid and f_type are the same. I'm a little bit sorry for dumping this one on you without a proper fix in mind, because I helped write the original security fixes and I know how hard it's going to be to make this safe. I really think there should be a kernel interface like openat(..., O_NOFOLLOW|O_SERIOUSLY_NOFOLLOW) which would cause symlinks to not be resolved as path components. Actually maybe I could rustle up a kernel patch for that. The downside is that you wouldn't be able to use this on older kernels, so that's a bit of an issue. [1]: https://github.com/moby/moby/pull/5720 [2]: https://github.com/moby/moby/pull/6000 -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH <https://www.cyphar.com/>
I just got back some emails from upstream with an agreement on the next steps. In short, I am about to send a patch which (mostly) fixes the issue and then the issue will become public on their issue tracker. I've been side-tracked for the past few months by trying to get some kernel patches upstream which would allow for a more complete solution to this problem, but this hotfix is probably good enough for most users.
I've submitted the upstream PR[1]. I will update the CVE entry to make it public. [1]: https://github.com/moby/moby/pull/39252
From: Aleksa Sarai <cyphar@cyphar.com> Subject: [oss-security] CVE-2018-15664: docker (all versions) is vulnerable to a symlink-race attack Date: Tue, 28 May 2019 14:25:13 +1000 There is no released Docker version with a fix for this issue at the time of writing. I've submitted a patch upstream[1] which is still undergoing code review, and after discussion with them they agreed that public disclosure of the issue was reasonable. Since the SUSE bug report contains exploit scripts[2], I've attached them here too. This attack was discovered by myself (Aleksa Sarai), though T=F5nis Tiigi did mention the possibility of an attack like this in the past (at the time we thought the race window was to small to exploit). In addition, you could see this exploit as a continuation of some 'docker cp' security bugs that I helped find and fix more than 4 years ago in 2014[3,4] (these were never assigned CVEs because at the time it was thought that attacks which used access to docker.sock were not valid security bugs). [[ Overview ]] The basic premise of this attack is that FollowSymlinkInScope suffers =66rom a fairly fundamental TOCTOU attack. The purpose of FollowSymlinkInScope is to take a given path and safely resolve it as though the process was inside the container. After the full path has been resolved, the resolved path is passed around a bit and then operated on a bit later (in the case of 'docker cp' it is opened when creating the archive that is streamed to the client). If an attacker can add a symlink component to the path *after* the resolution but *before* it is operated on, then you could end up resolving the symlink path component on the host as root. In the case of 'docker cp' this gives you read *and* write access to any path on the host. As far as I'm aware there are no meaningful protections against this kind of attack (other than not allowing "docker cp" on running containers -- but that only helps with his particular attack through FollowSymlinkInScope). Unless you have restricted the Docker daemon through AppArmor, then it can affect the host filesystem -- I haven't verified if the issue is as exploitable under the default SELinux configuration on Fedora/CentOS/RHEL. [[ Exploit Scripts ]] Attacked are two reproducers of the issue. They both include a Docker image which contains a simple binary that does a RENAME_EXCHANGE of a symlink to "/" and an empty directory in a loop, hoping to hit the race condition. In both of the scripts, the user is trying to copy a file to or from a path containing the swapped symlink. In the case of run_read.sh, I get a <1% chance of hitting the race condition (my attack script is quite dumb, it's possible with better timing you'd be able to hit the race window much more effectively). However <1% still means it only takes 10s of trying to get read access to the host with root permissions. % ./run_read.sh &>/dev/null & ; sleep 10s ; pkill -9 run.sh % chmod 0644 ex*/out # to fix up permissions for grep % grep 'SUCCESS' ex*/out | wc -l # managed to get it from the host 2 % grep 'FAILED' ex*/out | wc -l # got the file from the container 334 However, the run_write.sh script can overwrite the host filesystem in very few iterations -- this is because internally Docker has a "chrootarchive" concept where the archive is extracted from within a chroot. However, Docker doesn't chroot into the container's "/" (which would make this exploit ineffective), it chroots into the parent directory of the archive target -- which is attacker controlled. As a result, this actually results in the attack being more likely to succeed (once the chroot has hit the race, the rest of the attack is guaranteed to succeed). The scripts will ask for sudo permissions, but that is only to be able to create a "flag file" in /. You could modify the scripts to target /etc/shadow instead if you like. [[ Future Work ]] In an attempt to come up with a better solution for this problem, I've been working on some Linux kernel patches which add the ability to safely resolve paths from within a rootfs[5]. But they are still being reviewed and it will take a while for userspace to be able to take advantage of the new interfaces. However, I am also working on redesigning my "secure join" library's API[6] so that we can at least better detect these attacks on older kernels and take advantage of [5] in newer kernels. [1]: https://github.com/docker/docker/pull/39252 [2]: https://bugzilla.suse.com/show_bug.cgi?id=3D1096726 [3]: https://github.com/docker/docker/pull/5720 [4]: https://github.com/docker/docker/pull/6000 [5]: https://marc.info/?l=3Dlinux-fsdevel&m=3D155835923516235&w=3D2 [6]: https://github.com/cyphar/filepath-securejoin Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH <https://www.cyphar.com/>
SUSE-SU-2019:1514-1: An update that fixes one vulnerability is now available. Category: security (moderate) Bug References: 1096726 CVE References: CVE-2018-15664 Sources used: SUSE Linux Enterprise Module for Containers 12 (src): docker-18.09.6_ce-98.40.1 SUSE CaaS Platform 3.0 (src): docker-kubic-18.09.6_ce-98.40.1 OpenStack Cloud Magnum Orchestration 7 (src): docker-18.09.6_ce-98.40.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
SUSE-SU-2019:1562-1: An update that fixes one vulnerability is now available. Category: security (moderate) Bug References: 1096726 CVE References: CVE-2018-15664 Sources used: SUSE Linux Enterprise Module for Open Buildservice Development Tools 15-SP1 (src): docker-18.09.6_ce-6.20.3 SUSE Linux Enterprise Module for Open Buildservice Development Tools 15 (src): docker-18.09.6_ce-6.20.3 SUSE Linux Enterprise Module for Containers 15-SP1 (src): docker-18.09.6_ce-6.20.3 SUSE Linux Enterprise Module for Containers 15 (src): docker-18.09.6_ce-6.20.3 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
openSUSE-SU-2019:1621-1: An update that fixes one vulnerability is now available. Category: security (moderate) Bug References: 1096726 CVE References: CVE-2018-15664 Sources used: openSUSE Leap 15.1 (src): docker-18.09.6_ce-lp151.2.6.1 openSUSE Leap 15.0 (src): docker-18.09.6_ce-lp150.5.20.1
This was fixed in all released Docker versions a while ago.
SUSE-SU-2019:2223-1: An update that solves three vulnerabilities and has four fixes is now available. Category: security (moderate) Bug References: 1096726,1123156,1123387,1135460,1136974,1137860,1143386 CVE References: CVE-2018-15664,CVE-2019-10152,CVE-2019-6778 Sources used: SUSE Linux Enterprise Module for Containers 15-SP1 (src): fuse-overlayfs-0.4.1-3.3.8, fuse3-3.6.1-3.3.8, podman-1.4.4-4.8.1, slirp4netns-0.3.0-3.3.3 SUSE Linux Enterprise Module for Basesystem 15-SP1 (src): libcontainers-common-20190401-3.3.5 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
openSUSE-SU-2019:2044-1: An update that solves three vulnerabilities and has four fixes is now available. Category: security (moderate) Bug References: 1096726,1123156,1123387,1135460,1136974,1137860,1143386 CVE References: CVE-2018-15664,CVE-2019-10152,CVE-2019-6778 Sources used: openSUSE Leap 15.1 (src): fuse-overlayfs-0.4.1-lp151.2.1, fuse3-3.6.1-lp151.2.1, libcontainers-common-20190401-lp151.2.3.1, podman-1.4.4-lp151.3.3.1, slirp4netns-0.3.0-lp151.2.3.1