Bugzilla – Bug 1012568
VUL-0: CVE-2016-9962: runc: container escape vulnerability
Last modified: 2021-09-01 10:45:20 UTC
From: Aleksa Sarai runC passes a file descriptor from the host's filesystem to the "runc init" bootstrap process when joining a container. This allows a malicious process inside a container to gain access to the host filesystem with its current privilege set. Due to the race window between join-and-execve being quite small, this bug is quite hard to exploit. A similar, though mostly unrelated, exploit was discovered in LXC[1]. [1]: http://www.openwall.com/lists/oss-security/2016/11/23/6
Created attachment 704061 [details] patch that fixes the issue This is the patch I've sent to the rest of the upstream maintainers and has been LGTM'd. It applies on master, but I'll probably have to rebase it so we can apply it to our runC package.
Created attachment 704067 [details] reproducer patch In order to test this fix, you'll need to apply the reproducer patch (which just makes the race window much larger so you can test it in a shell -- the linked LXC reproducer could be modified to try to exploit runC without needing to widen the race window). First, set up a bundle for runC: % docker pull alpine % docker create --name alpine alpine % mkdir rootfs % docker export alpine | tar xvfC - rootfs/ % runc spec Now you have a config.json and rootfs that you can use with runC. Here's what an unpached runC looks like (shell1 and shell2 are two different shell sessions in the same directory): shell1% runc run ctr shell2% runc exec ctr sh [ this will block for 500 seconds ] shell1[ctr]# ps aux PID USER TIME COMMAND 1 root 0:00 sh 18 root 0:00 {runc:[2:INIT]} /proc/self/exe init 24 root 0:00 ps aux shell1[ctr]# ls /proc/18/fd -la total 0 dr-x------ 2 root root 0 Nov 28 14:29 . dr-xr-xr-x 9 root root 0 Nov 28 14:29 .. lrwx------ 1 root root 64 Nov 28 14:29 0 -> /dev/pts/8 lrwx------ 1 root root 64 Nov 28 14:29 1 -> /dev/pts/8 lrwx------ 1 root root 64 Nov 28 14:29 2 -> /dev/pts/8 lrwx------ 1 root root 64 Nov 28 14:29 3 -> socket:[2113990] lr-x------ 1 root root 64 Nov 28 14:29 4 -> /run/runc/test lrwx------ 1 root root 64 Nov 28 14:29 5 -> /dev/pts/8 l-wx------ 1 root root 64 Nov 28 14:29 6 -> /dev/null shell1[ctr]# ls -la /proc/18/fd/4/../../.. total 0 drwxr-xr-x 1 root root 166 Oct 16 14:59 . drwxr-xr-x 1 root root 166 Oct 16 14:59 .. drwxr-x--- 1 root root 46 Nov 27 10:37 .snapshots drwxr-xr-x 1 root root 1872 Nov 25 09:22 bin drwxr-xr-x 1 root root 552 Nov 25 09:46 boot drwxr-xr-x 21 root root 4240 Nov 27 22:09 dev drwxr-xr-x 1 root root 4958 Nov 28 14:28 etc drwxr-xr-x 1 root root 12 Jun 15 12:20 home drwxr-xr-x 1 root root 1572 Oct 30 12:00 lib drwxr-xr-x 1 root root 4160 Nov 25 09:21 lib64 drwxr-xr-x 1 root root 60 Aug 7 04:00 media drwxr-xr-x 1 root root 0 Jun 15 12:20 mnt drwxr-xr-x 1 root root 8 Oct 9 06:31 opt dr-xr-xr-x 327 root root 0 Nov 26 00:25 proc drwx------ 1 root root 324 Nov 26 09:52 root drwxr-xr-x 34 root root 900 Nov 28 14:28 run drwxr-xr-x 1 root root 4082 Nov 25 09:24 sbin drwxr-xr-x 1 root root 0 Jun 15 12:20 selinux drwxr-xr-x 1 root root 50 Jul 17 00:57 srv dr-xr-xr-x 13 root root 0 Nov 26 00:25 sys drwxrwxrwt 1 root root 42606 Nov 28 14:29 tmp drwxr-xr-x 1 root root 144 Jun 27 18:18 usr drwxr-xr-x 1 root root 116 Jun 26 07:39 var Where the final output is my *host's* root filesystem. With a patched runC, that file descriptor isn't open in the "runc exec" process: shell1% runc run ctr shell2% runc exec ctr ls [ this will block for 500 seconds ] shell1[ctr]# ps aux PID USER TIME COMMAND 1 root 0:00 sh 7 root 0:00 {runc:[2:INIT]} /proc/self/exe init 13 root 0:00 ps aux shell1[ctr]# ls -la /proc/7/fd total 0 dr-x------ 2 root root 0 Nov 28 14:29 . dr-xr-xr-x 9 root root 0 Nov 28 14:29 .. lrwx------ 1 root root 64 Nov 28 14:29 0 -> /dev/pts/8 lrwx------ 1 root root 64 Nov 28 14:29 1 -> /dev/pts/8 lrwx------ 1 root root 64 Nov 28 14:29 2 -> /dev/pts/8 lrwx------ 1 root root 64 Nov 28 14:29 3 -> socket:[2114856] lrwx------ 1 root root 64 Nov 28 14:29 4 -> /dev/pts/8 l-wx------ 1 root root 64 Nov 28 14:29 5 -> /dev/null
Created attachment 704071 [details] patch that applies cleanly on our runC package This patch applies cleanly on top of Virtualizaton:containers/runc, and Devel:Docker/runc.
bugbot adjusting priority
Created attachment 706119 [details] updated patch This is an updated patch, which is much simpler and will be the one applied upstream.
There's an update. It looks like there's a kernel race between O_CLOEXEC and set_dumpable which means that further changes are needed. I'm discussing the patch upstream before posting it here. In addition, this vulnerability now has its own CVE (CVE-2016-9962).
Created attachment 708064 [details] patch v3 Here is the final version of the upstream patch, which also handles a CLOEXEC kernel race condition (which I've sent a patch upstream for).
Created attachment 709034 [details] f59ba3cdd76f 0001-libcontainer-nsenter-set-init-processes-as-non-dumpa.patch So, I just figured out that the patch doesn't apply on top of the latest runC version we were planning on shipping for 1.12.X (f59ba3cdd76f). Attached are the necessary patches: * 0001-libcontainer-nsenter-set-init-processes-as-non-dumpa.patch * 0002-libcontainer-init-only-pass-stateDirFd-when-creating.patch
Created attachment 709039 [details] f59ba3cdd76f 0002-libcontainer-init-only-pass-stateDirFd-when-creating.patch
I've attached the two patches necessary for our new runC package (f59ba3cdd76f) for Docker 1.12.X. I'm not sure whether it works with the old version -- I will double check that both patches work. *BOTH* of the patches need to be applied. The first (0001-*) is the fix sent to the ML, the second (0002-*) is my fix which also ensures that containers with CAP_SYS_PTRACE are also "safe" (the attack won't work there either).
Okay, so I just tested it and the f59ba3cdd76f (0.1.1+gitr2818_f59ba3cdd76f) patches also work for the old 1.11.X package (0.1.1+gitr2816_02f8fa7). However, I will be updating the attached patches in a minute.
`docker run --pid=<another container>` is still not safe against this issue unfortunately (with --cap-add=CAP_SYS_PTRACE or --privileged for <another container>). I'm trying to fix the problem, but since I'm currently on holidays it's a bit hard to write the patch (I have kids to teach programming to :P).
Created attachment 709048 [details] 0001-libcontainer-nsenter-set-init-processes-as-non-dumpa.patch
Created attachment 709049 [details] 0002-libcontainer-init-only-pass-stateDirFd-when-creating.patch
(In reply to Aleksa Sarai from comment #19) > `docker run --pid=<another container>` is still not safe against this issue > unfortunately (with --cap-add=CAP_SYS_PTRACE or --privileged for <another > container>). I'm trying to fix the problem, but since I'm currently on > holidays it's a bit hard to write the patch (I have kids to teach > programming to :P). But note that Docker Inc decided that it wasn't an important security issue to be included in the CVE. I will do my best to fix the issue tonight.
Public at http://seclists.org/oss-sec/2017/q1/54 > Docker Engine version 1.12.6 has been released to address a vulnerability > and is immediately available for all supported platforms. Users are advised > to upgrade existing installations of the Docker Engine and use 1.12.6 for > new installations. > [...] > ============================================================== > [CVE-2016-9962] Insecure opening of file-descriptor allows privilege > escalation > > ============================================================== > > RunC allowed additional container processes via `runc exec` to be ptraced > by the pid 1 of the container. This allows the main processes of the > container, if running as root, to gain access to file-descriptors of these > new processes during the initialization and can lead to container escapes > or modification of runC state before the process is fully placed inside the > container > > > Credit for this discovery goes to Aleksa Sarai from SUSE and Tõnis Tiigi https://github.com/docker/docker/compare/v1.12.5...v1.12.6 https://github.com/opencontainers/runc/commit/50a19c6ff828c58e5dab13830bd3dacde268afe5
This is an autogenerated message for OBS integration: This bug (1012568) was mentioned in https://build.opensuse.org/request/show/450492 Factory / docker
SUSE-SU-2017:1964-1: An update that solves one vulnerability and has one errata is now available. Category: security (moderate) Bug References: 1012568,1019251 CVE References: CVE-2016-9962 Sources used: SUSE OpenStack Cloud 6 (src): containerd-0.2.5+gitr569_2a5e70c-15.3, docker-1.12.6-87.2, runc-0.1.1+gitr2819_50a19c6-15.2 SUSE Linux Enterprise Module for Containers 12 (src): containerd-0.2.5+gitr569_2a5e70c-15.3, docker-1.12.6-87.2, runc-0.1.1+gitr2819_50a19c6-15.2
openSUSE-SU-2017:1966-1: An update that solves one vulnerability and has 6 fixes is now available. Category: security (moderate) Bug References: 1004490,1009961,1012568,1015661,1016307,1019251,988408 CVE References: CVE-2016-9962 Sources used: openSUSE Leap 42.2 (src): containerd-0.2.5+gitr569_2a5e70c-8.1, docker-1.12.6-25.2, runc-0.1.1+gitr2819_50a19c6-8.1 openSUSE Leap 42.1 (src): containerd-0.2.5+gitr569_2a5e70c-10.1, docker-1.12.6-27.1, runc-0.1.1+gitr2819_50a19c6-10.1
closing as the fix is available now