Bugzilla – Bug 1190670
[Build 20210917][glibc2.34] docker blocks clone3 syscall
Last modified: 2022-01-20 14:29:34 UTC
## Observation openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-extra_tests_textmode_docker_containers@64bit fails in [docker_image](https://openqa.opensuse.org/tests/1926064/modules/docker_image/steps/182) docker again fights against glibc, where a new syscall, clone3, was added (kernel support since 5.3) Running a TW image inside docker now fails with terminate called after throwing an instance of 'std::system_error' what(): Operation not permitted We need at least https://github.com/moby/moby/pull/42836 in our docker packages ## Test suite description Maintainer: dheidler. Extra tests about CLI software in container module ## Reproducible Fails since (at least) Build [20210917](https://openqa.opensuse.org/tests/1925139) ## Expected result Last good: [20210916](https://openqa.opensuse.org/tests/1921813) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=64bit&test=extra_tests_textmode_docker_containers&version=Tumbleweed)
As long as we don't at least fix our own docker package to handle the case, we cannot release a new snapshot (and reverting glibc is not easily feasible, hence => blocker/shipstopper)
Similar class of bus as was seen in https://bugzilla.suse.com/show_bug.cgi?id=1182451 with glibc 2.33
Andreas: Any chance to also get a workaround from glibc's side, to not run into the same disasters as we had with bug 1182451 - i.e non-working TW images on allCI providers?
So, despite https://bugzilla.suse.com/show_bug.cgi?id=1182451#c20 docker (or libseccomp) are _still_ not fixed to return -ENOSYS for unknown syscalls? After years of that being known to create problems? Putting a work-around for this into glibc (see below) would only further delay any working fix for docker it would seem, I think we have enough indication to say that. I don't think that would be a good idea at this point. Docker _needs_ fixing for good. A work-around in glibc for this bug in containers would (as usual) be problematic: it wouldn't be acceptable upstream and hence not be really applicable for us either. The nature of the work-around is basically: "if this syscall doesn't work for whatever reason (including missing permissions), try with old syscall, even if the old syscall doesn't provide all features of the new one". That's obviously a bad idea: possibly the reason for the new syscall erroring out is exactly those new features that only that one has. In that situation we really don't want to call the old one. Which of course is exactly the reason why Docker must be fixed and can't be worked around: genuine errors that are to be reported to the user can't be differentiated from docker-invented errors because syscall isn't emulated. So, sorry, I don't see anything that glibc could responsibly do. That's all IMHO, I would let Andreas overrule me of course.
Nope.
Trying to avoid rehashing the discussion we had at the beginning of the year, the issue boils down to Docker profiles being written assuming that any excluded syscalls get EPERM. runc has had support for setting the default errno to ENOSYS for a while, but Docker hasn't yet migrated. The problem is that you cannot rewrite every profile with default EPERM to a profile that has explicit EPERM and default ENOSYS due to limitations in libseccomp. The non-existence of an inverse of SCMP_MASKED_EQ is one issue[1], but the main issue is that you cannot have more than one rule applied to the same argument with libseccomp -- meaning it's not possible to invert any OR rules that apply to an argument. The net result is that clone2(2) or socket(2) will return -ENOSYS for some arguments when it should be returning -EPERM. The solution I put in runc at the beginning of the year was the best I could think of (after trying many alternatives that would've been more robust but were not possible to do due to libseccomp limitations), but it relies on Docker being careful when they update their syscall table -- they weren't careful and now we're in this situation. On a more practical note, I've backported the patch to unbreak clone3. It's already in Factory and I am submitting a MR for SLES, which is the most we can do at the moment unfortunately. I opened [2] to re-open discussions on this topic with Docker upstream. [1]: https://github.com/seccomp/libseccomp/issues/310 [2]: https://github.com/moby/moby/issues/42871
(In reply to Michael Matz from comment #4) > So, despite https://bugzilla.suse.com/show_bug.cgi?id=1182451#c20 docker (or > libseccomp) are _still_ not fixed to return -ENOSYS for unknown syscalls? > After > years of that being known to create problems? Mainly summarising comment 6: libseccomp isn't, runc has an ugly workaround but docker doesn't enable that yet while podman does. > Putting a work-around for this into glibc (see below) would only further > delay any > working fix for docker it would seem, I think we have enough indication to > say > that. I don't think that would be a good idea at this point. Docker _needs_ > fixing for good. I don't think that would make any difference. Users care about that the end result works, and if everything except TW and Fedora 35 works, they'll just use something else. > A work-around in glibc for this bug in containers would (as usual) be > problematic: it wouldn't be acceptable upstream and hence not be really > applicable for us > either. Which unfortunately sounds like "not upstreamable" has a higher priority than "the product works". We lost more than a handful of users because of the last avoidable issue. And TBH, I don't know why it wouldn't be acceptable. glibc has workarounds for lots of kernel and compiler bugs, this would surely fit. It's not just our distribution that's affected, but all users of glibc 2.35. This would benefit all of them. > The nature of the work-around is basically: "if this syscall > doesn't work for whatever reason (including missing permissions), try with > old syscall, even if > the old syscall doesn't provide all features of the new one". That's > obviously a > bad idea: possibly the reason for the new syscall erroring out is exactly > those new features that only that one has. In that situation we really > don't want > to call the old one. Which is what the real fix also does: returning -ENOSYS prevents glibc from using clone3 and it falls back to the old ones. So I don't see a difference in that regard. > Which of course is exactly the reason why Docker must > be > fixed and can't be worked around: genuine errors that are to be reported to > the > user can't be differentiated from docker-invented errors because syscall > isn't emulated. Not necessarily. There are parameters to clone which are guaranteed to not return -EPERM, and glibc could use those to probe that a received -EPERM is caused by docker. I don't think that single extra syscall (the result can cached) violates any interface guarantees. It's more than just a single one-line bandaid though. > So, sorry, I don't see anything that glibc could responsibly do. That's all > IMHO, > I would let Andreas overrule me of course.
(In reply to Fabian Vogt from comment #8) > (In reply to Michael Matz from comment #4) > > So, despite https://bugzilla.suse.com/show_bug.cgi?id=1182451#c20 docker (or > > libseccomp) are _still_ not fixed to return -ENOSYS for unknown syscalls? > > After > > years of that being known to create problems? > > Mainly summarising comment 6: libseccomp isn't, runc has an ugly workaround > but docker doesn't enable that yet while podman does. Just to clarify: runc has two workarounds for this issue -- one really ugly one, and one okay (but not ideal) one. Docker uses the really ugly one (in fact you can't not use it -- runc applies it for all EPERM-default profiles) but it is quite brittle and if Docker isn't careful (which they weren't) it can break as has happened here. Podman uses the other workaround which won't break but has other minor issues. This issue I just opened[1] is trying to get Docker to switch to the less ugly workaround. [1]: https://github.com/moby/moby/issues/42871
FYI any project which use Tumbleweed container on GitHub Actions (e.g. snapper [1], LTP [2], iputils [3]) is hit by this issue because GitHub Actions does not allow to use podman [4]. Sometimes it feels like keeping up the CI is harder than the project itself. [1] https://github.com/pevik/snapper/runs/3686187874 [2] https://github.com/linux-test-project/ltp/runs/3685832145 [3] https://github.com/iputils/iputils/runs/3684329716 [4] https://github.com/actions/runner/issues/505
SUSE-RU-2021:3204-1: An update that has one recommended fix can now be installed. Category: recommended (important) Bug References: 1190670 CVE References: JIRA References: Sources used: SUSE Linux Enterprise Module for Containers 12 (src): docker-20.10.6_ce-98.69.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
*** Bug 1190985 has been marked as a duplicate of this bug. ***
SUSE-RU-2021:3245-1: An update that has one recommended fix can now be installed. Category: recommended (important) Bug References: 1190670 CVE References: JIRA References: Sources used: SUSE MicroOS 5.1 (src): docker-20.10.6_ce-153.1 SUSE MicroOS 5.0 (src): docker-20.10.6_ce-153.1 SUSE Linux Enterprise Server for SAP 15-SP1 (src): docker-20.10.6_ce-153.1 SUSE Linux Enterprise Server for SAP 15 (src): docker-20.10.6_ce-153.1 SUSE Linux Enterprise Server 15-SP1-LTSS (src): docker-20.10.6_ce-153.1 SUSE Linux Enterprise Server 15-SP1-BCL (src): docker-20.10.6_ce-153.1 SUSE Linux Enterprise Server 15-LTSS (src): docker-20.10.6_ce-153.1 SUSE Linux Enterprise Module for Containers 15-SP3 (src): docker-20.10.6_ce-153.1 SUSE Linux Enterprise Module for Containers 15-SP2 (src): docker-20.10.6_ce-153.1 SUSE Linux Enterprise High Performance Computing 15-SP1-LTSS (src): docker-20.10.6_ce-153.1 SUSE Linux Enterprise High Performance Computing 15-SP1-ESPOS (src): docker-20.10.6_ce-153.1 SUSE Linux Enterprise High Performance Computing 15-LTSS (src): docker-20.10.6_ce-153.1 SUSE Linux Enterprise High Performance Computing 15-ESPOS (src): docker-20.10.6_ce-153.1 SUSE Enterprise Storage 6 (src): docker-20.10.6_ce-153.1 SUSE CaaS Platform 4.0 (src): docker-20.10.6_ce-153.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
openSUSE-RU-2021:3245-1: An update that has one recommended fix can now be installed. Category: recommended (important) Bug References: 1190670 CVE References: JIRA References: Sources used: openSUSE Leap 15.3 (src): docker-20.10.6_ce-153.1, docker-kubic-20.10.6_ce-6.52.1
Then I suspect Bug 1190985 was not a duplicate as that occurs on fully updated tw + docker with latest tw images, so the issue still exists somewhere ....
I'll add myself to the list of victims of this bug here: https://github.com/cobbler/cobbler/issues/2811 The problem is that I can't use the workaround here since GitHub Actions is using Ubuntu as a base OS and thus a fix inside docker is not feasible for this. Any ideas how to progress this?
Same as https://bugzilla.suse.com/show_bug.cgi?id=1190670#c19 Ubuntu with TW docker image, zypper's failing to run.
Ubuntu now disabled clone3 (unconditionally even) in its glibc package: https://launchpad.net/ubuntu/+source/glibc/2.34-0ubuntu3
openSUSE-RU-2021:1324-1: An update that has one recommended fix can now be installed. Category: recommended (important) Bug References: 1190670 CVE References: JIRA References: Sources used: openSUSE Leap 15.2 (src): docker-20.10.6_ce-lp152.2.15.1
*** Bug 1191052 has been marked as a duplicate of this bug. ***
I was asked to add this additional detail, ubuntu host, running opensuse/tumbleweed:latest gives an error running zypper: yaleman@buildmonkey:~$ docker pull opensuse/tumbleweed:latest latest: Pulling from opensuse/tumbleweed Digest: sha256:357b19900641c6ec899ade58beca42f7bfff5f62364707b9d4f07f7153d93911 Status: Image is up to date for opensuse/tumbleweed:latest docker.io/opensuse/tumbleweed:latest yaleman@buildmonkey:~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 21.04 Release: 21.04 Codename: hirsute yaleman@buildmonkey:~$ docker run --rm -it opensuse/tumbleweed:latest bash zypper re cd719e175275:/ # zypper re terminate called after throwing an instance of 'std::system_error' what(): Operation not permitted Aborted (core dumped)
Fetched the image: $ podman pull opensuse/tumbleweed Tried to run: $ podman run -it <image id> /bin/bash Everything seems alright, except that as soon as I do: $ zypper up I get: """ terminate called after throwing an instance of 'std::system_error' what(): Operation not permitted Aborted (core dumped) """
I meant: # zypper up
*** Bug 1191586 has been marked as a duplicate of this bug. ***
Note one can use the following workaround for GitHub Actions: https://github.com/stanislavlevin/tox-console-scripts/commit/9b2d6e7f1fc9414fc965a2679afddb2fccf1d698 container: image: opensuse/tumbleweed options: --privileged
Also *sometimes* this worked [1]: options: --security-opt seccomp=unconfined (but it timeouted for other projects). Also, some distros just add workaround for glibc to fall-back from clone3 to clone in case it's EPERM [2]. It'd be really nice if we could also be pragmatic and accept it. --- a/sysdeps/unix/sysv/linux/clone-internal.c +++ b/sysdeps/unix/sysv/linux/clone-internal.c @@ -52,7 +52,7 @@ __clone_internal (struct clone_args *cl_args, /* Try clone3 first. */ int saved_errno = errno; ret = __clone3 (cl_args, sizeof (*cl_args), func, arg); - if (ret != -1 || errno != ENOSYS) + if (ret != -1 || (errno != ENOSYS && errno != EPERM)) return ret; /* NB: Restore errno since errno may be checked against non-zero [1] https://github.com/stanislavlevin/tox-console-scripts/commit/9b2d6e7f1fc9414fc965a2679afddb2fccf1d698 [2] http://git.altlinux.org/gears/g/glibc.git?p=glibc.git;a=commitdiff;h=09e37c7111e39b7c70846aea30941c03c43e6f54
The workaround commands are find if you're only using "docker run" but they aren't compatible with docker build...
This is an autogenerated message for openQA integration by the openqa_review script: This bug is still referenced in a failing openQA test: containers_tw_image_on_centos_host https://openqa.opensuse.org/tests/2028935 To prevent further reminder comments one of the following options should be followed: 1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted 2. The openQA job group is moved to "Released" or "EOL" (End-of-Life) 3. The bugref in the openQA scenario is removed or replaced, e.g. `label:wontfix:boo1234`
This is an autogenerated message for openQA integration by the openqa_review script: This bug is still referenced in a failing openQA test: containers_tw_image_on_centos_host https://openqa.opensuse.org/tests/2057725 To prevent further reminder comments one of the following options should be followed: 1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted 2. The openQA job group is moved to "Released" or "EOL" (End-of-Life) 3. The bugref in the openQA scenario is removed or replaced, e.g. `label:wontfix:boo1234`