Bug 1190670 - [Build 20210917][glibc2.34] docker blocks clone3 syscall
Summary: [Build 20210917][glibc2.34] docker blocks clone3 syscall
Status: NEW
: 1191052 1191586 (view as bug list)
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Current
Hardware: Other Other
: P1 - Urgent : Major with 21 votes (vote)
Target Milestone: ---
Assignee: Michael Matz
QA Contact: E-mail List
URL: https://openqa.opensuse.org/tests/192...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-20 13:08 UTC by Dominique Leuenberger
Modified: 2022-01-20 14:29 UTC (History)
23 users (show)

See Also:
Found By: openQA
Services Priority:
Business Priority:
Blocker: Yes
Marketing QA Status: ---
IT Deployment: ---
dleuenberger: SHIP_STOPPER+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dominique Leuenberger 2021-09-20 13:08:00 UTC
## Observation

openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-extra_tests_textmode_docker_containers@64bit fails in
[docker_image](https://openqa.opensuse.org/tests/1926064/modules/docker_image/steps/182)

docker again fights against glibc, where a new syscall, clone3, was added (kernel support since 5.3)

Running a TW image inside docker now fails with

terminate called after throwing an instance of 'std::system_error'
  what():  Operation not permitted

We need at least https://github.com/moby/moby/pull/42836 in our docker packages

## Test suite description
Maintainer: dheidler. Extra tests about CLI software in container module


## Reproducible

Fails since (at least) Build [20210917](https://openqa.opensuse.org/tests/1925139)


## Expected result

Last good: [20210916](https://openqa.opensuse.org/tests/1921813) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=64bit&test=extra_tests_textmode_docker_containers&version=Tumbleweed)
Comment 1 Dominique Leuenberger 2021-09-20 13:09:28 UTC
As long as we don't at least fix our own docker package to handle the case, we cannot release a new snapshot (and reverting glibc is not easily feasible, hence => blocker/shipstopper)
Comment 2 Dominique Leuenberger 2021-09-20 13:11:23 UTC
Similar class of bus as was seen in https://bugzilla.suse.com/show_bug.cgi?id=1182451 with glibc 2.33
Comment 3 Dominique Leuenberger 2021-09-20 14:05:35 UTC
Andreas: Any chance to also get a workaround from glibc's side, to not run into the same disasters as we had with bug 1182451 - i.e non-working TW images on allCI providers?
Comment 4 Michael Matz 2021-09-20 15:25:38 UTC
So, despite https://bugzilla.suse.com/show_bug.cgi?id=1182451#c20 docker (or
libseccomp) are _still_ not fixed to return -ENOSYS for unknown syscalls?  After
years of that being known to create problems?

Putting a work-around for this into glibc (see below) would only further delay any
working fix for docker it would seem, I think we have enough indication to say
that.  I don't think that would be a good idea at this point.  Docker _needs_
fixing for good.

A work-around in glibc for this bug in containers would (as usual) be problematic: it wouldn't be acceptable upstream and hence not be really applicable for us
either.  The nature of the work-around is basically: "if this syscall doesn't work for whatever reason (including missing permissions), try with old syscall, even if
the old syscall doesn't provide all features of the new one".  That's obviously a
bad idea: possibly the reason for the new syscall erroring out is exactly
those new features that only that one has.  In that situation we really don't want
to call the old one.  Which of course is exactly the reason why Docker must be
fixed and can't be worked around: genuine errors that are to be reported to the
user can't be differentiated from docker-invented errors because syscall isn't emulated.

So, sorry, I don't see anything that glibc could responsibly do.  That's all IMHO,
I would let Andreas overrule me of course.
Comment 5 Andreas Schwab 2021-09-21 10:07:03 UTC
Nope.
Comment 6 Aleksa Sarai 2021-09-22 01:21:29 UTC
Trying to avoid rehashing the discussion we had at the beginning of the year, the issue boils down to Docker profiles being written assuming that any excluded syscalls get EPERM. runc has had support for setting the default errno to ENOSYS for a while, but Docker hasn't yet migrated.

The problem is that you cannot rewrite every profile with default EPERM to a profile that has explicit EPERM and default ENOSYS due to limitations in libseccomp. The non-existence of an inverse of SCMP_MASKED_EQ is one issue[1], but the main issue is that you cannot have more than one rule applied to the same argument with libseccomp -- meaning it's not possible to invert any OR rules that apply to an argument. The net result is that clone2(2) or socket(2) will return -ENOSYS for some arguments when it should be returning -EPERM.

The solution I put in runc at the beginning of the year was the best I could think of (after trying many alternatives that would've been more robust but were not possible to do due to libseccomp limitations), but it relies on Docker being careful when they update their syscall table -- they weren't careful and now we're in this situation.

On a more practical note, I've backported the patch to unbreak clone3. It's already in Factory and I am submitting a MR for SLES, which is the most we can do  at the moment unfortunately. I opened [2] to re-open discussions on this topic with Docker upstream.

[1]: https://github.com/seccomp/libseccomp/issues/310
[2]: https://github.com/moby/moby/issues/42871
Comment 8 Fabian Vogt 2021-09-22 14:56:54 UTC
(In reply to Michael Matz from comment #4)
> So, despite https://bugzilla.suse.com/show_bug.cgi?id=1182451#c20 docker (or
> libseccomp) are _still_ not fixed to return -ENOSYS for unknown syscalls? 
> After
> years of that being known to create problems?

Mainly summarising comment 6: libseccomp isn't, runc has an ugly workaround
but docker doesn't enable that yet while podman does.

> Putting a work-around for this into glibc (see below) would only further
> delay any
> working fix for docker it would seem, I think we have enough indication to
> say
> that.  I don't think that would be a good idea at this point.  Docker _needs_
> fixing for good.

I don't think that would make any difference. Users care about that the end
result works, and if everything except TW and Fedora 35 works, they'll just
use something else.

> A work-around in glibc for this bug in containers would (as usual) be
> problematic: it wouldn't be acceptable upstream and hence not be really
> applicable for us
> either.

Which unfortunately sounds like "not upstreamable" has a higher priority than
"the product works". We lost more than a handful of users because of the last
avoidable issue.

And TBH, I don't know why it wouldn't be acceptable. glibc has workarounds
for lots of kernel and compiler bugs, this would surely fit. It's not just our
distribution that's affected, but all users of glibc 2.35. This would benefit
all of them.

> The nature of the work-around is basically: "if this syscall
> doesn't work for whatever reason (including missing permissions), try with
> old syscall, even if
> the old syscall doesn't provide all features of the new one".  That's
> obviously a
> bad idea: possibly the reason for the new syscall erroring out is exactly
> those new features that only that one has.  In that situation we really
> don't want
> to call the old one.

Which is what the real fix also does: returning -ENOSYS prevents glibc from
using clone3 and it falls back to the old ones. So I don't see a difference
in that regard.

> Which of course is exactly the reason why Docker must
> be
> fixed and can't be worked around: genuine errors that are to be reported to
> the
> user can't be differentiated from docker-invented errors because syscall
> isn't emulated.

Not necessarily. There are parameters to clone which are guaranteed to not
return -EPERM, and glibc could use those to probe that a received -EPERM is
caused by docker. I don't think that single extra syscall (the result can 
cached) violates any interface guarantees. It's more than just a single
one-line bandaid though.

> So, sorry, I don't see anything that glibc could responsibly do.  That's all
> IMHO,
> I would let Andreas overrule me of course.
Comment 9 Aleksa Sarai 2021-09-22 15:09:24 UTC
(In reply to Fabian Vogt from comment #8)
> (In reply to Michael Matz from comment #4)
> > So, despite https://bugzilla.suse.com/show_bug.cgi?id=1182451#c20 docker (or
> > libseccomp) are _still_ not fixed to return -ENOSYS for unknown syscalls? 
> > After
> > years of that being known to create problems?
> 
> Mainly summarising comment 6: libseccomp isn't, runc has an ugly workaround
> but docker doesn't enable that yet while podman does.

Just to clarify: runc has two workarounds for this issue -- one really ugly one, and one okay (but not ideal) one. Docker uses the really ugly one (in fact you can't not use it -- runc applies it for all EPERM-default profiles) but it is quite brittle and if Docker isn't careful (which they weren't) it can break as has happened here. Podman uses the other workaround which won't break but has other minor issues. This issue I just opened[1] is trying to get Docker to switch to the less ugly workaround.

[1]: https://github.com/moby/moby/issues/42871
Comment 10 Petr Vorel 2021-09-23 10:34:19 UTC
FYI any project which use Tumbleweed container on GitHub Actions (e.g. snapper [1], LTP [2], iputils [3]) is hit by this issue because GitHub Actions does not allow to use podman [4].

Sometimes it feels like keeping up the CI is harder than the project itself.

[1] https://github.com/pevik/snapper/runs/3686187874
[2] https://github.com/linux-test-project/ltp/runs/3685832145
[3] https://github.com/iputils/iputils/runs/3684329716
[4] https://github.com/actions/runner/issues/505
Comment 11 Swamp Workflow Management 2021-09-23 16:19:58 UTC
SUSE-RU-2021:3204-1: An update that has one recommended fix can now be installed.

Category: recommended (important)
Bug References: 1190670
CVE References: 
JIRA References: 
Sources used:
SUSE Linux Enterprise Module for Containers 12 (src):    docker-20.10.6_ce-98.69.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 12 Fabian Vogt 2021-09-28 09:36:25 UTC
*** Bug 1190985 has been marked as a duplicate of this bug. ***
Comment 13 Swamp Workflow Management 2021-09-28 16:20:28 UTC
SUSE-RU-2021:3245-1: An update that has one recommended fix can now be installed.

Category: recommended (important)
Bug References: 1190670
CVE References: 
JIRA References: 
Sources used:
SUSE MicroOS 5.1 (src):    docker-20.10.6_ce-153.1
SUSE MicroOS 5.0 (src):    docker-20.10.6_ce-153.1
SUSE Linux Enterprise Server for SAP 15-SP1 (src):    docker-20.10.6_ce-153.1
SUSE Linux Enterprise Server for SAP 15 (src):    docker-20.10.6_ce-153.1
SUSE Linux Enterprise Server 15-SP1-LTSS (src):    docker-20.10.6_ce-153.1
SUSE Linux Enterprise Server 15-SP1-BCL (src):    docker-20.10.6_ce-153.1
SUSE Linux Enterprise Server 15-LTSS (src):    docker-20.10.6_ce-153.1
SUSE Linux Enterprise Module for Containers 15-SP3 (src):    docker-20.10.6_ce-153.1
SUSE Linux Enterprise Module for Containers 15-SP2 (src):    docker-20.10.6_ce-153.1
SUSE Linux Enterprise High Performance Computing 15-SP1-LTSS (src):    docker-20.10.6_ce-153.1
SUSE Linux Enterprise High Performance Computing 15-SP1-ESPOS (src):    docker-20.10.6_ce-153.1
SUSE Linux Enterprise High Performance Computing 15-LTSS (src):    docker-20.10.6_ce-153.1
SUSE Linux Enterprise High Performance Computing 15-ESPOS (src):    docker-20.10.6_ce-153.1
SUSE Enterprise Storage 6 (src):    docker-20.10.6_ce-153.1
SUSE CaaS Platform 4.0 (src):    docker-20.10.6_ce-153.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 14 Swamp Workflow Management 2021-09-28 16:23:10 UTC
openSUSE-RU-2021:3245-1: An update that has one recommended fix can now be installed.

Category: recommended (important)
Bug References: 1190670
CVE References: 
JIRA References: 
Sources used:
openSUSE Leap 15.3 (src):    docker-20.10.6_ce-153.1, docker-kubic-20.10.6_ce-6.52.1
Comment 18 William Brown 2021-09-29 09:20:21 UTC
Then I suspect Bug 1190985  was not a duplicate as that occurs on fully updated tw + docker with latest tw images, so the issue still exists somewhere ....
Comment 19 Enno Gotthold 2021-09-29 09:27:04 UTC
I'll add myself to the list of victims of this bug here:
https://github.com/cobbler/cobbler/issues/2811

The problem is that I can't use the workaround here since GitHub Actions is using Ubuntu as a base OS and thus a fix inside docker is not feasible for this. Any ideas how to progress this?
Comment 20 James Hodgkinson 2021-09-29 09:33:21 UTC
Same as https://bugzilla.suse.com/show_bug.cgi?id=1190670#c19

Ubuntu with TW docker image, zypper's failing to run.
Comment 21 Fabian Vogt 2021-09-29 12:07:07 UTC
Ubuntu now disabled clone3 (unconditionally even) in its glibc package:

https://launchpad.net/ubuntu/+source/glibc/2.34-0ubuntu3
Comment 22 Swamp Workflow Management 2021-09-30 10:17:46 UTC
openSUSE-RU-2021:1324-1: An update that has one recommended fix can now be installed.

Category: recommended (important)
Bug References: 1190670
CVE References: 
JIRA References: 
Sources used:
openSUSE Leap 15.2 (src):    docker-20.10.6_ce-lp152.2.15.1
Comment 23 Andreas Schwab 2021-09-30 10:17:57 UTC
*** Bug 1191052 has been marked as a duplicate of this bug. ***
Comment 24 Andreas Schwab 2021-09-30 11:03:28 UTC
*** Bug 1190985 has been marked as a duplicate of this bug. ***
Comment 25 James Hodgkinson 2021-10-04 23:31:33 UTC
I was asked to add this additional detail, ubuntu host, running opensuse/tumbleweed:latest gives an error running zypper:

yaleman@buildmonkey:~$ docker pull opensuse/tumbleweed:latest
latest: Pulling from opensuse/tumbleweed
Digest: sha256:357b19900641c6ec899ade58beca42f7bfff5f62364707b9d4f07f7153d93911
Status: Image is up to date for opensuse/tumbleweed:latest
docker.io/opensuse/tumbleweed:latest
yaleman@buildmonkey:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 21.04
Release:	21.04
Codename:	hirsute
yaleman@buildmonkey:~$ docker run --rm -it opensuse/tumbleweed:latest bash
zypper re
cd719e175275:/ # zypper re
terminate called after throwing an instance of 'std::system_error'
  what():  Operation not permitted
Aborted (core dumped)
Comment 26 Adrien Glauser 2021-10-09 08:15:02 UTC
Fetched the image:
$ podman pull opensuse/tumbleweed

Tried to run:
$ podman run -it <image id> /bin/bash

Everything seems alright, except that as soon as I do:
$ zypper up

I get:
"""
terminate called after throwing an instance of 'std::system_error'
  what():  Operation not permitted
Aborted (core dumped)
"""
Comment 27 Adrien Glauser 2021-10-09 08:15:40 UTC
I meant:

# zypper up
Comment 28 Andreas Schwab 2021-10-18 08:34:52 UTC
*** Bug 1191586 has been marked as a duplicate of this bug. ***
Comment 29 Martin Liška 2021-10-18 12:35:47 UTC
Note one can use the following workaround for GitHub Actions:
https://github.com/stanislavlevin/tox-console-scripts/commit/9b2d6e7f1fc9414fc965a2679afddb2fccf1d698

    container:
      image: opensuse/tumbleweed
      options: --privileged
Comment 30 Petr Vorel 2021-10-26 22:15:31 UTC
Also *sometimes* this worked [1]:

options: --security-opt seccomp=unconfined

(but it timeouted for other projects).


Also, some distros just add workaround for glibc to fall-back from clone3 to
clone in case it's EPERM [2]. It'd be really nice if we could also be pragmatic and accept it.

--- a/sysdeps/unix/sysv/linux/clone-internal.c
+++ b/sysdeps/unix/sysv/linux/clone-internal.c
@@ -52,7 +52,7 @@ __clone_internal (struct clone_args *cl_args,
   /* Try clone3 first.  */
   int saved_errno = errno;
   ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
-  if (ret != -1 || errno != ENOSYS)
+  if (ret != -1 || (errno != ENOSYS && errno != EPERM))
     return ret;
 
   /* NB: Restore errno since errno may be checked against non-zero

[1] https://github.com/stanislavlevin/tox-console-scripts/commit/9b2d6e7f1fc9414fc965a2679afddb2fccf1d698
[2] http://git.altlinux.org/gears/g/glibc.git?p=glibc.git;a=commitdiff;h=09e37c7111e39b7c70846aea30941c03c43e6f54
Comment 31 James Hodgkinson 2021-10-26 22:47:14 UTC
The workaround commands are find if you're only using "docker run" but they aren't compatible with docker build...
Comment 32 openQA Review 2021-11-12 02:37:03 UTC
This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: containers_tw_image_on_centos_host
https://openqa.opensuse.org/tests/2028935

To prevent further reminder comments one of the following options should be followed:
1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
3. The bugref in the openQA scenario is removed or replaced, e.g. `label:wontfix:boo1234`
Comment 33 openQA Review 2021-11-27 00:01:05 UTC
This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: containers_tw_image_on_centos_host
https://openqa.opensuse.org/tests/2057725

To prevent further reminder comments one of the following options should be followed:
1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
3. The bugref in the openQA scenario is removed or replaced, e.g. `label:wontfix:boo1234`