|
Bugzilla – Full Text Bug Listing |
| Summary: | [Build 20240314] test docker_firewall: fails to restart docker | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | Dominique Leuenberger <dimstar> |
| Component: | Containers | Assignee: | Containers Team <containers-bugowner> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | dimstar, fbui, rbranco, systemd-maintainers |
| Version: | Current | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| URL: | https://openqa.opensuse.org/tests/4015303/modules/docker_firewall/steps/143 | ||
| Whiteboard: | |||
| Found By: | openQA | Services Priority: | |
| Business Priority: | Blocker: | Yes | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
Dominique Leuenberger
2024-03-15 09:56:51 UTC
earlier in the test suite, the restarting of docker service still worked it's only once the firewall setup is being done we fail to restart the service (In reply to Dominique Leuenberger from comment #1) > earlier in the test suite, the restarting of docker service still worked > > it's only once the firewall setup is being done we fail to restart the > service Do you have the journal or an error message from that test run? (In reply to Dan Čermák from comment #2) > (In reply to Dominique Leuenberger from comment #1) > > earlier in the test suite, the restarting of docker service still worked > > > > it's only once the firewall setup is being done we fail to restart the > > service > > Do you have the journal or an error message from that test run? based on comment#0: unfortunately, that job does not (yet) export a journal, making debugging a bit harder than it should be I filed https://progress.opensuse.org/issues/157330 for openQA to fix that (In reply to Dan Čermák from comment #2) > (In reply to Dominique Leuenberger from comment #1) > > earlier in the test suite, the restarting of docker service still worked > > > > it's only once the firewall setup is being done we fail to restart the > > service > > Do you have the journal or an error message from that test run? https://openqa.opensuse.org/tests/4015685#step/docker_firewall/138 × docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; preset: disabled) Active: failed (Result: start-limit-hit) since Fri 2024-03-15 10:26:33 UTC; 60ms ago Duration: 20.308s TriggeredBy: × docker.socket Docs: http://docs.docker.com Process: 6825 ExecStart=/usr/bin/dockerd --add-runtime oci=/usr/sbin/docker-runc $DOCKER_NETWORK_OPTIONS $DOCKER_OPTS (code=exited, status=0/SUCCESS) Main PID: 6825 (code=exited, status=0/SUCCESS) CPU: 1.373s Mar 15 10:26:32 localhost.localdomain dockerd[6825]: time="2024-03-15T10:26:32.444638010Z" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd Mar 15 10:26:32 localhost.localdomain dockerd[6825]: time="2024-03-15T10:26:32.444820153Z" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby Mar 15 10:26:32 localhost.localdomain dockerd[6831]: time="2024-03-15T10:26:32.445037306Z" level=debug msg="received signal" signal=terminated Mar 15 10:26:32 localhost.localdomain dockerd[6831]: time="2024-03-15T10:26:32.445154898Z" level=debug msg="sd notification" notified=false state="STOPPING=1" Mar 15 10:26:33 localhost.localdomain systemd[1]: docker.service: Deactivated successfully. Mar 15 10:26:33 localhost.localdomain systemd[1]: Stopped Docker Application Container Engine. Mar 15 10:26:33 localhost.localdomain systemd[1]: docker.service: Consumed 1.373s CPU time. Mar 15 10:26:33 localhost.localdomain systemd[1]: docker.service: Start request repeated too quickly. Mar 15 10:26:33 localhost.localdomain systemd[1]: docker.service: Failed with result 'start-limit-hit'. Mar 15 10:26:33 localhost.localdomain systemd[1]: Failed to start Docker Application Container Engine. --- ░░ Subject: Unit succeeded ░░ Defined-By: systemd ░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel ░░ ░░ The unit docker.service has successfully entered the 'dead' state. Mar 15 10:26:33 localhost.localdomain systemd[1]: Stopped Docker Application Container Engine. ░░ Subject: A stop job for unit docker.service has finished ░░ Defined-By: systemd ░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel ░░ ░░ A stop job for unit docker.service has finished. ░░ ░░ The job identifier is 10208 and the job result is done. Mar 15 10:26:33 localhost.localdomain systemd[1]: docker.service: Consumed 1.373s CPU time. ░░ Subject: Resources consumed by unit runtime ░░ Defined-By: systemd ░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel ░░ ░░ The unit docker.service completed and consumed the indicated resources. Mar 15 10:26:33 localhost.localdomain systemd[1]: docker.service: Start request repeated too quickly. Mar 15 10:26:33 localhost.localdomain systemd[1]: docker.service: Failed with result 'start-limit-hit'. ░░ Subject: Unit failed ░░ Defined-By: systemd ░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel ░░ ░░ The unit docker.service has entered the 'failed' state with result 'start-limit-hit'. Mar 15 10:26:33 localhost.localdomain systemd[1]: Failed to start Docker Application Container Engine. ░░ Subject: A start job for unit docker.service has failed ░░ Defined-By: systemd ░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel ░░ ░░ A start job for unit docker.service has finished with a failure. ░░ ░░ The job identifier is 10208 and the job result is failed. > Active: failed (Result: start-limit-hit) since Fri 2024-03-15 10:26:33 UTC; 60ms ago
Makes me wonder if systemd (updated to 255.3 in this snapshot) is being more aggressive in stopping us from repeatedly restarting the docker service.
@Franck: are you already aware of any issues around rate-limiting service restarts?
docker.service comes with its own set of limits: # Restart the docker process if it exits prematurely. Restart=on-failure StartLimitBurst=3 StartLimitInterval=60s Seems we might just be too fast now (or systemd properly honoring it finally) https://openqa.opensuse.org/tests/4017525/file/docker_firewall-journal.log There we now have a full journal https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18900 Adding a drop-in configuration to change the service definition. Allow 'more frequent restarts' as performed by the test Fixed and confirmed: https://openqa.opensuse.org/tests/4018060 |