Bug 1227713 - docker containers lose socket communication and fail to restart after restart of docker service
Summary: docker containers lose socket communication and fail to restart after restart...
Status: NEW
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Containers (show other bugs)
Version: Leap 15.5
Hardware: x86-64 openSUSE Leap 15.5
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Containers Team
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-12 14:07 UTC by Oliver Jakobi
Modified: 2024-07-12 14:07 UTC (History)
0 users

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Oliver Jakobi 2024-07-12 14:07:50 UTC
I have a couple of machines running docker installed from https://build.opensuse.org/project/show/Virtualization:containers . 


| docker                 | 26.1.4_ce-150500.1.3 |
| docker-bash-completion | 26.1.4_ce-150500.1.3 |
| docker-buildx          | 0.15.1-150500.39.3 |
| docker-compose         | 2.28.1-150500.93.3 |
| docker-rootless-extras | 26.1.4_ce-150500.1.3 |
| containerd             | 1.7.17-150500.185.3 |
| catatonit              | 0.2.0-150500.26.5 |

When updating these hosts to a new docker release, I had to stop, remove and redeploy all my docker compose services after a reboot, because docker reported that it failed to start the containers, because the secret store was not initialized. 

At first, I discarded this as "Update related", but on other OSes I can see containers being restarted after an update without problems.
All hosts have "live-restore" set to "true".

Also, I can replicate the issue by just restarting docker.service.


Example: 

traefik running from docker-compose.yml 
`/var/run/docker.sock` bind mounted to traefik container for automatic service discovery.

dockertesthost:/data/traefik # docker-compose logs traefik
traefik  | time="2024-07-12T12:46:31Z" level=info msg="Configuration loaded from file: /traefik.yml"



---- Restarting docker service ----
dockertesthost:/data/traefik # systemctl restart docker.service
dockertesthost:/data/traefik # docker-compose logs traefik
traefik  | time="2024-07-12T12:46:31Z" level=info msg="Configuration loaded from file: /traefik.yml"
traefik  | time="2024-07-12T13:52:53Z" level=error msg="Provider connection error unexpected EOF, retrying in 689.748336ms" providerName=docker
traefik  | time="2024-07-12T13:52:53Z" level=error msg="Failed to retrieve information of the docker client and server host: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?" providerName=docker
traefik  | time="2024-07-12T13:52:53Z" level=error msg="Provider connection error Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?, retrying in 666.847158ms" providerName=docker
traefik  | time="2024-07-12T13:52:54Z" level=error msg="Failed to retrieve information of the docker client and server host: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?" providerName=docker
traefik  | time="2024-07-12T13:52:54Z" level=error msg="Provider connection error Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?, retrying in 1.252701859s" providerName=docker
[...]
traefik  | time="2024-07-12T13:54:15Z" level=error msg="Provider connection error Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?, retrying in 2.436617494s" providerName=docker
traefik  | time="2024-07-12T13:54:17Z" level=error msg="Failed to retrieve information of the docker client and server host: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?" providerName=docker
traefik  | time="2024-07-12T13:54:17Z" level=error msg="Provider connection error Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?, retrying in 4.026284383s" providerName=docker



---- Restarting the container with docker compose ----
dockertesthost:/data/traefik # docker compose restart 
[+] Restarting 0/1
 ⠙ Container traefik  Restarting                                                                                                                                                                                1.2s 
Error response from daemon: Cannot restart container 93a045ff9c3c4af18614c44e173f3d95d11fe2ccfaf0fbefb8428d569d04a075: secret store is not initialized


---- Resolving the issue ----
dockertesthost:/data/traefik # docker compose rm
? Going to remove traefik Yes
[+] Removing 1/0
 ✔ Container traefik  Removed                                                                                                                                                                                   0.0s 
dockertesthost:/data/traefik # docker compose up -d
[+] Running 1/1
 ✔ Container traefik  Started


This will keep the container alive until the next restart of docker.service (which will happen on every package upgrade).
Or until a machine gets rebooted: 

dockertesthost:/data/traefik #  uptime
 14:03:20  up   0:00,  1 user,  load average: 0.65, 0.18, 0.06

dockertesthost:/data/traefik #  docker ps -a
CONTAINER ID   IMAGE          COMMAND                  CREATED         STATUS                        PORTS                                                                       NAMES
f3965f6d6046   traefik:2.10   "/entrypoint.sh trae…"   3 minutes ago   Exited (255) 29 seconds ago   0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp    traefik

dockertesthost:/data/traefik # docker compose up -d
[+] Running 0/1
 ⠴ Container traefik  Starting                                                                                                                                                                                  0.5s 
Error response from daemon: secret store is not initialized


To resolve the issue, removing the container and "docker compose up" will help.