Bugzilla – Bug 1149603
nscd startup fails if /var/run in tmpfs (regression related to bug #600804 ?)
Last modified: 2019-10-09 12:28:23 UTC
## Observation As observed on three aarch64 machines but I think this is not architecture-specific, the nscd service fails after bootup where /var/run is a symlink to /run which is mounted as tmpfs. This looks just like https://bugzilla.suse.com/show_bug.cgi?id=600804 Maybe the problem is related to order of setting up services and mount points. ## Workaround Restarting the service usually works. Most likely creating the directory in the systemd service if it does not exist would work.
Does /run already exist at this point?
I mean, does /run/nscd already exist? It should be created by /usr/lib/tmpfiles.d/nscd.conf, and nscd does not touch anything else under /run.
I can check that. For now I have implemented a workaround with bug detection in https://gitlab.suse.de/openqa/salt-states-openqa/merge_requests/162 . Maybe after the next days and some reboots I can give an answer to your question.
I could not find this error again. Let's treat this is WORKSFORME unless I find this again.
On the three ARM machines we have I have found this problem on all at least once: ``` sudo salt '*arm*' cmd.run 'journalctl -u nscd | grep 1149603' openqaworker-arm-1.suse.de: Sep 28 14:23:40 linux-rz6p mkdir_nscd[1143]: boo#1149603: /run/nscd does not exist openqaworker-arm-2.suse.de: Sep 19 18:38:39 linux-79cz mkdir_nscd[1576]: boo#1149603: /run/nscd does not exist openqaworker-arm-3.suse.de: Sep 17 20:20:37 openqaworker-arm-3 mkdir_nscd[1662]: boo#1149603: /run/nscd does not exist Sep 26 12:07:28 openqaworker-arm-3 mkdir_nscd[1546]: boo#1149603: /run/nscd does not exist ``` So to answer your question: Clearly the directory `/run/ncsd` does not exist when the above scripts are triggered as workaround. What else can I check? I should have the full journals available. The file `/usr/lib/tmpfiles.d/nscd.conf` exists correctly and has content: ``` d /run/nscd 0755 root root ``` On openqaworker-arm-1.suse.de where /run/nscd was not existing on "Sep 28 14:23:40" I would see the following output from systemd-tmpfiles-setup.servic ``` > sudo journalctl -u systemd-tmpfiles-setup.service -- Reboot -- Sep 27 07:58:20 linux-rz6p systemd[1]: Starting Create Volatile Files and Directories... Sep 27 07:58:20 linux-rz6p systemd-tmpfiles[1132]: [/usr/lib/tmpfiles.d/net-snmp.conf:1] Line references path below legacy directory /var/run/, updating /var/run/net-snmp → /run/net-snmp; please update the tmpfiles.d/ drop-i> Sep 27 07:58:20 linux-rz6p systemd-tmpfiles[1132]: [/usr/lib/tmpfiles.d/salt.conf:2] Line references path below legacy directory /var/run/, updating /var/run/salt → /run/salt; please update the tmpfiles.d/ drop-in file accor> Sep 27 07:58:20 linux-rz6p systemd-tmpfiles[1132]: [/usr/lib/tmpfiles.d/salt.conf:3] Line references path below legacy directory /var/run/, updating /var/run/salt/master → /run/salt/master; please update the tmpfiles.d/ drop> Sep 27 07:58:20 linux-rz6p systemd-tmpfiles[1132]: [/usr/lib/tmpfiles.d/salt.conf:4] Line references path below legacy directory /var/run/, updating /var/run/salt/minion → /run/salt/minion; please update the tmpfiles.d/ drop> Sep 27 07:58:20 linux-rz6p systemd-tmpfiles[1132]: [/usr/lib/tmpfiles.d/samba.conf:1] Line references path below legacy directory /var/run/, updating /var/run/samba → /run/samba; please update the tmpfiles.d/ drop-in file ac> Sep 27 07:58:20 linux-rz6p systemd-tmpfiles[1132]: [/usr/lib/tmpfiles.d/tmp.conf:13] Duplicate line for path "/var/tmp", ignoring. Sep 27 07:58:20 linux-rz6p systemd-tmpfiles[1132]: [/usr/lib/tmpfiles.d/var.conf:21] Duplicate line for path "/var/lib", ignoring. Sep 27 07:58:20 linux-rz6p systemd-tmpfiles[1132]: [/usr/lib/tmpfiles.d/var.conf:23] Duplicate line for path "/var/spool", ignoring. Sep 27 07:58:20 linux-rz6p systemd[1]: Started Create Volatile Files and Directories. -- Reboot -- Sep 28 14:23:29 linux-rz6p systemd[1]: systemd-tmpfiles-setup.service: Job systemd-tmpfiles-setup.service/start deleted to break ordering cycle starting with sysinit.target/start ``` One can see that the bootup on "Sep 27" was fine – except for many warnings – but on "Sep 28" it actually failed and I can confirm the same on another machine. systemd-tmpfiles-setup.service actually fails to startup, that's why we have this problem. So it seems the problem is actually in systemd-tmpfiles-setup.service so let me assign the bug to systemd maintainers. The warnings seem to be "acceptable" according to my research, e.g. following https://lists.freedesktop.org/archives/systemd-devel/2018-May/040726.html and https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1484027
(In reply to Oliver Kurz from comment #5) > Sep 28 14:23:29 linux-rz6p systemd[1]: systemd-tmpfiles-setup.service: Job > systemd-tmpfiles-setup.service/start deleted to break ordering cycle > starting with sysinit.target/start > ``` Do you still have the complete journal logs of the boot when this happened so we can see what other units are involved in this ordering cycle ? If so can you attach it ? Thanks.
Created attachment 820804 [details] system journal of openqaworker-arm-1 form 2019-09-28, covering "ordering cycles" Please find the full log attached All mentions of "ordering cycle" are as follows: ``` Sep 28 14:23:29 linux-rz6p systemd[1]: haveged.service: Current command vanished from the unit file, execution of the command list won't be resumed. Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found ordering cycle on sysinit.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on systemd-journal-catalog-update.service/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on local-fs.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on var-lib-openqa-nvme.mount/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on openqa_nvme_format.service/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on basic.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on sockets.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on dbus.socket/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on sysinit.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Breaking ordering cycle by deleting job systemd-journal-catalog-update.service/start Sep 28 14:23:29 linux-rz6p systemd[1]: systemd-journal-catalog-update.service: Job systemd-journal-catalog-update.service/start deleted to break ordering cycle starting with sysinit.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found ordering cycle on sysinit.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on systemd-tmpfiles-setup.service/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on local-fs.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on var-lib-openqa-nvme.mount/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on openqa_nvme_format.service/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on basic.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on sockets.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on dbus.socket/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on sysinit.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Breaking ordering cycle by deleting job systemd-tmpfiles-setup.service/start Sep 28 14:23:29 linux-rz6p systemd[1]: systemd-tmpfiles-setup.service: Job systemd-tmpfiles-setup.service/start deleted to break ordering cycle starting with sysinit.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found ordering cycle on sysinit.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on systemd-machine-id-commit.service/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on local-fs.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on var-lib-openqa-nvme.mount/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on openqa_nvme_format.service/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on basic.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on sockets.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on dbus.socket/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on sysinit.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Breaking ordering cycle by deleting job systemd-machine-id-commit.service/start Sep 28 14:23:29 linux-rz6p systemd[1]: systemd-machine-id-commit.service: Job systemd-machine-id-commit.service/start deleted to break ordering cycle starting with sysinit.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: basic.target: Found ordering cycle on basic.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: basic.target: Found dependency on paths.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: basic.target: Found dependency on ca-certificates.path/start Sep 28 14:23:29 linux-rz6p systemd[1]: basic.target: Found dependency on local-fs.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: basic.target: Found dependency on var-lib-openqa-nvme.mount/start Sep 28 14:23:29 linux-rz6p systemd[1]: basic.target: Found dependency on openqa_nvme_format.service/start Sep 28 14:23:29 linux-rz6p systemd[1]: basic.target: Found dependency on basic.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: basic.target: Breaking ordering cycle by deleting job paths.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: paths.target: Job paths.target/start deleted to break ordering cycle starting with basic.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found ordering cycle on sysinit.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on local-fs.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on var-lib-openqa-nvme.mount/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on openqa_nvme_format.service/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on basic.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on sockets.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on dbus.socket/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Found dependency on sysinit.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: sysinit.target: Breaking ordering cycle by deleting job local-fs.target/start Sep 28 14:23:29 linux-rz6p systemd[1]: local-fs.target: Job local-fs.target/start deleted to break ordering cycle starting with sysinit.target/start Sep 28 14:23:29 linux-rz6p auditd[878]: Couldn't open log file /var/log/audit/audit.log (Read-only file system) ``` I have the suspicion that our custom service "openqa_nvme_format.service" is the culprit. It's also mentioned within the above log excerpt but that is not obvious for me from the log messages. Maybe you can teach me how to read them properly in this regard? :)
(In reply to Oliver Kurz from comment #7) > I have the suspicion that our custom service "openqa_nvme_format.service" is > the culprit. Agreed. > It's also mentioned within the above log excerpt but that is > not obvious for me from the log messages. Maybe you can teach me how to read > them properly in this regard? :) I can try... > sysinit.target: Found ordering cycle on sysinit.target/start So sysinit.target depends (maybe indirectly) on a unit which in its turn depends on sysinit.target hence the cycle. Then systemd lists the potential culprit which is likely to be openqa_nvme_format.service. Where can the unit file can be found ?
(In reply to Franck Bui from comment #8) > (In reply to Oliver Kurz from comment #7) > > I have the suspicion that our custom service "openqa_nvme_format.service" is > > the culprit. > > Agreed. > > > It's also mentioned within the above log excerpt but that is > > not obvious for me from the log messages. Maybe you can teach me how to read > > them properly in this regard? :) > > I can try... > > > sysinit.target: Found ordering cycle on sysinit.target/start > > So sysinit.target depends (maybe indirectly) on a unit which in its turn > depends on sysinit.target hence the cycle. > > Then systemd lists the potential culprit which is likely to be > openqa_nvme_format.service. > > Where can the unit file can be found ? https://gitlab.suse.de/openqa/salt-states-openqa/blob/master/openqa/nvme_store/openqa_nvme_format.service ``` [Unit] Description=Format NVMe before mounting it Before=var-lib-openqa-nvme.mount [Service] Type=oneshot ExecStart=/sbin/mkfs.ext2 -F /dev/nvme0n1p1 [Install] WantedBy=multi-user.target ```
(In reply to Oliver Kurz from comment #9) > [Unit] > Description=Format NVMe before mounting it > Before=var-lib-openqa-nvme.mount > > [Service] So this unit wants to be started before var-lib-openqa-nvme.mount which is supposed to be started before local-fs.target which in its turn is supposed to be started before sysinit.target. OTOH it also implicitly uses DefaultDependencies=yes (the default) which means that it wants to be started after basic.target (see man system.unit). However basic.target is run after sysinit.target. Hence the issue. So the unit probably needs to disable the default dependencies with "DefaultDependencies=no". Let me know if it helps.
Of course! I just read the past days about this flag as well. This helped when I also added the same line to other service files we have created. Thanks a lot. Fixed in https://gitlab.suse.de/openqa/salt-states-openqa/merge_requests/192