Bug 936321

Summary: A start job is running for LSB: NFS client services (2min 40s / 6min 50s)
Product: [openSUSE] openSUSE Distribution Reporter: Anton vd haterd <antonvdh>
Component: BasesystemAssignee: Neil Brown <nfbrown>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: antonvdh, chcao, Curt.Blank, ladislav.mate
Version: 13.2Flags: ladislav.mate: needinfo? (antonvdh)
nfbrown: needinfo? (Curt.Blank)
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 13.2   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Anton vd haterd 2015-06-28 08:13:17 UTC
When booting opensuse 13.2 NFS server it takes more then 6 minutes to startup:
A start job is running for LSB: NFS client services (2min 40s / 6min 50s)
Comment 1 Ladislav Mate 2015-06-29 14:10:01 UTC
Hi,
can you please provide more information ? like,

systemctl status nfs
grep nfs /etc/fstab
systemd-analyze blame | head
systemd-analyze critical-chain

Cheers,
Comment 2 Anton vd haterd 2015-06-29 18:37:17 UTC
I could not attach what you asked by quotes 

linux-nfs:/home/anton # systemctl status nfs
nfs.service - LSB: NFS client services
   Loaded: loaded (/etc/init.d/nfs)
  Drop-In: /run/systemd/generator/nfs.service.d
           └─50-insserv.conf-$remote_fs.conf
   Active: active (running) since ma 2015-06-29 20:31:41 CEST; 2min 48s ago
  Process: 4678 ExecStart=/etc/init.d/nfs start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/nfs.service
           └─4695 /usr/sbin/rpc.gssd -D -p /var/lib/nfs/rpc_pipefs

jun 29 20:31:41 linux-nfs rpc.gssd[4705]: ERROR: gssd_refresh_krb5_machine_credential: no usab...ost
jun 29 20:31:41 linux-nfs rpc.gssd[4706]: ERROR: gssd_refresh_krb5_machine_credential: no usab...ost
jun 29 20:31:41 linux-nfs rpc.gssd[4707]: ERROR: gssd_refresh_krb5_machine_credential: no usab...ost
jun 29 20:31:41 linux-nfs rpc.gssd[4713]: ERROR: gssd_refresh_krb5_machine_credential: no usab...ost
jun 29 20:31:41 linux-nfs rpc.gssd[4714]: ERROR: gssd_refresh_krb5_machine_credential: no usab...ost
jun 29 20:31:41 linux-nfs rpc.gssd[4717]: ERROR: gssd_refresh_krb5_machine_credential: no usab...ost
jun 29 20:31:41 linux-nfs rpc.gssd[4718]: ERROR: gssd_refresh_krb5_machine_credential: no usab...ost
jun 29 20:31:41 linux-nfs rpc.gssd[4718]: ERROR: No credentials found for connection to server...ost
jun 29 20:31:41 linux-nfs rpc.gssd[4719]: ERROR: gssd_refresh_krb5_machine_credential: no usab...ost
jun 29 20:31:41 linux-nfs nfs[4678]: Mounting network file systems .....done
Hint: Some lines were ellipsized, use -l to show in full.

inux-nfs:/home/anton # grep nfs /etc/fstab
linux-nfs:/exports/renderfarm/  /mnt    nfs     rw 0 0 
linux-nfs:/exports/renderfarm   /home/anton/NFS nfs     defaults 0 0 
linux-nfs:/home/anton # 

linux-nfs:/home/anton # systemd-analyze blame | head
         14.176s wicked.service
          2.090s apparmor.service
          1.908s mysql.service
          1.821s apache2.service
          1.412s systemd-udev-settle.service
          1.339s postfix.service
           921ms lvm2-activation.service
           874ms ModemManager.service
           828ms lvm2-activation-early.service
           670ms polkit.service

linux-nfs:/home/anton # systemd-analyze critical-chain
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.
                                                                                                    
graphical.target @1min 55.542s                                                                      
└─multi-user.target @1min 55.542s                                                                   
  └─after-local.service @1min 55.542s
    └─getty.target @1min 55.541s
      └─getty@tty1.service @1min 55.541s
        └─apache2.service @1min 53.717s +1.821s
          └─nss-lookup.target @1min 53.573s
            └─named.service @1min 53.341s +231ms
              └─remote-fs.target @1min 53.341s
Comment 3 Ladislav Mate 2015-06-29 21:03:12 UTC
I was trying this and same behavior occurred, however during shutdown of the box occurred,not during boot, when 'nfs' was enabled with systemctl sooner then 'nfsserver' service for local filesystems.

Maybe you could try in meantime and test 
vm1:~ # systemctl disable nfsserver
vm1:~ # systemctl disable nfs
vm1:~ # systemctl enable nfsserver
vm1:~ # systemctl enable nfs
Comment 4 Neil Brown 2015-07-01 00:25:55 UTC
Does running
  systemctl enable rpcbind
remove the boot-time delay?
Comment 5 Anton vd haterd 2015-07-01 05:39:55 UTC
makes no difference.
For the moment I stopped NFS Client service with yast - service manager
And after boot start manually : /etc/init.d/nfs start
so I do not have to wait 6 during boot.
Comment 6 Neil Brown 2015-07-10 03:41:44 UTC
Anton: could you please confirm if you are seeing a "Start job" problem at startup time or a "stop job" problem at shutdown like Ladislav mentions?

I can only reproduce a "Stop job" problem.  This happens because you have a local filesystem NFS mounted.  The NFS server gets shut down before the NFS unmount happens, and this deadlocks.

This can be fixed by adding a line to /etc/init.d/nfsserver

 # X-Start-Before: remote-fs-pre.target

just before

 ### END INIT INFO

For good measure it would help to change the line in /etc/init.d/nfs:

  umount -at nfs,nfs4
to
  umount -aft nfs,nfs4

This will remove a similar hang if you shut down while a remote NFS server is unavailable.

I will try to arrange a maintenance update.
Comment 7 Neil Brown 2015-07-10 03:57:40 UTC
Sorry, that first change isn't right - I didn't test as properly as I thought.
When you specify a dependency like that it add ".service" to the end.  Arg.

New approach: create a directory /etc/systemd/system/nfsserver.service.d
and in there create a file called premount.conf.
In the file put two lines:

[Unit]
Before=remote-fs-pre.target

Then "systemctl daemon-reload" and it should all be good.

Do also make the change to "umount -at" in /etc/init.d/nfs.
Comment 8 Neil Brown 2015-10-08 06:58:34 UTC
Hi Anton did you get this working?
Any update?
Thanks.
Comment 9 Curtis J Blank 2015-12-09 20:30:23 UTC
I've had this same problem ever since going to 13.2. /etc/init.d/nfs holds up the boot for 5 minutes and 13 seconds:

A start job is running for LSB: NFS client services (1min 37s / 5min 13s)

Then afterwards reports that what it's trying to mount is already mounted which is really stupid! Why is it wasting time trying to mount them if they're already mounted!!

I too disabled it in yast.

This going to be fixed? 

5 months have already gone by since this bug was created.


.:~ # systemctl status nfs.service
nfs.service - LSB: NFS client services
   Loaded: loaded (/etc/init.d/nfs)
  Drop-In: /run/systemd/generator/nfs.service.d
           └─50-insserv.conf-$remote_fs.conf
   Active: failed (Result: timeout) since Tue 2015-12-08 20:36:39 CST; 17h ago

Dec 08 20:31:39 router nfs[4225]: Starting NFS client services: sm-notify gssd idmapd..done
Dec 08 20:36:39 router systemd[1]: nfs.service start operation timed out. Terminating.
Dec 08 20:36:39 router systemd[1]: Failed to start LSB: NFS client services.
Dec 08 20:36:39 router systemd[1]: Unit nfs.service entered failed state.
Dec 08 20:36:39 router nfs[4225]: Mounting network file systems ...mount.nfs: /xxxxx1 is busy or already mounted
Dec 08 20:36:39 router nfs[4225]: mount.nfs: /xxxxx2 is busy or already mounted
Dec 08 20:36:39 router nfs[4225]: mount.nfs: /xxxxx3 is busy or already mounted
Dec 08 20:36:39 router nfs[4225]: mount.nfs: /xxxxx4 is busy or already mounted
Dec 08 20:36:39 router nfs[4225]: mount.nfs: /xxxxx5 is busy or already mounted
Dec 08 20:36:39 router nfs[4225]: mount.nfs: /xxxxx6 is busy or already mounted
Comment 10 Neil Brown 2016-02-17 03:42:34 UTC
(sorry for delays .. summer was busy)

Are you running named?  If so can you edit /etc/init.d/named and replace 

# Required-Start:    $network $remote_fs $syslog
# Required-Stop:     $network $remote_fs $syslog

with

# Required-Start:    $network $syslog
# Required-Stop:     $network $syslog

i.e. remove $remove_fs.

Also, please edit  /usr/lib/systemd/system/nfs.service 
and remove

ExecStartPost=/usr/bin/mount -at nfs,nfs4
ExecStop=/usr/bin/umount -aft nfs,nfs4

and then see if that makes a difference
Comment 11 Anton vd haterd 2016-03-06 11:39:56 UTC
edited /etc/init.d/named and replaced

# Required-Start:    $network $remote_fs $syslog
# Required-Stop:     $network $remote_fs $syslog

with

# Required-Start:    $network $syslog
# Required-Stop:     $network $syslog

There is no  /usr/lib/systemd/system/nfs.service 

Restarted but did not change anything 
Bug is never solved.
Comment 12 Neil Brown 2016-03-08 21:35:50 UTC
Sorry, I was thinking that 13.2 had full systemd support, but that is only in tumbleweed/leap.  And i didn't look at your report properly :-(

Looking again, it could be that the "udevadm settle" in /etc/init.d/nfs is the problem.  That was needed before we had systemd at all, but should be irrelevant for 13.2.

So please remove both calls to "udevadm settle" from /etc/init.d/nfs and see if that fixes the problem
The important one is in:

        if test -n "$mnt" ; then
            # If network devices are not yet discovered, mounts
            # might fail, so we might need to 'udevadm settle' to
            # wait for the interfaces.
            # We cannot try the mount and on failure: 'settle' and try again
            # as if there are 'bg' mounts, we could get multiple copies
            # of them.  So always 'settle' if there is any mounting to do.
            echo -n "Mounting network file systems ..."
            udevadm settle
            mount -at nfs,nfs4 || rc_failed 1
            rc_status -v
        fi
 
And that whole section could probably be removed as systemd should do all the mounting.  But I think the "udevadm settle" is probably what is holding it up.
Comment 13 Anton vd haterd 2016-03-13 12:40:58 UTC
I removed the whole section /etfc/init.d/nfs

if test -n "$mnt" ; then
            # If network devices are not yet discovered, mounts
            # might fail, so we might need to 'udevadm settle' to
            # wait for the interfaces.
            # We cannot try the mount and on failure: 'settle' and try again
            # as if there are 'bg' mounts, we could get multiple copies
            # of them.  So always 'settle' if there is any mounting to do.
            echo -n "Mounting network file systems ..."
            udevadm settle
            mount -at nfs,nfs4 || rc_failed 1
            rc_status -v
        fi

Then restarted I still see 
A start job is running for LSB: NFS client services 
but the computer is starting up in less then 2 minutes. 
For me the problem is solved. Many Thanks