Bug 384481

Summary: Can't lock file on NFS from openSUSE-11.0
Product: [openSUSE] openSUSE 11.0 Reporter: Petr Mladek <pmladek>
Component: BasesystemAssignee: Neil Brown <nfbrown>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Blocker    
Priority: P5 - None CC: coolo, mabrand, mmeeks, nfbrown
Version: Beta 1   
Target Milestone: ---   
Hardware: All   
OS: openSUSE 11.0   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on:    
Bug Blocks: 383390    
Attachments: Testcase.
strace from 10.3
strace from SLED10-SP1
strace from 11.0
output from ls -lR /var/lib/nfs
Patch to start statd properly

Description Petr Mladek 2008-04-28 19:05:43 UTC
OpenOffice.org behaves a strange way on openSUSE-11.0, see the bug #383390.

It went out that the locking over NFS does not work as expected.

OOo uses fcntl F_SETLK to get the lock. I'll attach small testcase that does the locking the OOo way. I'll also attach two strace logs showing the different behavior on openSUSE-11.0.
Comment 1 Petr Mladek 2008-04-28 19:08:31 UTC
Created attachment 210952 [details]
Testcase.

You might try the following steps:

gcc test-lock.c
echo hello >~/test.txt
./a.out ~/test.txt
Comment 2 Petr Mladek 2008-04-28 19:10:35 UTC
Created attachment 210953 [details]
strace from 10.3

The locking did not work.
Comment 3 Petr Mladek 2008-04-28 19:13:40 UTC
Created attachment 210954 [details]
strace from SLED10-SP1

I actually just booted SLED10-SP1 on the same machine. Then I chrooted into the 11.0 system and started exactly the same binary on exactly the same file from exactly the same nfs server.
Comment 4 Petr Mladek 2008-04-28 19:21:02 UTC
Created attachment 210955 [details]
strace from 11.0

Urgh, please ignore the strace from 10.3. It was strace from 11.0. I just mentioned wrong version in the file name and comment :-(

It works on 10.3 the same way like on 11.0. Only 11.0 does not work.
Comment 5 Neil Brown 2008-04-29 00:44:26 UTC
From the trace, the problem is that on OpenSUSE-11 you are getting the
error 'ENOLCK' when trying to get a lock.

If there server is known to work correctly (as seems to be the case), this
suggests that 'statd' isn't running on your OpenSUSE-11 client.

Please check is statd is running:

  ps axgu | grep statd
  rpcinfo -p
  ls -lR /var/lib/nfs

Thanks.
Comment 6 Petr Mladek 2008-04-29 09:46:52 UTC
Good catch, rpc.statd really was not running on 11.0:

root@golem:/> ps axgu | grep statd
root      3273  0.0  0.0   2288   792 pts/2    S+   11:39   0:00 grep statd

root@golem:/> rpcinfo -p
   program verz proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100021    1   tcp  50936  nlockmgr
    100021    3   tcp  50936  nlockmgr
    100021    4   tcp  50936  nlockmgr


I wonder if it might be somewhat related to the new installation image magic.
Comment 7 Petr Mladek 2008-04-29 09:48:41 UTC
Created attachment 211089 [details]
output from ls -lR /var/lib/nfs
Comment 8 Petr Mladek 2008-04-29 09:53:51 UTC
OOo and the locking works correctly after I started "rpc.statd --no-notify" by hand.
Comment 9 Greg Kroah-Hartman 2008-04-29 15:48:27 UTC
Ok, closing out, this is not a kernel bug...
Comment 10 Petr Mladek 2008-04-29 16:22:36 UTC
I agree that it is not a kernel bug but we still need to find why rpc.statd was not running => REOPENING

There is a note in /etc/init.d/nfs that statd should get started by mount.nfs when needed. Also the rpcinfo output looks suspicious. I am not expert in this area, so I am not sure what to check, ...
Comment 11 Petr Mladek 2008-04-29 16:23:00 UTC
really REOPEN
Comment 12 Greg Kroah-Hartman 2008-04-29 16:36:58 UTC
reassigning to a different group then, as this isn't a kernel issue...
Comment 13 Petr Mladek 2008-04-29 18:32:41 UTC
Added Neil to CC because he maintains nfs-client. /etc/init.d/nfs and mount.nfs are part of this package, ...
Comment 14 Neil Brown 2008-05-02 03:32:59 UTC
Statd should be started when you first mount an NFS filesystem.
The mount.nfs program will run 
  /usr/sbin/start-statd

Could you please check that this script is installed and executable?

If you kill statd, then run

   /usr/sbin/start-statd

does statd start?

Thanks,

Comment 15 Petr Mladek 2008-05-02 16:12:05 UTC
Everything seems to be fine. /usr/sbin/start-statd is on the system and is executable. If I kill statd and run start-statd, statd is started again.
Comment 16 Neil Brown 2008-05-04 11:55:42 UTC
So maybe mount isn't running start-statd like it should...

Can you kill statd, unmount the NFS filesystem, then mount it again and
see if statd gets started?

Can you tell me more about the NFS filesystem that is causing problems.
Is it automounted, or mounted by /etc/fstab, or mounted by hand?
What are the mount options?
Comment 17 Petr Mladek 2008-05-05 15:36:10 UTC
I tried to mount it via yast and it did not start statd.
I tried it by hand "mount -t nts nfs.suse.cz:/home /home" and it did not start statd as well.

I did not use any special mount options.
Comment 19 Neil Brown 2008-05-06 06:07:03 UTC
OK, I've figured it out.

There are two quite separate branches of code in mount.nfs.  One performs
the mount using the 'old style' binary data structure to pass options to
the kernel.  The other uses the 'new style' text string to pass options
to the kernel.

The code for checking and starting statd was only in the 'old style'
branch.
I have commited a patch to STABLE which moves that code into common code.

I'll attach the patch for completeness.  It has been sent upstream.
Comment 20 Neil Brown 2008-05-06 06:07:51 UTC
Created attachment 212522 [details]
Patch to start statd properly
Comment 21 Jan Holesovsky 2008-05-12 14:54:54 UTC
*** Bug 385289 has been marked as a duplicate of this bug. ***
Comment 22 Petr Mladek 2008-05-16 09:59:29 UTC
It works for me on 11.0-beta2 => FIXED
Comment 23 Jan Holesovsky 2008-07-01 08:03:05 UTC
*** Bug 221193 has been marked as a duplicate of this bug. ***