Bug 153271 - coreutils: cp -{p,a,--preserve=timestamps} to NFS destination fails to preserve timestamps
Summary: coreutils: cp -{p,a,--preserve=timestamps} to NFS destination fails to preser...
Status: RESOLVED DUPLICATE of bug 149807
Alias: None
Product: SUSE Linux 10.1
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Beta 4
Hardware: Other SuSE Linux 10.1
: P5 - None : Critical (vote)
Target Milestone: ---
Assignee: Olaf Kirch
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-24 01:07 UTC by Bernhard Kaindl
Modified: 2006-02-27 12:59 UTC (History)
0 users

See Also:
Found By: Development
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Proposed patch (891 bytes, patch)
2006-02-27 12:52 UTC, Olaf Kirch
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Bernhard Kaindl 2006-02-24 01:07:54 UTC
System: 10.1 Beta4 Kernel, x86_64,i386, glibc and coreutils of Beta4.

This bug was initially reported in the debian bugtracking system
against coreutils 5.93, which is the version to which we updated
for 10.1:

http://bugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=340236

cp -a and cp --preserve=timestamps are also affected.

Can be reproduced here also with "cp -av /etc/passwd ~/tmp.$$" if your
home is on a NFS server (reproduced with wotan, hilbert, e.g. when submitting
packages, the timestamps of the files are not copied (when using cpio
to copy the files to hilbert it howewer correcly)!

After some debugging, first findings were reported to bug-coreutils:

http://lists.gnu.org/archive/html/bug-coreutils/2005-12/msg00196.html

An interesting thread started, but it did not came to a solution, instead
finally a kernel or glibc problem was assumed, but they didn't dig as deep
as I digged.

This findings was correct, but was not honored in the thread:

> 1. cp executes utimes before closing the file.  This might be a problem,
> although it seems to work on local destinations.

I also initially figured that NFS should behave the same but I'm
filing this bug to:

a) document the bug here in case somebody is looking here and
b) find out if:

* this is just something which luckily works mostly but is not
  required by POSIX (or other standards which may apply)

                             - or -

* if close() after calling utime is perfectly legal and linux-NFS
  (at least as of kernel 2.6) is broken in this regard.

This is the situation:

cp versions prior to 5.92 closed the destination file before
triggering the utime system call to set the modification time
of the destination file, while 5.92 to 5.94 and the current CVS
close the file _*_after_*_ setting the modfication time with
utime/utimes by issuing utime(), utimes() or utimesat().

That's a sample the newer versions do it (from cp -p):

> open("~", O_WRONLY|O_TRUNC|O_LARGEFILE) = 4
> fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> write(4, "foo\n", 4)                    = 4
> utimes("/proc/self/fd/4", {1134767841, 0}) = 0
> chmod("baz", 0100644)                   = 0
> close(4)                                = 0

But this minimized code is too simple, usually more than 4 bytes
are written - anyway, this is one possible testcase.

> 2. Timestamps of empty files are preserved, which reinforces the above
> suspicion (no writes for empty file)

Which is an interesting finding.

> 3. touch -r works correctly, and it also uses /proc/self/fd to refer to
> the file, so that should be okay.

Indeed, newer code uses /proc/self/fd to set the utime, but this is
perfectly OK and also works with NFS, it's not the problem, I tested.

What what I found:

When I debug cp and set breakpoints to the utime function which
is called and the chmod/fchmod functions which is called (see
the sample strace output above), and wait a bit before I hit
"c" to contine and the delay the utime call thereby, the correct
timestamp shows up on the server after continuing to let utime
run. However it was set to the current time after continuing
before the chmod/fchmod, letting the chmod and close run. It
almost looked like a race.

But:

close(dest_desc) before calling utimes fixes the problem for me,
but that is on itself only a test since afterwards, fchmod and
fchown are called and they would need to or best operate on
the filedescriptor.

calling fsync(dest_desc) before running utimes also appears to
fix it for me.

So I assume the race is fixed by simply ensuring that all data
gets written to the remote NFS server before utime/utimes is
called to ensure that on the close() after the utime() call,
no futher data is passed to the NFS server's filesystem layer
which would of course update the modification time again,
whenever it hits the filesystem layer.

NFS should be syncronous, but AFAIK some servers can cheat
or may have races.

Simply adding fsync() would fix it but is it safe that the
modfication time will not be updated on the final close()
even if fsync is called already?

Otherwise close() should be added, and the fchmod afterwards
would need to be convered to to chmod (or the file reopened)
but that would reopen the path of course.

A possible optimisation could be to change the order to call
utime last after all other operations (chmod/chown/setting acls)
which can work on the open file even if there could be dirty
buffers or data packets still on the way and at the end call
fsync() or close the file and then trigger the utime system call.

Minimum fix:

--- src/copy.c  2006/02/24 00:52:42     1.1
+++ src/copy.c  2006/02/24 00:53:38
@@ -420,6 +420,8 @@
       timespec[0] = get_stat_atime (src_sb);
       timespec[1] = get_stat_mtime (src_sb);

+      fsync(dest_desc); /* needed for NFS with Linux-2.6 */
+
       if (futimens (dest_desc, dst_name, timespec) != 0)
        {
          error (0, errno, _("preserving times for %s"), quote (dst_name));

Since timestamps are also data and loss of data belongs to severity critical,
I'm reporting it with severity critical. Your milage on what constitues data
may vary, but IMHO is would still constitute at least major.

I guess agruen could help on the NFS side, but with likely at least the
full prior Linux-2.6 series exhibiting this, I think something like this
will have to be done for cp even if it could maybe be demanded as part of
sync NFS operation, but...

Finally somebody should inform bug-coreutils with the updated info.
Comment 1 Andreas Schwab 2006-02-24 09:25:40 UTC

*** This bug has been marked as a duplicate of 149807 ***
Comment 3 Bernhard Kaindl 2006-02-24 16:46:26 UTC
I made a report to the kernel bugzilla since coretuils 5.93 worked on
2.4.18 with a Solaris NFS server:
http://lists.gnu.org/archive/html/bug-coreutils/2005-12/msg00219.html

It has been assigned to Trond so we'll hear what he has to say.
http://bugzilla.kernel.org/show_bug.cgi?id=6127
Comment 4 Andreas Schwab 2006-02-25 09:43:28 UTC
It's a clear NFS bug.  SLES8 works fine.
Comment 5 Olaf Kirch 2006-02-27 12:52:17 UTC
Created attachment 70393 [details]
Proposed patch

Please try this patch and let me know if it makes
the problem go away.
Comment 6 Olaf Kirch 2006-02-27 12:56:37 UTC
Bug #149807 has this patch, which does the same thing.

http://client.linux-nfs.org/Linux-2.6.x/2.6.16-rc4/linux-2.6.16-08-fix_setattr_clobber.dif

I'll add that patch to our kernel now
Comment 7 Olaf Kirch 2006-02-27 12:59:14 UTC

*** This bug has been marked as a duplicate of 149807 ***