Bug 133486 - drpmsync does delete the whole tree in error case
Summary: drpmsync does delete the whole tree in error case
Status: VERIFIED NORESPONSE
Alias: None
Product: openSUSE.org
Classification: openSUSE
Component: BuildService (show other bugs)
Version: unspecified
Hardware: All SUSE Other
: P5 - None : Major (vote)
Target Milestone: ---
Assignee: Michael Schröder
QA Contact: Adrian Schröter
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-11-11 15:53 UTC by Forgotten User OS1JNCFbCX
Modified: 2011-01-10 08:45 UTC (History)
2 users (show)

See Also:
Found By: Beta-Customer
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
alternative drpmsync client (2.54 KB, text/plain)
2005-11-29 20:40 UTC, Forgotten User OS1JNCFbCX
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Forgotten User OS1JNCFbCX 2005-11-11 15:53:21 UTC
drpmsync does have a fatal bahaviour when the server sends a permission denied message.  Instead of bailing out with an error message it deletes _all_ files in the local tree because it believes that there are no longer any files on the server.
Comment 1 Michael Schröder 2005-11-11 17:17:23 UTC
I guess I should make the opendir error in findfiles fatal?
Comment 2 Forgotten User OS1JNCFbCX 2005-11-11 17:50:49 UTC
If the client actually bails out when the server dies that should solve the problem.

The current situation is actually the worst thing that could happen. I lost today 24GB in a few seconds for the second time because of that problem. Fortunately I have backups.

Currently the public drpmsync service unfortunately is completely useless anyway because drpmsync.opensuse.org is ways to slow to serve a current repository even by sending deltas only. As it seems that nobody with a public server is willing to run a drpmsync server to balance the load it seems that the protocol definitely _must_ be changed to serve only references to the packages and deltas on standard ftp/http mirrors (as mentioned in my other bug on that topic) instead of the files itself. Otherwise the whole drpmsync project will definitely die because of uselessness.
Comment 3 Michael Schröder 2005-11-11 18:01:09 UTC
Yes, the client aborts if the server dies (the server sends an ERR package).
Sorry for your 24GB, but why was there a unreadable directory on the server?
I'll have to ask adrian if this was a problem on our side.

And yes, drpmsync.suse.de is much too slow. But I don't see how the protocol can be changed, the deltas must be transported to the ftp/http mirrors and drpmsync itself is the best way to accomplish this.
Comment 4 Forgotten User OS1JNCFbCX 2005-11-11 18:20:21 UTC
Adrian (or someone else) changed the name of the directory and obviously did not change the policy file.

But if you change the protocol to stream the actual _data_ by ftp/http you don't have to transport more bandwidth. You just use a well established protocol to _do_ the transport of the very same data. drpmsync server will then only stream the controll data that tells the client where to fetch the patches.

Consider the following example:

Client asks the server to fetch inst-source/suse/i586/xyz-42-1.i586.rpm with have=123456789abc...

Server says:
- apply delta drpmsync/delta/xyz.i586/123456789abc...
- apply delta drpmsync/delta/xyz.i586/098def873826...
- name the result inst-source/suse/i586/xyz-42-2.i586.rpm

You would have transfered the same data with the current protocol as well although you would have transfered the two deltas combined but this should not really be a killing factor.

You see what I mean?

BTW: You could still serve the old protocol but offer the new one as an alternative.
Comment 5 Forgotten User OS1JNCFbCX 2005-11-11 18:26:15 UTC
Another BTW: If you want to talk to me about my protocol idea in detail personally you can reach me by phone on Nuremberg 30836623.
Comment 6 Michael Schröder 2005-11-11 18:37:17 UTC
Yes, I see what you mean, but I don't think this solves anything. It is much easier if the mirrors also "speak" drpmsync. Running drpmsync in cgi mode is sufficient.
I mean the data has to be transferred to the mirrors first. Or should the mirrors just mirror the deltas and no rpms?
And combining the deltas really saves a lot of bandwidth...
Comment 7 Forgotten User OS1JNCFbCX 2005-11-11 18:55:09 UTC
Sure it is easier if they speak drpmsync but they don't, do they?

The mirrors could use the new protocol as well or they could mirror the traditional way, whatever they prefer. Obviously they need the full RPMs as well for the case someone does not have an old version.
Comment 8 Michael Schröder 2005-11-11 19:27:59 UTC
They don't yet, but they hopefully will some day. And it makes a lot of sense for the mirrors to get the rpms/deltas with drpmsync as well. The opensuse tree is much too big and changes too fast otherwise.
Comment 9 Forgotten User OS1JNCFbCX 2005-11-11 19:57:13 UTC
Sure it does make sense. You don't have to convince me. You have to convince the mirror admins.

BUT:

1. The proposed new protocol does not hinder them from syncing with drpmsync.

2. As long as you did not convince them you could at least provide a useable service because without this new protocol you have _nothing_ useable as long as you can't convince some of the mirrors with a good connection.

Note that my only chance up to now to sync up with current packages actually was to do something similar as I described as the new protocol. Actually I fetched many patches from gwdg.de. And note that currently syncing the _full_ packages from gwdg is ways faster than syncing the deltas only from that one small server you have to stream ftp, rsync and drpmsync to the public.
Comment 10 Michael Schröder 2005-11-11 20:02:11 UTC
Yeah, but that just means that gwdg.de needs the provide a drpmsync service... ;-)
Comment 11 Forgotten User OS1JNCFbCX 2005-11-11 20:11:57 UTC
Exactly. The question is: Do you know Eberhard will do this in the near future or is this just some dream world you are living in?
Comment 12 Forgotten User OS1JNCFbCX 2005-11-15 19:49:23 UTC
Adrian, can you give a comment about whether having some mirrors in the near future is still a dream or whether there will be chances that the current protocol becomes useable by getting real mirrors.

I mean if there are no real chances to get some then an alternative solution should be considered.
Comment 13 Michael Schröder 2005-11-16 14:41:42 UTC
Btw, I could add code that it works with a rsync server. This is probably a much cleaner way than your suggestion. Of course it will be a bit slower than a read drpmsync server, as the delta directory contents have to be transmitted and combinedeltarpm is not donw on the server, but on the client.
Comment 14 Forgotten User OS1JNCFbCX 2005-11-16 14:57:51 UTC
That would be solution that is better than the current one as well, just because it at least works with the current infrastructure.

It is not perfectly clear to me what exactly you want to do. Transfer the whole delta-directory and then work locally? That would even transfer deltas you don't actually need. But at least it should work.

What exactly do you consider not clean with my solution? It is in fact the same protocol you have at the moment but with keep_uncombined set and transfer references instead of the actual data. I don't see what is not clean with this.
Comment 15 Michael Schröder 2005-11-16 15:40:08 UTC
not clean: The different delta servers may not be up to date.

Hmm, maybe I spoke too soon. The big question is if I can get rsync to only transfer the first bytes of a file...
Comment 16 Forgotten User OS1JNCFbCX 2005-11-16 15:57:15 UTC
As far as I know you can't do with the tool, you would need an own/modified implementation. You could do with http/ftp.

But what's you idea?
Comment 17 Michael Schröder 2005-11-16 16:14:29 UTC
To update an rpm, I would get the first bytes, to get the lead md5sum and the package md5sum. then I can list the available deltas and construct a path from my local rpm to the remote one. If such a path exists I can get the deltas and apply them.
Comment 18 Forgotten User OS1JNCFbCX 2005-11-16 18:28:23 UTC
Just for the case you are interested: What I currently do to update to the most recent version is by a ugly hack that exploits the fact that all deltas are typically straight forward. Thus I apply all deltas that match to a selected RPM I have to get all versions that were produced after that one (including the latest). Obviously this does not really get me new packages or non-RPM files. But when I have done that step there is not much stuff left and rsync can fix up my repository.

My ugly hack is derived from drpmsync by using the functions rpminfo_f and rpminfo from there, changing the return statement in rpminfo_f to "return($sigmd5, $rpmmd5, $buildtime, $name, $arch);" and then adding the following code:

opendir DIR, ".";
my @files = readdir(DIR);
closedir DIR;
for (@files) {
    next if (not (/\.rpm$/));
    print "$_\n";
    my $file = $_;
    my $sighash;
    my $payloadhash;
    my $buildtime;
    my $name;
    my $arch;
    my $loop = 1;
    do {
        ($sighash, $payloadhash, $buildtime, $name, $arch) = rpminfo $file;
        print "$sighash $payloadhash $buildtime $name $arch\n";
        my $filename = `w3m -dump http://ftp4.gwdg.de/pub/opensuse/distribution/SL-OSS-factory/drpmsync/deltas/$name.$arch/ | sed -ne 's/^\\[   \\] \\(${payloadhash}[0-9a-f]*\\) .*\$/\\1/p'`;
        chop $filename;
        if ($filename) {
            system("wget http://ftp4.gwdg.de/pub/opensuse/distribution/SL-OSS-factory/drpmsync/deltas/$name.$arch/$filename");
            system("applydeltarpm -r $file $filename .rpm.tmp");
            unlink $filename;
            my $newname = `rpm -qp --qf '$name-%{version}-%{release}.$arch.rpm' .rpm.tmp`;
            rename ".rpm.tmp", $newname;
            $file = $newname;
        } else {
            $loop = 0;
        }
    } while ($loop);
}

This truely is not nice but does enable successful transfers.
Comment 19 Forgotten User OS1JNCFbCX 2005-11-29 20:40:41 UTC
Created attachment 59033 [details]
alternative drpmsync client

I have now implemented a proof-of-concept client for a drpmsync client that does not suffer from overloaded server when there are rsync mirrors available.

The client does currently the following:
1. Fetch contents file from drpmsync server.
2. Order contents file by file date because for the case the rsync mirror lags behind it is more likely that it has at least the older files. Due to the fact that the client does only fetch information from the rsync server on demand it is also likely that the rsync server has catched up when it comes to newer files.
3. For each file do.
   a) Check whether file is current.
   b) If not update the delta cache for this package.
   c) Check whether there is a patch chain to build the new package.
   d) Apply patches if applicable or fetch full package from rsync server.

After running this client it could be the case that there are still some minor glitches in case the rsync and the drpmsync server where extremely out of sync. You can fix these glitches by running a normal rsync or drpmsync client after the run. This should be pretty fast as most files should be appropriate already.

The client does/is currently _not_:
- delete obsolete files because this is left up to a rsync/drpmsync run afterwards
- proper error checking in each case but there should be no error that accidently destroys the repository
- sync non-rpm files (just commented out because it is not tested yet, comment in if you like)
- check for a loop in the patch chain
- security audited

Feel free to give comments.

Feel free to use any of this code under the conditions of the GPL if you like.
Comment 20 Michael Schröder 2005-11-30 08:44:16 UTC
Thanks for your work! Small problem: drpmsync is BSD 3-clause, I won't put any GPL code in it!
Comment 21 Forgotten User OS1JNCFbCX 2005-11-30 08:47:48 UTC
Ok, didn't think about that. You can also use it under the same license drpmsync is. I don't really care that much about the license of that small script.
Comment 22 Michael Schröder 2005-11-30 09:03:15 UTC
Thanks, I'll have a look at it.
Comment 23 Forgotten User OS1JNCFbCX 2005-12-03 17:12:39 UTC
I have now improved my client to be multi-threaded to allow downloading of the next patch while the other one is still processed. This allows using system ressources more effectively. You find the current version on http://pi3.informatik.uni-mannheim.de/~schiele/mydrpmsync/.
Comment 24 Michael Schröder 2005-12-04 16:54:30 UTC
Yes, that's also on my TODO list...
Comment 25 Christoph Thiel 2006-04-23 10:57:16 UTC
Adrian, what's the status on having mirrors out there that offer drpmsync service? 
Comment 27 Christoph Thiel 2006-04-23 11:02:40 UTC
ACtually this one is not really SL related any more -> moving to openSUSE.org.
Comment 28 Christian Boltz 2006-05-08 20:00:17 UTC
(In reply to comment #17)
> To update an rpm, I would get the first bytes, to get the lead md5sum and the
> package md5sum.

Why do you want to do such a strange thing?

Simply create a MD5SUM file in the directory - it will be small (some hundred kB max.) and avoids the ugly method you thought of ;-)
Comment 29 Michael Schröder 2006-05-08 20:12:41 UTC
1) the MD5SUM file can get out of sync, so using the real files is preferable.
2) the md5sum of the complete rpm changes when the rpm is (re)signed.
Comment 30 Adrian Schröter 2006-06-08 07:44:13 UTC
we do not have further drpmsync mirrors yet
Comment 31 Sascha Peilicke 2011-01-10 08:45:49 UTC
Really old and dead.