|
Bugzilla – Full Text Bug Listing |
| Summary: | drpmsync does delete the whole tree in error case | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE.org | Reporter: | Forgotten User OS1JNCFbCX <forgotten_OS1JNCFbCX> |
| Component: | BuildService | Assignee: | Michael Schröder <mls> |
| Status: | VERIFIED NORESPONSE | QA Contact: | Adrian Schröter <adrian.schroeter> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | adrian.schroeter, suse-beta |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | SUSE Other | ||
| Whiteboard: | |||
| Found By: | Beta-Customer | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: | alternative drpmsync client | ||
|
Description
Forgotten User OS1JNCFbCX
2005-11-11 15:53:21 UTC
I guess I should make the opendir error in findfiles fatal? If the client actually bails out when the server dies that should solve the problem. The current situation is actually the worst thing that could happen. I lost today 24GB in a few seconds for the second time because of that problem. Fortunately I have backups. Currently the public drpmsync service unfortunately is completely useless anyway because drpmsync.opensuse.org is ways to slow to serve a current repository even by sending deltas only. As it seems that nobody with a public server is willing to run a drpmsync server to balance the load it seems that the protocol definitely _must_ be changed to serve only references to the packages and deltas on standard ftp/http mirrors (as mentioned in my other bug on that topic) instead of the files itself. Otherwise the whole drpmsync project will definitely die because of uselessness. Yes, the client aborts if the server dies (the server sends an ERR package). Sorry for your 24GB, but why was there a unreadable directory on the server? I'll have to ask adrian if this was a problem on our side. And yes, drpmsync.suse.de is much too slow. But I don't see how the protocol can be changed, the deltas must be transported to the ftp/http mirrors and drpmsync itself is the best way to accomplish this. Adrian (or someone else) changed the name of the directory and obviously did not change the policy file. But if you change the protocol to stream the actual _data_ by ftp/http you don't have to transport more bandwidth. You just use a well established protocol to _do_ the transport of the very same data. drpmsync server will then only stream the controll data that tells the client where to fetch the patches. Consider the following example: Client asks the server to fetch inst-source/suse/i586/xyz-42-1.i586.rpm with have=123456789abc... Server says: - apply delta drpmsync/delta/xyz.i586/123456789abc... - apply delta drpmsync/delta/xyz.i586/098def873826... - name the result inst-source/suse/i586/xyz-42-2.i586.rpm You would have transfered the same data with the current protocol as well although you would have transfered the two deltas combined but this should not really be a killing factor. You see what I mean? BTW: You could still serve the old protocol but offer the new one as an alternative. Another BTW: If you want to talk to me about my protocol idea in detail personally you can reach me by phone on Nuremberg 30836623. Yes, I see what you mean, but I don't think this solves anything. It is much easier if the mirrors also "speak" drpmsync. Running drpmsync in cgi mode is sufficient. I mean the data has to be transferred to the mirrors first. Or should the mirrors just mirror the deltas and no rpms? And combining the deltas really saves a lot of bandwidth... Sure it is easier if they speak drpmsync but they don't, do they? The mirrors could use the new protocol as well or they could mirror the traditional way, whatever they prefer. Obviously they need the full RPMs as well for the case someone does not have an old version. They don't yet, but they hopefully will some day. And it makes a lot of sense for the mirrors to get the rpms/deltas with drpmsync as well. The opensuse tree is much too big and changes too fast otherwise. Sure it does make sense. You don't have to convince me. You have to convince the mirror admins. BUT: 1. The proposed new protocol does not hinder them from syncing with drpmsync. 2. As long as you did not convince them you could at least provide a useable service because without this new protocol you have _nothing_ useable as long as you can't convince some of the mirrors with a good connection. Note that my only chance up to now to sync up with current packages actually was to do something similar as I described as the new protocol. Actually I fetched many patches from gwdg.de. And note that currently syncing the _full_ packages from gwdg is ways faster than syncing the deltas only from that one small server you have to stream ftp, rsync and drpmsync to the public. Yeah, but that just means that gwdg.de needs the provide a drpmsync service... ;-) Exactly. The question is: Do you know Eberhard will do this in the near future or is this just some dream world you are living in? Adrian, can you give a comment about whether having some mirrors in the near future is still a dream or whether there will be chances that the current protocol becomes useable by getting real mirrors. I mean if there are no real chances to get some then an alternative solution should be considered. Btw, I could add code that it works with a rsync server. This is probably a much cleaner way than your suggestion. Of course it will be a bit slower than a read drpmsync server, as the delta directory contents have to be transmitted and combinedeltarpm is not donw on the server, but on the client. That would be solution that is better than the current one as well, just because it at least works with the current infrastructure. It is not perfectly clear to me what exactly you want to do. Transfer the whole delta-directory and then work locally? That would even transfer deltas you don't actually need. But at least it should work. What exactly do you consider not clean with my solution? It is in fact the same protocol you have at the moment but with keep_uncombined set and transfer references instead of the actual data. I don't see what is not clean with this. not clean: The different delta servers may not be up to date. Hmm, maybe I spoke too soon. The big question is if I can get rsync to only transfer the first bytes of a file... As far as I know you can't do with the tool, you would need an own/modified implementation. You could do with http/ftp. But what's you idea? To update an rpm, I would get the first bytes, to get the lead md5sum and the package md5sum. then I can list the available deltas and construct a path from my local rpm to the remote one. If such a path exists I can get the deltas and apply them. Just for the case you are interested: What I currently do to update to the most recent version is by a ugly hack that exploits the fact that all deltas are typically straight forward. Thus I apply all deltas that match to a selected RPM I have to get all versions that were produced after that one (including the latest). Obviously this does not really get me new packages or non-RPM files. But when I have done that step there is not much stuff left and rsync can fix up my repository.
My ugly hack is derived from drpmsync by using the functions rpminfo_f and rpminfo from there, changing the return statement in rpminfo_f to "return($sigmd5, $rpmmd5, $buildtime, $name, $arch);" and then adding the following code:
opendir DIR, ".";
my @files = readdir(DIR);
closedir DIR;
for (@files) {
next if (not (/\.rpm$/));
print "$_\n";
my $file = $_;
my $sighash;
my $payloadhash;
my $buildtime;
my $name;
my $arch;
my $loop = 1;
do {
($sighash, $payloadhash, $buildtime, $name, $arch) = rpminfo $file;
print "$sighash $payloadhash $buildtime $name $arch\n";
my $filename = `w3m -dump http://ftp4.gwdg.de/pub/opensuse/distribution/SL-OSS-factory/drpmsync/deltas/$name.$arch/ | sed -ne 's/^\\[ \\] \\(${payloadhash}[0-9a-f]*\\) .*\$/\\1/p'`;
chop $filename;
if ($filename) {
system("wget http://ftp4.gwdg.de/pub/opensuse/distribution/SL-OSS-factory/drpmsync/deltas/$name.$arch/$filename");
system("applydeltarpm -r $file $filename .rpm.tmp");
unlink $filename;
my $newname = `rpm -qp --qf '$name-%{version}-%{release}.$arch.rpm' .rpm.tmp`;
rename ".rpm.tmp", $newname;
$file = $newname;
} else {
$loop = 0;
}
} while ($loop);
}
This truely is not nice but does enable successful transfers.
Created attachment 59033 [details]
alternative drpmsync client
I have now implemented a proof-of-concept client for a drpmsync client that does not suffer from overloaded server when there are rsync mirrors available.
The client does currently the following:
1. Fetch contents file from drpmsync server.
2. Order contents file by file date because for the case the rsync mirror lags behind it is more likely that it has at least the older files. Due to the fact that the client does only fetch information from the rsync server on demand it is also likely that the rsync server has catched up when it comes to newer files.
3. For each file do.
a) Check whether file is current.
b) If not update the delta cache for this package.
c) Check whether there is a patch chain to build the new package.
d) Apply patches if applicable or fetch full package from rsync server.
After running this client it could be the case that there are still some minor glitches in case the rsync and the drpmsync server where extremely out of sync. You can fix these glitches by running a normal rsync or drpmsync client after the run. This should be pretty fast as most files should be appropriate already.
The client does/is currently _not_:
- delete obsolete files because this is left up to a rsync/drpmsync run afterwards
- proper error checking in each case but there should be no error that accidently destroys the repository
- sync non-rpm files (just commented out because it is not tested yet, comment in if you like)
- check for a loop in the patch chain
- security audited
Feel free to give comments.
Feel free to use any of this code under the conditions of the GPL if you like.
Thanks for your work! Small problem: drpmsync is BSD 3-clause, I won't put any GPL code in it! Ok, didn't think about that. You can also use it under the same license drpmsync is. I don't really care that much about the license of that small script. Thanks, I'll have a look at it. I have now improved my client to be multi-threaded to allow downloading of the next patch while the other one is still processed. This allows using system ressources more effectively. You find the current version on http://pi3.informatik.uni-mannheim.de/~schiele/mydrpmsync/. Yes, that's also on my TODO list... Adrian, what's the status on having mirrors out there that offer drpmsync service? ACtually this one is not really SL related any more -> moving to openSUSE.org. (In reply to comment #17) > To update an rpm, I would get the first bytes, to get the lead md5sum and the > package md5sum. Why do you want to do such a strange thing? Simply create a MD5SUM file in the directory - it will be small (some hundred kB max.) and avoids the ugly method you thought of ;-) 1) the MD5SUM file can get out of sync, so using the real files is preferable. 2) the md5sum of the complete rpm changes when the rpm is (re)signed. we do not have further drpmsync mirrors yet Really old and dead. |