Bugzilla – Bug 118717
remove duplicate files from ftp servers
Last modified: 2008-11-12 09:50:02 UTC
This is actually not a specific issue with SUSE Linux 10.1 but a generic issue with contents on ftp servers ftp.suse.com and ftp.opensuse.org. On both servers there are actually many files stored multiple times in various directories. It is likely that the number of duplicate files will improve more and more in the near future since you created two separate update trees for the almost identical releases 10.0 and 10.0-OSS. I have hacked a little script that scans specified directories and replaces duplicates of regular files with hard links. If you run this script on a regular basis on the primary staging servers of suse.com and opensuse.org you could prevent much unnecessary sync load and disk usage on the mirror network. I have tested the script on a mirror of ftp.suse.com/pub/suse and removed 3GB(!) of duplicate files that way. For ftp.opensuse.org/pub/opensuse the saving was about 700MB. Actually the first run of the script will take some time because MD5 sums must be calculated for every file. Further runs are much faster because the tool stores already calculated MD5 sums in a cache file.
Created attachment 50785 [details] The mentioned script
Fixed, for the most part.