Bug 118717

Summary: remove duplicate files from ftp servers
Product: [openSUSE] SUSE Linux 10.1 Reporter: Forgotten User OS1JNCFbCX <forgotten_OS1JNCFbCX>
Component: OtherAssignee: Roman Drahtmueller <draht>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Enhancement    
Priority: P5 - None CC: adrian.schroeter, aj
Version: unspecified   
Target Milestone: ---   
Hardware: All   
OS: All   
Whiteboard:
Found By: Beta-Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: The mentioned script

Description Forgotten User OS1JNCFbCX 2005-09-24 18:48:30 UTC
This is actually not a specific issue with SUSE Linux 10.1 but a generic issue
with contents on ftp servers ftp.suse.com and ftp.opensuse.org.

On both servers there are actually many files stored multiple times in various
directories. It is likely that the number of duplicate files will improve more
and more in the near future since you created two separate update trees for the
almost identical releases 10.0 and 10.0-OSS.

I have hacked a little script that scans specified directories and replaces
duplicates of regular files with hard links. If you run this script on a regular
basis on the primary staging servers of suse.com and opensuse.org you could
prevent much unnecessary sync load and disk usage on the mirror network.

I have tested the script on a mirror of ftp.suse.com/pub/suse and removed 3GB(!)
of duplicate files that way. For ftp.opensuse.org/pub/opensuse the saving was
about 700MB.

Actually the first run of the script will take some time because MD5 sums must
be calculated for every file. Further runs are much faster because the tool
stores already calculated MD5 sums in a cache file.
Comment 1 Forgotten User OS1JNCFbCX 2005-09-24 19:59:28 UTC
Created attachment 50785 [details]
The mentioned script
Comment 2 Roman Drahtmueller 2008-11-12 09:50:02 UTC
Fixed, for the most part.