Bug 551678

Summary: zypper dup heavily mis-predicting used disk space
Product: [openSUSE] openSUSE 11.2 Reporter: Bernhard Wiedemann <novellbmw>
Component: libzyppAssignee: E-mail List <zypp-maintainers>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Minor    
Priority: P3 - Medium CC: koenig, ma
Version: RC 1   
Target Milestone: ---   
Hardware: i686   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Bernhard Wiedemann 2009-10-31 20:01:07 UTC
User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.23) Gecko/20090912 SUSE/1.1.18-1.2 SeaMonkey/1.1.18

As part of the testing core team, I tested zypper dup from 11.1 to 11.2 and found, that "zypper dup" used 69% more disk-space than predicted (1080MiB instead of 636).

Reproducible: Always

Steps to Reproduce:
1. install 11.1-KDE-LiveCD
2. run 11.1 updates
3. notice output of df /
4. follow http://en.opensuse.org/index.php?title=Upgrade/11.2&oldid=108417
5. notice output of df / again
Actual Results:  
799 packages to upgrade, 4 to downgrade, 263 new, 101 to remove, 1 to change
vendor, 7 to change arch.
Overall download size: 846.9 MiB. After the operation, additional 636.4 MiB
will be used.

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2              7234296   2307212   4559600  34% /

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2              7234296   3359212   3507600  49% /


Expected Results:  
prediction could be somewhat closer to the real result.
This might actually be quite hard, given the different filesystems/block-sizes and options (e.g reiserfs with tail-packing). At least zypper could tell the user that he has to expect some extra 50-70% disk-usage on most filesystems.

If needed, I have full "du /" of both before and after state. Only 65MB are used extra in /var.
Maybe zypper dup does not take wasted 4k-blocks of ext3 into account?
Comment 1 Jan Kupec 2009-11-02 10:58:52 UTC
I'm not sure, i'm not an expert on ext3. The calculation is just a simple difference between the 'installed size' reported by the packages to be removed and that of the packages to be installed. This installed size does not take the block sizes into account as well.

Just in case: did you take the cache of the downloaded packages into account? Looks like the difference you're reporting could easily be the sum of the sizes of the downloaded rpms. Try 'zypper clean' after the install.
Comment 2 Bernhard Wiedemann 2009-11-02 17:53:54 UTC
this is from du / | sort -n | tail -13

before zypper dup:
96928   /var
97732   /lib
108120  /usr/share/doc/packages
152776  /usr/lib/ooo3/basis3.0/program
154096  /usr/share/doc
154712  /opt/kde3
154716  /opt
179072  /usr/lib/ooo3/basis3.0
186220  /usr/lib/ooo3
636416  /usr/share
927532  /usr/lib
1677456 /usr
2104864 /

after zypper dup
132860  /lib
152620  /usr/lib/ooo3/basis3.1/program
161704  /var
167376  /opt/kde3
167380  /opt
180064  /usr/lib/ooo3/basis3.1
190132  /usr/lib/ooo3
222320  /usr/share/doc
257548  /usr/share/icons
1192256 /usr/share
1336864 /usr/lib
2670176 /usr
3211144 /


15MB extra in /var size is from /var/log that were written during upgrade

With the method as you describe it, a one-byte-file will probably only be counted as one byte, even though it often takes up a whole block (e.g. 4096 byte)
average overhead would be $numberoffiles*$filesystemblocksize/2
though even that is underestimating, if there are many small files.
Comment 3 Michael Andres 2009-11-03 12:27:54 UTC
(In reply to comment #1)
> I'm not sure, i'm not an expert on ext3. The calculation is just a simple
> difference between the 'installed size' reported by the packages to be removed
> and that of the packages to be installed. This installed size does not take the
> block sizes into account as well.

We'd actually be able to estimate the size per partition including bocksizes.
libzypp/satsolver provide interfaces for this. 

BUT not all repositories supply the necessary disc usage information. It's missing in rpmmd repos. AFAIK here we even don't know the number of files included in a package. All this could be derived from the fileindex.xml, but we currently consider the pain of downloading this huge file bigger than the inaccuracy in size computation.

Susetags repos contain an abstract that allows to estimate the size below individual directories, up do depth 3 in the filesystem (suse/setup/descr/packages.DU.gz in the repo). For 16510 packages in factory the downloadsize size is ~600K. If we'd get something similar into rpmmd, it could be worth thinking about a (hopefully) closer calculation per partition.
Comment 4 Bernhard Wiedemann 2009-11-03 15:02:22 UTC
So is a better prediction possible with repositories like http://download.opensuse.org/distribution/11.1/repo/oss/ ?
3rd party repos usually only make up a small part of packages and thus the error in prediction would not be as big.

The current prediction can be seen as optimistic (minimal) prediction, but there should be something like
typical/expected
pessimistic (could use +50-70% on installed size on repos without extra metainfo)


Are there other zypper operations as big as zypper dup? zypper in openoffice?
Main reason for a good prediction is to tell humans and programs if the packages will fit on disk. And if the pessimistic guess exceeds free space, there should at least be some warning.
Comment 5 Michael Andres 2010-02-11 17:58:00 UTC
*** Bug 557209 has been marked as a duplicate of this bug. ***
Comment 6 Michael Andres 2012-09-26 07:20:43 UTC
Still no satisfying space computation.

*** This bug has been marked as a duplicate of bug 410897 ***