Bug 946067 - Corruption in RPM DB after fresh 42.1M2 install
Summary: Corruption in RPM DB after fresh 42.1M2 install
Status: NEW
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Current
Hardware: x86-64 openSUSE 42.1
: P5 - None : Minor (vote)
Target Milestone: ---
Assignee: Michael Schröder
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-16 14:24 UTC by Bob Rogers
Modified: 2016-03-31 17:27 UTC (History)
4 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
rpmdumpheader output (677.50 KB, application/x-rpmdumpheader)
2015-09-22 19:57 UTC, Bob Rogers
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bob Rogers 2015-09-16 14:24:31 UTC
Immediately after installing, I get the following persistent error when querying
all installed RPMs:

    thebes:~ # rpm -qa | wc
    error: rpmdbNextIterator: skipping h#    1558 Header V3 RSA/SHA256 Signature, key ID 3dbdc284: BAD
       1998    1998   63954
    thebes:~ #

The output looks normal; if anything is missing, it is not obvious to me.
I have seen similar errors before, but never after a fresh install.

For the save_y2logs tarball from this installation, see bug#945929.
Comment 1 Michael Schröder 2015-09-18 10:19:17 UTC
Well, what is header 1558? Please do:

rpm -qi --querybynumber 1558 --nodigest --nosignature
Comment 2 Michael Schröder 2015-09-18 10:31:37 UTC
(The output basically means that the pubkey that was used to sign the package was not imported into the rpmdb)
Comment 3 Bob Rogers 2015-09-18 17:01:06 UTC
Here's what I get (with description omitted):

    rogers@thebes> rpm -qi --querybynumber 1558 --nodigest --nosignature
    rpm -qi --querybynumber 1558 --nodigest --nosignature
    Name        : libreoffice
    Version     : 4.3.5.2
    Release     : 12.1
    Architecture: x86_64
    Install Date: Mon 14 Sep 2015 03:56:27 PM EDT
    Group       : Productivity/Office/Suite
    Size        : 222033984
    License     : Apache-2.0 and Artistic-1.0 and BSD-3-Clause and BSD-4-Clause and GPL-2.0+ and LPPL-1.3c and LGPL-2.1+ and LGPL-3.0 and MPL-1.1 and MIT and SUSE-Public-Domain and W3C
    Signature   : RSA/SHA256, Thu 27 Aug 2015 12:22:01 PM EDT, Key ID b88b2fd43dbdc284
    Source RPM  : libreoffice-4.3.5.2-12.1.src.rpm
    Build Date  : Thu 27 Aug 2015 11:49:58 AM EDT
    Build Host  : cloud138
    Relocations : (not relocatable)
    Vendor      : obs://build.opensuse.org/openSUSE:42
    URL         : http://www.documentfoundation.org/
    Summary     : A Free Office Suite (Framework)
    Description :

    . . .

    Distribution: openSUSE:42
    rogers@thebes>
Comment 4 Michael Schröder 2015-09-21 09:57:07 UTC
Hmm, that's the standard key, so it *is* in the database. The problem must be somewhere else.

Do you get more information when you do:

rpm -q -v -v -v  --querybynumber 1558

Thanks!
Comment 5 Michael Schröder 2015-09-21 10:00:25 UTC
Could you please also attach the output of:

/usr/lib/rpm/rpmdumpheader libreoffice-4.3.5.2-12.1

(This assumes you have the deltarpm package installed)
Comment 6 Bob Rogers 2015-09-22 19:54:53 UTC
Here's the rpm output:

    thebes:~ # rpm -q -v -v -v  --querybynumber 1558
    D: package record number: 1558
    D: loading keyring from pubkeys in /var/lib/rpm/pubkeys/*.key
    D: couldn't find any keys in /var/lib/rpm/pubkeys/*.key
    D: loading keyring from rpmdb
    D: opening  db environment /var/lib/rpm cdb:private:0x201
    D: opening  db index       /var/lib/rpm/Packages 0x400 mode=0x0
    D: locked   db index       /var/lib/rpm/Packages
    D: opening  db index       /var/lib/rpm/Name nofsync:0x400 mode=0x0
    D:  read h#    2007 Header SHA1 digest: OK (121869ebdc8e8ddf647915a929aa17a985500257)
    D: added key gpg-pubkey-3dbdc284-53674dd4 to keyring
    D: Using legacy gpg-pubkey(s) from rpmdb
    D:  read h#    1558 Header V3 RSA/SHA256 Signature, key ID 3dbdc284: OK
    libreoffice-4.3.5.2-12.1.x86_64
    D: closed   db index       /var/lib/rpm/Name
    D: closed   db index       /var/lib/rpm/Packages
    D: closed   db environment /var/lib/rpm
    thebes:~ #

The rpmdumpheader output is almost 700K, so I will add it as an
attachment.
Comment 7 Bob Rogers 2015-09-22 19:57:14 UTC
Created attachment 648623 [details]
rpmdumpheader output
Comment 8 Michael Schröder 2015-09-23 10:46:39 UTC
Wait a sec, why didn't you get that 'Signature, key ID 3dbdc284: BAD' error when you did the --querybynumber?

So you get an error with 'rpm -qa', but no error if you query just the single header??
Comment 9 Bob Rogers 2015-09-24 16:48:33 UTC
Actually, it gets even more wierd:  I'm now seeing two errors, fingering
different headers:

    thebes:~ # rpm -qa --last | head
    error: rpmdbNextIterator: skipping h#     624 Header V3 RSA/SHA256 Signature, key ID 3dbdc284: BAD
    error: rpmdbNextIterator: skipping h#     638 Header V3 RSA/SHA256 Signature, key ID 3dbdc284: BAD
    baloo5-tools-5.9.2-5.1.x86_64                 Wed Sep 16 10:08:45 2015
    yast2-nfs-client-3.1.16-4.2.noarch            Wed Sep 16 10:08:44 2015
    yast2-metapackage-handler-3.1.4-5.1.noarch    Wed Sep 16 10:07:57 2015
    libKF5WindowSystem5-lang-5.12.0-3.2.noarch    Wed Sep 16 10:07:57 2015
    avahi-lang-0.6.31-11.1.noarch                 Wed Sep 16 10:07:56 2015
    gpg-pubkey-3dbdc284-53674dd4                  Wed Sep 16 10:07:07 2015
    perl-Bio-Graphics-2.24-5.noarch               Tue Sep 15 15:21:36 2015
    perl-Data-Stag-0.11-2.noarch                  Tue Sep 15 15:21:19 2015
    perl-DBIx-Roles-1.04-3.noarch                 Tue Sep 15 15:00:52 2015
    perl-SQL-Abstract-1.22-1.x86_64               Tue Sep 15 14:52:13 2015
    thebes:~ #

This also shows that I haven't installed or upgraded any packages since
I submitted the bug report (nor do I recall removing any).

And if I do "rpm -q -v -v -v --querybynumber 624" (or 638), it keeps
complaining about the same header until I stop it with C-c.
Comment 10 Michael Schröder 2015-09-25 09:20:34 UTC
Weird. Does 'valgrind rpm -qa' show anything suspicious?
Comment 11 Bob Rogers 2015-09-25 16:31:40 UTC
Nothing that looks suspicious to me (though I have no experience
interpreting valgrind output).  Here are the summaries:

    ==2648== HEAP SUMMARY:
    ==2648==     in use at exit: 3,755 bytes in 4 blocks
    ==2648==   total heap usage: 88,648 allocs, 88,644 frees, 119,589,832 bytes allocated
    ==2648== 
    ==2648== LEAK SUMMARY:
    ==2648==    definitely lost: 0 bytes in 0 blocks
    ==2648==    indirectly lost: 0 bytes in 0 blocks
    ==2648==      possibly lost: 0 bytes in 0 blocks
    ==2648==    still reachable: 3,755 bytes in 4 blocks
    ==2648==         suppressed: 0 bytes in 0 blocks
    ==2648== Rerun with --leak-check=full to see details of leaked memory
    ==2648== 
    ==2648== For counts of detected and suppressed errors, rerun with: -v
    ==2648== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Running with greater levels of detail doesn't show anything more that
jumps out at me.  Note that I am still getting the same warnings about
skipping headers 624 and 638 after doing "zypper install valgrind".
Comment 12 Stanislav Brabec 2015-09-30 14:07:27 UTC
I experienced a similar problem:
oct:~ # zypper dup --from openSUSE-Tumbleweed-Oss --from openSUSE-Tumbleweed-Non-Oss
...
(   8/1039) Instaluje se: bundle-lang-kde-en-13.2-12.4 ..............................................................[chyba]
Instalace bundle-lang-kde-en-13.2-12.4 selhala:
Chyba: Subprocess failed. Error: RPM selhalo: Příkaz byl ukončen s kódem 1.

(A) Přerušit, (R) opakovat, (I) ignorovat? [a/r/i] (a): r
Instalace bundle-lang-kde-en-13.2-12.4 selhala:
Chyba: Subprocess failed. Error: RPM selhalo: error: unpacking of archive failed on file /usr/share/doc-bundle/kde/HTML/en/amarok/Face-smile.png: cpio: rename
error: bundle-lang-kde-en-13.2-12.4.noarch: install failed
error: bundle-lang-kde-en-13.2-12.2.noarch: erase skipped


(A) Přerušit, (R) opakovat, (I) ignorovat? [a/r/i] (a): i
(   9/1039) Instaluje se: geda-doc-1.8.2-5.4 ........................................................................[chyba]
Instalace geda-doc-1.8.2-5.4 selhala:
Chyba: Subprocess failed. Error: RPM selhalo: error: unpacking of archive failed on file /usr/share/doc/packages/geda-gaf/wiki/geda-bom_readme.ru.html: cpio: rename
error: geda-doc-1.8.2-5.4.noarch: install failed
Instalace geda-symbols-1.8.2-5.4 selhala:
Chyba: Subprocess failed. Error: RPM selhalo: error: unpacking of archive failed on file /usr/share/gEDA/sym/4000/4000-1.sym: cpio: rename
error: geda-symbols-1.8.2-5.4.noarch: install failed
error: geda-symbols-1.8.2-5.3.noarch: erase skipped


(A) Přerušit, (R) opakovat, (I) ignorovat? [a/r/i] (a): i
(  11/1039) Instaluje se: glibc-i18ndata-2.22-2.2 ...................................................................[chyba]
Instalace glibc-i18ndata-2.22-2.2 selhala:
Chyba: Subprocess failed. Error: RPM selhalo: error: unpacking of archive failed on file /usr/share/i18n/charmaps/ANSI_X3.110-1983.gz: cpio: rename
error: glibc-i18ndata-2.22-2.2.noarch: install failed
error: glibc-i18ndata-2.22-1.1.noarch: erase skipped


(A) Přerušit, (R) opakovat, (I) ignorovat? [a/r/i] (a): 
Došlo k chybě v průběhu instalace nebo odstraňování nebo po instalaci či odstranění balíčků:
Installation aborted by user

...

oct:~ # LANG=C zypper dup --from openSUSE-Tumbleweed-Oss --from openSUSE-Tumbleweed-Non-Oss
Target initialization failed:
Failed to cache rpm database (1).
Comment 13 Stanislav Brabec 2015-09-30 14:11:26 UTC
In my case, it has something to do with the file system, brtfs or snapper:

oct:~ # cp -a /var/lib/rpm /var/lib/rpm.save
cp: cannot create directory '/var/lib/rpm.save': No space left on device
oct:~ # df /var/lib
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/sdb1       41945088 37520524   4374356  90% /
Comment 14 Michael Schröder 2015-09-30 14:14:13 UTC
Well, rpm can't do much against broken file systems...
Comment 15 Stanislav Brabec 2015-10-01 13:41:22 UTC
David: What do you think about comment 12?

Real disk space taken: 58%

Disk space taken due to existing automatic snapshots: 90%

However it reported 10% free space, it was not possible to add any more files.

rpm failed on mv.
Comment 16 David Sterba 2015-10-05 11:37:06 UTC
Do you have output of 'btrfs fi df' ? This looks like metadata is exhausted while there's still space in the data.
Comment 17 Stanislav Brabec 2015-10-05 13:33:13 UTC
David: No, I don't. After snapper delete, everything works again.

I'll do that if it happens again.
Comment 18 Stanislav Brabec 2015-10-22 13:03:19 UTC
It happened again now. But today df shows 100%, so it is not exactly the same case as before.

Disk full problem is apparently caused by snapper/zypp configuration keeping lots of obsolete snapshots.


And I again seen obscure RPM error:

oct:~ # zypper dup
Warning: You are about to do a distribution upgrade with all enabled repositories. Make sure these repositories are compatible before you continue. See 'man zypper' for more information about this command.
Download (curl) error for 'http://download.opensuse.org/tumbleweed/repo/oss/content':
Error code: Write error
Error message: Failed writing body (2636 != 2920)

Abort, retry, ignore? [a/r/i/? shows all options] (a): 
ABORT request: Aborting requested by user
oct:~ # df /
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/sdb1       41945088 41854180        28 100% /
oct:~ # btrfs fi df /
Data, single: total=37.22GiB, used=37.22GiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=2.75GiB, used=2.20GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
oct:~ # for i in $(seq 100) ; do snapper delete $i ; done 2>/dev/null
oct:~ # sleep 120
oct:~ # df /
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/sdb1       41945088 26101036  14139828  65% /
oct:~ # btrfs fi df /
Data, single: total=37.22GiB, used=23.73GiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=2.75GiB, used=880.70MiB
GlobalReserve, single: total=304.00MiB, used=0.00B
oct:~ #
Comment 19 Stanislav Brabec 2016-03-31 14:04:56 UTC
And reproduced with bad free report:

oct:~ # df /
Souborový systém 1K bloků    Užito   Volné Uži% Připojeno do
/dev/sdb1        41945088 40815764 1087420  98% /
oct:~ # btrfs fi df /
Data, single: total=37.72GiB, used=36.68GiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=2.25GiB, used=1.74GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
oct:~ # uname -r
4.5.0-1-default
oct:~ # btrfs fi us /
Overall:
    Device size:		  40.00GiB
    Device allocated:		  40.00GiB
    Device unallocated:		   1.00MiB
    Device missing:		     0.00B
    Used:			  38.42GiB
    Free (estimated):		   1.04GiB	(min: 1.04GiB)
    Data ratio:			      1.00
    Metadata ratio:		      1.00
    Global reserve:		 512.00MiB	(used: 0.00B)

Data,single: Size:37.72GiB, Used:36.68GiB
   /dev/sdb1	  37.72GiB

Metadata,single: Size:2.25GiB, Used:1.74GiB
   /dev/sdb1	   2.25GiB

System,single: Size:32.00MiB, Used:16.00KiB
   /dev/sdb1	  32.00MiB

Unallocated:
   /dev/sdb1	   1.00MiB
oct:~ # 


=> 2 bugs are here

1) btrfs reports bad free

2) snapper is ill configured and repeatedly fills the disk. After removing all unneeded snapshots, I am back at 65%.
Comment 20 Stanislav Brabec 2016-03-31 17:27:33 UTC
This time, the system left unbootable after the error, and I had to re-install.

But rpm seems to have no chance to prevent it, as it gets incorrect information about free space.