Bug 309354 - rpmdb crashes
Summary: rpmdb crashes
Status: VERIFIED FIXED
Alias: None
Product: openSUSE 10.3
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Beta 3
Hardware: x86 openSUSE 10.3
: P5 - None : Critical (vote)
Target Milestone: ---
Assignee: Michael Schröder
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-09-10 22:00 UTC by Juan Erbes
Modified: 2010-11-27 13:08 UTC (History)
2 users (show)

See Also:
Found By: Beta-Customer
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
coolo: SHIP_STOPPER-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Juan Erbes 2007-09-10 22:00:32 UTC
In many cases when involved the package ugrade, the rpmdb crashes.

With opensuse updater, I got:
rpmdb: page 802: illegal page type or format rpmdb: PANIC: Invalid argument error: db4 error(-30977) from dbcursor->c_get: DB_RUNRECOVERY: Fatal error, run database recovery rpmdb: PANIC: fatal region error detected; run recovery error: db4 error(-30977) from db->cursor: DB_RUNRECOVERY: Fatal error, run database recovery rpmdb: PANIC: fatal region error detected; run recovery error: db4 error(-30977) from db->get: DB_RUNRECOVERY: Fatal error, run database recovery rpmdb: PANIC: fatal region error detected; run recovery error: db4 error(-30977) from db->cursor: DB_RUNRECOVERY: Fatal error, run database recovery rpmdb: PANIC: fatal region error detected; run recovery error: db4 error(-30977) from db->get: DB_RUNRECOVERY: Fatal error, run database recovery rpmdb: PANIC: fatal region error detected; run recovery error: db4 error(-30977) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery rpmdb: PANIC: fatal region error detected; run recovery error: db4 error(-30977) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery rpmdb: PANIC: fatal region error detected; run recovery error: db4 error(-30977) from dbenv->close: DB_RUNRECOVERY: Fatal error, run database recovery 


If I will install packages with the rpm command, I got:
error: db4 error(-30977) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery
rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30977) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery
rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30977) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery
rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30977) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery
rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30977) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery
rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30977) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery
rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30977) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery
rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30977) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery
rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30977) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery
rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30977) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery
rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30977) from db->close: DB_RUNRECOVERY: Fatal error, run database recovery
rpmdb: PANIC: fatal region error detected; run recovery
error: db4 error(-30977) from dbenv->close: DB_RUNRECOVERY: Fatal error, run database recovery

I must to run "rpm --rebuilddb", from time to time, but inmany cases it has no effect, and the crashes continues immediately after the rebuilddb.

The problem appeared with beta 2, and after many rebuilds via smart I upgraded to beta 3, and the problem continues.
Comment 1 Andreas Jaeger 2007-09-11 05:26:48 UTC
Looking at your comments in bug 299575, it could be hardware problems.
Comment 2 Stephan Kulow 2007-09-11 06:03:25 UTC
you seem to be only one with this problem. So I very much suspect your hardware. Now that you checked your memory, I guess something else is broken. Do you have another version installed on that machine that works fine?
Comment 3 Juan Erbes 2007-09-11 12:35:52 UTC
(In reply to comment #1 from Andreas Jaeger)
> Looking at your comments in bug 299575, it could be hardware problems.
> 

This is not a hardware problem. This problem can be derived from a buggy ahci driver

(In reply to comment #2 from Stephan Kulow)
> you seem to be only one with this problem. So I very much suspect your
> hardware. Now that you checked your memory, I guess something else is broken.
> Do you have another version installed on that machine that works fine?
> 
I have installed in other partition the 64 bits version, the kde version, without adittional packages, and I mean the problem not appear in this case.

The memory was tested for more than 30 minutes with zero error.

When I resized one partition to install the 64 bits version, the hard disk controller was reseted from time to time.

I posted this problem in Bug 299575.

How can I verify if the cause of the problem is the buggy ahci driver? 
Comment 4 Juan Erbes 2007-09-16 00:47:47 UTC
(In reply to comment #2 from Stephan Kulow)
> you seem to be only one with this problem. So I very much suspect your
> hardware. Now that you checked your memory, I guess something else is broken.
> Do you have another version installed on that machine that works fine?
> 

Did You mean that I'm the only one with this problem?
What You mean about the bug https://bugzilla.novell.com/show_bug.cgi?id=220155


Or https://bugzilla.novell.com/show_bug.cgi?id=308352 In which appear:

  ------- Comment #4 From Michael Andres  2007-09-07 08:34:33 MST  [reply] -------

The update fails due to a broken rpmdb. The error screen is full of:

  rpmdb: PANIC: fatal region error detected; run recovery
  error: db4 error(-30977) from db->close: DB_RUNRECOVERY: 
  Fatal error, run database recovery


@Sergio: You should run 'rpm --rebuilddb' (as root).

------- Comment #5 From Michael Andres 2007-09-07 08:40:03 MST [reply] -------

Zypp does no rebuildDatabase on update. Code is disabled.

------- Comment #7 From Stephan Kulow 2007-09-07 13:15:04 MST [reply] -------

I don't want to risk this for 10.3. Did the old code run for 10.2?

------- Comment #8 From Michael Andres 2007-09-09 14:26:20 MST [reply] -------

No, rebuilddb is disabled since 10.1.

------- Comment #9 From Stephan Kulow 2007-09-10 01:25:19 MST [reply] -------

OK, then move to 11.0 and change it early in the process.

Comment 5 Stephan Kulow 2007-09-16 05:37:33 UTC
308352 is about doing an update that fails. You're talking about having already updated and running opensuse updater. Different bug. Your rpmdb crashes out of the sudden, 308352 is about a rpmdb that is broken and yast not repairing it on selecting update.
Comment 6 Juan Erbes 2007-09-16 14:44:05 UTC
How I do rebuild the rpmdb?, if the rebuild is disabled (as said by Michael Andres 2007-09-09 14:26:20): "rebuilddb is disabled since 10.1"

One of the problems I hate observed in the rpmdb, is that they appear many versions of the same package. Anytime I install a package by hand I use "rpm -Uvh package.rpm", and I mean in this case must be removed from the db the reference to the old package.

Because the known fails of the Opensuse Updater and the shit of the zmd, i made the update via smart, and in many cases smart failed at 99% of downloading the total of packages (for example about 2 GB), because the packages are changed in the ftp server. In this cases I go to /var/lib/smar/packages and i do "rpm -Uh --nodeps --force --replacepkgs *.rpm"

With the parameter --replacepkgs, must be removed from the rpmdb the references to the old packages, when a new version of the same package where installed with this parameter.

Comment 7 Stephan Kulow 2007-09-16 16:55:35 UTC
you can do rpm --rebuilddb 
Comment 8 Juan Erbes 2007-09-16 18:41:07 UTC
Yes, I do.
But after installing 2 or 3 packages, it crashes again.
Comment 9 Juan Erbes 2007-09-23 13:50:30 UTC
How can I edit the rpmdb to remove the references to all the packages that are not installed. In th rpmdb appear as installed about 4 different versions from the same package.
Two days ago I begining to download the dvd of rc1, to make a a clean installation, via torrent, the only way to download the dvd, but the connection was very slowly, and I must waiting for about a week to complete the download.For the update via smart I can download 2.2 GB in about 10 hours.
When a package are updated via rpm -U and in many cases with the parameter --replacepkgs (or via smart), the updated or replaced package do'nt disappear from the rpmdb, and I mean it's the cause for the crashing.
Comment 10 Michael Schröder 2007-09-28 14:34:22 UTC
I doubt that this is the cause. It's just that rpm first does all the installs of new packages and after that it does the deletes. So if it crashes in the middle of a transaction, you'll get lots of bogus packages in the rpm database.
You can remove those bad entries with standard 'rpm -e' commands.

Regarding your original problem (the many crashes in the rpm database) I'm
afraid I can't help you much.
Comment 11 Willem Meens 2007-10-09 12:03:29 UTC
I just also had this problem (with OpenSUSE 10.3 final).

It could/could it? have something to do with installing the restricted formats (choosing ALL possible options within advanced configuration) http://opensuse-community.org/Restricted_Formats/10.3?

After the restricted formats I installed Wine 0.9.45 and the install failed and the PANIC errors appeared.


rpm --rebuilddb successfully repaired the problem and new (un)installs run successfully again. I did remove Packman and other unknown repros found that had been added with the restricted format install.

But when starting software manager i get this message:

There was an error in the repository initialization.
Record not found in the cache
History:
 - SQL logic error or missing database

(Could still be a bad repository link?)


Comment 12 Michael Schröder 2007-10-09 12:06:00 UTC
(CCing Duncan because of the SQL error message)
Comment 13 Alexander Naumov 2010-11-27 13:08:09 UTC
Can't reproduce it in openSUSE 11.1 (x86_64) and openSUSE 11.3 (i686)