|
Bugzilla – Full Text Bug Listing |
| Summary: | [Build D.65.3] journal_check: pam_wtmpdb: database locked | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | Dominique Leuenberger <dimstar> |
| Component: | Basesystem | Assignee: | Stefan Schubert <schubi> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | kukuk, schubi |
| Version: | Current | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| URL: | https://openqa.opensuse.org/tests/3578202/modules/journal_check/steps/32 | ||
| Whiteboard: | |||
| Found By: | openQA | Services Priority: | |
| Business Priority: | Blocker: | Yes | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
Dominique Leuenberger
2023-09-15 07:21:52 UTC
The errors are coming from sqlite and my suspicion is parallel access to wtmpdb. Which would be the same problem as utmp/wtmp have, which is not surprising at all: https://sourceware.org/bugzilla/show_bug.cgi?id=24492 Maybe we can make the timeout to get the lock for sqlite longer? Or repeat to get the lock a few times? But in every case, we should not block because of this. An error in the log is much better than that a normal user can block root or other users or processes from being able to login. And we should extend the error message, so that we know in which phase this happens, if possible. This also begs the question then if it should be logged as error, warning, or information in the journal Hm, I have tried an installation with this ISO but: journalctl --no-pager --quiet -p err -o short-precise does not return this error. Maybe it is a timing issue. Anyway I will have a look to the source code and will follow Thorsten's suggestions.... It seems that the login has gone successfully, right ? I am wondering where Sep 15 06:30:49.316433 localhost.localdomain systemctl[1568]: systemctl: invalid option -- '.' does come from so often and why Sep 15 06:54:00.970453 localhost.localdomain root[942]: ERROR: "/usr/libexec/health-checker/fail.sh check" failed Sep 15 06:54:01.168261 localhost.localdomain root[989]: Machine didn't come up correctly, do a rollback Added Thorsten to CC (In reply to Stefan Schubert from comment #3) > Hm, I have tried an installation with this ISO but: > journalctl --no-pager --quiet -p err -o short-precise > does not return this error. > Maybe it is a timing issue. Anyway I will have a look to the source > code and will follow Thorsten's suggestions.... The error can only happen, if either: 1. the wtmp.db is broken, because a process writing to it got killed. I don't think that's the case here. 2. the wtmp.db file is broken for other reasons 3. two or more processes try to write to the database at the same time. I think it's 3. What you could try: write a small C program which opens wtmp.db in read-write mode, sleeps 30 seconds (or whatever you need) and closes it. During the sleep time, you should try to login on the console. I expect: 1. login on console should work 2. you will see the error message in the journal (In reply to Stefan Schubert from comment #4) > It seems that the login has gone successfully, right ? > > I am wondering where > > Sep 15 06:30:49.316433 localhost.localdomain systemctl[1568]: systemctl: > invalid option -- '.' > > does come from so often I think Fabian fixed that now. > and why > > Sep 15 06:54:00.970453 localhost.localdomain root[942]: ERROR: > "/usr/libexec/health-checker/fail.sh check" failed > Sep 15 06:54:01.168261 localhost.localdomain root[989]: Machine didn't come > up correctly, do a rollback This looks like a QA test case, let health-checker fail and look, if we make a rollback. (In reply to Dominique Leuenberger from comment #2) > This also begs the question then if it should be logged as error, warning, > or information in the journal We don't know the reason why the wtmp.db is locked (could also be another problem) and wtmp.db has afterwards either incomplete entries or some are missing. So I would stay with the error. (In reply to Thorsten Kukuk from comment #6) > (In reply to Stefan Schubert from comment #3) > > Hm, I have tried an installation with this ISO but: > > journalctl --no-pager --quiet -p err -o short-precise > > does not return this error. > > Maybe it is a timing issue. Anyway I will have a look to the source > > code and will follow Thorsten's suggestions.... > > The error can only happen, if either: > 1. the wtmp.db is broken, because a process writing to it got killed. I > don't think that's the case here. > 2. the wtmp.db file is broken for other reasons > 3. two or more processes try to write to the database at the same time. > > I think it's 3. > > What you could try: > write a small C program which opens wtmp.db in read-write mode, sleeps 30 > seconds (or whatever you need) and closes it. > > During the sleep time, you should try to login on the console. > I expect: > 1. login on console should work > 2. you will see the error message in the journal I have stopped a process after sqlite3_open_v2 (rw open) and have been able to login on another terminal without an error. I will check the code regarding these possibilities: https://www2.sqlite.org/cvstrac/wiki?p=DatabaseIsLocked espl. around the db rotate call...... I cannot reproduce the error in all cases. I have added a timeout of 5 seconds now and have increased the logging in order to see where it happens: https://github.com/thkukuk/wtmpdb/pull/9 I will take that bug in order to see the results..... As far I am remembering the bug has gone some time ago with this fix. Dominique, please reopen it if there is still a problem. |