Bug 1171858 - journal corruption: "tag/entry realtime timestamp out of synchronization"
Summary: journal corruption: "tag/entry realtime timestamp out of synchronization"
Status: NEW
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: Current
Hardware: x86-64 Other
: P5 - None : Normal (vote)
Target Milestone: Current
Assignee: systemd maintainers
QA Contact: E-mail List
URL: http://phoenix-openqa.qam.suse.de/tes...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-19 07:28 UTC by Felix Niederwanger
Modified: 2021-03-11 06:05 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Felix Niederwanger 2020-05-19 07:28:00 UTC
## Observation

journalctl --verify yields a journal corruption after enabling FSS and rebooting the system.

openQA test in scenario sle-15-SP1-Server-DVD-Updates-x86_64-mau-extratests@64bit fails in
[journalctl fail - SLE15-SP1](http://phoenix-openqa.qam.suse.de/tests/1218/modules/journalctl/steps/55)
[journalctl fail - SLE15](http://phoenix-openqa.qam.suse.de/tests/1219#step/journalctl/55)
[journalctl fail - SLE12SP5](http://phoenix-openqa.qam.suse.de/tests/1220#step/journalctl/55)
[journalctl fail - SLE12SP4](http://phoenix-openqa.qam.suse.de/tests/1221#step/journalctl/55)
[journalctl fail - SLE12SP3](http://phoenix-openqa.qam.suse.de/tests/1222#step/journalctl/55)
[journalctl fail - SLE12SP2](http://phoenix-openqa.qam.suse.de/tests/1223#step/journalctl/55)
[journalctl fail - Leap 15.2](http://phoenix-openqa.qam.suse.de/tests/1225#step/journalctl/56)
[journalctl fail - Tumbleweed](http://phoenix-openqa.qam.suse.de/tests/1226#step/journalctl/56)

## Test suite description

journalctl regression test (in development)


## Reproducible

Fails on all current builds, reproducible in local VM (tested on Tumbleweed, SLES15-SP1, Leap 15.1, Leap 15.2 - all failing)

    sed -i 's/.*Storage=.*/Storage=persistent/' /etc/systemd/journald.conf
    sed -i 's/.*Seal=.*/Seal=yes/' /etc/systemd/journald.conf
    systemctl restart systemd-journald
    sleep 1
    journalctl --interval=10s --setup-keys | tee journalctl-setup-keys.txt
    sleep 1
    journalctl --rotate
    sleep 1
    journalctl --verify --verify-key=`cat journalctl-setup-keys.txt`
    
    # reboot system
    reboot
    
    journalctl --verify --verify-key=`cat journalctl-setup-keys.txt`


## Expected result

journalctl --verify is passing.


## Further details

Always latest result in this scenario: [latest](http://phoenix-openqa.qam.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Server-DVD-Updates&machine=64bit&test=mau-extratests&version=15-SP1)
Comment 1 Franck Bui 2020-05-29 14:40:08 UTC
(In reply to Felix Niederwanger from comment #0)
> Fails on all current builds, reproducible in local VM (tested on Tumbleweed,
> SLES15-SP1, Leap 15.1, Leap 15.2 - all failing)
> 
>     sed -i 's/.*Storage=.*/Storage=persistent/' /etc/systemd/journald.conf

Hmm on SLE, persistent journals are disabled on purpose: rsyslog is supposed to be the logging system...
Comment 3 Felix Niederwanger 2020-06-18 07:36:36 UTC
(In reply to Franck Bui from comment #1)
> (In reply to Felix Niederwanger from comment #0)
> > Fails on all current builds, reproducible in local VM (tested on Tumbleweed,
> > SLES15-SP1, Leap 15.1, Leap 15.2 - all failing)
> > 
> >     sed -i 's/.*Storage=.*/Storage=persistent/' /etc/systemd/journald.conf
> 
> Hmm on SLE, persistent journals are disabled on purpose: rsyslog is supposed
> to be the logging system...

Hi Frank,

Hmmmm, this is true, yet we found the bug in a non-default/non-suggested configuration.

If we are sure, that is is not becoming a problem I can remove this part from the journalctl test.
Comment 4 Franck Bui 2020-06-18 07:48:53 UTC
Your test is still relevant for Leap/Factory anyway, so please keep it.
Comment 5 Felix Niederwanger 2020-06-18 08:00:21 UTC
Thanks for the clarification we'll do!
Comment 8 Felix Niederwanger 2020-07-31 09:45:15 UTC
If there are no complains, I'm gonna remove the part of the openQA test that results in this softfailure from all SLE variants, as persistent storage is not recommended.

Still leaving it on for Tumbleweed/Leap though, so it still gets noticed in Factory.
Comment 9 Franck Bui 2020-08-04 07:21:08 UTC
Maybe you could also change the product target for this bug to point to an openSUSE distro, preferably TW ?
Comment 10 Felix Niederwanger 2020-08-06 07:59:58 UTC
Hi Frank,

I am going through the SLES documentation and find back to SLE12-SP2 that persistent journal is supported: https://documentation.suse.com/sles/12-SP2/html/SLES-all/cha-journalctl.html#journalctl-persistent. So, while not enabled by default, it is still supported.

What is not documented is Forward Secure Sealing (FSS), so I would skip this part on SLES products, as those are causing the softfailures in openQA. The basic persistence checks remains untouched though.

Best,
Felix
Comment 13 Felix Niederwanger 2020-10-28 09:43:34 UTC
https://bugzilla.suse.com/show_bug.cgi?id=1178193 is similar to this bug, but not the same. There we have a race condition for `journalctl --verify` after `systemctl restart systemd-journald.service`.
I could not see the same race condition for this bug on a Leap 15.2 VM:

    leap15_2:~ # sleep 60 && journalctl --verify --verify-key=`cat journalctl-setup-keys.txt`
    PASS: /var/log/journal/da8648fdf8e84058a7870135ef77c294/system@d200a2194a874b40b31b8b77eb9a8faa-0000000000000001-0005b2b7dcadd597.journal
    448048: tag/entry realtime timestamp out of synchronization                                                                   
    File corruption detected at /var/log/journal/da8648fdf8e84058a7870135ef77c294/system.journal:448048 (of 8388608 bytes, 53%).  
    FAIL: /var/log/journal/da8648fdf8e84058a7870135ef77c294/system.journal (Bad message)
Comment 17 Franck Bui 2020-12-14 12:55:42 UTC
I reported this issue to upstream with some additional info: https://github.com/systemd/systemd/issues/17833
Comment 23 Oliver Kurz 2021-03-11 06:05:31 UTC
This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: extra_tests_in_textmode
https://openqa.opensuse.org/tests/1319492

To prevent further reminder comments one of the following options should be followed:
1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
2. The openQA job group is moved to "Released"
3. The label in the openQA scenario is removed