Bugzilla – Bug 1221307
[Build 5.32] clocksource warning appears after VM is paused in QEMU
Last modified: 2024-03-28 16:49:10 UTC
Created attachment 873448 [details] journal ## Observation openQA test in scenario sle-15-SP6-JeOS-for-kvm-and-xen-x86_64-jeos-apparmor@uefi-virtio-vga fails in [journal_check](https://openqa.suse.de/tests/13713261/modules/journal_check/steps/9) A journal scan founded a warning message * Mar 05 00:30:28.028244 localhost kernel: clocksource: Long readout interval, skipping watchdog check: cs_nsec: 18210140184 wd_nsec: 512016532 * that appears in openQA tests, when openQA initiates creation of a VM snapshot. I have tried manually boot SLES15-SP6-Minimal-VM.x86_64-kvm-and-xen-Build5.32.qcow2 image in QEMU, paused the VM for several seconds, resumed and the message was triggered. I am not sure whether this is expected, or it can possibly be a symptom of a more severe issue. openQA log: [2024-03-05T01:30:27.987090+01:00] [debug] [pid:102123] ||| finished grub2_gfxmode jeos (runtime: 1 s) [2024-03-05T01:30:27.987847+01:00] [debug] [pid:102123] Creating a VM snapshot lastgood [2024-03-05T01:30:27.988261+01:00] [debug] [pid:102313] EVENT {"data":{"offset":-1,"qom-path":"\/machine\/unattached\/device[8]"},"event":"RTC_CHANGE","timestamp":{"microseconds":34814,"seconds":1709598324}} [2024-03-05T01:30:27.988342+01:00] [debug] [pid:102313] EVENT {"data":{"offset":-1,"qom-path":"\/machine\/unattached\/device[8]"},"event":"RTC_CHANGE","timestamp":{"microseconds":156075,"seconds":1709598324}} [2024-03-05T01:30:27.988424+01:00] [debug] [pid:102313] EVENT {"data":{"path":"\/machine\/peripheral-anon\/device[3]\/virtio-backend"},"event":"NIC_RX_FILTER_CHANGED","timestamp":{"microseconds":809687,"seconds":1709598382}} [2024-03-05T01:30:27.988491+01:00] [debug] [pid:102313] EVENT {"data":{"offset":-1,"qom-path":"\/machine\/unattached\/device[8]"},"event":"RTC_CHANGE","timestamp":{"microseconds":115460,"seconds":1709598397}} [2024-03-05T01:30:27.988577+01:00] [debug] [pid:102313] Saving snapshot (Current VM state is running) [2024-03-05T01:30:28.012499+01:00] [debug] [pid:102313] EVENT {"event":"STOP","timestamp":{"microseconds":12257,"seconds":1709598628}} [2024-03-05T01:30:28.020851+01:00] [debug] [pid:102313] <<< consoles::serial_screen::peak() [2024-03-05T01:30:28.020966+01:00] [info] [pid:102313] ::: consoles::serial_screen::peak: Peaked 8 bytes [2024-03-05T01:30:28.038599+01:00] [debug] [pid:102313] blockdev-snapshot-sync({'arguments' => {'format' => 'qcow2','node-name' => 'hd0-overlay0','snapshot-file' => '/var/lib/openqa/pool/22/raid/hd0-overlay1','snapshot-node-name' => 'hd0-overlay1'},'execute' => 'blockdev-snapshot-sync'}) -> {'return' => {}} [2024-03-05T01:30:28.042213+01:00] [debug] [pid:102313] blockdev-snapshot-sync({'arguments' => {'format' => 'qcow2','node-name' => 'pflash-code-overlay0','snapshot-file' => '/var/lib/openqa/pool/22/raid/pflash-code-overlay1','snapshot-node-name' => 'pflash-code-overlay1'},'execute' => 'blockdev-snapshot-sync'}) -> {'error' => {'class' => 'GenericError','desc' => 'Cannot find device=\'\' nor node-name=\'pflash-code-overlay0\''}} [2024-03-05T01:30:28.051676+01:00] [debug] [pid:102313] blockdev-snapshot-sync({'arguments' => {'device' => 'pflash-code-overlay0','format' => 'qcow2','snapshot-file' => '/var/lib/openqa/pool/22/raid/pflash-code-overlay1','snapshot-node-name' => 'pflash-code-overlay1'},'execute' => 'blockdev-snapshot-sync'}) -> {'return' => {}} [2024-03-05T01:30:28.053846+01:00] [debug] [pid:102313] blockdev-snapshot-sync({'arguments' => {'format' => 'qcow2','node-name' => 'pflash-vars-overlay0','snapshot-file' => '/var/lib/openqa/pool/22/raid/pflash-vars-overlay1','snapshot-node-name' => 'pflash-vars-overlay1'},'execute' => 'blockdev-snapshot-sync'}) -> {'error' => {'class' => 'GenericError','desc' => 'Cannot find device=\'\' nor node-name=\'pflash-vars-overlay0\''}} [2024-03-05T01:30:28.062631+01:00] [debug] [pid:102313] blockdev-snapshot-sync({'arguments' => {'device' => 'pflash-vars-overlay0','format' => 'qcow2','snapshot-file' => '/var/lib/openqa/pool/22/raid/pflash-vars-overlay1','snapshot-node-name' => 'pflash-vars-overlay1'},'execute' => 'blockdev-snapshot-sync'}) -> {'return' => {}} [2024-03-05T01:30:28.070529+01:00] [debug] [pid:102313] EVENT {"data":{"status":"setup"},"event":"MIGRATION","timestamp":{"microseconds":70303,"seconds":1709598628}} [2024-03-05T01:30:28.571045+01:00] [debug] [pid:102313] EVENT {"data":{"pass":1},"event":"MIGRATION_PASS","timestamp":{"microseconds":127074,"seconds":1709598628}} [2024-03-05T01:30:28.571204+01:00] [debug] [pid:102313] EVENT {"data":{"status":"active"},"event":"MIGRATION","timestamp":{"microseconds":127166,"seconds":1709598628}} [2024-03-05T01:30:28.571759+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:28.571826+01:00] [debug] [pid:102313] Migrating remaining bytes: 1063395328 [2024-03-05T01:30:29.072531+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:29.072636+01:00] [debug] [pid:102313] Migrating remaining bytes: 1043554304 [2024-03-05T01:30:29.573312+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:29.573391+01:00] [debug] [pid:102313] Migrating remaining bytes: 1014689792 [2024-03-05T01:30:30.074163+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:30.074241+01:00] [debug] [pid:102313] Migrating remaining bytes: 989831168 [2024-03-05T01:30:30.574959+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:30.575032+01:00] [debug] [pid:102313] Migrating remaining bytes: 963866624 [2024-03-05T01:30:31.075697+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:31.075763+01:00] [debug] [pid:102313] Migrating remaining bytes: 940011520 [2024-03-05T01:30:31.576416+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:31.576486+01:00] [debug] [pid:102313] Migrating remaining bytes: 919400448 [2024-03-05T01:30:32.077131+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:32.077199+01:00] [debug] [pid:102313] Migrating remaining bytes: 901033984 [2024-03-05T01:30:32.577803+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:32.577876+01:00] [debug] [pid:102313] Migrating remaining bytes: 884588544 [2024-03-05T01:30:33.078546+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:33.078612+01:00] [debug] [pid:102313] Migrating remaining bytes: 867962880 [2024-03-05T01:30:33.579267+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:33.579346+01:00] [debug] [pid:102313] Migrating remaining bytes: 855597056 [2024-03-05T01:30:34.080037+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:34.080107+01:00] [debug] [pid:102313] Migrating remaining bytes: 841744384 [2024-03-05T01:30:34.580777+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:34.580862+01:00] [debug] [pid:102313] Migrating remaining bytes: 826003456 [2024-03-05T01:30:35.081509+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:35.081575+01:00] [debug] [pid:102313] Migrating remaining bytes: 808022016 [2024-03-05T01:30:35.582262+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:35.582339+01:00] [debug] [pid:102313] Migrating remaining bytes: 794443776 [2024-03-05T01:30:36.083036+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:36.083105+01:00] [debug] [pid:102313] Migrating remaining bytes: 771125248 [2024-03-05T01:30:36.583796+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:36.583875+01:00] [debug] [pid:102313] Migrating remaining bytes: 757317632 [2024-03-05T01:30:37.084614+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:37.084686+01:00] [debug] [pid:102313] Migrating remaining bytes: 739586048 [2024-03-05T01:30:37.585385+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:37.585455+01:00] [debug] [pid:102313] Migrating remaining bytes: 719380480 [2024-03-05T01:30:38.086159+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:38.086236+01:00] [debug] [pid:102313] Migrating remaining bytes: 696000512 [2024-03-05T01:30:38.586904+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:38.586970+01:00] [debug] [pid:102313] Migrating remaining bytes: 673296384 [2024-03-05T01:30:39.087651+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:39.087720+01:00] [debug] [pid:102313] Migrating remaining bytes: 651853824 [2024-03-05T01:30:39.588395+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:39.588481+01:00] [debug] [pid:102313] Migrating remaining bytes: 635936768 [2024-03-05T01:30:40.089124+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:40.089191+01:00] [debug] [pid:102313] Migrating remaining bytes: 428040192 [2024-03-05T01:30:40.589866+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:40.589959+01:00] [debug] [pid:102313] Migrating remaining bytes: 413315072 [2024-03-05T01:30:41.090632+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:41.090699+01:00] [debug] [pid:102313] Migrating remaining bytes: 395087872 [2024-03-05T01:30:41.591402+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:41.591475+01:00] [debug] [pid:102313] Migrating remaining bytes: 372772864 [2024-03-05T01:30:42.092176+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:42.092243+01:00] [debug] [pid:102313] Migrating remaining bytes: 196149248 [2024-03-05T01:30:42.592907+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:42.592974+01:00] [debug] [pid:102313] Migrating remaining bytes: 133181440 [2024-03-05T01:30:43.093590+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:43.093658+01:00] [debug] [pid:102313] Migrating remaining bytes: 112726016 [2024-03-05T01:30:43.594320+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:43.594387+01:00] [debug] [pid:102313] Migrating remaining bytes: 93110272 [2024-03-05T01:30:44.095043+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:44.095109+01:00] [debug] [pid:102313] Migrating remaining bytes: 65236992 [2024-03-05T01:30:44.595780+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:44.595847+01:00] [debug] [pid:102313] Migrating remaining bytes: 42442752 [2024-03-05T01:30:45.096539+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:45.096617+01:00] [debug] [pid:102313] Migrating remaining bytes: 17788928 [2024-03-05T01:30:45.596919+01:00] [debug] [pid:102313] EVENT {"data":{"pass":2},"event":"MIGRATION_PASS","timestamp":{"microseconds":278348,"seconds":1709598645}} [2024-03-05T01:30:45.597026+01:00] [debug] [pid:102313] EVENT {"data":{"pass":3},"event":"MIGRATION_PASS","timestamp":{"microseconds":327458,"seconds":1709598645}} [2024-03-05T01:30:45.597110+01:00] [debug] [pid:102313] EVENT {"data":{"status":"completed"},"event":"MIGRATION","timestamp":{"microseconds":382518,"seconds":1709598645}} [2024-03-05T01:30:45.597246+01:00] [debug] [pid:102313] Migrating total bytes: 1084956672 [2024-03-05T01:30:45.597314+01:00] [debug] [pid:102313] Migrating remaining bytes: 0 [2024-03-05T01:30:45.597875+01:00] [debug] [pid:102313] Snapshot complete [2024-03-05T01:30:45.710598+01:00] [debug] [pid:102313] EVENT {"event":"RESUME","timestamp":{"microseconds":710448,"seconds":1709598645}} [2024-03-05T01:30:45.720737+01:00] [debug] [pid:102313] QEMU: Formatting '/var/lib/openqa/pool/22/raid/hd0-overlay1', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=25769803776 backing_file=/var/lib/openqa/pool/22/raid/hd0-overlay0 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16 [2024-03-05T01:30:45.720826+01:00] [debug] [pid:102313] QEMU: Formatting '/var/lib/openqa/pool/22/raid/pflash-code-overlay1', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1966080 backing_file=/var/lib/openqa/pool/22/raid/pflash-code-overlay0 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16 [2024-03-05T01:30:45.720916+01:00] [debug] [pid:102313] QEMU: Formatting '/var/lib/openqa/pool/22/raid/pflash-vars-overlay1', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=131072 backing_file=/var/lib/openqa/pool/22/raid/pflash-vars-overlay0 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16 [2024-03-05T01:30:45.720927+01:00] [debug] [pid:102123] ||| starting diskusage tests/jeos/diskusage.pm ## Test suite description Apparmor test suite for JeOS. It is enabled from 15-SP4, where JeOS (MinimalVM) comes with apparmor enabled by default. ## Reproducible Fails since (at least) Build [4.44](https://openqa.suse.de/tests/13452158) ## Expected result Last good: [4.23](https://openqa.suse.de/tests/13450480) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=JeOS-for-kvm-and-xen&machine=uefi-virtio-vga&test=jeos-apparmor&version=15-SP6)
Created attachment 873449 [details] journal_tw Same warning has been spotted in TW as well
This is a recent fix from Jiri in the upstream commit 644649553508b9bacf0fc7a5bdc4f9e0165576a5 clocksource: Skip watchdog check for large watchdog intervals It's been backported to SLE15-SP6 branch. Adding Jiri to Cc.
(In reply to Martin Loviska from comment #0) > I have tried manually boot > SLES15-SP6-Minimal-VM.x86_64-kvm-and-xen-Build5.32.qcow2 image in QEMU, > paused the VM for several seconds, resumed and the message was triggered. > I am not sure whether this is expected, or it can possibly be a symptom of a > more severe issue. Considering that the VM was paused, it is expected. The kernel runs a clocksource watchdog every 500 milliseconds. The tsc clocksource is checked by the kvm-clock clocksource. The clocksources available on the system: > $ grep 'PM: RTC\|clocksource' journal_clock_warning.log > Mar 05 00:25:44.753828 localhost kernel: clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns > Mar 05 00:25:44.754036 localhost kernel: clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns > Mar 05 00:25:44.754065 localhost kernel: clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1fb63109b96, max_idle_ns: 440795265316 ns > Mar 05 00:25:44.754339 localhost kernel: clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns > Mar 05 00:25:44.754356 localhost kernel: PM: RTC time: 00:25:43, date: 2024-03-05 > Mar 05 00:25:44.766554 localhost kernel: clocksource: Switched to clocksource kvm-clock > Mar 05 00:25:44.766659 localhost kernel: clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns > Mar 05 00:25:44.767349 localhost kernel: clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1fb63109b96, max_idle_ns: 440795265316 ns The clocksource watchdog measures the time interval since the last watchdog execution. The error notifies the user that one watchdog check was skipped because the measured interval was too long for either the clocksource being checked (cs_nsec) or the watchdog clocksource (wd_nsec): > Mar 05 00:30:28.028244 localhost kernel: clocksource: Long readout interval, skipping watchdog check: cs_nsec: 18210140184 wd_nsec: 512016532 The clocksource watchdog on the tsc clocksource will keep running after reporting the message. 644649553508 ("clocksource: Skip watchdog check for large watchdog intervals") makes sure that the margin used for evaluating time skew is adequate by limiting the range of intervals that are checked. The tsc clocksource would be marked as unstable (and watchdog checks stopped) without the patch. The marking of the tsc clocksource as unstable has no bearing on kvm-clock, which is the currently active clocksource. The "Long readout interval, skipping watchdog check" is of little significance when VMs are paused or migrated.
Is this coming soon? We have a SR deadline on 17.04.
I do not think this can be fixed. The TSC counter is a piece hardware. When a VM is paused it makes no sense to pause the TSC counter as well. The behaviour is expected. A suitable workaround to prevent checks and the message would be passing tsc=nowatchdog to the kernel.