Bug 1129476

Summary: Shutdown takes too much time when lvmetad doesn't respond to SIGKILL
Product: [openSUSE] openSUSE Distribution Reporter: Bruno Pesavento <mail>
Component: BasesystemAssignee: heming zhao <heming.zhao>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P2 - High CC: axel.wein, evaessen, heming.zhao, mischa.salle, slindomansilla, terjejhanssen
Version: Leap 15.1   
Target Milestone: ---   
Hardware: x86-64   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Journal for a correct reboot
Journal for a reboot delayed 1m30s after "Reached target Shutdown"
# journalctl -b -1 | egrep -i 'ailed|atchd|boot|down' | egrep -vi 'sata|bootl'

Description Bruno Pesavento 2019-03-16 14:49:53 UTC
Created attachment 800285 [details]
Journal for a correct reboot

Often, but not always, a delay is seen at shutdown, apparently limited to 1m30s by a timeout, on a pretty default Gnome install on an "Optimus" laptop.
Seen on all tested builds of 15.1beta, currently at 429.2
Seems a regression, not seen in Leap 15.0 or previous openSUSE IIRC.
Please find attached the journal for a correct reboot and for a delayed one, but I don't see any meaningful cue.
Feel free to ask for additional logs or tests, usual suspects like USB, external filesystems, network, Plymouth not apparently involved.
Comment 1 Bruno Pesavento 2019-03-16 14:51:03 UTC
Created attachment 800286 [details]
Journal for a reboot delayed 1m30s after "Reached target Shutdown"
Comment 2 Bruno Pesavento 2019-03-19 09:18:49 UTC
I can reproduce the bug also with lightdm+lxqt desktop (3 out of 4 tries) so the problem is not limited to the Gnome desktop apparently.
But I was unable to reproduce it booting to a console so far (multi user target), so it might be related to graphical targets.
Comment 3 Bruno Pesavento 2019-04-24 21:20:13 UTC
Trying to debug this problem, I saw that the problem doesn't show up if booting with the following sequence.
> boot to multi-user.target (just adding '3' to the boot command line)
> login as root and issue 'systemctl isolate graphical.target' on VT1
> login as normal desktop user via GDM (desktop shows on VT2)
> logout as 'root' from VT1
> work on the desktop (VT2) as usual, reboot or shutdown from desktop when finished.
To me that looks like something goes wrong in the 'default' boot sequence, maybe the graphical target is started too soon on this HW (fast SSD  and i7 4700HQ here) and starting it manually a few seconds later sorts things out?
Tried that 5 times with no problem; tried again the 'default' boot directly to graphical desktop and the nasty 1m30s delay at shutdown was still there...
Hope this helps somebody make sense of this annoying situation.
Comment 4 Ed Vaessen 2019-05-28 17:24:14 UTC
I had the same problem.
Among many lines, the shutdown log showed these:

[ 77.019608] systemd-journald[473]: systemd-journald stopped as pid 473
[ 167.020886] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[ 167.027376] systemd-shutdown[1]: Sending SIGKILL to PID 484 (lvmetad). 

As I do not need the LVM, I deleted the following packages:

liblvm2app2_2
liblvm2cmd2_02
lvm2

The service has vanished and shutdown is no longer delayed now.
Comment 5 Bruno Pesavento 2019-05-30 14:26:22 UTC
Apparently setting the lvm2-monitor service to start "manually" and possibly stopping the lvm2-lvmtad and lvm2-lvmpolld services seems to be enough to solve the problem.
Done 10 reboots without problems, I would have seen 6 or 7 delayed reboots with the default services config.
BTW, updated to the current released version in the meantime.

lvm as the culprit seems to be confirmed by at least one other user, see:
https://forums.opensuse.org/showthread.php/536145-OpenSuSE-Leap-15-1-shutdown-reboot-delay-of-90-seconds
Comment 6 Felix Miata 2019-08-19 06:21:31 UTC
Created attachment 814397 [details]
# journalctl -b -1 | egrep -i 'ailed|atchd|boot|down' | egrep -vi 'sata|bootl'

Not stopping watchdog I'm seeing painfully often. This last is from a 15.1 installation that had last been zypper up'd two months ago, running kernel 28.7, updated on the boot to 28.13. I removed the three lvm2 rpms, and shutdown became like Fedora's normal, virtually instant.

The previous instance of watchdog not stopping and long shutdown delay was same PC booted to 15.0 only minutes before first boot of the day to 15.1.

Neither of these produced a countdown timer reporting delay as often appears stopping a service that never started in the first place.
Comment 7 Terje J. Hanssen 2019-08-21 20:24:34 UTC
I confirm that the same "slow shutdown" and also "slow reboot" are the case with 15.1 on my Dell XPS 13 ultrabook, Asus and Supermicro workstations. After the console message "Reached target shutdown" there is a long delay that caused me first to think the system hung.

Only "init 0" from the command line power down the machines immediately after "Reached target shutdown".
Comment 8 Mischa Salle 2019-08-26 08:25:29 UTC
the better solution is the one-line fix from https://github.com/lvmteam/lvm2/issues/17 since it works *with* LVM.
I had the same issue (on a new 15.1 and actually using LVM so can't just remove the package) and this fixes it.

Note that bugs https://bugzilla.opensuse.org/show_bug.cgi?id=1096241  and 
https://bugzilla.opensuse.org/show_bug.cgi?id=1142587 (both for Tumbleweed) are most probably the same.

The required upstream patch can be found at https://sourceware.org/git/?p=lvm2.git;a=blobdiff;f=scripts/lvm2_lvmetad_systemd_red_hat.service.in;h=960f32dab714e09012a63e2bd23f4261be34c655;hp=92e6d695f157a28739f37a1afafdd2f471466a8e;hb=0a726a7e268b31856615491809af73bda5d4d6f9;hpb=b79f1e176f013167ca9798efb55eaf048d64e042
Comment 9 heming zhao 2019-11-06 03:14:04 UTC
Hello 

bug #1096241 (tb), bug #1129476 (leap15.1), bug #1142587 (tb)& bug #1155668 (sles15sp1) are same. 

I will close bug:
1096241 "resolved->fix" <== just mark as fix
1129476 "resolved->dup->1096241"
1142587 "resolved->dup->1096241"
1155668 "resolved->fix" <== use this bug No to fix

Opensuse leap15.1 will auto backport from sles15sp1. Please wait.
Opensuse tumbleweed lvm2 version is 2.3.05+, which has already contained these patches. So it doesn't need to fix.

Thanks.

*** This bug has been marked as a duplicate of bug 1096241 ***
Comment 10 Sergio Lindo Mansilla 2020-07-27 15:17:01 UTC
Reopening as resolving https://bugzilla.suse.com/show_bug.cgi?id=1096241 didn't resolve the problem for Leap 15.1

See: https://bugzilla.suse.com/show_bug.cgi?id=1096241#c43
Comment 12 Axel Wein 2020-08-16 16:08:38 UTC
I think I have the same issue with Leap 15.2 (German). But it is more extreme. The shutdown often has a delay of 1:30 m, yes. Starting up uses 2:40 m (openSUSE 13.2 it was 1:25 m on the same machine).

I can't delete LVM!

systemd-analyze plot shows about 1:31 m for initrd. But the terrible thing is, that booting up (not only rebooting) very often ends with "Administratorpasswort für Wartungszwecke eingeben (oder drücken Sie Strg + D, um fortzufahren)" - so I can't use the OS really.

I'm not so firm with Linux yet, that I could change the initrd like it is described above.
Comment 13 heming zhao 2020-08-17 01:06:01 UTC
(In reply to Axel Wein from comment #12)
> I think I have the same issue with Leap 15.2 (German). But it is more
> extreme. The shutdown often has a delay of 1:30 m, yes. Starting up uses
> 2:40 m (openSUSE 13.2 it was 1:25 m on the same machine).
> 
> I can't delete LVM!
> 
> systemd-analyze plot shows about 1:31 m for initrd. But the terrible thing
> is, that booting up (not only rebooting) very often ends with
> "Administratorpasswort für Wartungszwecke eingeben (oder drücken Sie Strg +
> D, um fortzufahren)" - so I can't use the OS really.
> 
> I'm not so firm with Linux yet, that I could change the initrd like it is
> described above.

this bug title related with lvmetad. in leap 15.2, lvm2 doesn't have lvmetad code. you boot/shutdown issue is another issue. if you very concern this issue, please file a new bug. 

when you file a bug, please upload below info.
- supportconfig

- systemd-analyze plot

- system booting/rebooting or shutting down log.

```
add kernel cmdline: splash=verbose systemd.show_status=0 rd.break=pre-shutdown rd.break=shutdown
delete "spash=silent quiet"

using ctrl-d to break hung when booting/rebooting/shutingdown.
```
Comment 14 heming zhao 2020-09-01 06:42:32 UTC
hello all,

For c#12, please check c#13 comment.

from c#11, I think this bug can be fixed completely, and close this bug as fixed.