Bug 954908 - Boot after installation with AutoYaST hangs
Boot after installation with AutoYaST hangs
Status: VERIFIED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Network
Current
Other Other
: P2 - High : Major (vote)
: ---
Assigned To: Stefan Schubert
E-mail List
https://trello.com/c/w7QH2Pgx
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-13 08:45 UTC by Fabian Vogt
Modified: 2021-03-09 12:25 UTC (History)
11 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
y2log (468.38 KB, application/gzip)
2015-11-16 12:41 UTC, Fabian Vogt
Details
AutoYaST XML (12.92 KB, text/xml)
2015-11-16 12:45 UTC, Fabian Vogt
Details
mt-test1.tgz -- my test results (313.31 KB, application/x-gzip-compressed)
2015-12-02 13:00 UTC, Marius Tomaschewski
Details
screenshot of hanging reboot (16.55 KB, image/png)
2015-12-09 07:43 UTC, Thomas Blume
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Vogt 2015-11-13 08:45:19 UTC
After installing the latest tumbleweed snapshot with an AutoYaST XML,
the system hangs on the first boot during the YaST start:

> Probing connected terminal...
> 
> Initializing virtual console...
>                                                                        
> Found a Linux console terminal on /dev/console (122 columns x 55 lines).

and nothing happens.
Adding " 3" to the kernel cmdline and running "systemctl disable" for YaST2-Firstboot.service and Yast2-Second-Stage.service makes it boot again.
Comment 1 Stefan Schubert 2015-11-16 12:08:06 UTC
Any log file and the corresponding autoinst.xml ?
Please use save_y2logs for generating the logfile.
Comment 2 Fabian Vogt 2015-11-16 12:41:25 UTC
Created attachment 656051 [details]
y2log
Comment 3 Fabian Vogt 2015-11-16 12:45:11 UTC
Created attachment 656052 [details]
AutoYaST XML
Comment 4 Stefan Schubert 2015-12-01 08:55:32 UTC
Thanks, I can reproduce the bug. The installation workflow hangs while starting the network via "rcnetwork start". So I assume that is around wickedd.
Comment 5 Thomas Renninger 2015-12-01 16:46:25 UTC
Pawel, Marius.
Could someone have a look at this one, please.

Schubi: Afaik this only happens with normal auto installation.
Can you provide a hint how to run into this and how to debug this remotely if possible or does one have to reproduce on a local machine and then switch to an available console?
Comment 6 Marius Tomaschewski 2015-12-02 11:02:12 UTC
(In reply to Stefan Schubert from comment #4)
> Thanks, I can reproduce the bug. The installation workflow hangs while
> starting the network via "rcnetwork start".

I cannot see anything like this -- where? I cannot see any usable log
files attached -- there is not even y2log (only ./YaST2/y2log-1.gz).

> So I assume that is around wickedd.

I don't think so. I'd is about systemd dependencies in Firstboot.service
and Yast2-Second-Stage.service services (causing some loop/deadlock).

IMO there was already an another (SLE-12-SP1) bug about. [Further, yast2
should IMO use a Yast2-Second-Stage.target, so the system boots into this
special target. When the configuration is done, boot default.target and
not break start dependencies scheduled in systemd, but canceled due to
random calls of "systemctl restart --ignore-dependencies ..." by yast2.]

(In reply to Thomas Renninger from comment #5)
> Pawel, Marius.
> Could someone have a look at this one, please.
> 
> Schubi: Afaik this only happens with normal auto installation.
> Can you provide a hint how to run into this and how to debug this remotely
> if possible or does one have to reproduce on a local machine and then switch
> to an available console?

Add systemd.log_level=debug systemd.log_target=kmsg to kernel boot parameters
call "journalctl -b -o short-precise > journal.txt" and attach journal.txt.

To get a wicked logs, set WICKED_DEBUG=all in /etc/sysconfig/network/config.
Comment 8 Stefan Schubert 2015-12-02 11:49:17 UTC
(In reply to Marius Tomaschewski from comment #6)
> (In reply to Stefan Schubert from comment #4)
> > Thanks, I can reproduce the bug. The installation workflow hangs while
> > starting the network via "rcnetwork start".
> 
> I cannot see anything like this -- where? I cannot see any usable log
> files attached -- there is not even y2log (only ./YaST2/y2log-1.gz).

That's because YaST is not really started at all.
Just see y2start where it is hanging

> 
> > So I assume that is around wickedd.
> 
> I don't think so. I'd is about systemd dependencies in Firstboot.service
> and Yast2-Second-Stage.service services (causing some loop/deadlock).
> 
> IMO there was already an another (SLE-12-SP1) bug about. [Further, yast2
> should IMO use a Yast2-Second-Stage.target, so the system boots into this
> special target. When the configuration is done, boot default.target and
> not break start dependencies scheduled in systemd, but canceled due to
> random calls of "systemctl restart --ignore-dependencies ..." by yast2.]
> 
> (In reply to Thomas Renninger from comment #5)
> > Pawel, Marius.
> > Could someone have a look at this one, please.
> > 
> > Schubi: Afaik this only happens with normal auto installation.
> > Can you provide a hint how to run into this and how to debug this remotely
> > if possible or does one have to reproduce on a local machine and then switch
> > to an available console?
> 
> Add systemd.log_level=debug systemd.log_target=kmsg to kernel boot parameters
> call "journalctl -b -o short-precise > journal.txt" and attach journal.txt.
> 
> To get a wicked logs, set WICKED_DEBUG=all in /etc/sysconfig/network/config.

I think that's a question to the reporter too:-)
Comment 9 Fabian Vogt 2015-12-02 11:55:50 UTC
I'm not entirely sure how to run those commands if no shell is running...
Adjusting needinfo to thomas, as he has access to the test environment.
Comment 10 Stefan Schubert 2015-12-02 12:10:56 UTC
(In reply to Fabian Vogt from comment #9)
> I'm not entirely sure how to run those commands if no shell is running...
> Adjusting needinfo to thomas, as he has access to the test environment.

With <Alt> <cursor right> you can switch to a console.
Comment 11 Fabian Vogt 2015-12-02 13:00:26 UTC
Those are empty, no getty@ service is running IIRC.
Comment 12 Marius Tomaschewski 2015-12-02 13:00:57 UTC
Created attachment 658084 [details]
mt-test1.tgz -- my test results

As visible in ps axfwww output, systemctl started tty-ask hooks for some reason.
Comment 13 Marius Tomaschewski 2015-12-02 14:11:04 UTC
The "rcnetwork start" -> "systemctl start --full network.service" job:

  725 ttyS0    Ss+    0:00 /bin/bash /usr/lib/YaST2/startup/YaST2.Second-Stage
 2815 ttyS0    S+     0:00  \_ /bin/bash /sbin/rcnetwork start
 2839 ttyS0    S+     0:00      \_ systemctl start --full network.service
 2845 ttyS0    S+     0:00          \_ /usr/bin/systemd-tty-ask-password-agent --watch
 2846 ttyS0    Sl+    0:00          \_ /usr/bin/pkttyagent --notify-fd 5 --fallback

has been merged with the normal job scheduled to start at boot:

Dec 02 13:45:10.992806 linux-jh53 systemd[1]: wicked.service: Installed new job wicked.service/start as 199
[...]
Dec 02 13:45:31.332172 linux-jh53 systemd[1]: wicked.service: Trying to enqueue job wicked.service/start/replace
Dec 02 13:45:31.332195 linux-jh53 systemd[1]: wicked.service: Merged into installed job wicked.service/start as 199
Dec 02 13:45:31.332216 linux-jh53 systemd[1]: network.target: Merged into installed job network.target/start as 202
Dec 02 13:45:31.332239 linux-jh53 systemd[1]: wicked.service: Enqueued job wicked.service/start as 199

_after_ YaST2-Firstboot and YaST2-Second-Stage.service as defined by their
"Before=...network.service" ordering definition.
Comment 15 Stefan Schubert 2015-12-03 13:12:34 UTC
I have tried Marius suggestions but the result are the same.
Marius which information is needed to go on ?
Comment 16 Marius Tomaschewski 2015-12-04 07:51:32 UTC
(In reply to Stefan Schubert from comment #15)
> I have tried Marius suggestions but the result are the same.
> Marius which information is needed to go on ?

This is a question for systemd-maintainers how to solve this loop properly.
I just wrote what I _think_ how it could work, but I'm not a systemd expert.
Comment 17 Stefan Schubert 2015-12-04 08:56:46 UTC
Added Ancor and Martin to that bug. They have made some changes in the past
around that area. Perhaps the have an idea too. :-)
Comment 18 Dr. Werner Fink 2015-12-04 09:45:49 UTC
(In reply to Marius Tomaschewski from comment #16)

Dependency rules are not systemd specific, it is ordinary math, that is to be exact Directed Acyclic Graph Theory.  This is the same of what physicists mean with the order of Cause and Effect.

The problem is normally to identify within a dependency chain that it remain acyclic.

With the list-dependencies command of the systemctyl command it should be possible to detect the loop:

       list-dependencies NAME
           Shows required and wanted units of the specified unit. If no
           unit is specified, default.target is implied. Target units are
           recursively expanded. When --all is passed, all other units are
           recursively expanded as well.

       --reverse
           Show reverse dependencies between units with list-dependencies,
          i.e. units with dependencies of type Wants= or Requires= on the
          given unit.

       --after, --before
           Show which units are started after or before with
           list-dependencies, respectively.

So it is up on the reporter to show him/her self and us the dependency chains to be able to search for a loop.
Comment 19 Martin Vidner 2015-12-04 09:50:13 UTC
"`systemd-analyze dump` outputs a (usually very long) human-readable serialization of the complete server state."
It helped me debug the problem with the serial console (bsc#935965), maybe it helps here too.
Comment 21 Thomas Blume 2015-12-09 07:43:30 UTC
Created attachment 658788 [details]
screenshot of hanging reboot

See attached screenshot.
The loop is here:

YaST2-Second-Stage service executes: rcnetwork start
This gets translated to a start of wicked.
But wicked is waiting for YaST2-Second-Stage service to finish.

IMHO, YaST2-Second-Stage shouldn't do an explicit network start.
Instead some dependencies should be added that starts YaST2-Second-Stage.service after wicked.
Comment 22 Thomas Blume 2015-12-09 12:27:32 UTC
(In reply to Thomas Blume from comment #21)
> IMHO, YaST2-Second-Stage shouldn't do an explicit network start.
> Instead some dependencies should be added that starts
> YaST2-Second-Stage.service after wicked.

After killing the network start processes (pid 3915 and 3938 in the screenshot), the installation finishes and the network is up.
Comment 24 Stefan Schubert 2015-12-17 09:29:31 UTC
current state:
- I have removed all after and before sections from the YaST2*.service files
  and network has worked. So I can confirm that it is a YAST problem and
  not a wickedd problem.
- I have taken older versions of YaST2*.service and have been able to 
  reproduce this error. So, the changes made by Martin and Ancor in the past
  have no influence here.

going on....
Comment 25 Stefan Schubert 2015-12-17 15:05:16 UTC
I have cleaned up the dependencies. But all cases has to be tested.
Currently I have tested a ssh installation.

These fixes are needed:

https://github.com/yast/yast-installation/pull/332

https://github.com/yast/yast-autoinstallation/pull/173
Comment 26 Stefan Schubert 2016-01-11 15:29:38 UTC
I have tested ssh, vnc and normal installation even under SLES12 and have not seen any error.
Comment 27 Stefan Schubert 2016-01-11 15:31:32 UTC
merged with master
Comment 28 Lars Marowsky-Bree 2016-01-19 11:05:16 UTC
Just confirming that this works for me now too again (using virt-install and autoyast2). Thanks!