Bugzilla – Bug 1155545
[LiveInst] Upgrade from Leap 15.0 using the live upgrade fails
Last modified: 2020-01-21 14:30:07 UTC
Created attachment 822884 [details] YaST logs (from the assets page) I see two issues: - It runs "grub2-editenv list" apparently in the wrong context (/ is livecd) - It thinks the system was booted with EFI, but it wasn't (EFI: 0 in install.inf) ## Observation openQA test in scenario opensuse-Tumbleweed-KDE-Live-x86_64-kde_live_upgrade_leap_15.0@64bit fails in [disable_grub_timeout](https://openqa.opensuse.org/tests/1068933/modules/disable_grub_timeout/steps/14) ## Test suite description Uses the live installer on the kde live media for upgrading the system. +QEMURAM=2048 is necessary as the HDD_1 is only available for aarch64/64-bit and more RAM is necessary. ## Reproducible Fails since (at least) Build [20190921](https://openqa.opensuse.org/tests/1038441) ## Expected result Last good: (unknown) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=KDE-Live&machine=64bit&test=kde_live_upgrade_leap_15.0&version=Tumbleweed)
*** Bug 1155515 has been marked as a duplicate of this bug. ***
Fails for Leap 15.2 now as well, while it worked two weeks ago.
Josef, could you check it? BTW keep in mind that the Live installer is a normal installer, there is just a small wrapper. But it contains some EFI related workaround, see https://build.opensuse.org/package/view_file/openSUSE:Factory/live-net-installer/start-install.sh?expand=1
(In reply to Ladislav Slezák from comment #3) > Josef, could you check it? BTW keep in mind that the Live installer is a > normal installer, there is just a small wrapper. But it contains some EFI > related workaround, see > https://build.opensuse.org/package/view_file/openSUSE:Factory/live-net- > installer/start-install.sh?expand=1 well, that code only set EFI when there is already efi variable. So nothing strange. I will check failed tests.
(In reply to Fabian Vogt from comment #0) > Created attachment 822884 [details] > YaST logs (from the assets page) > > I see two issues: > > - It runs "grub2-editenv list" apparently in the wrong context (/ is livecd) yes, as SCR is not yet switched. We probably need to switch it for reading bootloader there for selected target disk. > - It thinks the system was booted with EFI, but it wasn't (EFI: 0 in > install.inf) > Well, bootloader use own method to detect EFI for live installation ( do not ask me why, but it is there already for very long time) see https://github.com/yast/yast-bootloader/blob/master/src/lib/bootloader/bootloader_factory.rb#L91 So can you check on your live system what happens when you do? Thanks modprobe efivars ls /sys/firmware/efi/systab
(In reply to Josef Reidinger from comment #5) > (In reply to Fabian Vogt from comment #0) > > Created attachment 822884 [details] > > YaST logs (from the assets page) > > > > I see two issues: > > > > - It runs "grub2-editenv list" apparently in the wrong context (/ is livecd) > > yes, as SCR is not yet switched. We probably need to switch it for reading > bootloader there for selected target disk. Likely. When the error message appears, "chroot /mnt grub2-editenv list" works fine. > > - It thinks the system was booted with EFI, but it wasn't (EFI: 0 in > > install.inf) > > > > Well, bootloader use own method to detect EFI for live installation ( do not > ask me why, but it is there already for very long time) > > see > https://github.com/yast/yast-bootloader/blob/master/src/lib/bootloader/ > bootloader_factory.rb#L91 > > So can you check on your live system what happens when you do? Thanks > > modprobe efivars > ls /sys/firmware/efi/systab /sys/firmware/efi doesn't exist after modprobe.
(In reply to Fabian Vogt from comment #6) > (In reply to Josef Reidinger from comment #5) > > > - It thinks the system was booted with EFI, but it wasn't (EFI: 0 in > > > install.inf) > > > > > > > Well, bootloader use own method to detect EFI for live installation ( do not > > ask me why, but it is there already for very long time) > > > > see > > https://github.com/yast/yast-bootloader/blob/master/src/lib/bootloader/ > > bootloader_factory.rb#L91 Note that "Mode.live_installation" is not true for the live net installation AFAIK...
It looks like this is now broken in Leap 15.1 as well - so a maintenance update caused a regression there. This makes this a more serious bug now as it blocks updates of 15.1 images. https://openqa.opensuse.org/tests/1089084#step/disable_grub_timeout/14
Please remind me where I can find a FATE or JIRA entry for this KDE live installation thing. I can't remember ever seeing requirements or anything. IIRC we had a live installer many, many years ago, and then it was dropped because there was a neverending flood of problems. How exactly was that thing revived? And who exactly decided that the YaST team now has to deal with all those problems (that were almost all problems of the "broken by design" kind)? And how come problems in that area are now mysteriously uprated to P2?
(In reply to Stefan Hundhammer from comment #10) > Please remind me where I can find a FATE or JIRA entry for this KDE live > installation thing. I can't remember ever seeing requirements or anything. That's because there isn't. > IIRC we had a live installer many, many years ago, and then it was dropped > because there was a neverending flood of problems. How exactly was that > thing revived? This has absolutely nothing to do with "yast2-live-installation". This is running the plain YaST installation, from a live system instead of installation-images.
I see that openQA test booting into a KDE live system. https://openqa.opensuse.org/tests/1068933#step/opensuse_welcome/5
(In reply to Stefan Hundhammer from comment #12) > I see that openQA test booting into a KDE live system. > > https://openqa.opensuse.org/tests/1068933#step/opensuse_welcome/5 That's the live system I'm referring to: > This is running the plain YaST installation, from a live system instead of installation-images.
(In reply to Fabian Vogt from comment #11) > (In reply to Stefan Hundhammer from comment #10) > > IIRC we had a live installer many, many years ago, and then it was dropped > > because there was a neverending flood of problems. How exactly was that > > thing revived? > > This has absolutely nothing to do with "yast2-live-installation". This is > running the plain YaST installation, from a live system instead of > installation-images. So, Fabian, who took the decision to just use the Installation/Upgrade tool and used that in a completely unsupported scenario? Who takes the responsibility for bugs that logically have to happen just because the well-defined installation environment is silently replaced with a running system? Yes, this is open-source, so we definitely support when someone takes the code and uses it somewhere else. But is a bit problematic to take and fix all bugs that happen because of that. The team is not inflatable, we seem to have far more important tasks to do. Is this the new major way how to install openSUSE? And what do we drop instead? Jiri, do we have anything about supporting this scenario? We usually have a FATE/JIRA, some maintenance plan, ... is there anything? Thanks in advance.
(In reply to Lukas Ocilka from comment #14) > (In reply to Fabian Vogt from comment #11) > > (In reply to Stefan Hundhammer from comment #10) > > > IIRC we had a live installer many, many years ago, and then it was dropped > > > because there was a neverending flood of problems. How exactly was that > > > thing revived? > > > > This has absolutely nothing to do with "yast2-live-installation". This is > > running the plain YaST installation, from a live system instead of > > installation-images. > > So, Fabian, who took the decision to just use the Installation/Upgrade tool > and used that in a completely unsupported scenario? Who takes the > responsibility for bugs that logically have to happen just because the > well-defined installation environment is silently replaced with a running > system? > > Yes, this is open-source, so we definitely support when someone takes the > code > and uses it somewhere else. But is a bit problematic to take and fix all > bugs > that happen because of that. The team is not inflatable, we seem to have far > more important tasks to do. Is this the new major way how to install > openSUSE? > And what do we drop instead? > > Jiri, do we have anything about supporting this scenario? We usually have > a FATE/JIRA, some maintenance plan, ... is there anything? Thanks in advance. I'm not aware of any item to track it. While the idea of live installation and upgrade is nice, it is hardly possible to expect that it will work out-of-the-box as it puts the YaST installer to a completely new environment. If anyone wants to get it working, he will need to do quite some changes in the existing YaST code. And while the YaST team should help (review and merge submit request, point to the code, explaining design decisions etc.) noone can expect that they will just go after the bugs and provide fixes. I would expect whoever decided to introduce the scenario to review all steps and provide necessary patches (including unit tests in order to avoid future regressions). This installation workflow in this environment was never discussed with the YaST team (correct me if I'm wrong), therefore expecting the YaST team (which has other duties too) to fix bugs of completely new scenario is quite unfair until such an agreement is made first.
(In reply to Lukas Ocilka from comment #14) > (In reply to Fabian Vogt from comment #11) > > (In reply to Stefan Hundhammer from comment #10) > > > IIRC we had a live installer many, many years ago, and then it was dropped > > > because there was a neverending flood of problems. How exactly was that > > > thing revived? > > > > This has absolutely nothing to do with "yast2-live-installation". This is > > running the plain YaST installation, from a live system instead of > > installation-images. > > So, Fabian, who took the decision to just use the Installation/Upgrade tool > and used that in a completely unsupported scenario? Who takes the > responsibility for bugs that logically have to happen just because the > well-defined installation environment is silently replaced with a running > system? > > Yes, this is open-source, so we definitely support when someone takes the > code > and uses it somewhere else. But is a bit problematic to take and fix all > bugs > that happen because of that. This has been used since Leap 15.0, and without any major problems so far, just the issue with YaST breaking network during install a while ago. The (trivial) support for upgrades was added by https://build.opensuse.org/request/show/617833 and made visible as desktop icon after openQA tests passed. > The team is not inflatable, we seem to have far > more important tasks to do. Is this the new major way how to install > openSUSE? There are quite a few users who download the live media and expect an installer on it. > And what do we drop instead? No dropping should be necessary - the difference in environments should be neglegible, anything else is a bug in the environment. > Jiri, do we have anything about supporting this scenario? We usually have > a FATE/JIRA, some maintenance plan, ... is there anything? Thanks in advance.
(In reply to Fabian Vogt from comment #16) > The (trivial) support for upgrades was added by > https://build.opensuse.org/request/show/617833 > and made visible as desktop icon after openQA tests passed. It says: --- cut --- This is just a proof of concept I wanted to register somewhere, even if it's in a rejected SR. ... Moreover, performing a system upgrade from the live image is something that would need quite some testing that I have not done. --- cut --- Proof of concept --> best effort There was no bugzila nr or FATE nr in that submission so I guess no request existed. Ancor, do you have some, PLS? > No dropping should be necessary - the difference in environments should be > neglegible, anything else is a bug in the environment. The point is that we 1. Don't plan our new features (or bugfixes) having this in mind 2. We don't test it 3. We don't document it
(In reply to Lukas Ocilka from comment #17) > > (In reply to Fabian Vogt from comment #11) > > There was no bugzila nr or FATE nr in that submission so I guess no request > existed. > > Ancor, do you have some, PLS? I can't find any. My memory is faulty, but I'm pretty sure it was not my idea, just something I submitted because somebody asked whether it was possible. But I cannot find the original conversation. > > No dropping should be necessary - the difference in environments should be > > neglegible, anything else is a bug in the environment. I don't follow the reasoning here. 1) We develop the installer for a known and controlled environment (the standard int-sys). 2) Then you introduce a new environment (the live image) claiming that things should work and, if not, it's a bug in the environment. 3) Then turns out that, although the YaST team has been trying to help when possible and reasonable, the whole thing actually does not work as smooth as expected. 4) But instead of fixing the buggy environment (that was the premise, "difference in environments should be neglegible, anything else is a bug in the environment") we end up with a bunch of bug reports about thing we have to adapt in the YaST side for it to work in this new environment. That was not the deal. As a cherry on top, now the installation and upgrade process from the live image seems to be considered critical to release a new Tumbleweed snapshot. So something that was on our side "hey, let's support that guy in his brave crazy idea" has step by step become "it's our responsibility and it's urgent when it breaks".
The way I see it, the live net installer has multiple advantages: * Using this, several bugs in YaST code were found and subsequently fixed * Substantial gain for not much effort * Other distros have installable live CDs, it's what users expect I regularly use it myself for debugging, as having a more complete environment available on the same system is quite valuable. (In reply to Ancor Gonzalez Sosa from comment #18) > 3) Then turns out that, although the YaST team has been trying to help when > possible and reasonable, the whole thing actually does not work as smooth as > expected. > > 4) But instead of fixing the buggy environment (that was the premise, > "difference in environments should be neglegible, anything else is a bug in > the environment") we end up with a bunch of bug reports about thing we have > to adapt in the YaST side for it to work in this new environment. That was > not the deal. Please tell me which bugs in the environment you're talking about here, to my knowledge there aren't any open ones. The only adaption that had to happen AFAIK (but only because of another bug, YaST touching the outside /run), was to ignore wicked setup if NM is running.
(In reply to Fabian Vogt from comment #19) > > > > 4) But instead of fixing the buggy environment (that was the premise, > > "difference in environments should be neglegible, anything else is a bug in > > the environment") we end up with a bunch of bug reports about thing we have > > to adapt in the YaST side for it to work in this new environment. That was > > not the deal. First of all, let me clarify something. I talked about "buggy environment" as a direct application of your own sentence: "anything else is a bug in the environment". As far as I understood that sentence, it means that if something works in the normal official installation but it does not work in the live media, it ought to be considered a bug in the live media itself, not in YaST. > Please tell me which bugs in the environment you're talking about here, to > my knowledge there aren't any open ones. Let's perform a very quick search to find some bugs about something being broken in the live media, although it works in the normal int-sys (there are many more examples... including several ones that has been fixed in the YaST side): - bug#1151291 - bug#1151148 - bug#1089823 - bug#1059298 - bug#1155516 - bug#1155687 - bug#1157686 None of those are a problem when the installer runs in the official int-sys, which is the environment for which we develop and test the installer. The problems only arise when executed in that different environment that is the live media. That's why I dared to call them "bugs in the environment" as a direct application of the rule above. All those bugs are currently assigned to yast2-maintainers. So you can say they are not bugs in the environment, but things the YaST Team should fix by modifying YaST. But, as said, that was not the deal.
(In reply to Ancor Gonzalez Sosa from comment #20) > All those bugs are currently assigned to yast2-maintainers. So you can say > they are not bugs in the environment, but things the YaST Team should fix by > modifying YaST. But, as said, that was not the deal. Just to clarify my words once more. The YaST Team wants to continue being supportive with the idea of running the net installer on top of live media. It's indeed a neat initiative. But "being supportive" has meant, so far: - allocating resources only when other priorities allow it - doing only changes that don't risk the supported scenarios - making no commitment about keeping it working in the future We simply fear the expectations on us has changed recently (little by little) without we having been part of any conversation about it.
Shortly back to the original topic: I debugged this a bit and found the culprit. To make the proposal, the bootloader module ends up comparing BootloaderFactory.current with the bootloader type of the target system: https://github.com/yast/yast-bootloader/blob/8630995161b572eac13326e38400438cfb5d500d/src/lib/bootloader/proposal_client.rb#L199 current is defined as (system || proposed), which means that if there is a system bootloader detected, it tries to migrate to that one instead of performing an upgrade. In the live environment, /etc/sysconfig/bootloader has LOADER_TYPE set to grub2-efi. So this is effectively just another manifestation of bnc#874646: https://github.com/yast/yast-bootloader/blob/8630995161b572eac13326e38400438cfb5d500d/src/lib/bootloader/proposal_client.rb#L181 As LOADER_TYPE on the live medium is meaningless anyway, I'll try to just remove that line. Then system evaluates to nil and current == proposed, which works as expected: https://openqa.opensuse.org/tests/1104031 I did notice that upgrade installation fails in the bootloader proposal (also tested the standard NET .iso) if the driver for the /boot drive changed. In the first test I accidentally used the QEMU default of ata_piix instead of virtio-blk used by openQA, so YaST aborted the upgrade with "Unknown udev device '/dev/disk/by-path/pci-0000:00:08.0'". Does that count as a bug? If so, I'll file a report with logs.
(In reply to Ancor Gonzalez Sosa from comment #20) > (In reply to Fabian Vogt from comment #19) > > > > > > 4) But instead of fixing the buggy environment (that was the premise, > > > "difference in environments should be neglegible, anything else is a bug in > > > the environment") we end up with a bunch of bug reports about thing we have > > > to adapt in the YaST side for it to work in this new environment. That was > > > not the deal. > > First of all, let me clarify something. I talked about "buggy environment" > as a direct application of your own sentence: "anything else is a bug in the > environment". As far as I understood that sentence, it means that if > something works in the normal official installation but it does not work in > the live media, it ought to be considered a bug in the live media itself, > not in YaST. If it's something that can be fixed in the environment, yes. Like /etc/install.inf contents or YaST calling "extend" for feature checks, etc. > > Please tell me which bugs in the environment you're talking about here, to > > my knowledge there aren't any open ones. > > Let's perform a very quick search to find some bugs about something being > broken in the live media, although it works in the normal int-sys (there are > many more examples... including several ones that has been fixed in the YaST > side): > > - bug#1151291 Yup - this was clearly caused by a major difference in the environment and discussed earlier. Arguably YaST should avoid to modify the outside system's /run, that might cause issues in the inst-sys as well. > - bug#1151148 Cannot possibly be fixed in the environment, it only worked at all because the inst-sys doesn't have an rpm database. IMO a YaST bug which the live installer uncovered. > - bug#1089823 I opened this report after debugging the issue and fixing the environment. It's about YaST completely ignoring the failure of snapper during installation, no matter the environment. > - bug#1059298 That bug didn't really have much to do with YaST - it did exactly what it was supposed to. A workaround/fix filtering out the unexpected cmdline entries was added to live-net-installer. > - bug#1155516 That's actually a dup of boo#993885 - long known, not that nice to fix, but harmless. > - bug#1155687 I haven't seen that one yet - seems to be more fallout from the /run bind-mount. Fixed live-net-installer is building currently. > - bug#1157686 That happened in the inst-sys as well - the entire snapshot was broken. > None of those are a problem when the installer runs in the official int-sys, > which is the environment for which we develop and test the installer. The > problems only arise when executed in that different environment that is the > live media. That's why I dared to call them "bugs in the environment" as a > direct application of the rule above. Looking at the kind of bugs listed above, the fixes (excluding the network one, which was a known topic back when the live installer was introduced, but didn't cause any fatal issues except for an empty dialog until recently) are mostly just relatively small. So the main blocker here is that it's simply not tested before the full TW snapshot openQA, leading to the for you more frustrating reports of severe regressions. So having it tested continuously would improve this situation quite a bit for everyone involved here AFAICT. That should be doable. It's on my ToDo list already to test (JeOS, MicroOS, Live)-Images in stagings already for a while. > All those bugs are currently assigned to yast2-maintainers. So you can say > they are not bugs in the environment, but things the YaST Team should fix by > modifying YaST. But, as said, that was not the deal. Yes - please CC/Assign me if such bugs end up assigned to you and seem to be caused by the environment. (In reply to Ancor Gonzalez Sosa from comment #21) > (In reply to Ancor Gonzalez Sosa from comment #20) > > All those bugs are currently assigned to yast2-maintainers. So you can say > > they are not bugs in the environment, but things the YaST Team should fix by > > modifying YaST. But, as said, that was not the deal. > > Just to clarify my words once more. The YaST Team wants to continue being > supportive with the idea of running the net installer on top of live media. > It's indeed a neat initiative. > > But "being supportive" has meant, so far: > > - allocating resources only when other priorities allow it > - doing only changes that don't risk the supported scenarios > - making no commitment about keeping it working in the future > > We simply fear the expectations on us has changed recently (little by > little) without we having been part of any conversation about it. It's easier to explain: Until recently, the live installer worked really well, so that the sudden failures were more noticeable and unexpected :-)
(In reply to Fabian Vogt from comment #22) > Shortly back to the original topic: > > I debugged this a bit and found the culprit. > > To make the proposal, the bootloader module ends up comparing > BootloaderFactory.current with the bootloader type of the target system: > https://github.com/yast/yast-bootloader/blob/ > 8630995161b572eac13326e38400438cfb5d500d/src/lib/bootloader/proposal_client. > rb#L199 > > current is defined as (system || proposed), which means that if there is a > system bootloader detected, it tries to migrate to that one instead of > performing an upgrade. In the live environment, /etc/sysconfig/bootloader > has LOADER_TYPE set to grub2-efi. > > So this is effectively just another manifestation of bnc#874646: > https://github.com/yast/yast-bootloader/blob/ > 8630995161b572eac13326e38400438cfb5d500d/src/lib/bootloader/proposal_client. > rb#L181 > > As LOADER_TYPE on the live medium is meaningless anyway, I'll try to just > remove that line. Then system evaluates to nil and current == proposed, > which works as expected: https://openqa.opensuse.org/tests/1104031 > > I did notice that upgrade installation fails in the bootloader proposal > (also tested the standard NET .iso) if the driver for the /boot drive > changed. In the first test I accidentally used the QEMU default of ata_piix > instead of virtio-blk used by openQA, so YaST aborted the upgrade with > "Unknown udev device '/dev/disk/by-path/pci-0000:00:08.0'". Does that count > as a bug? If so, I'll file a report with logs. Sadly this driver thing is unavoidable. It is basically issue with feature to have persistent names ( so resistent to kernel name switching between e.g. sda and sdb ). But udev names is often generated using various inputs like kernel names. General speaking bootloader use quite complex strategy to pick the most stable udev link [1], but for disk options are quite limited and that driver change cause troubles. And without udev it is not much better as I remember issues with vda x sda x hda names. So I worry there is not much we can do when kernel driver change and udev name also changed. [1] https://github.com/yast/yast-bootloader/blob/master/src/lib/bootloader/udev_mapping.rb#L39
(In reply to Josef Reidinger from comment #25) > > So I worry there is not much we can do when kernel driver change and udev > name also changed. Should we then close this as WONTFIX?
(In reply to Ancor Gonzalez Sosa from comment #26) > (In reply to Josef Reidinger from comment #25) > > > > So I worry there is not much we can do when kernel driver change and udev > > name also changed. > > Should we then close this as WONTFIX? well, if Fabian did fix he mentions in comment#22 I think it can be closed as FIXED ( live env should not have set bootloader type to force proposal for installation ).
(In reply to Josef Reidinger from comment #27) > (In reply to Ancor Gonzalez Sosa from comment #26) > > (In reply to Josef Reidinger from comment #25) > > > > > > So I worry there is not much we can do when kernel driver change and udev > > > name also changed. > > > > Should we then close this as WONTFIX? > > well, if Fabian did fix he mentions in comment#22 I think it can be closed > as FIXED ( live env should not have set bootloader type to force proposal > for installation ). Well, reassigning then.
(In reply to Ancor Gonzalez Sosa from comment #28) > (In reply to Josef Reidinger from comment #27) > > (In reply to Ancor Gonzalez Sosa from comment #26) > > > (In reply to Josef Reidinger from comment #25) > > > > > > > > So I worry there is not much we can do when kernel driver change and udev > > > > name also changed. > > > > > > Should we then close this as WONTFIX? > > > > well, if Fabian did fix he mentions in comment#22 I think it can be closed > > as FIXED ( live env should not have set bootloader type to force proposal > > for installation ). > > Well, reassigning then. IMO it's still a bug that YaST even reads /etc/sysconfig of the installation system in a upgrade scenario, but sure.
This is an autogenerated message for OBS integration: This bug (1155545) was mentioned in https://build.opensuse.org/request/show/763986 15.1 / livecd-openSUSE
This is an autogenerated message for OBS integration: This bug (1155545) was mentioned in https://build.opensuse.org/request/show/764359 15.1 / livecd-openSUSE
This is an autogenerated message for OBS integration: This bug (1155545) was mentioned in https://build.opensuse.org/request/show/766079 15.1 / livecd-openSUSE