Bugzilla – Bug 1225352
[Build 13.199] openQA test fails in prepare_firstboot: RPi3 not booting?
Last modified: 2024-06-18 06:52:07 UTC
The Leap 15.5 JeOS image fails to boot in openQA for some time now. Initially it looked to me like an issue with the test infra, but the TW and 15.6 images boot while 15.5 fails consistently. I tried to reproduce the issue locally and except for a slow boot due to no ethernet cable it worked. The package diff between last working and latest build shows changes in aaa_base, coreutils, kernel, less, protobuf, libsemanage, perl, rpm and yast. Could be a kernel issue? ## Observation openQA test in scenario opensuse-15.5-JeOS-for-RPi-aarch64-jeos@RPi3 fails in [prepare_firstboot](https://openqa.opensuse.org/tests/4217406/modules/prepare_firstboot/steps/1) ## Test suite description Maintainer: fvogt, mnowak Start JeOS from the HDD image, configure it using the firstboot wizard and then run basic tests. console=tty0 added as needed for aarch64. ## Reproducible Fails since (at least) Build [13.183](https://openqa.opensuse.org/tests/4120363) ## Expected result Last good: [13.182](https://openqa.opensuse.org/tests/4119085) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=aarch64&distri=opensuse&flavor=JeOS-for-RPi&machine=RPi3&test=jeos&version=15.5)
Could you have a look what the SUT is doing when the test fails?
I would guess a problem with network. I will have deeper look.
Created attachment 875139 [details] 15.5 boot log on serial (from openQA run)
Created attachment 875148 [details] 15.5 serial boot log with a longer timeout With a longer timeout, it manages to boot until login prompt and starts jeos-fisrtboot on serial. I think jeos-fisrtboot should not be started as it blocks the start of the ssh server.
(In reply to Guillaume GARDET from comment #4) > Created attachment 875148 [details] > 15.5 serial boot log with a longer timeout > > With a longer timeout, it manages to boot until login prompt and starts > jeos-fisrtboot on serial. > > I think jeos-fisrtboot should not be started as it blocks the start of the > ssh server. Why does it start jeos-firstboot? AFAIK it shouldn't be enabled in this image? In my local test run it did not.
(In reply to Fabian Vogt from comment #5) > (In reply to Guillaume GARDET from comment #4) > > Created attachment 875148 [details] > > 15.5 serial boot log with a longer timeout > > > > With a longer timeout, it manages to boot until login prompt and starts > > jeos-fisrtboot on serial. > > > > I think jeos-fisrtboot should not be started as it blocks the start of the > > ssh server. > > Why does it start jeos-firstboot? AFAIK it shouldn't be enabled in this > image? > > In my local test run it did not. Ah no, sorry, an additional boot from a Tumbleweed test polluted the serial log. It hanged after: ********** [ OK ] Listening on Load/Save RF …itch Status /dev/rfkill Watch. Starting Security Auditing Service... Starting Rebuild Journal Catalog... [ OK ] Finished Commit a transient machine-id on disk. Starting Load/Save RF Kill Switch Status... **********
(In reply to Guillaume GARDET from comment #6) > (In reply to Fabian Vogt from comment #5) > > (In reply to Guillaume GARDET from comment #4) > > > Created attachment 875148 [details] > > > 15.5 serial boot log with a longer timeout > > > > > > With a longer timeout, it manages to boot until login prompt and starts > > > jeos-fisrtboot on serial. > > > > > > I think jeos-fisrtboot should not be started as it blocks the start of the > > > ssh server. > > > > Why does it start jeos-firstboot? AFAIK it shouldn't be enabled in this > > image? > > > > In my local test run it did not. > > Ah no, sorry, an additional boot from a Tumbleweed test polluted the serial > log. > > It hanged after: > ********** > [ OK ] Listening on Load/Save RF …itch Status /dev/rfkill Watch. > Starting Security Auditing Service... > Starting Rebuild Journal Catalog... > [ OK ] Finished Commit a transient machine-id on disk. > Starting Load/Save RF Kill Switch Status... > ********** Did it completely hang or just take a long time? Can you get the full journal? Smells like a kernel issue.
(In reply to Fabian Vogt from comment #7) > (In reply to Guillaume GARDET from comment #6) > > (In reply to Fabian Vogt from comment #5) > > > (In reply to Guillaume GARDET from comment #4) > > > > Created attachment 875148 [details] > > > > 15.5 serial boot log with a longer timeout > > > > > > > > With a longer timeout, it manages to boot until login prompt and starts > > > > jeos-fisrtboot on serial. > > > > > > > > I think jeos-fisrtboot should not be started as it blocks the start of the > > > > ssh server. > > > > > > Why does it start jeos-firstboot? AFAIK it shouldn't be enabled in this > > > image? > > > > > > In my local test run it did not. > > > > Ah no, sorry, an additional boot from a Tumbleweed test polluted the serial > > log. > > > > It hanged after: > > ********** > > [ OK ] Listening on Load/Save RF …itch Status /dev/rfkill Watch. > > Starting Security Auditing Service... > > Starting Rebuild Journal Catalog... > > [ OK ] Finished Commit a transient machine-id on disk. > > Starting Load/Save RF Kill Switch Status... > > ********** > > Did it completely hang or just take a long time? Can you get the full > journal? Looks like a hang. I waited more than 10 min after the boot and the serial was unresponsive after those lines. (No login prompt) I will try to get more traces on serial.
*** Bug 1225787 has been marked as a duplicate of this bug. ***
To echo the comments from bug #1225787 - kernel 5.14.21-150500.55.52-default was the last known working kernel from me - same system does not boot with kernel-default-5.14.21-150500.55.59.1.aarch64
Ping. Raising severity as well.
I tested today for both RPi 3 and 4, no issue at all. https://paste.opensuse.org/pastes/6733655a325c or Mac-mini:.ssh Zaoliang$ ssh zaoliang@192.168.8.187 The authenticity of host '192.168.8.187 (192.168.8.187)' can't be established. ED25519 key fingerprint is SHA256:xWM/Nt3SdkZ7W1YyOnG4FEbB/WdAXrGDMlGfFopQWYM. This key is not known by any other names. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added '192.168.8.187' (ED25519) to the list of known hosts. (zaoliang@192.168.8.187) Password: Have a lot of fun... zaoliang@localhost:~> cat /etc/os-release NAME="openSUSE Leap" VERSION="15.5" ID="opensuse-leap" ID_LIKE="suse opensuse" VERSION_ID="15.5" PRETTY_NAME="openSUSE Leap 15.5" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:opensuse:leap:15.5" BUG_REPORT_URL="https://bugs.opensuse.org" HOME_URL="https://www.opensuse.org/" DOCUMENTATION_URL="https://en.opensuse.org/Portal:Leap" LOGO="distributor-logo-Leap"
openSUSE-Leap-15.5-ARM-JeOS-raspberrypi.aarch64-2023.03.31-Build13.217.raw.xz is used.
I checked this on another device, in this case RPi 400. It boots up from grub menu, but never reached prompt or desktop. This is quite strange.
Hi, is there at least one report of non-booting Rpi4 or does it hangs only on Rpi3? Furthermore, doers runnign it on CM4 make any difference wrt running it on straight model B? Thanks
(In reply to Andrea della Porta from comment #15) > Hi, is there at least one report of non-booting Rpi4 or does it hangs only > on Rpi3? Furthermore, doers runnign it on CM4 make any difference wrt > running it on straight model B? > > Thanks I'm not aware of any issues on RPi 4, but on my RPi 3 the image works fine so it might just be random...
Rpi4 works just fine, rpi3 is hanging somewhat randomly. Some investigation is needed.
For the record, I'm testing openSUSE-Leap-15.5-ARM-JeOS-raspberrypi.aarch64-2023.03.31-Build13.217.raw.xz.
(In reply to Robert Munteanu from comment #10) > To echo the comments from bug #1225787 > > - kernel 5.14.21-150500.55.52-default was the last known working kernel from > me > - same system does not boot with > kernel-default-5.14.21-150500.55.59.1.aarch64 5.14.21-150500.55.52-default does not work for me either. May I ask you how did you test the older kernel? Did you just burn an older Leap raw image on SD or did you just downgrade the kernel via commandline with something like: zypper install --oldpackage kernel-default=5.14.21-150500.55.52.1 Many thanks
(In reply to Andrea della Porta from comment #19) > (In reply to Robert Munteanu from comment #10) > > To echo the comments from bug #1225787 > > > > - kernel 5.14.21-150500.55.52-default was the last known working kernel from > > me > > - same system does not boot with > > kernel-default-5.14.21-150500.55.59.1.aarch64 > > 5.14.21-150500.55.52-default does not work for me either. May I ask you how > did you test the older kernel? Did you just burn an older Leap raw image on > SD or did you just downgrade the kernel via commandline with something like: > > zypper install --oldpackage kernel-default=5.14.21-150500.55.52.1 > > Many thanks 5.14.21-150500.55.65-default(In reply to Andrea della Porta from comment #19) > (In reply to Robert Munteanu from comment #10) > > To echo the comments from bug #1225787 > > > > - kernel 5.14.21-150500.55.52-default was the last known working kernel from > > me > > - same system does not boot with > > kernel-default-5.14.21-150500.55.59.1.aarch64 > > 5.14.21-150500.55.52-default does not work for me either. May I ask you how > did you test the older kernel? Did you just burn an older Leap raw image on > SD or did you just downgrade the kernel via commandline with something like: > > zypper install --oldpackage kernel-default=5.14.21-150500.55.52.1 > > Many thanks openSUSE-Leap-15.5-ARM-JeOS-raspberrypi.aarch64-2023.03.31-Build13.217.raw.xz is working fine, 5.14.21-150500.55.65-default. maybe an issue with SD card?
> openSUSE-Leap-15.5-ARM-JeOS-raspberrypi.aarch64-2023.03.31-Build13.217.raw. > xz is working fine, 5.14.21-150500.55.65-default. you tested it on rpi4 or also on rpi3?
(In reply to Andrea della Porta from comment #21) > > > openSUSE-Leap-15.5-ARM-JeOS-raspberrypi.aarch64-2023.03.31-Build13.217.raw. > > xz is working fine, 5.14.21-150500.55.65-default. > > you tested it on rpi4 or also on rpi3? yes, both.
(In reply to Andrea della Porta from comment #19) > (In reply to Robert Munteanu from comment #10) > > To echo the comments from bug #1225787 > > > > - kernel 5.14.21-150500.55.52-default was the last known working kernel from > > me > > - same system does not boot with > > kernel-default-5.14.21-150500.55.59.1.aarch64 > > 5.14.21-150500.55.52-default does not work for me either. May I ask you how > did you test the older kernel? Did you just burn an older Leap raw image on > SD or did you just downgrade the kernel via commandline with something like: > > zypper install --oldpackage kernel-default=5.14.21-150500.55.52.1 > > Many thanks I had the old kernel installed, it was not cleaned up. There was no manipulation of the SD card or reinstallation.
FWIW, this is fixed by upgrading the system to openSUSE Leap 15.6 which updates the kernel to version 6.4.x.
I confirm that build 13.224 does not work (hangs as expected by the ticket), just like 13.217: in fact they share the exact same kernel. Some further info: adding modprobe.blacklist=vc4 avoid the issue and let the Rpi3 boot correctly (albeit without monitor support). On rpi4, everything is ok and vc4 module does not hang. Also, on rpi3, this is the ftrace stack when it hangs (on vc4 module only: "echo '*:mod:vc4' > set_ftrace_filter"): 2) | vc4_drm_register [vc4]() { 2) + 12.813 us | vc4_hvs_dev_probe [vc4](); 2) + 30.521 us | vc4_hdmi_dev_probe [vc4](); 2) + 54.844 us | vc4_vec_dev_probe [vc4](); 2) + 21.146 us | vc4_txp_probe [vc4](); 2) + 18.855 us | vc4_crtc_dev_probe [vc4](); 2) 7.656 us | vc4_crtc_dev_prob there's report in this ticket that 15.6 works (I've not tried it by myself yet), and from 15.5 (kernel 5.14) to 15.6 (kernel 6.4) there's plenty of kernel commits related to vc4, two of which (at least) may be worth our attention since solve crashes: 797d72ce8e0f8 and c86b41214362e8e. Reassigning to the HW enablement team. Many thanks, Andrea
At a quick glance this might be a mismatch between kernel and DTB. Perhaps the RPi firmware package changed (or needs to be updated)? The firmware lives on the SD-card so I'm not sure how this is updated in QA. Ivan, since you know the platform better, do you have any thoughts?
Devicetree's for RPi's are provided by raspberrypi-firmware-dt package. Which hasn't been updated in a while(5 months).