Bugzilla – Bug 1225150
[Build 6.71] MinimalVM combustion fails to unmount /sysroot/dev/shm in AMD workers
Last modified: 2024-06-12 07:07:12 UTC
Created attachment 875056 [details] Intel worker ok ## Observation openQA test in scenario sle-15-SP6-JeOS-for-kvm-and-xen-x86_64-jeos-main-combustion@64bit-virtio-vga fails in [image_checks](https://openqa.suse.de/tests/14337468/modules/image_checks/steps/2) ## Reproducible Fails since (at least) Build [6.37](https://openqa.suse.de/tests/14104622) ## Expected result Last good: [6.39](https://openqa.suse.de/tests/14094642) (or more recent) ## Further details I've seen that the tests runs properly in Intel machines, but fails in AMD ones. After analyzing both logs, I've get to the conclusion that there's something preventing the filesystem /sysroot/dev/shm to unmount, which causes combustion script to fail and leads into emergency mode. Attached you can find both `journalctl --no-pager` from a NOK AMD worker and an OK Intel one.
Created attachment 875057 [details] AMD worker NOK
Might be similar to https://bugzilla.suse.com/show_bug.cgi?id=1222411
Please try with a "wait" command at the end of the combustion script.
I've created a new combustion image with `sleep 5` at the end and the tests have started to pass. Which would be the proper way to proceed? https://openqa.suse.de/tests/14442171#
(In reply to Pablo Herranz Ramírez from comment #4) > I've created a new combustion image with `sleep 5` at the end and the tests > have started to pass. Which would be the proper way to proceed? > > https://openqa.suse.de/tests/14442171# Have you tried with "wait"?
I've tried `wait` alone but the test fails. Do I need to specify the PID of the job? https://openqa.suse.de/tests/14454009#
(In reply to Pablo Herranz Ramírez from comment #6) > I've tried `wait` alone but the test fails. Do I need to specify the PID of > the job? > > https://openqa.suse.de/tests/14454009# Does it fail with the same error? "wait" without arguments should wait for all jobs. I just realized that this can't really work by design, as tee only quits once the script has finished, but the script waits for tee to quit... That should result in a deadlock though, not failure. What happens with "exec 1>&- 2>&-; wait"?
Yes, that works fine :) https://openqa.suse.de/tests/14454510#
(In reply to Pablo Herranz Ramírez from comment #8) > Yes, that works fine :) > > https://openqa.suse.de/tests/14454510# Great, can you do that 3x to make sure it's not a random success? If this works, I'll add it to the combustion README and we should probably mention it in the product documentation as well.
There's 5/10 jobs failing, but this seems like a different failure. I'll go on investigating tomorrow: ``` 1 job has been created: - sle-15-SP6-JeOS-for-kvm-and-xen-x86_64-Build6.73-jeos-main-combustion@64bit-virtio-vga -> https://openqa.suse.de/tests/14454945 1 job has been created: - sle-15-SP6-JeOS-for-kvm-and-xen-x86_64-Build6.73-jeos-main-combustion@64bit-virtio-vga -> https://openqa.suse.de/tests/14454946 1 job has been created: - sle-15-SP6-JeOS-for-kvm-and-xen-x86_64-Build6.73-jeos-main-combustion@64bit-virtio-vga -> https://openqa.suse.de/tests/14454947 1 job has been created: - sle-15-SP6-JeOS-for-kvm-and-xen-x86_64-Build6.73-jeos-main-combustion@64bit-virtio-vga -> https://openqa.suse.de/tests/14454948 1 job has been created: - sle-15-SP6-JeOS-for-kvm-and-xen-x86_64-Build6.73-jeos-main-combustion@64bit-virtio-vga -> https://openqa.suse.de/tests/14454949 1 job has been created: - sle-15-SP6-JeOS-for-kvm-and-xen-x86_64-Build6.73-jeos-main-combustion@64bit-virtio-vga -> https://openqa.suse.de/tests/14454950 1 job has been created: - sle-15-SP6-JeOS-for-kvm-and-xen-x86_64-Build6.73-jeos-main-combustion@64bit-virtio-vga -> https://openqa.suse.de/tests/14454951 1 job has been created: - sle-15-SP6-JeOS-for-kvm-and-xen-x86_64-Build6.73-jeos-main-combustion@64bit-virtio-vga -> https://openqa.suse.de/tests/14454952 1 job has been created: - sle-15-SP6-JeOS-for-kvm-and-xen-x86_64-Build6.73-jeos-main-combustion@64bit-virtio-vga -> https://openqa.suse.de/tests/14454953 1 job has been created: - sle-15-SP6-JeOS-for-kvm-and-xen-x86_64-Build6.73-jeos-main-combustion@64bit-virtio-vga -> https://openqa.suse.de/tests/14454954 ```
I've restarted them 10 and now the tests are all green. Seems like some sporadic issue (ssh-ing to a s390x machine?!?!) was going on yesterday. The fix suggested by @fvogt works like a charm: https://openqa.suse.de/tests/14461909 https://openqa.suse.de/tests/14461910 https://openqa.suse.de/tests/14461911 https://openqa.suse.de/tests/14461912 https://openqa.suse.de/tests/14461913 https://openqa.suse.de/tests/14461914 https://openqa.suse.de/tests/14461915 https://openqa.suse.de/tests/14461916 https://openqa.suse.de/tests/14461917 https://openqa.suse.de/tests/14461918
(In reply to Pablo Herranz Ramírez from comment #11) > I've restarted them 10 and now the tests are all green. Seems like some > sporadic issue (ssh-ing to a s390x machine?!?!) was going on yesterday. Maybe a conflict with VNC ports? > The fix suggested by @fvogt works like a charm: > > https://openqa.suse.de/tests/14461909 > https://openqa.suse.de/tests/14461910 > https://openqa.suse.de/tests/14461911 > https://openqa.suse.de/tests/14461912 > https://openqa.suse.de/tests/14461913 > https://openqa.suse.de/tests/14461914 > https://openqa.suse.de/tests/14461915 > https://openqa.suse.de/tests/14461916 > https://openqa.suse.de/tests/14461917 > https://openqa.suse.de/tests/14461918 Perfect! Reassigning to documentation. Can you please mention in the combustion sections that the script should ensure that all processes complete before it ends, like this: # Close outputs and wait for tee to finish. exec 1>&- 2>&-; wait; IMO it's mostly a workaround until tukit handles this better but it's good practice anyway so it can be recommended in general.
Thank you for reporting this bug! It is being tracked and processed as part of our queue.
Fixed by: https://github.com/SUSE/doc-modular/pull/343