Bugzilla – Bug 1203566
[Build 21.1] openQA test fails in ibft - smartctl - i /dev/sda failed for 'mandatory smart cmd failure'
Last modified: 2023-01-09 10:10:38 UTC
## Observation openQA test in scenario sle-15-SP5-Online-x86_64-cryptlvm_iscsi@64bit fails in [ibft](https://openqa.suse.de/tests/9536843/modules/ibft/steps/40) ## Test suite description Conducts installation on iSCSI device relying on iBFT with encrypted LVM. ## Reproducible Fails since (at least) Build [21.1](https://openqa.suse.de/tests/9518762) ## Expected result Last good: [19.1](https://openqa.suse.de/tests/9418168) (or more recent) ## Further details Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Online&machine=64bit&test=cryptlvm_iscsi&version=15-SP5)
Not sure about the root cause, but it could be related to the iscsi setup in yast2. We didn't have a kernel change so far. For now I am setting the YaST2 component. One note on the bug report, the Observation links to a different issue than the Reproducible part.
Could you maybe collect and attach the y2logs of that system? They are not available in openQA.
smartctl -i /dev/sda failed with: "A mandatory SMART command failed: exiting" When successful, that command ("-i" for "--info") should result in something like this: smartctl 7.2 2021-09-14 r5237 [x86_64-linux-5.14.21-150400.24.21-default] (SUSE RPM) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint F3 Device Model: SAMSUNG HD103SJ Serial Number: S246JD2Z921835 LU WWN Device Id: 5 0024e9 0040754bf Firmware Version: 1AJ10001 User Capacity: 1.000.204.886.016 bytes [1,00 TB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 2.6, 3.0 Gb/s Local Time is: Wed Oct 5 13:22:20 2022 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled
cat 07-committed.yml # 2022-09-19 05:00:13 -0400 --- - disk: name: "/dev/sda" size: 20971540 KiB (20.00 GiB) block_size: 0.5 KiB io_size: 0 B min_grain: 1 MiB align_ofs: 0 B partition_table: gpt partitions: - free: size: 1 MiB start: 0 B - partition: size: 8 MiB start: 1 MiB name: "/dev/sda1" type: primary id: bios_boot - partition: size: 20962307.5 KiB (19.99 GiB) start: 9 MiB name: "/dev/sda2" type: primary id: lvm encryption: type: luks name: "/dev/mapper/cr_scsi-1IET_00010001-part2" password: "***" - free: size: 16.5 KiB start: 20971523.5 KiB (20.00 GiB) - disk: name: "/dev/vda" size: 20 GiB block_size: 0.5 KiB io_size: 0 B min_grain: 1 MiB align_ofs: 0 B - lvm_vg: vg_name: system extent_size: 4 MiB lvm_lvs: - lvm_lv: lv_name: home size: 5864 MiB (5.73 GiB) stripes: 1 file_system: xfs mount_point: "/home" - lvm_lv: lv_name: root size: 13400 MiB (13.09 GiB) stripes: 1 file_system: btrfs mount_point: "/" btrfs: default_subvolume: "@" subvolumes: - subvolume: path: "@" - subvolume: path: "@/boot/grub2/i386-pc" - subvolume: path: "@/boot/grub2/x86_64-efi" - subvolume: path: "@/opt" - subvolume: path: "@/root" - subvolume: path: "@/srv" - subvolume: path: "@/tmp" - subvolume: path: "@/usr/local" - subvolume: path: "@/var" nocow: true - lvm_lv: lv_name: swap size: 1204 MiB (1.18 GiB) stripes: 1 file_system: swap mount_point: swap lvm_pvs: - lvm_pv: blk_device: "/dev/mapper/cr_scsi-1IET_00010001-part2"
The iSCSI disk in question is /dev/sda. During configuring iSCSI in both test cases, the screenshots in the failing one look exactly the same as the "last good" one. Yet later, after installation, in the failing case the iSCSI disk does not accept SMART commands. But is that really something that can be influenced from the client side? Shouldn't that be set up on the iSCSI server, and that's it? Are we absolutely sure that the iSCSI server did not change in the meantime? Is that a physical server, or also a virtual machine? On physical machines, I remember that SMART support needs to be enabled in the BIOS. Is it plausible that anything changed there; on the iSCSI server side? I also briefly checked in the yast-iscsi-client code; the last pull request is from June 21th, much longer ago than the "last good" test case. And even that PR does not appear to be even remotely related. https://github.com/yast/yast-iscsi-client/pull/120/files So, please check the iSCSI server side first.
Created attachment 862728 [details] ibft y2log
(In reply to Stefan Weiberg from comment #2) > Could you maybe collect and attach the y2logs of that system? They are not > available in openQA. There is an issue of ibft worker blocked to reproduce this bug, anyway, finally I reproduced it and got the ibft y2log. https://openqa.nue.suse.com/tests/9898913#step/ibft/41
I never got an answer to my question in comment #5: Are you sure that SMART support is enabled on the server side?
BTW I don't think that's something that can be extracted from an y2log on the server side; you'll have to run "smartctl" commands there.
(In reply to Stefan Hundhammer from comment #10) > BTW I don't think that's something that can be extracted from an y2log on > the server side; you'll have to run "smartctl" commands there. Hi, I think you mean run the cmd on the ibft worker directly, I have a passed job recently, please check it https://openqa.nue.suse.com/tests/9891296#step/ibft/40 It runs on the openqaworker6:1 which is the instance of worker qemu_x86_64_ibft, so I think the setting for ibft is correct on server side at least when the cmd run without failure. I agree with you, there is something wrong on the worker of qemu_x86_64_ibft when failure happened. In fact, our openQA test run on the SUT which is VM based on the worker, so maybe we can check it on SUT also. So please give me some instructions to check it, thanks.
(In reply to Stefan Hundhammer from comment #10) > BTW I don't think that's something that can be extracted from an y2log on > the server side; you'll have to run "smartctl" commands there. Hi, I just checked the iscsi server, I can't access it. # iscsiadm --mode discovery --op update --type sendtargets --portal x.x.x.x iscsiadm: cannot make connection to x.x.x.x: No route to host iscsiadm: cannot make connection to x.x.x.x: No route to host iscsiadm: cannot make connection to x.x.x.x: No route to host iscsiadm: cannot make connection to x.x.x.x: No route to host iscsiadm: connection login retries (reopen_max) 5 exceeded iscsiadm: Could not perform SendTargets discovery: iSCSI PDU timed out For security reason, I haven't pasted the ip of iscsi server here, it is in the log of autoinst-log, and I can send it to you via e-mail also.
I cannot check that remotely from here. Somebody who has access to that server will need to log in and issue "smartctl" commands to check if the machine has SMART enabled. From our investigations in this bug so far, it looks very much like it's not. Not every problem in the world is a YaST installer problem. We cannot do system administration for the server infrastructure in the QA labs.
(In reply to Stefan Hundhammer from comment #13) > I cannot check that remotely from here. Somebody who has access to that > server will need to log in and issue "smartctl" commands to check if the > machine has SMART enabled. From our investigations in this bug so far, it > looks very much like it's not. > > Not every problem in the world is a YaST installer problem. We cannot do > system administration for the server infrastructure in the QA labs. I can access the machine now, but I don't know how to setup it, it seems no such cmd of smartctl. leli@worker2:~> smartctl -bash: smartctl: command not found leli@worker2:~> I don't know who is the maintainer of the iscsi server.
https://openqa.nue.suse.com/tests/9898914#step/ibft/48 There is no issue with this iscsi server. [seems we have more than 1 iscsi server?] I can give the iscsi tgt server configuration: #tgt-admin -s Target 1: iqn.2016-02.openqa.de:for.openqa System information: Driver: iscsi State: ready I_T nexus information: LUN information: LUN: 0 Type: controller SCSI ID: IET 00010000 SCSI SN: beaf10 Size: 0 MB, Block size: 1 Online: Yes Removable media: No Prevent removal: No Readonly: No SWP: No Thin-provisioning: No Backing store type: null Backing store path: None Backing store flags: LUN: 1 Type: disk SCSI ID: IET 00010001 SCSI SN: beaf11 Size: 21475 MB, Block size: 512 Online: Yes Removable media: No Prevent removal: No Readonly: No SWP: No Thin-provisioning: No Backing store type: rdwr Backing store path: /opt/openqa-iscsi-disk Backing store flags: Account information: ACL information: ALL
The issue does not seem to be originating from a change in configuration of the iscsi server. This is the original failure: https://openqa.suse.de/tests/9536843#step/ibft/34 with target: iqn.2016-02.openqa.de:for.openqa and portal: 10.160.1.93 This is a recent test that passed: https://openqa.suse.de/tests/9680734#step/ibft/34 It has the same iscsi target: iqn.2016-02.openqa.de:for.openqa and portal: 10.160.1.93 and there has been no configuration change there since the issue. Since SMART support seems to be enabled in the run that PASSES, it could be that there is a different underlying issue.
timeout in new build40.1 https://openqa.suse.de/tests/9919949#step/ibft/40
So, what is the status of this? Do we know now whether or not SMART is enabled on that server? Is the smartmontools package installed on that server? Please notice that they are in /usr/sbin which you may not have in your $PATH as a normal user.
Please also notice that it's not YaST that tries to use the smartctl command, it's additional tests in your test setup. I don't know to what extent SMART works over iSCSI, and what the requirements are for it. I don't see any option that looks even remotely related to SMART in yast-iscsi-client, and that code has not changed for a long time (see comment #5). I don't see ANY indication that this should be a YaST bug.
As for SMART over iSCSI: https://www.smartmontools.org/wiki/FAQ#SmartmontoolsforFireWireUSBandSATAdiskssystems "SCSI commands can be conveyed by many transports: the veteran SCSI Parallel Interface (SPI), Fibre Channel (FC), Infiniband (SRP), Serial Attached SCSI (SAS), IP (iSCSI and iSER), USB (mass storage), , and IEEE 1394 (SBP) to name some." Maybe this is helpful to debug from your test client's side what is going on: "The '-d sat' option instructs smartctl and smartd to assume a SATL is in place and act accordingly." It might even be useful to always use that "-d sat" option in that test when it is known that the target disk is iSCSI.
No feedback. Besides, as mentioned multiple times, there is no hint that this might be a YaST bug. YaST does not change anything related to SMART. It's either the test server (which quite possibly might not have SMART support enabled in the BIOS) or the iSCSI transport layer.
we still hit this bug in the Build64.1 https://openqa.suse.de/tests/10220394#step/ibft/41
On the worker where this is running SMART support is available and enabled: smartctl --all /dev/sda smartctl 7.2 2021-09-14 r5237 [x86_64-linux-5.14.21-150400.24.38-default] (SUSE RPM) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Constellation ES.3 Device Model: ST1000NM0033-9ZM173 Serial Number: Z1W5P5JM LU WWN Device Id: 5 000c50 091d250d3 Firmware Version: SN06 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Mon Jan 9 09:59:55 2023 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled In the VM is showed like this when sporadically works or fails: https://openqa.suse.de/tests/10187388#step/ibft/42 When trying to add option `smartctl -d sat -T permissive -a /dev/sda` I run into trouble like described here https://www.smartmontools.org/wiki/SAT-with-UAS-Linux Googling a bit further I always hit the same thing with uas, but doing lsmod in worker and in VM I don't find that Kernel module. To summarize, We have an installation in a VM using iscsi disk from the worker where the VM is running: https://openqa.suse.de/tests/10226693#step/iscsi_configuration/3 (screenshot configuring iscsi in installation) and sporadically when we get info of the disk with smartcl command in the running system produced by that installation we don't get an answer. Definitely nothing related with YaST and our plan is to disable this check for the test, as looks like there is not guarantee that you can run that command and get and reliable answer. Please forward this bug to Kernel or other component if you think it makes sense, for our testing scope and given the information in here https://www.smartmontools.org/wiki/FAQ#SmartmontoolsforFireWireUSBandSATAdiskssystems for us seems not worth it. Thanks Stefan Hundhammer, for the information provided as it helps us to understand the issue and also be aware what makes sense for us to test, and sorry for the late response.