Bug 1203566 - [Build 21.1] openQA test fails in ibft - smartctl - i /dev/sda failed for 'mandatory smart cmd failure'
[Build 21.1] openQA test fails in ibft - smartctl - i /dev/sda failed for 'ma...
Status: NEW
Classification: openSUSE
Product: PUBLIC SUSE Linux Enterprise Server 15 SP5
Classification: openSUSE
Component: YaST2
unspecified
x86-64 SLES 15
: P2 - High : Normal
: ---
Assigned To: E-mail List
https://openqa.suse.de/tests/9536843/...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2022-09-20 11:41 UTC by Ming Li
Modified: 2022-11-16 02:45 UTC (History)
4 users (show)

See Also:
Found By: openQA
Services Priority:
Business Priority:
Blocker: Yes
Marketing QA Status: ---
IT Deployment: ---
shundhammer: needinfo? (leli)


Attachments
ibft y2log (7.82 MB, application/x-bzip)
2022-11-08 11:04 UTC, Ming Li
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ming Li 2022-09-20 11:41:19 UTC
## Observation

openQA test in scenario sle-15-SP5-Online-x86_64-cryptlvm_iscsi@64bit fails in
[ibft](https://openqa.suse.de/tests/9536843/modules/ibft/steps/40)

## Test suite description
Conducts installation on iSCSI device relying on iBFT with encrypted LVM.


## Reproducible

Fails since (at least) Build [21.1](https://openqa.suse.de/tests/9518762)


## Expected result

Last good: [19.1](https://openqa.suse.de/tests/9418168) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Online&machine=64bit&test=cryptlvm_iscsi&version=15-SP5)
Comment 1 Stefan Weiberg 2022-09-26 15:08:49 UTC
Not sure about the root cause, but it could be related to the iscsi setup in yast2. We didn't have a kernel change so far. For now I am setting the YaST2 component.

One note on the bug report, the Observation links to a different issue than the Reproducible part.
Comment 2 Stefan Weiberg 2022-09-26 15:09:44 UTC
Could you maybe collect and attach the y2logs of that system? They are not available in openQA.
Comment 3 Stefan Hundhammer 2022-10-05 11:25:46 UTC
  smartctl -i /dev/sda

failed with:

  "A mandatory SMART command failed: exiting"

When successful, that command ("-i" for "--info") should result in
something like this:


smartctl 7.2 2021-09-14 r5237 [x86_64-linux-5.14.21-150400.24.21-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F3
Device Model:     SAMSUNG HD103SJ
Serial Number:    S246JD2Z921835
LU WWN Device Id: 5 0024e9 0040754bf
Firmware Version: 1AJ10001
User Capacity:    1.000.204.886.016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Wed Oct  5 13:22:20 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Comment 4 Stefan Hundhammer 2022-10-05 11:28:56 UTC
cat 07-committed.yml 

# 2022-09-19 05:00:13 -0400
---
- disk:
    name: "/dev/sda"
    size: 20971540 KiB (20.00 GiB)
    block_size: 0.5 KiB
    io_size: 0 B
    min_grain: 1 MiB
    align_ofs: 0 B
    partition_table: gpt
    partitions:
    - free:
        size: 1 MiB
        start: 0 B
    - partition:
        size: 8 MiB
        start: 1 MiB
        name: "/dev/sda1"
        type: primary
        id: bios_boot
    - partition:
        size: 20962307.5 KiB (19.99 GiB)
        start: 9 MiB
        name: "/dev/sda2"
        type: primary
        id: lvm
        encryption:
          type: luks
          name: "/dev/mapper/cr_scsi-1IET_00010001-part2"
          password: "***"
    - free:
        size: 16.5 KiB
        start: 20971523.5 KiB (20.00 GiB)
- disk:
    name: "/dev/vda"
    size: 20 GiB
    block_size: 0.5 KiB
    io_size: 0 B
    min_grain: 1 MiB
    align_ofs: 0 B
- lvm_vg:
    vg_name: system
    extent_size: 4 MiB
    lvm_lvs:
    - lvm_lv:
        lv_name: home
        size: 5864 MiB (5.73 GiB)
        stripes: 1
        file_system: xfs
        mount_point: "/home"
    - lvm_lv:
        lv_name: root
        size: 13400 MiB (13.09 GiB)
        stripes: 1
        file_system: btrfs
        mount_point: "/"
        btrfs:
          default_subvolume: "@"
          subvolumes:
          - subvolume:
              path: "@"
          - subvolume:
              path: "@/boot/grub2/i386-pc"
          - subvolume:
              path: "@/boot/grub2/x86_64-efi"
          - subvolume:
              path: "@/opt"
          - subvolume:
              path: "@/root"
          - subvolume:
              path: "@/srv"
          - subvolume:
              path: "@/tmp"
          - subvolume:
              path: "@/usr/local"
          - subvolume:
              path: "@/var"
              nocow: true
    - lvm_lv:
        lv_name: swap
        size: 1204 MiB (1.18 GiB)
        stripes: 1
        file_system: swap
        mount_point: swap
    lvm_pvs:
    - lvm_pv:
        blk_device: "/dev/mapper/cr_scsi-1IET_00010001-part2"
Comment 5 Stefan Hundhammer 2022-10-05 11:49:25 UTC
The iSCSI disk in question is /dev/sda.

During configuring iSCSI in both test cases, the screenshots in the failing one look exactly the same as the "last good" one.

Yet later, after installation, in the failing case the iSCSI disk does not accept SMART commands. But is that really something that can be influenced from the client side? Shouldn't that be set up on the iSCSI server, and that's it?

Are we absolutely sure that the iSCSI server did not change in the meantime? Is that a physical server, or also a virtual machine? On physical machines, I remember that SMART support needs to be enabled in the BIOS. Is it plausible that anything changed there; on the iSCSI server side?

I also briefly checked in the yast-iscsi-client code; the last pull request is from June 21th, much longer ago than the "last good" test case. And even that PR does not appear to be even remotely related.

  https://github.com/yast/yast-iscsi-client/pull/120/files

So, please check the iSCSI server side first.
Comment 6 Ming Li 2022-11-08 11:04:04 UTC
Created attachment 862728 [details]
ibft y2log
Comment 7 Ming Li 2022-11-08 11:05:59 UTC
(In reply to Stefan Weiberg from comment #2)
> Could you maybe collect and attach the y2logs of that system? They are not
> available in openQA.

There is an issue of ibft worker blocked to reproduce this bug, anyway, finally I reproduced it and got the ibft y2log.  https://openqa.nue.suse.com/tests/9898913#step/ibft/41
Comment 8 Stefan Hundhammer 2022-11-08 12:31:03 UTC
I never got an answer to my question in comment #5: Are you sure that SMART support is enabled on the server side?
Comment 9 Stefan Hundhammer 2022-11-08 12:31:20 UTC
I never got an answer to my question in comment #5: Are you sure that SMART support is enabled on the server side?
Comment 10 Stefan Hundhammer 2022-11-08 12:33:05 UTC
BTW I don't think that's something that can be extracted from an y2log on the server side; you'll have to run "smartctl" commands there.
Comment 11 Ming Li 2022-11-09 01:29:58 UTC
(In reply to Stefan Hundhammer from comment #10)
> BTW I don't think that's something that can be extracted from an y2log on
> the server side; you'll have to run "smartctl" commands there.

Hi, I think you mean run the cmd on the ibft worker directly, 
I have a passed job recently, please check it https://openqa.nue.suse.com/tests/9891296#step/ibft/40 It runs on the openqaworker6:1 which is the instance of worker qemu_x86_64_ibft, so I think the setting for ibft is correct on server side at least when the cmd run without failure.
I agree with you, there is something wrong on the worker of qemu_x86_64_ibft when failure happened. In fact, our openQA test run on the SUT which is VM based on the worker, so maybe we can check it on SUT also. So please give me some instructions to check it, thanks.
Comment 12 Ming Li 2022-11-09 02:28:24 UTC
(In reply to Stefan Hundhammer from comment #10)
> BTW I don't think that's something that can be extracted from an y2log on
> the server side; you'll have to run "smartctl" commands there.

Hi, I just checked the iscsi server, I can't access it.

# iscsiadm --mode discovery --op update --type sendtargets --portal x.x.x.x
iscsiadm: cannot make connection to x.x.x.x: No route to host
iscsiadm: cannot make connection to x.x.x.x: No route to host
iscsiadm: cannot make connection to x.x.x.x: No route to host
iscsiadm: cannot make connection to x.x.x.x: No route to host
iscsiadm: connection login retries (reopen_max) 5 exceeded
iscsiadm: Could not perform SendTargets discovery: iSCSI PDU timed out

For security reason, I haven't pasted the ip of iscsi server here, it is in the log of autoinst-log, and I can send it to you via e-mail also.
Comment 13 Stefan Hundhammer 2022-11-09 09:36:55 UTC
I cannot check that remotely from here. Somebody who has access to that server will need to log in and issue "smartctl" commands to check if the machine has SMART enabled. From our investigations in this bug so far, it looks very much like it's not.

Not every problem in the world is a YaST installer problem. We cannot do system administration for the server infrastructure in the QA labs.
Comment 14 Ming Li 2022-11-10 07:08:15 UTC
(In reply to Stefan Hundhammer from comment #13)
> I cannot check that remotely from here. Somebody who has access to that
> server will need to log in and issue "smartctl" commands to check if the
> machine has SMART enabled. From our investigations in this bug so far, it
> looks very much like it's not.
> 
> Not every problem in the world is a YaST installer problem. We cannot do
> system administration for the server infrastructure in the QA labs.

I can access the machine now, but I don't know how to setup it, it seems no such cmd of smartctl.

leli@worker2:~> smartctl
-bash: smartctl: command not found
leli@worker2:~>

I don't know who is the maintainer of the iscsi server.
Comment 15 Richard Fan 2022-11-10 08:13:27 UTC
https://openqa.nue.suse.com/tests/9898914#step/ibft/48

There is no issue with this iscsi server. [seems we have more than 1 iscsi server?]

I can give the iscsi tgt server configuration:

#tgt-admin -s
Target 1: iqn.2016-02.openqa.de:for.openqa
    System information:
        Driver: iscsi
        State: ready
    I_T nexus information:
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET     00010000
            SCSI SN: beaf10
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: null
            Backing store path: None
            Backing store flags: 
        LUN: 1
            Type: disk
            SCSI ID: IET     00010001
            SCSI SN: beaf11
            Size: 21475 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: rdwr
            Backing store path: /opt/openqa-iscsi-disk
            Backing store flags: 
    Account information:
    ACL information:
        ALL
Comment 16 George Gkioulis 2022-11-14 09:34:33 UTC
The issue does not seem to be originating from a change in configuration of the iscsi server.

This is the original failure: https://openqa.suse.de/tests/9536843#step/ibft/34 with target: iqn.2016-02.openqa.de:for.openqa and portal: 10.160.1.93

This is a recent test that passed: https://openqa.suse.de/tests/9680734#step/ibft/34
It has the same iscsi target: iqn.2016-02.openqa.de:for.openqa and portal: 10.160.1.93 and there has been no configuration change there since the issue.

Since SMART support seems to be enabled in the run that PASSES, it could be that there is a different underlying issue.
Comment 17 Liu Shukui 2022-11-16 02:45:07 UTC
timeout in new build40.1

https://openqa.suse.de/tests/9919949#step/ibft/40