Bug 1175105 - mdadm doesn't except partition for --add because of "dirt" in that partition of a fresh gpt table
mdadm doesn't except partition for --add because of "dirt" in that partition ...
Status: REOPENED
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Kernel
Leap 15.1
x86-64 Other
: P3 - Medium : Normal (vote)
: ---
Assigned To: openSUSE Kernel Bugs
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-08-11 10:48 UTC by Ralf Czekalla
Modified: 2021-08-03 20:30 UTC (History)
4 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ralf Czekalla 2020-08-11 10:48:16 UTC
I'm about to migrate an important central private nextcloud server to 15.2.

I'm use RAID1 for all my partitions and benefit from that with easy made clones with the help of mdadm by growing the number of raid-devices and adding a refreshed SSD with a new gpt partition table and cloned partitions descriptions. 

This worked for most of the six partitions but one. (It's not the first installation I transform with this idea. Did these processes already several times)

This faulty one showed weird error messages I couldn't find a real good reason for, like ...

...on console:
# mdadm --verbose /dev/md2 --add /dev/sdc5
mdadm: add new device failed for /dev/sdc5 as 4: Invalid argument

# mdadm -E /dev/sdc5
/dev/sdc5:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x9
     Array UUID : 94bbbcd3:ad4d1b0b:dcd4d548:1af16050
           Name : any:2
  Creation Time : Fri Feb  2 20:09:03 2018
     Raid Level : raid1
   Raid Devices : 3

 Avail Dev Size : 83892192 sectors (40.00 GiB 42.95 GB)
     Array Size : 41945984 KiB (40.00 GiB 42.95 GB)
  Used Dev Size : 83891968 sectors (40.00 GiB 42.95 GB)
   Super Offset : 83892208 sectors
   Unused Space : before=0 sectors, after=224 sectors
          State : active
    Device UUID : c9e17312:0069d580:42782958:8817e0f7

Internal Bitmap : -16 sectors from superblock
    Update Time : Mon Aug 10 18:09:21 2020
  Bad Block Log : 512 entries available at offset -8 sectors - bad blocks present.
       Checksum : 7b5d72a - correct
         Events : 0


   Device Role : spare
   Array State : AA. ('A' == active, '.' == missing, 'R' == replacing)

First I was blinded by the "bad blocks present", but all devices checked out perfectly healthy in smart and also extended device scans didn't show any problems. Took me several hours to get through all this.
 
dmesg:
md: sdc5 does not have a valid v1.0 superblock, not importing!
md: md_import_device returned -22

Also tried to --zero-superblock the partition of course. No change of behavior and still the above error message in dmesg remained.

At the end I found a surprising hint on serverfault.com (https://serverfault.com/questions/696392/how-to-add-disk-back-to-raid-and-replace-removed) from Oct. 2015 with a similar behavior that suggested to clean-up of the partition first with dd writing zeros (dd if=/dev/zero of=/dev/sdc5 status=progress) and try again later to add the last seemingly faulty partition. 

And long story short, this really worked.

Somehow mdadm - and md behind - is choking on some dirt inside of partition blocks from old content before the disk/ssd was wiped with a new gpt partition table. (Of course with SSDs you try to prevent unnecessary writes and wiping every block of a disk first) 

I'm using md device type 1.0 here (in contrast to the serverfault.com case), where the RAID/md setup data is stored at the end of the partition. I had to wipe exactly this area at the end of the partition to add it afterwards successfully. Of course, no effect when doing this at the beginning of the partition (mentioned in the serverfault.com case) where md type 1.2 stores the md setup data. 

I think after 5 years this might need a clean up in mdadm or md device management.

At least the error message should suggest to clean the partition first instead of this error message in dmesg. Also the --zero-superblock should wipe out the dirt beneath the superblock and not seemingly let some stuff leak here.

Thanks
Ralf
Comment 1 Ralf Czekalla 2020-08-11 10:51:17 UTC
Version of mdadm 
v4.1 - 2018-10-01

Kernel 5.7.12 from Kernel:stable
Comment 2 Miroslav Beneš 2020-12-23 12:05:00 UTC
Forgotten one, sorry about that.

Coly, could you take a look if this is something we want to fix/improve?
Comment 3 Coly Li 2020-12-24 11:51:50 UTC
(In reply to Miroslav Beneš from comment #2)
> Forgotten one, sorry about that.
> 
> Coly, could you take a look if this is something we want to fix/improve?

The --zero-superblock option does not zero-erase all superblock area, it just exactly zero fills some attributions (e.g. array name, homehost name, uuid, data offset from start, etc.) from the super block (it depends on the super block version and format).

Maybe wipefs(8) may help in this case? Normally it works with "wipefs -fa <dev>" for me.

Coly Li
Comment 4 Miroslav Beneš 2020-12-30 11:14:00 UTC
Thanks Coly.

Yes, wipefs sounds like a good approach here. So let me close with INVALID as there is basically nothing to fix.
Comment 5 Ralf Czekalla 2021-04-04 14:53:51 UTC
Seriously guys? 

=> Misleading error messages by an important tool - we are talking about mdadm here, right? 
First to-be-fixed issue! At least it should bring an appropriate error message and recommend the possibility to fix this problem with wipefs.

=> wipefs should automatically be called by mdadm here! 
Second necessary fix: Call wipefs automatically if mdadm recognizes that it can not cope with the bits and bytes of dirt during md creation and run it on the device mdadm is supposed to use!

Really nothing to fix? Really?
Comment 6 Coly Li 2021-04-04 15:55:21 UTC
(In reply to Ralf Czekalla from comment #5)
> Seriously guys? 
> 
> => Misleading error messages by an important tool - we are talking about
> mdadm here, right? 
> First to-be-fixed issue! At least it should bring an appropriate error
> message and recommend the possibility to fix this problem with wipefs.
> 

Adding an proper error message might be a practical fix which might be acceptable by upstream maintainer. Could you please to provide a suggested error message to avoid misunderstanding in this case ?

> => wipefs should automatically be called by mdadm here! 
> Second necessary fix: Call wipefs automatically if mdadm recognizes that it
> can not cope with the bits and bytes of dirt during md creation and run it
> on the device mdadm is supposed to use!

This is hard and complicated, to make mdadm correctly recognize the error condition is really not easy, and erase the meta data automatically is risky and won't have full support from all developers.

I do need your help to provide your opinion and suggestion on how to provide the error information and a preferred message content. At least the information which you think may make you feel better, then I can try to compose a patch.

Thanks in advance.

Coly Li
Comment 7 Ralf Czekalla 2021-08-03 20:30:56 UTC
Sorry Coly to come back to this so late. 

I'm fine with the proposal to fix this with an appropriate error message giving better suggestions what the error is and how to fix. 

I would like to help you with this, but I'm not sure how to reproduce this behavior. The device was wiped in the successful manual process and gone with this. 

It's probably heavily depending on the byte/bit pattern where mdadm looks into the offered partition/device. If here the data is not cleansed from unrecognized byte patterns, it probably will end up in misinterpretations and misguiding error messages. 

I would assume that an --zeroblock should take care, that the places in the device are cleaned where mdadm is afterwards checking for a proper device. 

Do you have a list of places where mdadm officially and unofficially checks a new device for adding to an existing md? These steps need to be checked for reading beyond the supposed places. 

This is probably anyway a security risk if mdadm is using unofficial places to read data and misinterpreting these for operations afterwards. No clue what else can happen with this erratic data interpreted for real.

Best Regards
Ralf