Bug 1222465 - fdisk creates broken partition table
Summary: fdisk creates broken partition table
Status: CONFIRMED
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Installation (show other bugs)
Version: Leap 15.5
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Stanislav Brabec
QA Contact: Jiri Srain
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-04-08 12:32 UTC by Volker Kuhlmann
Modified: 2024-06-24 20:00 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Tar file containing sparse 2TB disk image (20.00 KB, application/x-tar)
2024-04-08 12:32 UTC, Volker Kuhlmann
Details
Partition table (fdisk -l) (934 bytes, text/plain)
2024-04-08 12:35 UTC, Volker Kuhlmann
Details
Hex dump of 2TB disk image (1.38 KB, text/plain)
2024-04-08 12:36 UTC, Volker Kuhlmann
Details
Reproducer shell script (764 bytes, application/x-shellscript)
2024-06-24 20:00 UTC, Stanislav Brabec
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Volker Kuhlmann 2024-04-08 12:32:09 UTC
Created attachment 874128 [details]
Tar file containing sparse 2TB disk image

I always partition my disks as it works best for me, and these disks last multiple re-installations of openSUSE versions. In yast I allocate filesystems manually to the existing partitions. Yast is not allowed to change the partition layout.

Leap 15.5 is unable to be installed on this already partitioned disk (15.4 had no problems) because when yast is told to read the partition table nothing shows up in yast.

Yast runs parted -l -s -m (or similar) for this, and parted crashes while reading the partition table(!) and doesn't produce any output yast can use.

To reproduce the crash it is sufficient to create a sparse file "disk" and partition it.

The disk has this size:

Disk SSD-sparse-crash.diskimg: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors

Partition with fdisk, as per the attached table created after the fact with fdisk -l. All filesystem boundaries in that table are intended to start at a multiple of a large number of MiB. Partitions overlap and there's no error in the table.

Incredibly parted can be made to not crash just by introducing a very small gap between partition boundaries. It's still a parted bug.

I'll try and attach a tar file (20k) of the 2TB sparse disk image.
Comment 1 Volker Kuhlmann 2024-04-08 12:35:11 UTC
Created attachment 874129 [details]
Partition table (fdisk -l)

parted crashes reading from a disk with this partition table. This file was created with fdisk -l and the table was created with fdisk (leap 15.4 or 15.5).
Comment 2 Volker Kuhlmann 2024-04-08 12:36:14 UTC
Created attachment 874130 [details]
Hex dump of 2TB disk image
Comment 3 Lukas Ocilka 2024-04-08 12:38:25 UTC
Ah, Arvin, you seem to be the one taking care of parted. Reassigning.
Comment 4 Volker Kuhlmann 2024-04-08 12:59:45 UTC
The crash can be avoided by moving the end of the previous partition down by 1 sector (512 bytes).

I've seen parted creating partitions starting on an odd(!!) sector number, which is plain stupid.
Comment 5 Arvin Schnell 2024-04-08 15:59:24 UTC
For each logical partitions there is an extended boot record (EBR). Typical
this EBR is placed between the logical partitions. That is why AFIAS parted
needs a gap of at least a sector between logical partitions.

Since there is no gap between the logical partitions of your partition
table it is at least not typical. But AFAIS it is even broken in that
the EBR is located *inside* the partition. I have gathered the data by
placing extra logging in parted since I am not aware of a program
showing the EBR locations.

  Partition 5 spans sectors 86507520 to 1135083519.

  The EBR of partition 6 is located at 1135081472 (so at the end
  of partition 5 but *inside* partition 5).

To further investigate the program could you please provide the exact
steps to produce the partition table?
Comment 6 Volker Kuhlmann 2024-04-10 21:10:32 UTC
Steps to create this partition table:

* Create 2TB sparse file with truncate.
* fdisk of openSUSE 15.4 - enter the numbers of this table.

Alignment is critical for SSD performance. Partitions, or maybe more importantly filesystems, should be at least 1MiB aligned - or whatever the disk's internal block size is. Equally important is erase block size, but manufacturers treat this as secret. It may be in the order of 128MiB now. One could test with e.g. flashbench.

If the yast partitioner could meet all these constraints, maybe with a user-settable alignment size, and optionally operate in sectors so one can be sure and/or verify that'd be great. Until then I'll need to partition manually with other tools.

flashbench
https://github.com/bradfa/flashbench
Comment 7 Arvin Schnell 2024-04-11 06:43:58 UTC
Then fdisk is creating a broken partition table. I cannot say whether fdisk
expects the user to make sure there is place for the EBRs or not.

You can use hexdump on partition 5 and see the EBR for partition 6, esp. the
DOS signature "55 aa".

With that partition table data loss is imminent. E.g. copy from /dev/zero
to the last GiB of partition 5 and fdisk cannot find partition 6 and 7
(without a warning). parted complains about the broken signature. Simply
running fstrim might cause the same data loss.

BTW: YaST does take care of 1 MiB alignment since at least 10 years. And with
GPT the whole problem of placing EBRs does not exist.
Comment 8 Steffen Winterfeldt 2024-04-11 07:52:16 UTC
> Alignment is critical for SSD performance.

Your partitions are aligned perfectly. They just overlap. While this
is great for performance, it is not so great for the data.

It's a bug in fdisk that it allows you to do this.

The problem is that you leave a gap before the first logical partition.

So after you created the 1st logical partition, fdisk keeps
suggesting a range that starts at the first available block
in the original extended partition and ends at the end of free space.

This is clearly not true (since there's already a partition in between).
This allows you to enter basically any random partition borders without
fdisk validating them properly.

And yes, as Arvin already mentioned - it is technically impossible
to have logical partitions seamlessly one after another. There MUST be
space in between them. And if you keep the start values fdisk is
suggesting (that is, no artificial gap), everything is fine.
Comment 9 Volker Kuhlmann 2024-04-12 03:25:57 UTC
OK so I didn't take care of a correct layout, and fdisk didn't show a warning (as long as it's a warning, not an error).

That's still no reason for parted to crash (and installation to fail)!

For that matter, parted can't validate a partition table either. It only has align-checks for min and opt, which all pass.

Yast may align partitions to 1MiB but AFAIR it doesn't show useful information that it does so ("166.5GB" is completely useless for checking what it's doing re alignment).

And as I explained, alignment to anything less than erase block size (which is much larger than block size) will degrade performance. Yast partitioner isn't able to take care of this. With today's disk sizes loosing a few 10s of MB between partitions doesn't matter, performance does, and aiming a bit higher than needed doesn't hurt. 1MiB alignment is no longer good enough.

So fdisk, parted, yast all need fixing...
Comment 10 Volker Kuhlmann 2024-04-12 03:30:59 UTC
The start values fdisk suggests for logical partitions are not useful for increasing alignment because the start value it suggests is the beginning of a 35MB gap between the second last and last logical partition, or similar, so one is forced to manually put in a value after the end of the last logical partition.
Comment 11 Steffen Winterfeldt 2024-04-12 08:19:36 UTC
> as long as it's a warning, not an error).

I know I'm talking to a wall but I'll try one last time: your partitions overlap
and you will see data corruption eventually. Your layout is wrong
and you should fix it asap if you value your data.

> 1MiB alignment is no longer good enough.

One person's alignment gap is the other person's free usable space.

So in light of this, am I correct assuming you tried to create a layout
with 256 MiB manually with fdisk? Why aren't you using GPT where you can
put your partitions one after another without gap, perfectly aligned?

> So fdisk, parted, yast all need fixing...

Yes. And world peace, too!

There's two real issues: (1) fdisk should not allow such partitioning and
(2) parted should probably not just crash when it sees it.
Comment 12 Volker Kuhlmann 2024-04-12 11:11:52 UTC
> I know I'm talking to a wall but I'll try one last time:

Calm down. I am fixing it with urgency.

> One person's alignment gap is the other person's free usable space.

Correct. Yast doesn't give options (it doesn't even say what it's doing), hence using other tools. Bad luck fdisk didn't flag user error. That's unix - everyone's allowed to be an idiot but doesn't have to be. With Redmond OS you must be an idiot, big brother always knows what you need.

> So in light of this, am I correct assuming you tried to create a layout
> with 256 MiB manually with fdisk?

128MiB alignment, yes. Information I found suggests erase block sizes are much larger than 1MiB and flashbench testing indicates 128MiB to be a good choice.

> Why aren't you using GPT where you can
> put your partitions one after another without gap, perfectly aligned?

If I knew everything, I could tell whether OtherOS on the earlier partitions still boots with GPT, but I don't. DOS tables are supposed to work on a 2TB disk so that seems the safe option. Plus it's always worked so far, so no change is safest.

> There's two real issues: (1) fdisk should not allow such partitioning and

I prefer a warning, not error. There is often unexpected use in tools that eventually do what told. But whatever.

> (2) parted should probably not just crash when it sees it.

Nothing is allowed to crash on bad input. Especially when it causes install failure. Show a warning, fine, a completely blank result with no reason why is not useful. I can recover now regardless whether anything gets fixed. Thanks for looking at it, I appreciate it.
Comment 13 Stanislav Brabec 2024-06-24 19:06:08 UTC
Well, I have checked the fdisk output.

The first EBR is placed in the first sector mentioned in the line with Type == Extended.

EBR of the next partition inside the Extended space is invisible for fdisk.

Anyway, trying to reproduce on 15.6 with a randomly sized partitions don't allow me to reproduce the bug. I got a good proposal and trying to add bad number reports "Value out of range.".


But there is apparently a different problem that allows to create overlapping partitions:

truncate -s 2000398934016 SSD-sparse-crash.diskimg2
fdisk SSD-sparse-crash.diskimg2
n
p

200703
n
p

84107519
n
p
84148224
86245375
n
e
86245376

n
86507520
1135083519
Here you will get correct:

Command (m for help): n
All primary partitions are in use.
Adding logical partition 5
First sector (86247424-3907029167, default 86247424): 86507520
Last sector, +/-sectors or +/-size{K,M,G,T,P} (86507520-3907029167, default 3907029167): 1135083519

Created a new partition 5 of type 'Linux' and of size 500 GiB.

But now error appears:
n
Command (m for help): n
All primary partitions are in use.
Adding logical partition 6
First sector (86247424-3907029167, default 86247424): 1135083520
Last sector, +/-sectors or +/-size{K,M,G,T,P} (1135083520-3907029167, default 3907029167): 3904897023

Created a new partition 6 of type 'Linux' and of size 1.3 TiB.

The range allows to create fully overlapping partitions. Just like the previously allocated sectors are ignored.

But it happens specifically for the reporter's data. Creating a random layout with a different sizes correctly sets the down limit.
Comment 14 Stanislav Brabec 2024-06-24 19:53:14 UTC
But if I try to abuse this bug and try to create totally overlapping partitions, it does not work:

Command (m for help): n

All primary partitions are in use.
Adding logical partition 6
First sector (86247424-3907029167, default 86247424): 86507520

Sector 86507520 is already allocated.

So there is something wrong with suggested lowest sector, and something other wrong with a sector which is really accepted.
Comment 15 Stanislav Brabec 2024-06-24 19:56:23 UTC
And yet another attempt:
Command (m for help): n

All primary partitions are in use.
Adding logical partition 6
First sector (86247424-3907029167, default 86247424): 86507520

Sector 86507520 is already allocated.
First sector (1135085568-3907029167, default 1135085568): 1135083520
Value out of range.

So entering a bad sector number forces fdisk to recompute the lowest sector, and then it proposes a working value.
Comment 16 Stanislav Brabec 2024-06-24 20:00:38 UTC
Created attachment 875682 [details]
Reproducer shell script