|
Bugzilla – Full Text Bug Listing |
| Summary: | EXT4-fs error (device dm-5) in ext4_mb_clear_bb:6517: error 95 | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | Ruediger Oertel <ro> |
| Component: | Kernel | Assignee: | Luis Henriques <lhenriques> |
| Status: | NEW --- | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | ada.lovelace, dmueller, ihno, jack, marcela.maslanova, rgoldwyn, ro, tiwai |
| Version: | Current | ||
| Target Milestone: | --- | ||
| Hardware: | S/390-64 | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
Ruediger Oertel
2024-02-12 10:43:12 UTC
> # worker creates a filesystem: > mke2fs -t ext4 -O ^has_journal -F /dev/mapper/$DMTARG Can you provide details on what this $DMTARG is? Is it a real device, luks, ...? > [ 171.720733] EXT4-fs (dm-5): mounting with "discard" option, but the > device does not support discard > [ 171.720743] EXT4-fs (dm-5): mounted filesystem > b286e1d2-cc6a-4990-9f4a-6a66bb7df2ca r/w without journal. Quota mode: none. > [ 171.722381] sysrq: Changing Loglevel > [ 171.722385] sysrq: Loglevel set to 7 > [ 225.920112] EXT4-fs error (device dm-5) in ext4_mb_clear_bb:6517: error 95 > [ 225.920565] EXT4-fs error (device dm-5) in ext4_mb_clear_bb:6517: error 95 > [ 225.957830] EXT4-fs error (device dm-5) in ext4_mb_clear_bb:6517: error 95 > [ 225.958456] EXT4-fs error (device dm-5) in ext4_mb_clear_bb:6517: error 95 > [ 225.958901] EXT4-fs error (device dm-5) in ext4_mb_clear_bb:6517: error 95 > [ 225.959304] EXT4-fs error (device dm-5) in ext4_mb_clear_bb:6517: error 95 > [ 225.968457] EXT4-fs error (device dm-5) in ext4_mb_clear_bb:6517: error 95 > [ 226.056499] EXT4-fs error (device dm-5) in ext4_mb_clear_bb:6517: error 95 > [ 226.118631] EXT4-fs error (device dm-5) in ext4_mb_clear_bb:6517: error 95 > [ 226.134514] EXT4-fs error (device dm-5) in ext4_mb_clear_bb:6517: error 95 > [ 226.654747] loop0: detected capacity change from 0 to 104857600 If $DMTARG is this^^^ loop device, then this looks odd, because the line above should have been seen *before* anything else, right? Anyway, I'll see if I can reproduce it locally. > [ 226.657919] EXT4-fs (loop0): mounted filesystem > 92fdfcd8-945b-46e6-9851-d28bbbb112f7 r/w without journal. Quota mode: none. no the loop device resides in files on top of the filesystem on dm-5 ($DMTARG) the layout looks like this: s390zl31:~ # multipath -ll 3600507630bffd216000000000000201a 3600507630bffd216000000000000201a dm-3 IBM,2107900 size=512G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active |- 0:0:2:2 sdd 8:48 active ready running `- 1:0:2:2 sdl 8:176 active ready running s390zl31:~ # df -hT /var/cache/obs/worker/ Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/3600507630bffd216000000000000201a ext4 504G 42G 438G 9% /var/cache/obs/worker 2-4 pathes to a multipath device map and the filesystem directly lives on that block device. inside the filesystems we create directories and files and some of these then work as loopback files (used as root/swamp for VMs of the build processes). Well, this one looks simple (and harmless). You have mount options "noatime,nodiratime,discard,nobarrier,async" - in particular the "discard" option is important. This means that without a journal ext4_mb_clear_bb() tries to issue discard requests for each freed extent. And the underlying storage apparently doesn't support discard and so we end up with the error 95 which is EOPNOTSUPP. Now I agree ext4 should not hog the log with these errors but ultimately the easiest is to fix the mount options to not include 'discard' mount option. Generally I seriously doubt 'discard' option is a good choice for your storage because for most storage types doing these small discards is hurting performance instead of helping it. Calling fstrim once a day or so tends to be much better choice. well, we added discard all across the board, but since s390 has it's own file creating the fs and writing the mount options I can just drop this. so the only real change is that the EOPNOTSUPP is logged and before it was just being ignored ... thanks for looking at this! OK, I'd question why "discard" was added across the board. Do you have some evaluation showing it actually benefits anything? Because I have hard time remembering where "discard" mount option was actually a net win over all those years. Regarding EOPNOTSUPP not being logged before - I'm not sure what was the "before" state. Without "discard" option, no discard was sent to the underlying device so sure, no error was reported. Similarly if the ext4 filesystem uses journalling, the discard actually happens at a difference place and the EOPNOTSUPP error happens to be silent. I'll send a fix upstream to make this consistent in ext4 (i.e. silence the EOPNOTSUPP error). well, our usage pattern is that any use of this filesystem is mostly write-only we copy all the packages into that fs, have loopback files on there and at the end of the buildjob just the resulting rpms are extracted and all the rest is thrown away. depending on the hardware the "physical device" underneath is either: - a multipath scsi device from some storage like on s390 - a single or multipath scsi disk for machines where we have nothing better - these days usually a nvme for all platforms where we could get these - tmpfs (basically gone due to slower performance than nvme and RAM prices being high, almost all replaced by nvme today) As far as I remember, Dirk Mueller was the one that proposed using "discard" for our "build" filesystems. Dirk, do you remember the background ? The issue was that on aarch64 and x86_64 machines, the underlying storage devices were rate-limiting writes to reach the MTBF endurance ratings. without discard, we were down to 3-4MB/s of write performance. after mounting all the layers with discard (which afaik is the default anyhow meanwhile in newer kernels), it went back up to the expected 500MB/s+ write performance. now a fstream could in *theory* do something similar, however the usage pattern here is that we create huge filesystems every few minutes to seconds. if they're not trimmed, then the NVME sees a completely full disk all the time. which isn't true. I guess we could make a more clever discard once the build job is completed and zap the entire filesystem that was allocated, but I never got the time to implement that. (In reply to Dirk Mueller from comment #7) > The issue was that on aarch64 and x86_64 machines, the underlying storage > devices were rate-limiting writes to reach the MTBF endurance ratings. > without discard, we were down to 3-4MB/s of write performance. after > mounting all the layers with discard (which afaik is the default anyhow > meanwhile in newer kernels), it went back up to the expected 500MB/s+ write > performance. Doh, weird. I've never heard about such behavior in the past :) 'discard' mount option definitely is not the default with any recent kernels as there were quite a few reports of it being detrimental to the performance. > now a fstrim could in *theory* do something similar, however the usage > pattern here is that we create huge filesystems every few minutes to > seconds. if they're not trimmed, then the NVME sees a completely full disk > all the time. which isn't true. So mkfs.ext4 can discard the whole device before creating the filesystem but I guess this is not very useful for you because AFAIU you create the big filesystem, fill it with build, then it gets emptied as we copy-out the RPM and the build artifacts are removed - and this is the moment when you'd like to tell the disk that most of the blocks are actually uninteresting with discard. > I guess we could make a more clever discard once the build job is completed > and zap the entire filesystem that was allocated, but I never got the time > to implement that. Yeah, e.g. running mkfs on the device once you're done with the filesystem will do the job as mkfs.ext4 by default discards the device. |