Bug 1216415 - Ext4 performance regression in fsmark with 6.6-rc1
Summary: Ext4 performance regression in fsmark with 6.6-rc1
Status: NEW
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Jan Kara
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-10-19 11:18 UTC by Jan Kara
Modified: 2023-10-19 11:19 UTC (History)
1 user (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Kara 2023-10-19 11:18:29 UTC
Our performance testing grid has noticed a performance regression in fsmark benchmark on ext4 filesystem due to:

590a809ff743 ("jbd2: check 'jh->b_transaction' before removing it from checkpoint")

Bisection results look like:
Comparison
==========
                                   initial              initial                  last                  penup                   last                  penup                  first
                                 good-v6.5     bad-58720809f527          bad-7ca4b085           bad-768d612f          good-373ac521          good-4eea9fbe           bad-590a809f
Hmean     1-files/sec  75111.58 (   0.00%)  53329.29 * -29.00%*   50981.18 * -32.13%*    54359.72 * -27.63%*    69390.67 *  -7.62%*    72971.96 (  -2.85%)    51931.20 * -30.86%*
1st-qrtle 1-files/sec  82564.00 (   0.00%)  63280.90 ( -23.36%)   60071.30 ( -27.24%)    62820.30 ( -23.91%)    74996.20 (  -9.17%)    77740.60 (  -5.84%)    62483.80 ( -24.32%)
2nd-qrtle 1-files/sec  75635.50 (   0.00%)  51772.30 ( -31.55%)   50621.20 ( -33.07%)    54163.10 ( -28.39%)    70801.70 (  -6.39%)    73161.40 (  -3.27%)    51862.40 ( -31.43%)
3rd-qrtle 1-files/sec  69357.70 (   0.00%)  46942.60 ( -32.32%)   44497.50 ( -35.84%)    48781.00 ( -29.67%)    65062.80 (  -6.19%)    68952.00 (  -0.58%)    47086.20 ( -32.11%)

The commit fixes a data consistency issue where a buffer could have been removed from a checkpointing transaction too early and thus filesystem could become inconsistent in case of a crash at unfortunate moment. I suspect the corrected checking whether buffer can be removed from the checkpoint causes the cleanup of checkpointed transaction to stall more waiting for transaction commit and thus the cleanup gets slower and as a result our journal throughput is lowered (which is what generally determines fsmark performance). If that is the case, there's not much we can do (data consistency trumphs performance) but it needs verification. Also we could possibly investigate schemes to more eagerly cleanup checkpointed transactions when the journal is getting full.
Comment 1 Jan Kara 2023-10-19 11:19:42 UTC
Forgot to note in comment 0 that the bisection result comes from simba2.