Bugzilla – Bug 1216415
Ext4 performance regression in fsmark with 6.6-rc1
Last modified: 2023-10-19 11:19:42 UTC
Our performance testing grid has noticed a performance regression in fsmark benchmark on ext4 filesystem due to: 590a809ff743 ("jbd2: check 'jh->b_transaction' before removing it from checkpoint") Bisection results look like: Comparison ========== initial initial last penup last penup first good-v6.5 bad-58720809f527 bad-7ca4b085 bad-768d612f good-373ac521 good-4eea9fbe bad-590a809f Hmean 1-files/sec 75111.58 ( 0.00%) 53329.29 * -29.00%* 50981.18 * -32.13%* 54359.72 * -27.63%* 69390.67 * -7.62%* 72971.96 ( -2.85%) 51931.20 * -30.86%* 1st-qrtle 1-files/sec 82564.00 ( 0.00%) 63280.90 ( -23.36%) 60071.30 ( -27.24%) 62820.30 ( -23.91%) 74996.20 ( -9.17%) 77740.60 ( -5.84%) 62483.80 ( -24.32%) 2nd-qrtle 1-files/sec 75635.50 ( 0.00%) 51772.30 ( -31.55%) 50621.20 ( -33.07%) 54163.10 ( -28.39%) 70801.70 ( -6.39%) 73161.40 ( -3.27%) 51862.40 ( -31.43%) 3rd-qrtle 1-files/sec 69357.70 ( 0.00%) 46942.60 ( -32.32%) 44497.50 ( -35.84%) 48781.00 ( -29.67%) 65062.80 ( -6.19%) 68952.00 ( -0.58%) 47086.20 ( -32.11%) The commit fixes a data consistency issue where a buffer could have been removed from a checkpointing transaction too early and thus filesystem could become inconsistent in case of a crash at unfortunate moment. I suspect the corrected checking whether buffer can be removed from the checkpoint causes the cleanup of checkpointed transaction to stall more waiting for transaction commit and thus the cleanup gets slower and as a result our journal throughput is lowered (which is what generally determines fsmark performance). If that is the case, there's not much we can do (data consistency trumphs performance) but it needs verification. Also we could possibly investigate schemes to more eagerly cleanup checkpointed transactions when the journal is getting full.
Forgot to note in comment 0 that the bisection result comes from simba2.