Bug 1185570 - New kernel breaks bcache mounted rootfs
New kernel breaks bcache mounted rootfs
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
x86-64 openSUSE Tumbleweed
: P5 - None : Critical (vote)
: ---
Assigned To: Coly Li
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2021-05-03 13:01 UTC by Diego Ercolani
Modified: 2023-01-18 16:47 UTC (History)
4 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
dmesg file carrying a single kernel oops (19.94 KB, application/gzip)
2021-05-03 13:42 UTC, Diego Ercolani
Details
systemd-journal file showing a bunch of kernel oops while changing from runlevel 1 to runlevel 3 see with journalctl --file=./system.journal (798.50 KB, application/gzip)
2021-05-03 13:45 UTC, Diego Ercolani
Details
dmesg booting from bsc1185570 kernel (20.29 KB, application/gzip)
2021-05-05 12:46 UTC, Diego Ercolani
Details
bcache: avoid oversized read request in cache missing code path (8.30 KB, application/mbox)
2021-05-19 15:41 UTC, Coly Li
Details
boot log 5.15.5 (47.00 KB, application/gzip)
2021-12-03 11:10 UTC, Diego Ercolani
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Diego Ercolani 2021-05-03 13:01:10 UTC
kernel-devel-5.12.0-1.2.noarch
breaks completely the bcache mounted rootfs
I verified that rolling back to 5.11.16 make bcache working again
Comment 1 Takashi Iwai 2021-05-03 13:09:54 UTC
Could you elaborate what's exactly broken and how?

In anyway, adding bcache maintainer to Cc.
Comment 2 Diego Ercolani 2021-05-03 13:42:26 UTC
Created attachment 848980 [details]
dmesg file carrying a single kernel oops

Ok, sorry but it's very difficult to have a complete kernel dump as the system start to printout kernel panic at full speed.
Normally if I jump into runlevel 1 the system seem to start correctly, as soon as I jump in another runlevel (eg. 3) system start to printout kernel panics in the console and is not possible to have any info.
One time, when I jumped into runlevel 1, in the dmesg there was a kernel panic that I can include (dmesg.gz)
Comment 3 Diego Ercolani 2021-05-03 13:45:47 UTC
Created attachment 848981 [details]
systemd-journal file showing a bunch of kernel oops while changing from runlevel 1 to runlevel 3 see with journalctl --file=./system.journal
Comment 4 Takashi Iwai 2021-05-03 14:35:38 UTC
Thanks!

The Oops in comment 2 indicates that the bcache tries to call bio_alloc_bioset() with too many nr_vecs.  In 5.11.x kernel, bio_alloc_bioset() returned NULL in such a case without complaints, but now it hits the kernel panic instead.  The BUG() call is intentional, but it doesn't look like the most helpful way...

The call pattern is via cached_dev_cache_miss(), and it calculates the nr_vecs like
  DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS)
and this is likely over BIO_MAX_VECS (=256).

Dropping BUG() call in bio.c should restore the old behavior (although there is still another WARN_ON()), but the real fix is needed rather in the caller side in bcache code, I suppose.
Comment 5 Takashi Iwai 2021-05-04 16:59:06 UTC
A test kernel with the drop of BUG() call is being built in OBS home:tiwai:bsc1185570 repo.  It'll be available later at
  http://download.opensuse.org/repositories/home:/tiwai:/bsc1185570/standard/

Please give it a try later.

Note that the kernel will likely show a WARNING with stack trace once in your case.  It's expected behavior, and the kernel isn't supposed to be the right "fix".  The only point here is to check whether it can go forward over the BUG() call.
Comment 6 Diego Ercolani 2021-05-05 12:46:02 UTC
Created attachment 849061 [details]
dmesg booting from bsc1185570 kernel

As said, now I have a single kernel oops
Comment 7 Takashi Iwai 2021-05-05 12:54:27 UTC
Thanks.  It's no Oops but the normal kernel warning with stack trace, as expected.  So far, so good.

Usually this can be fixed by capping nr_iovecs via bio_max_segs().  But as I don't know the details of bcache, I reassign this bug to Coly.
Comment 8 Coly Li 2021-05-06 01:59:18 UTC
(In reply to Takashi Iwai from comment #7)
> Thanks.  It's no Oops but the normal kernel warning with stack trace, as
> expected.  So far, so good.
> 
> Usually this can be fixed by capping nr_iovecs via bio_max_segs().  But as I
> don't know the details of bcache, I reassign this bug to Coly.

I am back from public days, now I look into the bcache part.

Thanks.

Coly Li
Comment 9 Coly Li 2021-05-06 02:52:29 UTC
(In reply to Coly Li from comment #8)
> (In reply to Takashi Iwai from comment #7)
> > Thanks.  It's no Oops but the normal kernel warning with stack trace, as
> > expected.  So far, so good.
> > 
> > Usually this can be fixed by capping nr_iovecs via bio_max_segs().  But as I
> > don't know the details of bcache, I reassign this bug to Coly.
> 
> I am back from public days, now I look into the bcache part.

There are similar reports from mailing list. A testing patch is posted to linux-bcache mailing list, for other reporters to test and verify.

Coly Li
Comment 10 Diego Ercolani 2021-05-13 10:44:38 UTC
5.12.2 released from opensuse but it have the same problem, the difference is that I have only a single kernel oops but the system is slowed down in an unmanageable way
Comment 11 Takashi Iwai 2021-05-17 11:52:34 UTC
Coly, given the severity of the bug, could you put a temporary workaround to stable branch at least (e.g. just drop BUG() call)?  Once after the proper upstream fix arrives, we can replace with it.
Comment 12 Coly Li 2021-05-17 13:47:12 UTC
(In reply to Takashi Iwai from comment #11)
> Coly, given the severity of the bug, could you put a temporary workaround to
> stable branch at least (e.g. just drop BUG() call)?  Once after the proper
> upstream fix arrives, we can replace with it.

Current bcache code exceeds two size limitations in the cache miss code path. In the past days I working on the fixes and today it seems a better solution comes to  a proper shape.

It looks like this,

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 29c231758293..cd0431fd9d20 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -515,18 +515,25 @@ static int cache_lookup_fn(struct btree_op *op, struct btree *b, struct bkey *k)
        struct search *s = container_of(op, struct search, op);
        struct bio *n, *bio = &s->bio.bio;
        struct bkey *bio_key;
-       unsigned int ptr;
+       unsigned int ptr, max_cache_miss_size;

        if (bkey_cmp(k, &KEY(s->iop.inode, bio->bi_iter.bi_sector, 0)) <= 0)
                return MAP_CONTINUE;

+       /*
+        * Make sure the cache missing size won't exceed the restrictions of
+        * max bkey size and max bio's bi_max_vecs.
+        */
+       max_cache_miss_size = min_t(uint64_t,
+               (1 << KEY_SIZE_BITS) - 1, BIO_MAX_VECS * PAGE_SECTORS);
+
        if (KEY_INODE(k) != s->iop.inode ||
            KEY_START(k) > bio->bi_iter.bi_sector) {
                unsigned int bio_sectors = bio_sectors(bio);
                unsigned int sectors = KEY_INODE(k) == s->iop.inode
-                       ? min_t(uint64_t, INT_MAX,
+                       ? min_t(uint64_t, max_cache_miss_size,
                                KEY_START(k) - bio->bi_iter.bi_sector)
-                       : INT_MAX;
+                       : max_cache_miss_size;
                int ret = s->d->cache_miss(b, s, bio, sectors);

                if (ret != MAP_CONTINUE)
@@ -547,7 +554,7 @@ static int cache_lookup_fn(struct btree_op *op, struct btree *b, struct bkey *k)
        if (KEY_DIRTY(k))
                s->read_dirty_data = true;

-       n = bio_next_split(bio, min_t(uint64_t, INT_MAX,
+       n = bio_next_split(bio, min_t(uint64_t, max_cache_miss_size,
                                      KEY_OFFSET(k) - bio->bi_iter.bi_sector),
                           GFP_NOIO, &s->d->bio_split);

But I need to have 1 or 2 days to test and verify. I hope it can be finally in next 2 days.

P.S previous work around just avoid the panic but won't cache the missing data, this is why I didn't post it.

Coly Li
Comment 13 Coly Li 2021-05-19 15:41:07 UTC
Created attachment 849484 [details]
bcache: avoid oversized read request in cache missing code path

The is the patch I posted to linux-bcache (Cc linux-block and linux-kernel) as a fix for the reported issue.

This is also related to another bug and our customer is testing now. This patch survived from my pressure testing, once we have positive response from customer (or community users as well), I will do the back port.

Coly Li
Comment 14 Coly Li 2021-05-21 04:11:25 UTC
(In reply to Coly Li from comment #13)
> Created attachment 849484 [details]
> bcache: avoid oversized read request in cache missing code path
> 
> The is the patch I posted to linux-bcache (Cc linux-block and linux-kernel)
> as a fix for the reported issue.
> 
> This is also related to another bug and our customer is testing now. This
> patch survived from my pressure testing, once we have positive response from
> customer (or community users as well), I will do the back port.

Currently it seems this fix works. Although there are code review comments for a better patch, IMHO we can have this patch in, and replace it with upstream version later after the finally patch merged into mainline kernel.

Coly Li
Comment 15 Bodo Eggert 2021-05-25 12:26:47 UTC
I'd be a potential tester, too.
Comment 16 Coly Li 2021-05-25 13:16:10 UTC
(In reply to Bodo Eggert from comment #15)
> I'd be a potential tester, too.

I will add the fast fix to our kernel very soon. And the fast fix will be replaced with upstream version once it merged into kernel finally.

Coly Li
Comment 17 Diego Ercolani 2021-05-25 15:25:25 UTC
New kernel 5.12.4, same issue so we are waiting
Comment 18 Coly Li 2021-05-25 15:28:04 UTC
(In reply to Diego Ercolani from comment #17)
> New kernel 5.12.4, same issue so we are waiting

OK, working on the fast fix backport now. Please notice: this is not final upstream version.

Coly Li
Comment 19 Coly Li 2021-06-19 13:33:44 UTC
(In reply to Coly Li from comment #18)
> (In reply to Diego Ercolani from comment #17)
> > New kernel 5.12.4, same issue so we are waiting
> 
> OK, working on the fast fix backport now. Please notice: this is not final
> upstream version.
> 

Patches are submitted and accepted into SLE15-SP3 kernel.

Coly Li
Comment 21 Diego Ercolani 2021-06-23 08:29:49 UTC
Installed kernel-default-5.12.12-1.1 via the "zypper dup" command and suse repositories
issue seems resolved

Thank you
Comment 27 Swamp Workflow Management 2021-06-28 19:26:05 UTC
openSUSE-SU-2021:2184-1: An update that solves four vulnerabilities and has 107 fixes is now available.

Category: security (important)
Bug References: 1087082,1152489,1154353,1174978,1176447,1176771,1177666,1178134,1178378,1178612,1179610,1182999,1183712,1184259,1184436,1184631,1185195,1185428,1185497,1185570,1185589,1185675,1185701,1186155,1186286,1186460,1186463,1186472,1186501,1186672,1186677,1186681,1186752,1186885,1186928,1186949,1186950,1186951,1186952,1186953,1186954,1186955,1186956,1186957,1186958,1186959,1186960,1186961,1186962,1186963,1186964,1186965,1186966,1186967,1186968,1186969,1186970,1186971,1186972,1186973,1186974,1186976,1186977,1186978,1186979,1186980,1186981,1186982,1186983,1186984,1186985,1186986,1186987,1186988,1186989,1186990,1186991,1186992,1186993,1186994,1186995,1186996,1186997,1186998,1186999,1187000,1187001,1187002,1187003,1187038,1187039,1187050,1187052,1187067,1187068,1187069,1187072,1187143,1187144,1187167,1187334,1187344,1187345,1187346,1187347,1187348,1187349,1187350,1187351,1187357,1187711
CVE References: CVE-2020-26558,CVE-2020-36385,CVE-2020-36386,CVE-2021-0129
JIRA References: 
Sources used:
openSUSE Leap 15.3 (src):    kernel-64kb-5.3.18-59.10.1, kernel-debug-5.3.18-59.10.1, kernel-default-5.3.18-59.10.1, kernel-default-base-5.3.18-59.10.1.18.4.2, kernel-docs-5.3.18-59.10.1, kernel-kvmsmall-5.3.18-59.10.1, kernel-obs-build-5.3.18-59.10.1, kernel-obs-qa-5.3.18-59.10.1, kernel-preempt-5.3.18-59.10.1, kernel-source-5.3.18-59.10.1, kernel-syms-5.3.18-59.10.1, kernel-zfcpdump-5.3.18-59.10.1
Comment 28 Swamp Workflow Management 2021-06-28 19:58:33 UTC
SUSE-SU-2021:2184-1: An update that solves four vulnerabilities and has 107 fixes is now available.

Category: security (important)
Bug References: 1087082,1152489,1154353,1174978,1176447,1176771,1177666,1178134,1178378,1178612,1179610,1182999,1183712,1184259,1184436,1184631,1185195,1185428,1185497,1185570,1185589,1185675,1185701,1186155,1186286,1186460,1186463,1186472,1186501,1186672,1186677,1186681,1186752,1186885,1186928,1186949,1186950,1186951,1186952,1186953,1186954,1186955,1186956,1186957,1186958,1186959,1186960,1186961,1186962,1186963,1186964,1186965,1186966,1186967,1186968,1186969,1186970,1186971,1186972,1186973,1186974,1186976,1186977,1186978,1186979,1186980,1186981,1186982,1186983,1186984,1186985,1186986,1186987,1186988,1186989,1186990,1186991,1186992,1186993,1186994,1186995,1186996,1186997,1186998,1186999,1187000,1187001,1187002,1187003,1187038,1187039,1187050,1187052,1187067,1187068,1187069,1187072,1187143,1187144,1187167,1187334,1187344,1187345,1187346,1187347,1187348,1187349,1187350,1187351,1187357,1187711
CVE References: CVE-2020-26558,CVE-2020-36385,CVE-2020-36386,CVE-2021-0129
JIRA References: 
Sources used:
SUSE Linux Enterprise Workstation Extension 15-SP3 (src):    kernel-default-5.3.18-59.10.1, kernel-preempt-5.3.18-59.10.1
SUSE Linux Enterprise Module for Live Patching 15-SP3 (src):    kernel-default-5.3.18-59.10.1, kernel-livepatch-SLE15-SP3_Update_2-1-7.5.1
SUSE Linux Enterprise Module for Legacy Software 15-SP3 (src):    kernel-default-5.3.18-59.10.1
SUSE Linux Enterprise Module for Development Tools 15-SP3 (src):    kernel-docs-5.3.18-59.10.1, kernel-obs-build-5.3.18-59.10.1, kernel-preempt-5.3.18-59.10.1, kernel-source-5.3.18-59.10.1, kernel-syms-5.3.18-59.10.1
SUSE Linux Enterprise Module for Basesystem 15-SP3 (src):    kernel-64kb-5.3.18-59.10.1, kernel-default-5.3.18-59.10.1, kernel-default-base-5.3.18-59.10.1.18.4.2, kernel-preempt-5.3.18-59.10.1, kernel-source-5.3.18-59.10.1, kernel-zfcpdump-5.3.18-59.10.1
SUSE Linux Enterprise High Availability 15-SP3 (src):    kernel-default-5.3.18-59.10.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 29 Swamp Workflow Management 2021-06-30 13:31:41 UTC
SUSE-SU-2021:2202-1: An update that solves four vulnerabilities and has 98 fixes is now available.

Category: security (important)
Bug References: 1152489,1154353,1174978,1176447,1176771,1178134,1178612,1179610,1183712,1184259,1184436,1184631,1185195,1185570,1185589,1185675,1185701,1186155,1186286,1186463,1186472,1186672,1186677,1186752,1186885,1186928,1186949,1186950,1186951,1186952,1186953,1186954,1186955,1186956,1186957,1186958,1186959,1186960,1186961,1186962,1186963,1186964,1186965,1186966,1186967,1186968,1186969,1186970,1186971,1186972,1186973,1186974,1186976,1186977,1186978,1186979,1186980,1186981,1186982,1186983,1186984,1186985,1186986,1186987,1186988,1186989,1186990,1186991,1186992,1186993,1186994,1186995,1186996,1186997,1186998,1186999,1187000,1187001,1187002,1187003,1187038,1187039,1187050,1187052,1187067,1187068,1187069,1187072,1187143,1187144,1187167,1187334,1187344,1187345,1187346,1187347,1187348,1187349,1187350,1187351,1187357,1187711
CVE References: CVE-2020-26558,CVE-2020-36385,CVE-2020-36386,CVE-2021-0129
JIRA References: 
Sources used:
SUSE Linux Enterprise Module for Public Cloud 15-SP3 (src):    kernel-azure-5.3.18-38.8.1, kernel-source-azure-5.3.18-38.8.1, kernel-syms-azure-5.3.18-38.8.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 30 Swamp Workflow Management 2021-06-30 14:05:52 UTC
openSUSE-SU-2021:2202-1: An update that solves four vulnerabilities and has 98 fixes is now available.

Category: security (important)
Bug References: 1152489,1154353,1174978,1176447,1176771,1178134,1178612,1179610,1183712,1184259,1184436,1184631,1185195,1185570,1185589,1185675,1185701,1186155,1186286,1186463,1186472,1186672,1186677,1186752,1186885,1186928,1186949,1186950,1186951,1186952,1186953,1186954,1186955,1186956,1186957,1186958,1186959,1186960,1186961,1186962,1186963,1186964,1186965,1186966,1186967,1186968,1186969,1186970,1186971,1186972,1186973,1186974,1186976,1186977,1186978,1186979,1186980,1186981,1186982,1186983,1186984,1186985,1186986,1186987,1186988,1186989,1186990,1186991,1186992,1186993,1186994,1186995,1186996,1186997,1186998,1186999,1187000,1187001,1187002,1187003,1187038,1187039,1187050,1187052,1187067,1187068,1187069,1187072,1187143,1187144,1187167,1187334,1187344,1187345,1187346,1187347,1187348,1187349,1187350,1187351,1187357,1187711
CVE References: CVE-2020-26558,CVE-2020-36385,CVE-2020-36386,CVE-2021-0129
JIRA References: 
Sources used:
openSUSE Leap 15.3 (src):    kernel-azure-5.3.18-38.8.1, kernel-source-azure-5.3.18-38.8.1, kernel-syms-azure-5.3.18-38.8.1
Comment 32 Coly Li 2021-08-23 02:44:40 UTC
The fixes are in stable kernel and our products, people confirm the reported issue is fixed. Here I close this report.
Comment 33 Diego Ercolani 2021-11-23 12:38:44 UTC
Hello, last upgrade (kernel vmlinuz-5.15.2-1-default) broke bcache again
Comment 34 Coly Li 2021-11-23 13:08:37 UTC
(In reply to Diego Ercolani from comment #33)
> Hello, last upgrade (kernel vmlinuz-5.15.2-1-default) broke bcache again

This is from another different regression. My current solution has 3 locations to fix,

1, Revert commit 2fd3e5efe791946be0957c8e1eed9560b541fe46
2, Revert commit  f8b679a070c536600c64a78c83b96aa617f8fa71
3, Do the following change in drivers/md/bcache.c,
@@ -885,9 +885,9 @@ static void bcache_device_free(struct bcache_device *d)

         bcache_device_detach(d);
 
     if (disk) {
-        blk_cleanup_disk(disk);
         ida_simple_remove(&bcache_device_idx,
                   first_minor_to_idx(disk->first_minor));
+        blk_cleanup_disk(disk);
     }


Coly Li
Comment 35 Coly Li 2021-12-03 08:05:48 UTC
(In reply to Diego Ercolani from comment #33)
> Hello, last upgrade (kernel vmlinuz-5.15.2-1-default) broke bcache again

Hi Diego,

Does the suggested fix in comment #34 work?

Coly Li
Comment 36 Diego Ercolani 2021-12-03 11:08:47 UTC
(In reply to Coly Li from comment #35)
Hello,
I didn't understood that you was suggesting me recompile the kernel.
By the way with kernel subrelease -2 & -3 (vmlinuz-5.15.2-3-default) and 5.15.5-1-default the problem disappeared... 
I had not time to investigate or verify log details but it seem 
there are no oops evidences:

5.15.5-1-default dmesg:
[   20.739222] bcache: register_bcache() error : device already registered
[   20.770803] bcache: register_bcache() error : device already registered
[   24.760848] e1000e 0000:00:19.0 eno1: NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
[   24.760855] e1000e 0000:00:19.0 eno1: 10/100 speed: disabling TSO
[   24.760898] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[   24.888044] NET: Registered PF_PACKET protocol family
[  641.887295] BTRFS info (device bcache1): qgroup scan completed (inconsistency flag cleared)
[11498.952152] perf: interrupt took too long (2517 > 2500), lowering kernel.perf_event_max_sample_rate to 79250
[17032.148265] perf: interrupt took too long (3159 > 3146), lowering kernel.perf_event_max_sample_rate to 63250
[18126.766311] perf: interrupt took too long (4011 > 3948), lowering kernel.perf_event_max_sample_rate to 49750
[19895.734321] perf: interrupt took too long (5039 > 5013), lowering kernel.perf_event_max_sample_rate to 39500

I attach the boot log since yesterday evening
Comment 37 Diego Ercolani 2021-12-03 11:10:32 UTC
Created attachment 854277 [details]
boot log 5.15.5

NAME    FSTYPE FSVER LABEL  UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                             
sda1                                                                            
sda2    bcache              7d42c6d3-4087-405e-842c-483103d627f4                
sda3    bcache              e36884b0-361e-43ff-8534-08834789484b                
sda4    ext4   1.0   bootfs 8396a4bf-5694-4afb-97c9-5649e4dd461e    4.3G     2% /boot
sdb                                                                             
sdb2    swap   1            26c8a8e3-a3eb-4fe0-aded-c2d60abcea50                [SWAP]
sdb4    bcache              8b269e3d-bf8a-47af-9884-6aa432385c15                
sdb5    bcache              6dc60da4-e73c-4e64-8b52-5f19f0cd7d92                
sr0                                                                             
bcache0 btrfs        homefs b3c77e21-e124-46fa-855b-90b5b75fe166   95.8G     1% /home
bcache1 btrfs        rootfs 2d0a1196-d6e5-42bf-b251-523a4c32d586   70.8G    23% /root
                                                                                /opt
                                                                                /var
                                                                                /usr/local
                                                                                /srv
                                                                                /.snapshots
                                                                                /
Comment 38 Coly Li 2021-12-03 15:32:55 UTC
(In reply to Diego Ercolani from comment #36)
> (In reply to Coly Li from comment #35)
> Hello,
> I didn't understood that you was suggesting me recompile the kernel.
> By the way with kernel subrelease -2 & -3 (vmlinuz-5.15.2-3-default) and
> 5.15.5-1-default the problem disappeared... 
> I had not time to investigate or verify log details but it seem 
> there are no oops evidences:
> 
> 5.15.5-1-default dmesg:
> [   20.739222] bcache: register_bcache() error : device already registered
> [   20.770803] bcache: register_bcache() error : device already registered
> [   24.760848] e1000e 0000:00:19.0 eno1: NIC Link is Up 100 Mbps Full
> Duplex, Flow Control: Rx/Tx
> [   24.760855] e1000e 0000:00:19.0 eno1: 10/100 speed: disabling TSO
> [   24.760898] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
> [   24.888044] NET: Registered PF_PACKET protocol family
> [  641.887295] BTRFS info (device bcache1): qgroup scan completed
> (inconsistency flag cleared)
> [11498.952152] perf: interrupt took too long (2517 > 2500), lowering
> kernel.perf_event_max_sample_rate to 79250
> [17032.148265] perf: interrupt took too long (3159 > 3146), lowering
> kernel.perf_event_max_sample_rate to 63250
> [18126.766311] perf: interrupt took too long (4011 > 3948), lowering
> kernel.perf_event_max_sample_rate to 49750
> [19895.734321] perf: interrupt took too long (5039 > 5013), lowering
> kernel.perf_event_max_sample_rate to 39500
> 
> I attach the boot log since yesterday evening

OK, maybe the fixes are in stable kernel now. Since you don't encounter the panic, I plan to close this report again.

Thanks.

Coly Li