Bugzilla – Bug 1178280
Libvirt segfaults and causes kernel NULL pointer dereference on 5.9.1-1-default
Last modified: 2023-04-26 13:55:06 UTC
Created attachment 843144 [details] logs and output of dmidecode -t processor Just upgraded to Tumbleweed 20201026 and rebooted. Libvirt was running but didn't respond to anything - "virsh capabilities", "virsh list --all" just froze. Please, take a look at the attached logs. After rebooting to previous Kernel, 5.8.15-1-default everything works fine, I don't see anything suspicious in journal. I'm on AMD Ryzen 7 3800X.
It's calling with NULL vcpu and leads to Oops at printing an error / debug message via vcpu_unimpl() in kvm_msr_ignored_check(). A simple fix would be to replace vcpu_unimpl() with kvm_pr_unimpl() (also vcpu_debug_ratelimited() with kvm_debug_ratelimited() as well) in kvm_msr_ignored_check(). Adding Joerg to Cc.
Created attachment 843153 [details] Test fix patch
The test kernel is being built in OBS home:tiwai:bsc1178280 repo. Please give it a try later.
(In reply to Takashi Iwai from comment #2) > Created attachment 843153 [details] > Test fix patch Yeah, that should fix it. Alternativly we can handle the vcpu == NULL case in these debug printouts: diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7f2e2a09ebbd..eca48467d35c 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -521,17 +521,19 @@ struct kvm { task_tgid_nr(current), ## __VA_ARGS__) /* The guest did something we don't support. */ +#define vcpu_id(vcpu) \ + (vcpu) ? (vcpu)->vcpu_id : -1 #define vcpu_unimpl(vcpu, fmt, ...) \ kvm_pr_unimpl("vcpu%i, guest rIP: 0x%lx " fmt, \ - (vcpu)->vcpu_id, kvm_rip_read(vcpu), ## __VA_ARGS__) + vcpu_id(vcpu), kvm_rip_read(vcpu), ## __VA_ARGS__) #define vcpu_debug(vcpu, fmt, ...) \ - kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__) + kvm_debug("vcpu%i " fmt, vcpu_id(vcpu), ## __VA_ARGS__) #define vcpu_debug_ratelimited(vcpu, fmt, ...) \ - kvm_debug_ratelimited("vcpu%i " fmt, (vcpu)->vcpu_id, \ + kvm_debug_ratelimited("vcpu%i " fmt, vcpu_id(vcpu), \ ## __VA_ARGS__) #define vcpu_err(vcpu, fmt, ...) \ - kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__) + kvm_err("vcpu%i " fmt, vcpu_id(vcpu), ## __VA_ARGS__) static inline bool kvm_dirty_log_manual_protect_and_init_set(struct kvm *kvm) {
(In reply to Joerg Roedel from comment #4) > (In reply to Takashi Iwai from comment #2) > > Created attachment 843153 [details] > > Test fix patch > > Yeah, that should fix it. Alternativly we can handle the vcpu == NULL case > in these debug printouts: > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 7f2e2a09ebbd..eca48467d35c 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -521,17 +521,19 @@ struct kvm { > task_tgid_nr(current), ## __VA_ARGS__) > > /* The guest did something we don't support. */ > +#define vcpu_id(vcpu) \ > + (vcpu) ? (vcpu)->vcpu_id : -1 > #define vcpu_unimpl(vcpu, fmt, ...) \ > kvm_pr_unimpl("vcpu%i, guest rIP: 0x%lx " fmt, \ > - (vcpu)->vcpu_id, kvm_rip_read(vcpu), ## __VA_ARGS__) > + vcpu_id(vcpu), kvm_rip_read(vcpu), ## __VA_ARGS__) > > #define vcpu_debug(vcpu, fmt, ...) \ > - kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__) > + kvm_debug("vcpu%i " fmt, vcpu_id(vcpu), ## __VA_ARGS__) > #define vcpu_debug_ratelimited(vcpu, fmt, ...) \ > - kvm_debug_ratelimited("vcpu%i " fmt, (vcpu)->vcpu_id, \ > + kvm_debug_ratelimited("vcpu%i " fmt, vcpu_id(vcpu), \ > ## __VA_ARGS__) > #define vcpu_err(vcpu, fmt, ...) \ > - kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__) > + kvm_err("vcpu%i " fmt, vcpu_id(vcpu), ## __VA_ARGS__) > > static inline bool kvm_dirty_log_manual_protect_and_init_set(struct kvm > *kvm) > { Yes, that should work, too. I just chose a shorter change :)
(In reply to Takashi Iwai from comment #3) > The test kernel is being built in OBS home:tiwai:bsc1178280 repo. Please > give it a try later. Just booted it (5.9.2-1.g1706d43-default). No problems whatsoever, everything works. There is just a few messages: "Oct 30 14:30:32 HAL9001 kernel: kvm [3474]: ignored rdmsr: 0x48b data 0x0" Thank you! Also, with your patch there is no messages like "Failed to open file '/sys/kernel/security/apparmor/profiles': Permission denied" which I had on 5.9.1-1-default. By the way, that does it mean, "NULL vcpu"? Do I have a misconfigured virtualization stack?
(In reply to Pavel Artemyev from comment #6) > By the way, that does it mean, "NULL vcpu"? Do I have a misconfigured > virtualization stack? I meant a NULL pointer is passed to vcpu argument of the function, and it was the cause of Oops. It's just about the code, and not about configuration or such :)
(In reply to Takashi Iwai from comment #5) > Yes, that should work, too. I just chose a shorter change :) And I noticed that vcpu isn't used any longer in that function after my patch, so the argument itself can be dropped -- which will lead to a bigger change, after all. In anyway, now the fix was confirmed. Let's toss to upstream.
My fixed merged to upstream, and backported to stable git branch now. It'll be included in TW update, likely after 5.9.3 update.