Bug 1178280 - Libvirt segfaults and causes kernel NULL pointer dereference on 5.9.1-1-default
Summary: Libvirt segfaults and causes kernel NULL pointer dereference on 5.9.1-1-default
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: x86-64 openSUSE Tumbleweed
: P5 - None : Critical (vote)
Target Milestone: ---
Assignee: openSUSE Kernel Bugs
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-29 21:46 UTC by Pavel Artemyev
Modified: 2023-04-26 13:55 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
logs and output of dmidecode -t processor (14.00 KB, text/plain)
2020-10-29 21:46 UTC, Pavel Artemyev
Details
Test fix patch (1.36 KB, patch)
2020-10-30 09:21 UTC, Takashi Iwai
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Pavel Artemyev 2020-10-29 21:46:05 UTC
Created attachment 843144 [details]
logs and output of dmidecode -t processor

Just upgraded to Tumbleweed 20201026 and rebooted. Libvirt was running but didn't respond to anything - "virsh capabilities", "virsh list --all" just froze.
Please, take a look at the attached logs.

After rebooting to previous Kernel, 5.8.15-1-default everything works fine, I don't see anything suspicious in journal. I'm on AMD Ryzen 7 3800X.
Comment 1 Takashi Iwai 2020-10-30 09:13:10 UTC
It's calling with NULL vcpu and leads to Oops at printing an error / debug message via vcpu_unimpl() in kvm_msr_ignored_check().

A simple fix would be to replace vcpu_unimpl() with kvm_pr_unimpl() (also vcpu_debug_ratelimited() with kvm_debug_ratelimited() as well) in kvm_msr_ignored_check().

Adding Joerg to Cc.
Comment 2 Takashi Iwai 2020-10-30 09:21:24 UTC
Created attachment 843153 [details]
Test fix patch
Comment 3 Takashi Iwai 2020-10-30 09:32:59 UTC
The test kernel is being built in OBS home:tiwai:bsc1178280 repo.  Please give it a try later.
Comment 4 Joerg Roedel 2020-10-30 10:33:58 UTC
(In reply to Takashi Iwai from comment #2)
> Created attachment 843153 [details]
> Test fix patch

Yeah, that should fix it. Alternativly we can handle the vcpu == NULL case in these debug printouts:

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7f2e2a09ebbd..eca48467d35c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -521,17 +521,19 @@ struct kvm {
                           task_tgid_nr(current), ## __VA_ARGS__)
 
 /* The guest did something we don't support. */
+#define vcpu_id(vcpu)                                                  \
+               (vcpu) ? (vcpu)->vcpu_id : -1
 #define vcpu_unimpl(vcpu, fmt, ...)                                    \
        kvm_pr_unimpl("vcpu%i, guest rIP: 0x%lx " fmt,                  \
-                       (vcpu)->vcpu_id, kvm_rip_read(vcpu), ## __VA_ARGS__)
+                       vcpu_id(vcpu), kvm_rip_read(vcpu), ## __VA_ARGS__)
 
 #define vcpu_debug(vcpu, fmt, ...)                                     \
-       kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
+       kvm_debug("vcpu%i " fmt, vcpu_id(vcpu), ## __VA_ARGS__)
 #define vcpu_debug_ratelimited(vcpu, fmt, ...)                         \
-       kvm_debug_ratelimited("vcpu%i " fmt, (vcpu)->vcpu_id,           \
+       kvm_debug_ratelimited("vcpu%i " fmt, vcpu_id(vcpu),             \
                              ## __VA_ARGS__)
 #define vcpu_err(vcpu, fmt, ...)                                       \
-       kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
+       kvm_err("vcpu%i " fmt, vcpu_id(vcpu), ## __VA_ARGS__)
 
 static inline bool kvm_dirty_log_manual_protect_and_init_set(struct kvm *kvm)
 {
Comment 5 Takashi Iwai 2020-10-30 11:21:48 UTC
(In reply to Joerg Roedel from comment #4)
> (In reply to Takashi Iwai from comment #2)
> > Created attachment 843153 [details]
> > Test fix patch
> 
> Yeah, that should fix it. Alternativly we can handle the vcpu == NULL case
> in these debug printouts:
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7f2e2a09ebbd..eca48467d35c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -521,17 +521,19 @@ struct kvm {
>                            task_tgid_nr(current), ## __VA_ARGS__)
>  
>  /* The guest did something we don't support. */
> +#define vcpu_id(vcpu)                                                  \
> +               (vcpu) ? (vcpu)->vcpu_id : -1
>  #define vcpu_unimpl(vcpu, fmt, ...)                                    \
>         kvm_pr_unimpl("vcpu%i, guest rIP: 0x%lx " fmt,                  \
> -                       (vcpu)->vcpu_id, kvm_rip_read(vcpu), ## __VA_ARGS__)
> +                       vcpu_id(vcpu), kvm_rip_read(vcpu), ## __VA_ARGS__)
>  
>  #define vcpu_debug(vcpu, fmt, ...)                                     \
> -       kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
> +       kvm_debug("vcpu%i " fmt, vcpu_id(vcpu), ## __VA_ARGS__)
>  #define vcpu_debug_ratelimited(vcpu, fmt, ...)                         \
> -       kvm_debug_ratelimited("vcpu%i " fmt, (vcpu)->vcpu_id,           \
> +       kvm_debug_ratelimited("vcpu%i " fmt, vcpu_id(vcpu),             \
>                               ## __VA_ARGS__)
>  #define vcpu_err(vcpu, fmt, ...)                                       \
> -       kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
> +       kvm_err("vcpu%i " fmt, vcpu_id(vcpu), ## __VA_ARGS__)
>  
>  static inline bool kvm_dirty_log_manual_protect_and_init_set(struct kvm
> *kvm)
>  {

Yes, that should work, too.  I just chose a shorter change :)
Comment 6 Pavel Artemyev 2020-10-30 12:18:53 UTC
(In reply to Takashi Iwai from comment #3)
> The test kernel is being built in OBS home:tiwai:bsc1178280 repo.  Please
> give it a try later.

Just booted it (5.9.2-1.g1706d43-default). No problems whatsoever, everything works.
There is just a few messages:
"Oct 30 14:30:32 HAL9001 kernel: kvm [3474]: ignored rdmsr: 0x48b data 0x0"
Thank you!

Also, with your patch there is no messages like "Failed to open file '/sys/kernel/security/apparmor/profiles': Permission denied" which I had on 5.9.1-1-default.

By the way, that does it mean, "NULL vcpu"? Do I have a misconfigured virtualization stack?
Comment 7 Takashi Iwai 2020-10-30 12:53:55 UTC
(In reply to Pavel Artemyev from comment #6)
> By the way, that does it mean, "NULL vcpu"? Do I have a misconfigured
> virtualization stack?

I meant a NULL pointer is passed to vcpu argument of the function, and it was the cause of Oops.  It's just about the code, and not about configuration or such :)
Comment 8 Takashi Iwai 2020-10-30 13:17:48 UTC
(In reply to Takashi Iwai from comment #5)
> Yes, that should work, too.  I just chose a shorter change :)

And I noticed that vcpu isn't used any longer in that function after my patch, so the argument itself can be dropped -- which will lead to a bigger change, after all.

In anyway, now the fix was confirmed.  Let's toss to upstream.
Comment 9 Takashi Iwai 2020-11-03 17:23:09 UTC
My fixed merged to upstream, and backported to stable git branch now.
It'll be included in TW update, likely after 5.9.3 update.