Bug 1212174

Summary: libvirtd[8873]: Unable to read from monitor: Connection reset by peer
Product: [openSUSE] openSUSE Tumbleweed Reporter: Mark Petersen <petersenmde>
Component: Virtualization:OtherAssignee: E-mail List <kvm-bugs>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: dfaggioli, jfehlig, petersenmde
Version: CurrentFlags: dfaggioli: needinfo? (petersenmde)
dfaggioli: SHIP_STOPPER?
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE Tumbleweed   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Mark Petersen 2023-06-09 13:07:22 UTC
I am unable to run any VM's under virsh:

localhost:~ # virsh start debian11
error: Failed to start domain 'debian11'
error: An error occurred, but the cause is unknown


localhost:~ # systemctl status libvirtd
○ libvirtd.service - Virtualization daemon
     Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; preset: disabled)
     Active: inactive (dead) since Thu 2023-06-08 08:11:44 CDT; 23h ago
   Duration: 2min 114ms
TriggeredBy: ● libvirtd-ro.socket
             ● libvirtd.socket
             ● libvirtd-admin.socket
       Docs: man:libvirtd(8)
             https://libvirt.org
    Process: 8873 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=exited, status=0/SUCCESS)
   Main PID: 8873 (code=exited, status=0/SUCCESS)
      Tasks: 2 (limit: 32768)
        CPU: 996ms
     CGroup: /system.slice/libvirtd.service
             ├─1906 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec>
             └─1907 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec>

Jun 08 08:09:44 localhost libvirtd[8873]: hostname: localhost
Jun 08 08:09:44 localhost libvirtd[8873]: Unable to read from monitor: Connection reset by peer
Jun 08 08:09:44 localhost libvirtd[8873]: Failed to probe capabilities for /usr/bin/qemu-system-x86_64: Unable to read from monitor: >
Jun 08 08:09:45 localhost libvirtd[8873]: Unable to read from monitor: Connection reset by peer
Jun 08 08:09:45 localhost libvirtd[8873]: Failed to probe capabilities for /usr/bin/qemu-system-x86_64: Unable to read from monitor: >
Jun 08 08:09:46 localhost libvirtd[8873]: Unable to read from monitor: Connection reset by peer
Jun 08 08:09:46 localhost libvirtd[8873]: Failed to probe capabilities for /usr/bin/qemu-system-x86_64: Unable to read from monitor: >
Jun 08 08:11:44 localhost systemd[1]: libvirtd.service: Deactivated successfully.
Jun 08 08:11:44 localhost systemd[1]: libvirtd.service: Unit process 1906 (dnsmasq) remains running after unit stopped.
Jun 08 08:11:44 localhost systemd[1]: libvirtd.service: Unit process 1907 (dnsmasq) remains running after unit stopped.


localhost:~ # virt-host-validate
  QEMU: Checking for hardware virtualization                                 : PASS
  QEMU: Checking if device /dev/kvm exists                                   : PASS
  QEMU: Checking if device /dev/kvm is accessible                            : PASS
  QEMU: Checking if device /dev/vhost-net exists                             : PASS
  QEMU: Checking if device /dev/net/tun exists                               : PASS
  QEMU: Checking for cgroup 'cpu' controller support                         : PASS
  QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
  QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
  QEMU: Checking for cgroup 'memory' controller support                      : PASS
  QEMU: Checking for cgroup 'devices' controller support                     : PASS
  QEMU: Checking for cgroup 'blkio' controller support                       : PASS
  QEMU: Checking for device assignment IOMMU support                         : PASS
  QEMU: Checking if IOMMU is enabled by kernel                               : PASS
  QEMU: Checking for secure guest support                                    : WARN (AMD Secure Encrypted Virtualization appears to be disabled in firmware.)
   LXC: Checking for Linux >= 2.6.26                                         : PASS
   LXC: Checking for namespace ipc                                           : PASS
   LXC: Checking for namespace mnt                                           : PASS
   LXC: Checking for namespace pid                                           : PASS
   LXC: Checking for namespace uts                                           : PASS
   LXC: Checking for namespace net                                           : PASS
   LXC: Checking for namespace user                                          : PASS
   LXC: Checking for cgroup 'cpu' controller support                         : PASS
   LXC: Checking for cgroup 'cpuacct' controller support                     : PASS
   LXC: Checking for cgroup 'cpuset' controller support                      : PASS
   LXC: Checking for cgroup 'memory' controller support                      : PASS
   LXC: Checking for cgroup 'devices' controller support                     : PASS
   LXC: Checking for cgroup 'freezer' controller support                     : FAIL (Enable 'freezer' in kernel Kconfig file or mount/enable cgroup controller in your system)
   LXC: Checking for cgroup 'blkio' controller support                       : PASS
   LXC: Checking if device /sys/fs/fuse/connections exists                   : PASS


localhost:~ # cat /proc/cpuinfo
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 23
model		: 8
model name	: AMD Ryzen 7 2700X Eight-Core Processor
stepping	: 2
microcode	: 0x800820d
cpu MHz		: 2194.541
cache size	: 512 KB
physical id	: 0
siblings	: 16
core id		: 0
cpu cores	: 8
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es
bugs		: sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb
bogomips	: 7385.61
TLB size	: 2560 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

localhost:~ # zypper info libvirt-daemon
Loading repository data...
Reading installed packages...


Information for package libvirt-daemon:
---------------------------------------
Repository     : Main Repository (OSS)
Name           : libvirt-daemon
Version        : 9.4.0-1.1
Arch           : x86_64
Vendor         : openSUSE
Installed Size : 569.6 KiB
Installed      : Yes (automatically)
Status         : up-to-date
Source package : libvirt-9.4.0-1.1.src

    
localhost:~ # zypper info qemu-x86
Loading repository data...
Reading installed packages...


Information for package qemu-x86:
---------------------------------
Repository     : Main Repository (OSS)
Name           : qemu-x86
Version        : 8.0.2-1.1
Arch           : x86_64
Vendor         : openSUSE
Installed Size : 38.9 MiB
Installed      : Yes
Status         : up-to-date
Source package : qemu-8.0.2-1.1.src
Comment 1 James Fehlig 2023-06-09 16:57:14 UTC
(In reply to Mark Petersen from comment #0)
> I am unable to run any VM's under virsh:

Is this a regression caused by a regular update of your Tumbleweed system? If so, can you determine which component update (qemu or libvirt) caused it?

> Jun 08 08:09:44 localhost libvirtd[8873]: Unable to read from monitor:
> Connection reset by peer
> Jun 08 08:09:44 localhost libvirtd[8873]: Failed to probe capabilities for
> /usr/bin/qemu-system-x86_64: Unable to read from monitor: >

This usually indicates qemu has crashed or aborted. Do you see any indication of that? E.g. 'coredumpctl list' or any coredumps in /var/lib/systemd/coredump/?

Enabling more debug info from the libvirtd qemu driver could be helpful too. E.g. a log filter like the following in /etc/libvirt/libvirtd.conf

log_filters="1:qemu"
Comment 2 Mark Petersen 2023-06-09 18:29:46 UTC
The problem began after the 5 June update.

On 5 June both livirt & qemu were updated:
libvirt-* was updated from 9.3.0-1.1 to 9.4.0-1.1
qemu-* was updated from 8.0.0-1.1 to 8.0.2-1.1


localhost:~ # coredumpctl dump 9913 --output=core.dump
           PID: 9913 (qemu-system-x86)
           UID: 458 (qemu)
           GID: 458 (qemu)
        Signal: 6 (ABRT)
     Timestamp: Fri 2023-06-09 08:05:35 CDT (5h 20min ago)
  Command Line: /usr/bin/qemu-system-x86_64 -S -no-user-config -nodefaults -nographic -machine none,accel=kvm:tcg -qmp unix:/var/lib/libvirt/qemu/qmp-0R2P61/qmp.monitor,server=on,wait=off -pidfile /var/lib/libvirt/qemu/qmp-0R2P61/qmp.pid -daemonize
    Executable: /usr/bin/qemu-system-x86_64
 Control Group: /system.slice/libvirtd.service
          Unit: libvirtd.service
         Slice: system.slice
       Boot ID: e87af9eca069493e8b745fa5ea5f5909
    Machine ID: bc01512e69d44962bd005a9930dabc3a
      Hostname: localhost
       Storage: /var/lib/systemd/coredump/core.qemu-system-x86.458.e87af9eca069493e8b745fa5ea5f5909.9913.1686315935000000.zst (present)
  Size on Disk: 1.4M
       Message: Process 9913 (qemu-system-x86) of user 458 dumped core.
                
                Stack trace of thread 9913:
                #0  0x00007f0428f4aa7c __pthread_kill_implementation (libc.so.6 + 0x8fa7c)
                #1  0x00007f0428ef9226 raise (libc.so.6 + 0x3e226)
                #2  0x00007f0428ee1897 abort (libc.so.6 + 0x26897)
                #3  0x00007f0428ee17ab __assert_fail_base.cold (libc.so.6 + 0x267ab)
                #4  0x00007f0428ef14b6 __assert_fail (libc.so.6 + 0x364b6)
                #5  0x000056539bf465fe module_load (qemu-system-x86_64 + 0xa955fe)
                #6  0x000056539bf4613c module_load (qemu-system-x86_64 + 0xa9513c)
                #7  0x000056539bf46833 module_load_qom_all (qemu-system-x86_64 + 0xa95833)
                #8  0x000056539bf07832 qmp_marshal_qom_list_types (qemu-system-x86_64 + 0xa56832)
                #9  0x000056539bf35398 n/a (qemu-system-x86_64 + 0xa84398)
                #10 0x000056539bf4caf5 aio_bh_poll (qemu-system-x86_64 + 0xa9baf5)
                #11 0x000056539bf4055d aio_dispatch (qemu-system-x86_64 + 0xa8f55d)
                #12 0x000056539bf560ee n/a (qemu-system-x86_64 + 0xaa50ee)
                #13 0x00007f04294ca8d8 g_main_context_dispatch (libglib-2.0.so.0 + 0x5d8d8)
                #14 0x000056539bf599c8 main_loop_wait (qemu-system-x86_64 + 0xaa89c8)
                #15 0x000056539b91ecef qemu_default_main (qemu-system-x86_64 + 0x46dcef)
                #16 0x00007f0428ee2bb0 __libc_start_call_main (libc.so.6 + 0x27bb0)
                #17 0x00007f0428ee2c79 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x27c79)
                #18 0x000056539b91ab55 _start (qemu-system-x86_64 + 0x469b55)
                
                Stack trace of thread 9914:
                #0  0x00007f0428fc83dd syscall (libc.so.6 + 0x10d3dd)
                #1  0x000056539bf3bdaa qemu_event_wait (qemu-system-x86_64 + 0xa8adaa)
                #2  0x000056539bf51071 n/a (qemu-system-x86_64 + 0xaa0071)
                #3  0x000056539bf37178 n/a (qemu-system-x86_64 + 0xa86178)
                #4  0x00007f0428f48c24 start_thread (libc.so.6 + 0x8dc24)
                #5  0x00007f0428fd0510 __clone3 (libc.so.6 + 0x115510)
                
                Stack trace of thread 9917:
                #0  0x00007f0428fc244f __poll (libc.so.6 + 0x10744f)
                #1  0x00007f04294cac5e n/a (libglib-2.0.so.0 + 0x5dc5e)
                #2  0x00007f04294caf9f g_main_loop_run (libglib-2.0.so.0 + 0x5df9f)
                #3  0x000056539bde73d1 n/a (qemu-system-x86_64 + 0x9363d1)
                #4  0x000056539bf37178 n/a (qemu-system-x86_64 + 0xa86178)
                #5  0x00007f0428f48c24 start_thread (libc.so.6 + 0x8dc24)
                #6  0x00007f0428fd0510 __clone3 (libc.so.6 + 0x115510)
                ELF object binary architecture: AMD x86-64
Comment 3 James Fehlig 2023-06-09 20:02:34 UTC
Reassigning to kvm-bugs to investigate the qemu abort.
Comment 4 Dario Faggioli 2023-06-19 20:43:17 UTC
(In reply to James Fehlig from comment #3)
> Reassigning to kvm-bugs to investigate the qemu abort.

Right. One question, though. Comment 0 shows that libvirtd is being used... Shouldn't it be virtqemud?
Comment 5 Dario Faggioli 2023-06-19 20:47:44 UTC
(In reply to Mark Petersen from comment #2)
> The problem began after the 5 June update.
> 
> On 5 June both livirt & qemu were updated:
> libvirt-* was updated from 9.3.0-1.1 to 9.4.0-1.1
> qemu-* was updated from 8.0.0-1.1 to 8.0.2-1.1
> 
I still have to try an AMD machine, but for now, I cannot reproduce this.

That is, I have a fully updated Tumbleweed sysem on which VMs start just fine.

Can you share more about your host setup and VMs configuration?

Can you rollback to a snapshot from earlier than the latest QEMU update, confirm that everything works there and, if yes, perhaps update just qemu (`zypper ref && zypper in qemu-x86`, I think) and try again?
Comment 6 Mark Petersen 2023-06-19 21:02:15 UTC
I apologize for not posting back here sooner - Life...

On the 12th of June, an update of qemu to 8.0.2-1.2 and libvirt to 9.4.0-2.1 fixed this for me on my AMD machine.

What ever the issue was, (it seemed like a qemu problem) it did not affect my work computer which has an Intel CPU.
Comment 7 James Fehlig 2023-06-19 21:40:05 UTC
(In reply to Dario Faggioli from comment #4)
> Right. One question, though. Comment 0 shows that libvirtd is being used...
> Shouldn't it be virtqemud?

ATM, only on fresh installs and only if the user enabled virtqemud. But e.g. someone running TW before modular daemons were preferred would be using libvirtd, and subsequent updates of TW should not change that.
Comment 8 Dario Faggioli 2023-06-20 14:45:37 UTC
(In reply to Mark Petersen from comment #6)
> I apologize for not posting back here sooner - Life...
> 
> On the 12th of June, an update of qemu to 8.0.2-1.2 and libvirt to 9.4.0-2.1
> fixed this for me on my AMD machine.
> 
>
Well, this is particularly "interesting", as that update is, as far as I know, "just" a rebuild (or something like that, I wasn't even there as I was on time off :-P).

Anyway, good to know that it works now. I'm closing this (as FIXED, although we're not sure what was wrong and what and how it got fixed...), but do let us know if something breaks again.