View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0003533||Rocky-Linux-8||kernel||public||2023-06-02 20:25||2023-06-02 20:36|
|Reporter||Stuart Gathman||Assigned To||Louis Abel|
|Platform||X86_64||OS||Rocky Linux||OS Version||8.8|
|Summary||0003533: kernel-4.18.0-477.10.1.el8_8.x86_64 locks up qemu-kvm with heavy virtio|
|Description||qemu-kvm becomes frozen and unkillable on heavy virtio with multiple devices. kill -9 is ineffective. ps -ef hangs, ps -el does not (because it doesn't need to access process memory). atop on host hangs and requires kill -9|
virsh destroy doesn't work either naturally.
All other vms and host continue to work normally (unless you manage to lock up another vm).
top on host shows 'D' and 0.0 cpu for wedged qemu-kvm process
1. Shutdown all working vms.
2. systemctl poweroff
3. There will be a long time while systemd tries to kill qemu-kvm. Finally after maybe 10mins it will time out and
report on the console that it can't unmount /oldroot (because qemu-kvm has it open for process image). There should be no
more disk activity (hopefully your server has an activity light). At this point, you can hold the power button down
to force power off.
4. Power on again
Install kernel-4.18.0-425.19.2.el8_7.x86_64 as default boot. This is reported as stable. I am testing it now.
From a Fedora 36 guest:
May 29 20:35:20 matrix.gathman.org kernel: Sending NMI from CPU 1 to CPUs 0:
May 29 20:38:19 matrix.gathman.org kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
May 29 20:38:19 matrix.gathman.org kernel: rcu: 0-...0: (6 ticks this GP) idle=85fc/1/0x4000000000000000 softirq=274821528/274821529 fqs=1289617
May 29 20:38:19 matrix.gathman.org kernel: (detected by 1, t=5460152 jiffies, g=584688457, q=2153 ncpus=2)
|Steps To Reproduce||Boot from kernel-4.18.0-477.10.1.el8_8.x86_64 and run a qemu-kvm guest os with 2 or more virtio disks.|
Do something like run a backup of 10+ G from one disk to the other.
This issue is occurring on an old version of -477 kernel. Please attempt to get the same results using kernel 4.18.0-477.13.1.el8_8.
Setting to need info.
The journal from f36 guest was obtained after resetting the host and restarting the vm, then running journalctl -b-3 to see what the last journal entries were. (There were a few more reboots of vm to fsck filesystems, etc)
The host had a stacktrace in dmesg about when the vm froze, but I forgot to save it. :-( (Being in a hurry to get services back up)
|2023-06-02 20:25||Stuart Gathman||New Issue|
|2023-06-02 20:25||Stuart Gathman||Tag Attached: kernel|
|2023-06-02 20:25||Stuart Gathman||Tag Attached: qemu-kvm|
|2023-06-02 20:31||Louis Abel||Assigned To||=> Louis Abel|
|2023-06-02 20:31||Louis Abel||Status||new => needinfo|
|2023-06-02 20:31||Louis Abel||Note Added: 0003637|
|2023-06-02 20:32||Stuart Gathman||Note Added: 0003638|