Intro
Before you track down the rabbit hole, I have to warn you this does not really pay off.
Optimization
As I said in Libvirt Chap4#3D Acceleration, my Windows VM runs fairly slow at first. Then I refer to QEMU/KVM virt-manager windows vm slow and did a series of optimizations:
- Enable all Hyper-V enlightenments
- Disable all timers except
hypervclock
- CPU pinning, one-on-one core emulation in short (instead of the default behavior of kernel constantly swapping virtual CPUs to different threads). Take care of shared L3 cache issue
- virtio disk. Did not take the approach as this is a bit demanding. You have to load virtio ISO and inject the drivers when partitioning the disk during OOBE
How do I do CPU pinning and isolation? from r/VFIO is also worth reading:
- kvm-qemu-virtualization-guide: poorly written, in a hard-to-understand way. Just skim it.
- CPU Pinning Benchmarks: head to conclusion, CPU pinning has few effect. But to maximize CPU performance, you have to choose between none/all. And IMO the gaming performance boost is probably just that PUBG has terrible optimization…
Hyper-V enlightenments:
<hyperv>
<relaxed state='on'/>
<vapic state='on'/>
<spinlocks state='on' retries='8191'/>
<vpindex state='on'/>
<synic state='on'/>
<stimer state='on'>
<direct state='on'/>
</stimer>
<reset state='on'/>
<frequencies state='on'/>
<reenlightenment state='on'/>
<tlbflush state='on'/>
<ipi state='on'/>
</hyperv>
Disable all timers except hypervclock
:
<clock offset='localtime'>
<timer name='rtc' present='no' tickpolicy='catchup'/>
<timer name='pit' present='no' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
<timer name='kvmclock' present='no'/>
<timer name='hypervclock' present='yes'/>
</clock>
CPU pinning. This is quite difficult to comprehend. I just happened to find the exact same CPU architecture on CPU Pinning understanding so I just use that, add a virtio-scsi controller and finally at two lines of iothread
:
Output of lscpu
(lstopo
has better readability but requires hwloc
to be installed):
$ lscpu -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ
0 0 0 0 0:0:0:0 yes 4200.0000 400.0000
1 0 0 1 1:1:1:0 yes 4200.0000 400.0000
2 0 0 2 2:2:2:0 yes 4200.0000 400.0000
3 0 0 3 3:3:3:0 yes 4200.0000 400.0000
4 0 0 0 0:0:0:0 yes 4200.0000 400.0000
5 0 0 1 1:1:1:0 yes 4200.0000 400.0000
6 0 0 2 2:2:2:0 yes 4200.0000 400.0000
7 0 0 3 3:3:3:0 yes 4200.0000 400.0000
From my understanding, if I want to use CPU pinning and assign 6 cores to quest, I should block core 0 and 4 in quest, then assign the remaining six in pairs. But I just could not figure out how to pair as Windows VM keeps showing only 4 cores:
<vcpu placement="static">6</vcpu>
<iothreads>1</iothreads>
<cputune>
<vcpupin vcpu="0" cpuset="1" />
<vcpupin vcpu="1" cpuset="2" />
<vcpupin vcpu="2" cpuset="3" />
<vcpupin vcpu="3" cpuset="5" />
<vcpupin vcpu="4" cpuset="6" />
<vcpupin vcpu="5" cpuset="7" />
<emulatorpin cpuset="0,4" />
<iothreadpin iothread="1" cpuset="0,4" />
</cputune>
If I ever decide to reinstall and utilize virtio for maximum disk performance, Optimizing Windows VM performance on QEMU/KVM mentions quite a few tricks.
Patch
Generated diff (aka. patch) for nerd, which is also a TL;DR.
Optimized:
--- win10-ltsc-orig.xml 2024-06-09 19:19:11.000000000 +0000
+++ win10-ltsc-optimized.xml 2024-06-09 19:18:48.000000000 +0000
@@ -25,14 +25,25 @@
<relaxed state="on" />
<vapic state="on" />
<spinlocks state="on" retries="8191" />
+ <vpindex state="on" />
+ <synic state="on" />
+ <stimer state="on">
+ <direct state="on" />
+ </stimer>
+ <reset state="on" />
+ <frequencies state="on" />
+ <reenlightenment state="on" />
+ <tlbflush state="on" />
+ <ipi state="on" />
</hyperv>
<vmport state="off" />
</features>
<cpu mode="host-passthrough" check="none" migratable="on" />
<clock offset="localtime">
- <timer name="rtc" tickpolicy="catchup" />
- <timer name="pit" tickpolicy="delay" />
+ <timer name="rtc" present="no" tickpolicy="catchup" />
+ <timer name="pit" present="no" tickpolicy="delay" />
<timer name="hpet" present="no" />
+ <timer name="kvmclock" present="no" />
<timer name="hypervclock" present="yes" />
</clock>
<on_poweroff>destroy</on_poweroff>
Fully optimized:
--- win10-ltsc-optimized.xml 2024-06-09 19:18:48.000000000 +0000
+++ win10-ltsc-fully-optimized.xml 2024-06-09 19:18:31.000000000 +0000
@@ -13,7 +13,18 @@
<source type="memfd" />
<access mode="shared" />
</memoryBacking>
- <vcpu placement="static">4</vcpu>
+ <vcpu placement="static">6</vcpu>
+ <iothreads>1</iothreads>
+ <cputune>
+ <vcpupin vcpu="0" cpuset="1" />
+ <vcpupin vcpu="1" cpuset="2" />
+ <vcpupin vcpu="2" cpuset="3" />
+ <vcpupin vcpu="3" cpuset="5" />
+ <vcpupin vcpu="4" cpuset="6" />
+ <vcpupin vcpu="5" cpuset="7" />
+ <emulatorpin cpuset="0,4" />
+ <iothreadpin iothread="1" cpuset="0,4" />
+ </cputune>
<os>
<type arch="x86_64" machine="pc-q35-9.0">hvm</type>
<boot dev="hd" />
@@ -143,6 +154,9 @@
<controller type="virtio-serial" index="0">
<address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0" />
</controller>
+ <controller type="scsi" index="0" model="virtio-scsi">
+ <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0" />
+ </controller>
<filesystem type="mount" accessmode="passthrough">
<driver type="virtiofs" />
<source dir="/run/media/user/NT/Game/" />