Virt-Manager A-Z: Chapter 5 (VM Optimization)

Intro

Before you track down the rabbit hole, I have to warn you this does not really pay off.

Optimization

As I said in Libvirt Chap4#3D Acceleration, my Windows VM runs fairly slow at first. Then I refer to QEMU/KVM virt-manager windows vm slow and did a series of optimizations:

  • Enable all Hyper-V enlightenments
  • Disable all timers except hypervclock
  • CPU pinning, one-on-one core emulation in short (instead of the default behavior of kernel constantly swapping virtual CPUs to different threads). Take care of shared L3 cache issue
  • virtio disk. Did not take the approach as this is a bit demanding. You have to load virtio ISO and inject the drivers when partitioning the disk during OOBE

How do I do CPU pinning and isolation? from r/VFIO is also worth reading:

  • kvm-qemu-virtualization-guide: poorly written, in a hard-to-understand way. Just skim it.
  • CPU Pinning Benchmarks: head to conclusion, CPU pinning has few effect. But to maximize CPU performance, you have to choose between none/all. And IMO the gaming performance boost is probably just that PUBG has terrible optimization…

Hyper-V enlightenments:

<hyperv>
  <relaxed state='on'/>
  <vapic state='on'/>
  <spinlocks state='on' retries='8191'/>
  <vpindex state='on'/>
  <synic state='on'/>
  <stimer state='on'>
    <direct state='on'/>
  </stimer>
  <reset state='on'/>
  <frequencies state='on'/>
  <reenlightenment state='on'/>
  <tlbflush state='on'/>
  <ipi state='on'/>
</hyperv>

Disable all timers except hypervclock:

<clock offset='localtime'>
  <timer name='rtc' present='no' tickpolicy='catchup'/>
  <timer name='pit' present='no' tickpolicy='delay'/>
  <timer name='hpet' present='no'/>
  <timer name='kvmclock' present='no'/>
  <timer name='hypervclock' present='yes'/>
</clock>

CPU pinning. This is quite difficult to comprehend. I just happened to find the exact same CPU architecture on CPU Pinning understanding so I just use that, add a virtio-scsi controller and finally at two lines of iothread:

Output of lscpu (lstopo has better readability but requires hwloc to be installed):

$ lscpu -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ   MINMHZ
  0    0      0    0 0:0:0:0          yes 4200.0000 400.0000
  1    0      0    1 1:1:1:0          yes 4200.0000 400.0000
  2    0      0    2 2:2:2:0          yes 4200.0000 400.0000
  3    0      0    3 3:3:3:0          yes 4200.0000 400.0000
  4    0      0    0 0:0:0:0          yes 4200.0000 400.0000
  5    0      0    1 1:1:1:0          yes 4200.0000 400.0000
  6    0      0    2 2:2:2:0          yes 4200.0000 400.0000
  7    0      0    3 3:3:3:0          yes 4200.0000 400.0000

From my understanding, if I want to use CPU pinning and assign 6 cores to quest, I should block core 0 and 4 in quest, then assign the remaining six in pairs. But I just could not figure out how to pair as Windows VM keeps showing only 4 cores:

<vcpu placement="static">6</vcpu>
<iothreads>1</iothreads>
<cputune>
  <vcpupin vcpu="0" cpuset="1" />
  <vcpupin vcpu="1" cpuset="2" />
  <vcpupin vcpu="2" cpuset="3" />
  <vcpupin vcpu="3" cpuset="5" />
  <vcpupin vcpu="4" cpuset="6" />
  <vcpupin vcpu="5" cpuset="7" />
  <emulatorpin cpuset="0,4" />
  <iothreadpin iothread="1" cpuset="0,4" />
</cputune>

If I ever decide to reinstall and utilize virtio for maximum disk performance, Optimizing Windows VM performance on QEMU/KVM mentions quite a few tricks.

Patch

Generated diff (aka. patch) for nerd, which is also a TL;DR.

Optimized:

--- win10-ltsc-orig.xml	2024-06-09 19:19:11.000000000 +0000
+++ win10-ltsc-optimized.xml	2024-06-09 19:18:48.000000000 +0000
@@ -25,14 +25,25 @@
             <relaxed state="on" />
             <vapic state="on" />
             <spinlocks state="on" retries="8191" />
+            <vpindex state="on" />
+            <synic state="on" />
+            <stimer state="on">
+                <direct state="on" />
+            </stimer>
+            <reset state="on" />
+            <frequencies state="on" />
+            <reenlightenment state="on" />
+            <tlbflush state="on" />
+            <ipi state="on" />
         </hyperv>
         <vmport state="off" />
     </features>
     <cpu mode="host-passthrough" check="none" migratable="on" />
     <clock offset="localtime">
-        <timer name="rtc" tickpolicy="catchup" />
-        <timer name="pit" tickpolicy="delay" />
+        <timer name="rtc" present="no" tickpolicy="catchup" />
+        <timer name="pit" present="no" tickpolicy="delay" />
         <timer name="hpet" present="no" />
+        <timer name="kvmclock" present="no" />
         <timer name="hypervclock" present="yes" />
     </clock>
     <on_poweroff>destroy</on_poweroff>

Fully optimized:

--- win10-ltsc-optimized.xml	2024-06-09 19:18:48.000000000 +0000
+++ win10-ltsc-fully-optimized.xml	2024-06-09 19:18:31.000000000 +0000
@@ -13,7 +13,18 @@
         <source type="memfd" />
         <access mode="shared" />
     </memoryBacking>
-    <vcpu placement="static">4</vcpu>
+    <vcpu placement="static">6</vcpu>
+    <iothreads>1</iothreads>
+    <cputune>
+        <vcpupin vcpu="0" cpuset="1" />
+        <vcpupin vcpu="1" cpuset="2" />
+        <vcpupin vcpu="2" cpuset="3" />
+        <vcpupin vcpu="3" cpuset="5" />
+        <vcpupin vcpu="4" cpuset="6" />
+        <vcpupin vcpu="5" cpuset="7" />
+        <emulatorpin cpuset="0,4" />
+        <iothreadpin iothread="1" cpuset="0,4" />
+    </cputune>
     <os>
         <type arch="x86_64" machine="pc-q35-9.0">hvm</type>
         <boot dev="hd" />
@@ -143,6 +154,9 @@
         <controller type="virtio-serial" index="0">
             <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0" />
         </controller>
+        <controller type="scsi" index="0" model="virtio-scsi">
+            <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0" />
+        </controller>
         <filesystem type="mount" accessmode="passthrough">
             <driver type="virtiofs" />
             <source dir="/run/media/user/NT/Game/" />

Vinfall's Geekademy

Sine īrā et studiō