Processor and memory virtualization
Machine virtualization involves all of the major server components. Proper configuration and tuning of each component is important to maximize the server utilization. The four major areas of a virtualized system involve CPU, memory, network, and storage.
This chapter covers CPU and memory on IBM PowerKVM and includes the following topics:
Resources overcommitment
CPU compatibility mode
SMT support
Dynamic and static Micro-Threading mode
CPU pinning
CPU shares
NUMA
Huge pages
CPU and memory hotplug
Chapter 6, “I/O virtualization” on page 163 covers the I/O subsystem, which includes networking and storage.
5.1 CPU virtualization
CPU virtualization is a technique that allows a virtual CPU to run over another CPU (virtual or physical). The process of running a virtual CPU on top of another virtual CPU is called nested virtualization, and that topic is outside the scope of this publication. This chapter covers only CPU virtualization over a physical CPU.
In the beginning of CPU virtualization, most of the instructions that ran on the virtual CPU were emulated. But with recent virtualization technologies, most of the guest instructions run directly on the physical CPU, which avoids the translation overhead.
The different ways to virtualize CPUs are covered in the sections that follow.
5.1.1 Types of virtualization
When an operating system runs inside a virtual machine, it can work in two different ways, depending on how it interacts with the hypervisor layer: Full virtualization or paravirtualization.
Full virtualization
In full virtualization mode, the guest operating system runs inside the virtual machine and does not know that it is running in a virtualized environment. This means that the guest operating system has instructions to run on real hardware, so the hypervisor needs to emulate the real hardware.
In this mode, the hypervisor emulates the full hardware, such as registers, timing, and hardware limitations. The guest operating system thinks it is interacting with real hardware. However, emulation is complex and inefficient.
Paravirtualization
In paravirtualization, the guest operating system knows that it is running inside a virtual machine, so it helps the hypervisor whenever possible. The advantage is the better performance of the virtual machine, mainly because the communication between hypervisor and guest can be shortened, which reduces overhead. With PowerKVM, all of the supported guests can run in paravirtualized mode.
Much of the paravirtualization optimization happens when the virtual machine operating system (OS) needs to do input and output (I/O) operations, which are processed by the hypervisor. One example is when the guest operating system needs to send a network packet outside of the server. When the guest OS sends the packet in full virtualization mode, it operates in the same way that it would when interacting with a physical NIC, using the same memory space, interruptions, and so on.
However, when the guest uses the paravirtualization approach, the guest operating system knows it is virtualized and knows that the guest I/O will arrive in a hypervisor (not on a physical hardware), and it cooperates with the hypervisor. This cooperation is what provides most of the performance benefits of paravirtualization.
In the context of KVM, this set of device drivers are called Virtio device drivers (see 1.3.11, “Virtio drivers” on page 20). There is a set of paravirtualized device drivers used initially on
IBM PowerVM that is also supported on PowerKVM, including ibmveth, ibmvscsi, and others.
Hardware-assisted virtualization
Hardware-assisted virtualization is a platform feature that enables the hypervisor to take advantage of the hardware when using guest virtualization. One of the main benefits is not changing the code of the guest images when running it, so the guest binary code can be run without any translation.
IBM Power Systems introduced virtualization assistance hardware with the POWER5 family of servers. At that time, Power Systems did much of the assistance by cooperating with the hypervisor for certain functions, such as fast page movement, micropartitioning, and Micro-Threading.
5.2 CPU overcommitment
CPU overcommitment allows an under-used CPU to be shared among other virtual machines. The CPU overcommit is usually enabled when the virtual machines are not expected to use all of the CPU resources at the same time. Therefore, when one virtual machine is not using its share of the CPU, another virtual machine can use it.
A CPU assigned to a virtual machine is called virtual CPU (vCPU). In an overcommitment scenario, the number of vCPUs is larger than the number of CPUs available.
For example, Figure 5-1 shows a hypervisor with four CPUs that is hosting two virtual machines (VMs) that are using three vCPUs each. This means that the guest operating system can use up to three CPUs if another VM is not using more than one CPU.
If the vCPU gets 100% used at a time, the hypervisor will multiplex the vCPU in the real CPU according to the hypervisor policies.
Figure 5-1 CPU overcommitment scenario
5.3 CPU configuration
There are many ways and techniques how to configure CPUs in a PowerKVM environment, which are discussed in the following sections.
5.3.1 CPU compatibility mode
It is possible to run a guest in compatibility mode with IBM POWER8, POWER7®, and POWER6® modes.
To enable POWER7 compatibility mode, add or edit the XML element in the domain element of the guest XML configuration file, as shown in Example 5-1.
Example 5-1 Enable POWER7 compatibility mode
<cpu mode=’host-model’>
<model>power7</model>
</cpu>
 
Note: POWER7 compatibility mode is limited to up to four threads per core.
Example 5-2 shows how to verify the compatibility mode inside the guest. In this case, for POWER7.
Example 5-2 Guest in POWER7 compatibility mode
# cat /proc/cpuinfo
processor : 0
cpu : POWER7 (architected), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
 
processor : 1
cpu : POWER7 (architected), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
 
processor : 2
cpu : POWER7 (architected), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
 
processor : 3
cpu : POWER7 (architected), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
 
timebase : 512000000
platform : pSeries
model : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)
 
Note: The XML tag for the compatibility mode has been changed. In PowerKVM V2.1, it was <cpu mode=’custom’>. In PowerKVM V3.1.0, the tag is <cpu mode=’host-model’>. For the host migration from PowerKVM V2.1 to PowerKVM V3.1, scripts will take care of that change, as described in section 2.3, “Install over existing IBM PowerKVM and host migration” on page 52.
To enable POWER6 compatibility mode, add or edit the XML element shown in Example 5-3 on the domain element of the guest XML configuration file.
Example 5-3 Enable POWER6 compatibility mode
<cpu mode=’host-model’>
<model>power6</model>
</cpu>
 
Note: POWER6 compatibility mode is limited to up to two threads per core.
5.3.2 Simultaneous multithreading
To run PowerKVM on Power Systems servers, the SMT option needs to be turned off in the hypervisor. The simultaneous multithreading (SMT) feature is visible only inside the guests, rather than on the hypervisor. In this scenario, a single core VM can use the SMT feature and have up to eight threads activated in the virtual machine.
To disable SMT on the hypervisor, run the following command:
ppc64_cpu --smt=off
PowerKVM disables SMT in the hypervisor during the boot. Each virtual machine that needs to use the SMT feature should enable it in the virtual machine configuration.
To check whether the SMT is disabled on the cores, run the ppc64_cpu command with the --smt or --info parameter. The ppc64_cpu --info command shows the output of the CPUs, marking the threads for each CPU that are enabled with an asterisk (*) near the thread. Example 5-4 shows that in a six-core machine, only one thread per CPU is enabled.
Example 5-4 SMT disabled on the hypervisor
# ppc64_cpu --info
Core 0: 0* 1 2 3 4 5 6 7
Core 1: 8* 9 10 11 12 13 14 15
Core 2: 16* 17 18 19 20 21 22 23
Core 3: 24* 25 26 27 28 29 30 31
Core 4: 32* 33 34 35 36 37 38 39
Core 5: 40* 41 42 43 44 45 46 47
If you want to start the VM using SMT, it needs to specify that manually. For example, if you want to use only one core with SMT 8, the machine should be assigned with eight vCPUs, which will use just one core and eight threads, as covered in “SMT on the guests” on page 136.
To enable SMT support on a guest, the XML configuration file needs to set the number of threads per core. This number must be a power of 2, that is: 1, 2, 4, or 8. The number of vCPUs must also be the product of the number of threads per core and the number of cores.
Example 5-5 demonstrates how to set these numbers for four threads per core and two cores, resulting in eight vCPUs.
Example 5-5 Setting the number of threads per core
<vcpu placement=’static’>8</vcpu>
<cpu>
<topology sockets=’1’ cores=’2’ threads=’4’/>
</cpu>
Example 5-6 shows the CPU information for the guest defined in Example 5-5. The guest is running with four threads per core and two cores. The example includes the information with SMT enabled and disabled.
Example 5-6 CPU information about a guest with SMT
# ppc64_cpu --smt
SMT=4
 
# cat /proc/cpuinfo
processor : 0
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
 
processor : 1
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
 
processor : 2
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
 
processor : 3
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
 
processor : 4
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
 
processor : 5
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
 
processor : 6
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
 
processor : 7
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
 
timebase : 512000000
platform : pSeries
model : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)
 
# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 4
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7
 
# ppc64_cpu --smt=off
 
# cat /proc/cpuinfo
processor : 0
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
 
processor : 4
cpu : POWER8E (raw), altivec supported
clock : 3026.000000MHz
revision : 2.1 (pvr 004b 0201)
 
timebase : 512000000
platform : pSeries
model : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)
 
# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0,4
Off-line CPU(s) list: 1-3,5-7
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0,4
SMT on the guests
To enable SMT on the guests, the virtual machine needs to be assigned with the number of threads that will run on the operating system. Keep the following formula in mind:
vCPU = sockets x cores x threads
Table 5-1 shows the relation between the number of vCPU in guests, according to the number of sockets, cores, and threads configured in the guest XML definition in libvirt.
Table 5-1 The relation between vCPU, cores, and threads on guest configuration
vCPU
Cores
SMT
Guest XML definition
32
4
8
<topology sockets=’1’ cores=’4’ threads=’8’ />
16
4
4
<topology sockets=’1’ cores=’4’ threads=’4’ />
8
4
2
<topology sockets=’1’ cores=’4’ threads=’2’ />
4
4
off
<topology sockets=’1’ cores=’4’ threads=’1’ />
16
2
8
<topology sockets=’1’ cores=’2’ threads=’8’ />
8
2
4
<topology sockets=’1’ cores=’2’ threads=’4’ />
4
2
2
<topology sockets=’1’ cores=’2’ threads=’2’ />
2
2
off
<topology sockets=’1’ cores=’2’ threads=’1’ />
8
1
8
<topology sockets=’1’ cores=’1’ threads=’8’ />
4
1
4
<topology sockets=’1’ cores=’1’ threads=’4’ />
2
1
2
<topology sockets=’1’ cores=’1’ threads=’2’ />
1
1
off
<topology sockets=’1’ cores=’1’ threads=’1’ />
5.3.3 Micro-Threading
Micro-Threading is an IBM POWER8 feature that enables each POWER8 core to be split into two or four subcores. Each subcore has also a limited number of threads, as listed in Table 5-2.
Table 5-2 Threads per subcore
Subcores per core
Threads per subcore
2
1, 2, 4
4
1, 2
This type of configuration provides performance advantages for some types of workloads.
Figure 5-2 shows the architecture of a POWER8 core using the Micro-Threading feature. In this scenario, the core is configured to have four subcores, and each subcore configured in two threads.
Figure 5-2 Example of a POWER8 core with four subcores and two threads each subcore
Another way to demonstrate how Micro-Threading works is defining a scenario where a user wants to start four virtual machines on a single core. You can start it without using Micro-Threading or with Micro-Threading enabled.
Figure 5-3 shows that four virtual machines are running in the same core, and each VM can access up to eight threads. The core switches among the four virtual machines, and each virtual machine runs only about one-fourth of the time. This indicates that the CPU is overcommitted.
Figure 5-3 Four virtual machines running in a single core without Micro-Threading enabled
Figure 5-4 shows the same four virtual machines running on four different subcores in the same core. Each virtual machine can have up to two SMT threads. In this case, the guest is always running in the CPU.
Figure 5-4 Four virtual machines running in a single core with Micro-Threading enabled
Micro-Threading benefits:
Better CPU resources use
More virtual machines per core
Micro-Threading limitations:
SMT limited to 2 or 4 depending on the number of subcores
Guests in single thread (SMT 1) mode cannot use the full core
Dynamic Micro-Threading
PowerKVM V3.1 introduces dynamic Micro-Threading, which is enabled by default. Dynamic Micro-Threading allows virtual processors from several guests to run concurrently on the processor core. The processor core is split on guest entry and then made whole again on guest exit.
If the static Micro-Threading mode is set to anything other than whole core (in other words, set to 2 or 4 subcores) as described in “Enabling static Micro-Threading on the PowerKVM hypervisor” on page 139, dynamic Micro-Threading is disabled.
Along with dynamic Micro-Threading, PowerKVM V3.1 also implements a related feature called subcore sharing. Subcore sharing allows multiple virtual CPUs from the same guest to run concurrently on one subcore. Subcore sharing applies only to guests that are running in SMT 1 (whole core) mode and to virtual CPUs in the same guest. It applies in any Micro-Threading mode (static or dynamic).
Dynamic Micro-Threading can be also disabled or restricted to a mode that allows the core only to be dynamically split into two subcores or four subcores. This can be done by using the dynamic_mt_modes parameter.
Example 5-7 sets the parameter from the default 6 to 4, which means only splitting into four cores is allowed (not into two).
Example 5-7 Only 4-way dynamic Micro-Threading
# cat /sys/module/kvm_hv/parameters/dynamic_mt_modes
6
 
# echo 4 > /sys/module/kvm_hv/parameters/dynamic_mt_modes
Table 5-3 shows the supported values for dynamic_mt_modes.
Table 5-3 Supported values for dynamic_mt_modes
dynamic_mt_modes value
Result
0
Disables dynamic Micro-Threading
2
Allows 2-way Micro-Threading (but not 4-way Micro-Threading)
4
Allows 4-way Micro-Threading (but not 2-way Micro-Threading)
6
(= 4+2) (default): Allows both 2-way and 4-way Micro-Threading
 
Note: The documentation of dynamic Micro-Threading in the IBM Knowledge Center contains a table that shows the maximum number of virtual CPUs that can run on one core for the various Micro-Threading modes:
http://www.ibm.com/support/knowledgecenter/SSZJY4_3.1.0/liabp/liabpdynamicsplit.htm
Enabling static Micro-Threading on the PowerKVM hypervisor
To enable static Micro-Threading on the PowerKVM hypervisor, run the following procedures (the best way to do it is after a fresh reboot):
1. Ensure that all guests are not running.
2. Set the number of subcores to 1:
# ppc64_cpu --subcores-per-core=1
3. Enable SMT on the host:
# ppc64_cpu --smt=on
4. Set the number of subcores to 4:
# ppc64_cpu --subcores-per-core=4
5. Turn the SMT off on the host:
# ppc64_smt --smt=off
 
Note: To configure two subcores per core, specify --subcores-per-core=2.
To verify that the machine has Micro-Threading enabled, use the ppc64_cpu command and show the CPUs information with the --info parameters. Example 5-8 on page 140 shows the output of the ppc64_cpu command, displaying that the server has six cores and each core has four subcores.
Example 5-8 Checking if Micro-Threading is enabled
# ppc64_cpu --info
Core 0:
Subcore 0: 0* 1
Subcore 1: 2* 3
Subcore 2: 4* 5
Subcore 3: 6* 7
Core 1:
Subcore 4: 8* 9
Subcore 5: 10* 11
Subcore 6: 12* 13
Subcore 7: 14* 15
Core 2:
Subcore 8: 16* 17
Subcore 9: 18* 19
Subcore 10: 20* 21
Subcore 11: 22* 23
Core 3:
Subcore 12: 24* 25
Subcore 13: 26* 27
Subcore 14: 28* 29
Subcore 15: 30* 31
Core 4:
Subcore 16: 32* 33
Subcore 17: 34* 35
Subcore 18: 36* 37
Subcore 19: 38* 39
Core 5:
Subcore 20: 40* 41
Subcore 21: 42* 43
Subcore 22: 44* 45
Subcore 23: 46* 47
 
Note: If Micro-Threading is turned on with four subcores, and a guest is started that uses more than two threads, this results in the error Cannot support more than 2 threads on PPC with KVM. A four-thread configuration would be possible by activating Micro-Threading with only two subcores.
Disabling static Micro-Threading
To disable the static Micro-Threading feature, follow these steps in the PowerKVM hypervisor:
1. Ensure that all guests are stopped.
2. Set the hypervisor cores back to full core mode:
ppc64_cpu --subcores-per-core=1
3. Turn SMT on to “reset” the online thread topology:
ppc64_cpu --smt=on
4. Turn the SMT off before starting the guests:
ppc64_cpu --smt=off
To verify that the Micro-Threading feature is disabled, check with the ppc64_cpu --info command, as shown previously in Example 5-4 on page 133.
5.3.4 Configuring NUMA
NUMA stands for Non-Uniform Memory Access. It describes an environment where processors on different sockets, boards, or nodes have local memory that they can access directly, but also have access to memory at the other processors in the system. The far memory is also referred to as remote or distant memory. Local memory can be accessed faster as remote or distant memory. Therefore, it is best from a performance point of view if a guest only works with local memory.
Within PowerKVM, it is possible to define a NUMA environment on a guest. If that NUMA environment fits to the physical architecture of the system, that can result in better performance. To link the processors of a NUMA guest to the physical environment, CPU pinning can be used as described in 5.3.5, “CPU pinning” on page 142. Also, the memory can be linked to the physical environment of the system. This is done by restricting a guest to allocate memory from a set of NUMA nodes as described in 5.5.5, “Restrict NUMA memory allocation” on page 156.
A guest’s NUMA environment is defined in the CPU section of the domain in the XML file. Example 5-9 shows an environment of a system with two sockets and four cores in each socket. The guest should run in SMT8 mode. The NUMA section shows that the first 32 vCPUs (0 - 31) should be in NUMA cell 0 and the other 32 vCPUs (32 - 63) will be assigned to NUMA cell 1. The tag current=’8’ in the vCPU section makes sure that the guest will start with only eight vCPUs, which is one core with eight threads. More CPUs can be later added using CPU Hotplug as described in 5.4, “CPU Hotplug” on page 145.
For the memory part of the guest, the XML file as shown in Example 5-9 defines that each cell should have 4 GB memory, equally spread over the two NUMA cells. The sum of the memory in the cells is also the maximum memory stated by the memory tag. If you try to set the maximum memory higher than the sum of the cells, PowerKVM automatically adjusts the maximum memory to the sum of the cells. Nevertheless, there is a possibility to have a higher maximum as the sum of memory in the cells by adding (virtual) dual inline memory modules (DIMMs) to the NUMA cells, as described in “Memory Hotplug in a NUMA configuration” on page 159.
Example 5-9 Definition of a NUMA guest
<memory unit='KiB'>8388338</memory>
<currentMemory unit='KiB'>8388338</currentMemory>
...
<vcpu placement='static' current='8'>64</vcpu>
...
<cpu>
<topology sockets='2' cores='4' threads='8'/>
<numa>
<cell id='0' cpus='0-31' memory='4194304' unit='KiB'/>
<cell id='1' cpus='32-63' memory='4194034' unit='KiB'/>
</numa>
</cpu>
To verify the result inside the guests, the lscpu and numactl commands can be used as shown in Example 5-10.
Example 5-10 Verification of a NUMA configuration inside the guest
# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 2
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s):
 
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 4096 MB
node 0 free: 2329 MB
node 1 cpus:
node 1 size: 4096 MB
node 1 free: 4055 MB
node distances:
node 0 1
0: 10 10
1: 10 10
5.3.5 CPU pinning
CPU pinning allows a guest virtual machine to be pinned to a given CPU or set of CPUs. It means that the hypervisor schedules only vCPUs in those CPUs that the guest is pinned to. By default, the guest can be scheduled on any CPU.
The advantage of pinning is that it can improve data locality. Two threads on the same core using the same data are able to share it on a local cache. The same thing happens for two cores on the same NUMA node.
Example 5-11 shows a configuration with four vCPUs without SMT turned on (SMT=1), where the four vCPUs are pinned to the first four cores in the first socket of the host.
Example 5-11 CPU pinning without SMT
<vcpu placement=’static’ cpuset=’0,8,16,24’>4</vcpu>
<cpu>
<topology sockets=’2’ cores=’6’ threads=’1’/>
</cpu>
If the topology fits the system layout, for example within a Power System S812L with two physical sockets and six cores in each socket, this configuration makes sure that this guest only runs in the first socket of the system.
To verify if the pinning works correctly, the following commands can be used on the PowerKVM host as shown in Example 5-12.
Example 5-12 Verification on CPU pinning
# ppc64_cpu --info
Core 0: 0* 1 2 3 4 5 6 7   <-- Physical Socket 1
Core 1: 8* 9 10 11 12 13 14 15
Core 2: 16* 17 18 19 20 21 22 23
Core 3: 24* 25 26 27 28 29 30 31
Core 4: 32* 33 34 35 36 37 38 39
Core 5: 40* 41 42 43 44 45 46 47
Core 6: 48* 49 50 51 52 53 54 55   <-- Physical Socket 2
Core 7: 56* 57 58 59 60 61 62 63
Core 8: 64* 65 66 67 68 69 70 71
Core 9: 72* 73 74 75 76 77 78 79
Core 10: 80* 81 82 83 84 85 86 87
Core 11: 88* 89 90 91 92 93 94 95
 
# ps -ef | grep qemu | grep linux-guest
qemu 30179 1 5 17:44 ? 00:00:48 /usr/bin/qemu-system-ppc64 -name linux-guest-1 -S -machine pseries-2.4,accel=kvm,usb=off -m
...
 
# taskset -cp 30179
pid 30179's current affinity list: 0,8,16,24
 
With SMT turned on in the guest, pinning CPUs works the same way, as SMT is not activated on the host. In an example with SMT 4, the first four guest vCPUs are mapped to threads 0, 1, 2, and 3 of the core 0 on the host. The second four guest vCPUs are mapped to threads 8, 9, 10, and 11 of the core 1 on the host, and so on. Example 5-13 shows the same configuration as in the previous example but with SMT 4.
Example 5-13 CPU pinning with SMT
<vcpu placement=’static’ cpuset=’0,8,16,24’>16</vcpu>
<cpu>
<topology sockets=’2’ cores=’6’ threads=’4’/>
</cpu>
 
Note: All threads of a core must be running on the same physical core. It is not supported to activate SMT on the PowerKVM host and pin single threads to different cores.
CPU pinning can be also used with subcores, which is explained in detail in 5.3.3, “Micro-Threading” on page 136. Also, in this case the pinning works in the same manner. In Example 5-14 on page 144, a guest using four subcores with two threads each is pinned to the first physical core.
Example 5-14 CPU pinning with subcores
# ppc64_cpu --info
Core 0:
Subcore 0: 0* 1
Subcore 1: 2* 3
Subcore 2: 4* 5
Subcore 3: 6* 7
Core 1:
Subcore 4: 8* 9
Subcore 5: 10* 11
Subcore 6: 12* 13
Subcore 7: 14* 15
Core 2:
Subcore 8: 16* 17
Subcore 9: 18* 19
Subcore 10: 20* 21
Subcore 11: 22* 23
...
 
<vcpu placement=’static’ cpuset=’0,2,4,8’>48</vcpu>
<cpu>
<topology sockets=’2’ cores=’6’ threads=’4’/>
</cpu>
5.3.6 CPU shares
In a kernel-based virtual machine (KVM), the virtual machines run as processes on the host. This means that they are scheduled to run on host CPUs just like any other process. The implication is that CPUs are shared by default. This CPU sharing allows CPU overcommitment, that is, creating more vCPUs than there are CPUs on the system.
The Linux scheduler spreads the vCPUs among the CPU cores. However, when there is overcommitment, multiple vCPUs can share a CPU core. To balance the amount of time that a virtual machine has compared to another virtual machine, you can configure shares.
Example 5-15 demonstrates how to configure the relative share time for a guest. By default, guests have a relative share time of 1024. Two guests with share time of 1024 shares the CPU for the same amount of time. If a third guest has a share time of 256, it runs a quarter of the time, relative to the other guests. A guest with a share time of 2048 runs twice the time compared to the other guests.
Example 5-15 CPU shares
<cputune>
<shares>256</shares>
</cputune>
This share time applies only when there is sharing either because of CPU pinning or because of CPU overcommitment. If vCPUs are idle or only a few vCPUs have been allocated, it is possible that a guest with a share time of 256 will be able to run on a CPU without sharing. If another guest needs to run on that same CPU, the configured share time will be in effect.
5.4 CPU Hotplug
Starting with PowerKVM V3.1, CPU Hotplug is supported. CPU Hotplug allows to add or remove CPUs in a running guest operating system. To support CPU Hotplug, the operating system needs the minimum required versions of the following packages.
Table 5-4 Required packages to support CPU and memory Hotplug
Package
Minimum required version
powerpc-utils
1.2.26
ppc64-diag
2.6.8
librtas
1.3.9
The addition or removal of CPUs is done on a per socket basis as defined in the CPU section in the guests XML file. A socket in that sense is not necessarily a physical socket of the Power System. It is just a virtual definition.
Before you start a hotplug operation, ensure that the rtas_errd daemon is running inside the guest:
# ps -ef | grep rtas
root 1367 1 0 09:22 ? 00:00:00 /usr/sbin/rtas_errd
The following examples were created on a Power System S812L with six cores on two sockets, giving a total of 12 cores in the system. The XML file of the guest system contains the following configuration as written in Example 5-16.
Example 5-16 Base definition of sockets, cores, and threads for CPU Hotplug
<vcpu placement=’static’ current=’8’>96</vcpu>
...
<cpu>
<topology sockets=’12’ cores=’1’ threads=’8’/>
</cpu>
In Example 5-16, we defined a guest with 12 sockets, each with one core and eight threads, giving a total of 96 vCPUs. The guest will start with eight vCPUs, which is one socket with one CPU and eight threads, as defined with the attribute current in the vcpu section. From a CPU Hotplug perspective, the guest can be increased in steps of eight vCPUs up to 96 vCPUs (12 cores with eight threads).
The Hotplug task itself works in the same manner as PCI Hotplug works. An XML snippet is needed to define a sequence number for the additional socket. The snippet defining the first socket to be added begins with sequence number 0 as defined in Example 5-17.
Example 5-17 XML snippet for Hotplugging a socket
# cat cpu_hot_0.xml
<spapr-cpu-socket id="0">
<alias name="spaprcpusock0"/>
</spapr-cpu-socket>
 
Note: spapr-cpu-socket stands for Server IBM Power Architecture® Platform Reference CPU socket.
This snippet can be attached to the running guest with a virsh attach-device command as described in Example 5-18.
Example 5-18 CPU Hotplug example
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7
 
[powerkvm-host]# virsh attach-device linux-guest cpu_hot_0.xml --live
Device attached successfully
 
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 2
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-15
 
Note: A persistent attachment of CPUs in the XML file by using the --config attribute is not supported.
In Example 5-18 on page 146, we added another socket with one core and eight threads. This can be repeated with a snippet containing the next available sequence number, for example 1. If the SMT type was changed in the meantime, CPU Hotplug is also possible. In that case, only the vCPUs matching the SMT mode will be online. The other vCPUs remain offline.
Example 5-19 continues Example 5-18 on page 146 by changing the SMT mode to 4 and adding another socket.
Example 5-19 Change SMT mode and Hotplug another socket
[linux-guest]# ppc64_cpu --smt
SMT=8
 
[linux-guest]# ppc64_cpu --smt=4
 
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-3,8-11
Off-line CPU(s) list: 4-7,12-15
Thread(s) per core: 4
Core(s) per socket: 1
Socket(s): 2
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-3,8-11
 
[powerkvm-host]# cat cpu_hot_1.xml
<spapr-cpu-socket id="1">
<alias name="spaprcpusock1"/>
</spapr-cpu-socket>
 
[powerkvm-host]# virsh attach-device linux-guest cpu_hot_1.xml --live
Device attached successfully
 
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-3,8-11,16-23
Off-line CPU(s) list: 4-7,12-15
Thread(s) per core: 5
Core(s) per socket: 1
Socket(s): 3
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-3,8-11,16-23
 
Note: It is not possible to attach two or more sockets with one snippet. However, this can be done by using several snippets in a loop.
Removing sockets using CPU Hotplug is also supported. To remove sockets, the same snippets are needed. The snippets must be applied using virsh detach-device in the opposite direction as the addition of the sockets. It is not possible to remove a lower sequence number before a higher sequence number. Example 5-20 shows the removal of one socket continuing the previous example.
Example 5-20 Removal of one socket using CPU Hotplug
[powerkvm-host]# virsh detach-device linux-guest cpu_hot_0.xml --live
error: Failed to detach device from cpu_hot_0.xml
error: unsupported configuration: Non-contiguous socket index '0' not allowed. Expecting : 1
 
[powerkvm-host]# virsh detach-device linux-guest cpu_hot_1.xml --live
Device detached successfully
 
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-3,8-11
Off-line CPU(s) list: 4-7,12-15
Thread(s) per core: 4
Core(s) per socket: 1
Socket(s): 2
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-3,8-11
5.4.1 CPU Hotplug with a NUMA configuration
For a NUMA guest, the NUMA node to which a CPU will be hotplugged, is determined by the NUMA topology defined in the guest XML. Example 5-21 shows a two-NUMA cells configuration with two sockets each and how the sockets will be populated using CPU Hotplug, according to the NUMA definition in the XML file.
Example 5-21 CPU Hotplug with a NUMA configuration
[powerkvm-host]# virsh edit linux-guest
...
<vcpu placement='static' current='8'>32</vcpu>
...
<cpu>
<topology sockets='4' cores='1' threads='8'/>
<numa>
<cell id='0' cpus='0-15,16-23' memory='2097152' unit='KiB'/>
<cell id='1' cpus='8-15,24-31' memory='2097152' unit='KiB'/>
</numa>
</cpu>
...
 
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 2
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s):
 
[powerkvm-host]# virsh attach-device linux-guest cpu_hot_0.xml --live
Device attached successfully
 
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 2
NUMA node(s): 2
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
 
[powerkvm-host]# virsh attach-device linux-guest cpu_hot_1.xml --live
Device attached successfully
 
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 3
NUMA node(s): 2
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15
 
[powerkvm-host]# virsh attach-device linux-guest cpu_hot_2.xml --live
Device attached successfully
 
[linux-guest]# lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 4
NUMA node(s): 2
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31
5.4.2 Considerations for CPU Hotplug
There are some considerations when using CPU Hotplug.
No removal of sockets that were present at the time of starting the guest
Only added Hotplug sockets can be removed by using a Hotplug action. If for example the guest was started with two sockets (as defined in the XML definition), and one socket should be removed by using virsh detach-device, this results in an error.
No CPU Hotplug with unfilled sockets
CPU Hotplug is not possible in a configuration where a socket was not completely used at starting time. The following configuration in Example 5-22 does not support CPU Hotplug. The guest starts with only four vCPUs, which means that only one of the two cores of the socket is used.
Example 5-22 Unsupported configuration for CPU Hotplug
<vcpu placement='static' current="4">32</vcpu>
<cpu>
<topology sockets='4' cores='2' threads='4'/>
</cpu>
5.5 Memory
With virtualization, the memory is basically static, which means it is not virtualized like the CPU, and a block of memory is mapped directly to a single (and just one) virtual machine.
Because each virtual machine is also a hypervisor Linux thread, the memory can be overcommitted.
This section covers methods to improve the performance of PowerKVM memory management. These methods involve resizing the guest memory dynamically and merging identical guests pages on the hypervisor.
5.5.1 Memory allocation
Guest memory is allocated by the host according to the guest configuration. It is possible to set a maximum amount of memory and a current amount. The guest will have the maximum amount of memory available, but it can choose to use only the current amount and release the remaining amount to the host. See 5.5.2, “Memory ballooning” on page 151.
Example 5-23 shows the configuration for the maximum amount of memory allocated to the guest on the memory element and the current amount of memory on the currentMemory element. Since PowerKVM V3.1, it is possible to also increase the memory across the maximum amount by using memory hotplug as described in 5.6, “Memory Hotplug” on page 157.
Example 5-23 Memory allocation
<memory unit=’KiB’>4194304</memory>
<currentMemory unit=’KiB’>2097152</memory>
 
Note: On the guest, you might notice that there is a total amount of memory that is less than what is set as the current amount. This might happen because the guest subcore has reserved an amount of memory for some reason. One example is the crashkernel command, which is used for a kernel dump.
5.5.2 Memory ballooning
Memory ballooning is a technique that allows the guest memory to be increased or decreased cooperatively, depending on the amount of free memory available on the guests and hypervisor.
When memory ballooning is enabled on the guest, the hypervisor can remove and add memory to the guest dynamically.
This technique can be used if the memory should be overcommitted, which means assigning the guests in sum more memory that the system provides. In case a guest needs more memory and another guest needs less memory at the same time, the memory is used more efficiently. But if all guests need their assigned overcommitted memory, this can cause a bad performance because in that case the host starts to swap pages to disk.
How to enable and manage memory ballooning on PowerKVM
Memory ballooning is enabled by default using the virtio memballon model as shown in Example 5-24. Only if you want do disable ballooning, change the model to none.
Example 5-24 Enable memory balloon on the guest
<devices>
..
<memballoon model=’virtio’>
..
</devices>
When a guest is configured to support ballooning, the memory can be added and removed from the virtual machine using the virsh setmem linux-guest command. The total memory allocated to the virtual machine can be seen with the virsh dommemstat command.
Example 5-25 shows a virtual machine initially with 2 GB memory. After the virsh setmem linux-guest 1048576 --config --live command, the memory assigned to that partition goes to 1 GB. The --live flag changes the amount of memory in the running guest and the --config changes the currentMemory unit tag in the XML file. The two flags can be used individually or together.
Example 5-25 Decreasing the virtual machine memory to 1 GB
# virsh dommemstat linux-guest
actual 2097152
swap_in 46312795184
rss 1954752
 
# virsh setmem linux-guest 1048576 --config --live
 
# virsh dommemstat linux-guest
actual 1048576
swap_in 46312795184
rss 1955200
 
Note: If the virtual machine or the guest operating system is not configured properly to support virtio ballooning, the following message displays on the hypervisor:

Error: Requested operation is not valid: Unable to change memory of active domain without the balloon device and guest OS balloon drive.
Monitoring
To check whether the memory ballooning is working on the guest, you can check with the QEMU monitor that is running the command, as shown in Example 5-26. If the balloon is not available in the virtual machine, the output is “Device balloon has not been activated.”
Example 5-26 Output of memory available on the balloon
# virsh qemu-monitor-command --domain linux-guest --hmp ‘info balloon’
ballon: actual=3559
To change the amount of memory in the guest, the ‘balloon <memory in MB>’ command is used, as in Example 5-27, that changes the memory from 3559 MB to 1024 MB. After this command, only 1024 MB of memory is available to the guest.
Example 5-27 Changing the memory allocated to the virtual machine
(qemu) virsh qemu-monitor-command --domain linux-guest --hmp ‘info balloon’
ballon: actual=3559
(qemu) virsh qemu-monitor-command --domain linux-guest --hmp ‘ballon 1024’
(qemu) virsh qemu-monitor-command --domain linux-guest --hmp ‘info balloon’
ballon: actual=1024
 
Note: Most of the operating systems have virtio-balloon embedded into the kernel. If you are using an operating system that does not have the virtio-balloon device driver in the kernel, you need to install it manually.
5.5.3 Kernel SamePage Merging
Kernel SamePage Merging (KSM) is a KVM technology that merges blocks of memory pages with the same content to reduce the memory use in the hypervisor.
KSM technology can detect that two virtual machines have identical memory pages. In that case, it merges both pages in the same physical memory page, which reduces that amount of memory use. To do so, a certain number of CPU cycles is used to scan and spot these pages.
For example, Figure 5-5 shows that all three virtual machines have pages that contain the same content. In this case, when KSM is enabled, all four pages that contain the same content will use only one physical memory block.
Figure 5-5 KSM mapping when VM uses the same page
There is a similar feature found in the PowerVM hypervisor, called Active Memory Deduplication. For more information about this feature, see “Power Systems Memory Deduplication, REDP-4827.”
How to enable Kernel SamePage Merging on PowerKVM
KSM is supported in PowerKVM server virtualization, but it is not enabled automatically.
To verify whether KSM is running and to enable and disable it, you need to interact with the /sys/kernel/mm/ksm/run file.
 
Important: The ksmtuned daemon must be running to run KSM. PowerKVM already has this daemon running automatically, so you do not need to turn it on. To verify that the daemon is running, check Example 5-28.
Example 5-28 Verify that the ksmtuned daemon is running
# systemctl status ksmtuned
ksmtuned.service - Kernel Samepage Merging (KSM) Tuning Daemon
Loaded: loaded (/usr/lib/systemd/system/ksmtuned.service; enabled)
Active: active (running) since Sat 2014-05-10 10:55:52 EDT; 2 days ago
Main PID: 18420 (ksmtuned)
CGroup: name=systemd:/system/ksmtuned.service
17510 sleep 60
18420 /bin/bash /usr/sbin/ksmtuned
 
Example 5-29 shows that KSM is disabled and how to enable it.
Example 5-29 Enable KSM in PowerKVM
# cat /sys/kernel/mm/ksm/run
0
# echo 1 > /sys/kernel/mm/ksm/run
# cat /sys/kernel/mm/ksm/run
1
Monitoring KSM
To monitor the pages being merged by KSM, check the /sys/kernel/mm/ksm files. The subsections that follow explain some of the status files.
Pages shared
The /sys/kernel/mm/ksm/pages_shared file shows how many merged pages exist in the system. Example 5-30 shows that 2976 pages are shared by two or more virtual machines in the system.
Example 5-30 Number of pages shared in the hypervisor
# cat /sys/kernel/mm/ksm/page_shared
2976
Pages sharing
The /sys/kernel/mm/ksm/pages_sharing file shows how many pages on the virtual machines are using a page that is shared and merged in the hypervisor. Example 5-31 shows the number of pages in the virtual machines that are linked to a shared page in the hypervisor.
Example 5-31 Number of pages that are linked to a shared page
# cat /sys/kernel/mm/ksm/page_sharing
6824
Looking at both of the previous examples, you see that 6824 virtual pages are using 2976 physical pages, which means that 3848 pages are saved. Considering 64 KB pages, this means that approximately 246 MB of memory was saved by using this feature.
There are some other monitoring options for KSM, as shown in Table 5-5.
/sys/kernel/mm/ksm options
Description
pages_unshared
How many pages are candidates to be shared but are not shared at the moment
pages_volatile
The number of pages that are candidates to be shared but are being changed so frequently that they will not be merged
full_scans
How many times the KSM scanned the pages looking for duplicated content
merge_across_nodes
Option to enable merges across NUMA nodes (disable it for better performance)
pages_to_scan
How many pages the KSM algorithm scans per turn before sleeping
sleep_milisecs
How many milliseconds ksmd should sleep before the next scan
Table 5-5 KSM options
5.5.4 Huge pages
Huge pages is a Linux feature that uses the processor capability to use multiple page sizes. POWER processors support multiple page sizes since POWER5. Some workloads benefit from using a larger page size. IBM Power Systems that run Linux can use 16 MiB page sizes.
On IBM PowerKVM, a guest must have its memory backed by huge pages for the guest to be able to use it. You need to enable huge pages on the host and configure the guest to use huge pages before you start it.
Example 5-32 demonstrates how to enable huge pages on the host. Run the command on a host shell. The number of pages to use depends on the total amount of memory for guests that are backed by huge pages. In this example, 4 GB of memory is reserved for huge pages (256 pages with 16384 KB each).
Example 5-32 Setting huge pages on the host
# echo 256 > /proc/sys/vm/nr_hugepages
# grep -i hugepage /proc/meminfo
HugePages_Total: 256
HugePages_Free: 256
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 16384 kB
Example 5-33 shows an excerpt from an XML configuration file for a guest, demonstrating how to enable huge pages. The memoryBacking element must be inside the domain element of the XML configuration file.
Example 5-33 Enabling huge pages on a guest
<memoryBacking>
<hugepages/>
</memoryBacking>
If there are not enough huge pages to back your guest memory, you will see the error in Example 5-34. Try increasing the number of huge pages on the host.
Example 5-34 Error starting a guest with huge pages
# virsh start linux-guest
error: Failed to start domain linux-guest
error: internal error: early end of file from monitor: possible problem:
2015-11-04T17:46:01.720148Z qemu-system-ppc64: unable to map backing store for hugepages: Cannot allocate memory
5.5.5 Restrict NUMA memory allocation
It is possible to restrict a guest to allocate memory from a set of NUMA nodes. If the guest vCPUs are also pinned to a set of cores on that same set of NUMA nodes, memory access will be local, which improves memory access performance.
Example 5-35 presents the output of a command on the PowerKVM host that shows how many pages have been allocated on every node before restricting the guest to only one NUMA node.
Example 5-35 Memory allocation to NUMA nodes before restricting it to one node
# cat /sys/fs/cgroup/memory/machine.slice/machine-qemux2dlinux-guestx2d1.scope/memory.numa_stat
total=27375 N0=23449 N1=3926
file=0 N0=0 N1=0
anon=27375 N0=23449 N1=3926
unevictable=0 N0=0 N1=0
hierarchical_total=27375 N0=23449 N1=3926
hierarchical_file=0 N0=0 N1=0
hierarchical_anon=27375 N0=23449 N1=3926
hierarchical_unevictable=0 N0=0 N1=0
The output shows that most of the memory is assigned to NUMA node 0 (N0) but some memory to NUMA node 1 (N1).
 
Note: The path in the command contains the name of the guest (in Example 5-35 linux-guest) and is only available when the guest is running.
Example 5-36 presents a possible configuration to restrict a guest to NUMA node 0.
Example 5-36 NUMA node set
<numatune>
<memory nodeset=’0’/>
</numatune>
 
Note: To find out how many nodes a system contains, use the numactl -H command. An example output is contained in Example 5-42 on page 159.
After restarting the guest and if the system has enough free memory on NUMA node 0, the command lists that all memory now fits into NUMA node 0 as shown in Example 5-37 on page 157.
Example 5-37 Memory allocation to NUMA nodes after restricting it to one node
# cat /sys/fs/cgroup/memory/machine.slice/machine-qemux2dlinux-guestx2d1.scope/memory.numa_stat
total=24751 N0=24751 N1=0
file=0 N0=0 N1=0
anon=24751 N0=24751 N1=0
unevictable=0 N0=0 N1=0
hierarchical_total=24751 N0=24751 N1=0
hierarchical_file=0 N0=0 N1=0
hierarchical_anon=24751 N0=24751 N1=0
hierarchical_unevictable=0 N0=0 N1=0
 
Note: The number of memory pages shown here is used pages by the guest. Therefore, the number changes over time.
5.6 Memory Hotplug
Memory Hotplug was introduced in PowerKVM V3.1 and allows to increase the memory over the maximum amount of memory that is defined by the memory attribute in the XML file. Memory Hotplug uses up to 32 (virtual) hotpluggable DIMM modules that can be added to a domain. The hotplugged DIMM modules can be different in size and are not limited to a maximum amount. Only the guest definition limits the maximum amount of a DIMM that can be added.
Only adding of memory is supported. It is not possible to remove DIMMs that were added using memory hotplug. Memory Hotplug assigns contiguous chunks of memory to the guest. By adding memory using memory ballooning this is not necessarily the case, which can result in memory fragmentation. Although it is possible to also reduce the memory with memory ballooning if the guest supports it, as described in 5.5.2, “Memory ballooning” on page 151.
Before using memory hotplug, ensure that the guest operating system has the required packages installed as listed in Table 5-4 on page 145.
Like CPU Hotplug, a memory DIMM can be added by using an XML snippet that defines the size of the DIMM that should be added. Example 5-38 shows a snipped for a DIMM of 4 GB.
Example 5-38 XML snippet for a DIMM with 4 GB
<memory model='dimm'>
<target>
<size unit='KiB'>4194304</size>
</target>
</memory>
 
Note: In comparison to CPU Hotplug, there is no sequence number needed. That means a snippet can be used several times for one running guest.
To attach a memory DIMM to a running domain, use the virsh attach-device <snipped.xml> --live command for adding CPU or other devices. In Example 5-39 on page 158, we show how to increase the memory of a guest with a maximum of 4 GB of memory first by 2 GB and then by another 4 GB giving 10 GB in total. After that, we reduce the memory by using ballooning back to 4 GB.
Example 5-39 Example of how to increase the memory by using memory hotplug
[powerkvm-host]# virsh dumpxml linux-guest
...
<maxMemory slots='32' unit='KiB'>67108864</maxMemory>
<memory unit='KiB'>4194304</memory>
<currentMemory unit='KiB'>4194304</currentMemory>
...
 
[linux-guest]# free -m
total used free shared buff/cache available
Mem: 3558 587 2378 19 592 2812
Swap: 1023 0 1023
 
[powerkvm-host]# cat mem_hot_2G.xml
<memory model='dimm'>
<target>
<size unit='KiB'>2097152</size>
</target>
</memory>
 
[powerkvm-host]# virsh attach-device linux-guest mem_hot_2G.xml --live
Device attached successfully
 
[linux-guest]# free -m
total used free shared buff/cache available
Mem: 5606 615 4367 19 623 4817
Swap: 1023 0 1023
 
[powerkvm-host]# cat mem_hot_2G.xml
<memory model='dimm'>
<target>
<size unit='KiB'>4194304</size>
</target>
</memory>
 
[powerkvm-host]# virsh attach-device linux-guest mem_hot_4G.xml --live
Device attached successfully
 
 
[linux-guest]# free -m
total used free shared buff/cache available
Mem: 9702 635 8442 19 625 8883
Swap: 1023 0 1023
 
[powerkvm-host]# virsh dommemstat linux-guest
actual 10485760
swap_in 0
rss 1879744
 
[powerkvm-host]# virsh setmem linux-guest 4194304 --live
 
[linux-guest]# free -m
total used free shared buff/cache available
Mem: 3558 618 2315 19 625 2756
Swap: 1023 0 1023
In Example 5-39 on page 158, we reduced the amount of memory back to its original value, but remember that nevertheless two DIMMs were added to the running guest and are still added. That means only 30 DIMMs (out of 32) are left for being hotplugged.
 
Remember: It is not possible to remove the added DIMMs by using the memory hotplug function.
Memory DIMMs can be also added persistently to the configuration of the guest by adding --config to the attach command as shown in Example 5-40. The DIMMs are added into the devices section of the guest XML.
Example 5-40 Persistent attachment of DIMMs
# virsh attach-device linux-guest mem_hot_1G.xml --live --config
 
# virsh edit linux-guest
...
<devices>
...
<memory model='dimm'>
<target>
<size unit='KiB'>1048576</size>
<node>0</node>
</target>
</memory>
...
</devices>
This section describes additional options and possibilities used with memory hotplug.
Memory Hotplug in a NUMA configuration
Memory Hotplug can be also used within a NUMA configuration. In this case, the NUMA node is specified in the XML snippet. Example 5-41 shows a snippet defining that the memory DIMM should be added to node 1.
Example 5-41 Memory Hotplug snippet in a NUMA environment
<memory model='dimm'>
<target>
<size unit='KiB'>1048576</size>
<node>1</node>
</target>
</memory>
Example 5-42 shows how to attach 1 GB of memory to just NUMA node 1 by using the snippet as shown in Example 5-41.
Example 5-42 Memory Hotplug within a NUMA configuration
[linux-guest]# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 2048 MB
node 0 free: 1074 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 2048 MB
node 1 free: 1152 MB
node distances:
node 0 1
0: 10 40
1: 40 10
 
[powerkvm-host]# virsh attach-device linux-guest mem_hot_1G_numa1.xml --live
Device attached successfully
 
[linux-guest]# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 2048 MB
node 0 free: 1036 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 3072 MB
node 1 free: 2166 MB
node distances:
node 0 1
0: 10 40
1: 40 10
Also in a NUMA environment, the DIMMs can be added persistently by adding --config to the virsh attach-device command. As a result, the DIMMs are added including the correct cell (node) definition for the DIMMs as shown in Example 5-43. The example also shows that in this case, the maximum memory of the guest is higher than the sum of memory defined in the NUMA section of the XML file.
Example 5-43 Persistent Hotplug Memory DIMMs
# virsh attach-device linux-guest mem_hot_1G_numa1.xml --live --config
 
# virsh edit linux-guest
...
<memory unit='KiB'>9436914</memory>
 
<cpu>
<topology sockets='2' cores='4' threads='8'/>
<numa>
<cell id='0' cpus='0-31' memory='4194304' unit='KiB'/>
<cell id='1' cpus='32-63' memory='4194034' unit='KiB'/>
</numa>
</cpu>
...
<device>
<memory model='dimm'>
<target>
<size unit='KiB'>1048576</size>
<node>1</node>
</target>
</memory>
</device>
Huge pages support
Guests that use huge pages are also supported by memory hotplug. If enough huge pages are available, these can be added to a guest using the same methodology as described in this chapter. For more information about huge pages, see 5.5.4, “Huge pages” on page 155.
Live migration support
Guests with virtual DIMMs added using memory hotplug can also be migrated to a different host.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset