Chapter 5. Processor and memory virtualization

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Processor and memory virtualization

Machine virtualization involves all of the major server components. Proper configuration and tuning of each component is important to maximize the server utilization. The four major areas of a virtualized system involve CPU, memory, network, and storage.

This chapter covers CPU and memory on IBM PowerKVM and includes the following topics:

•Resources overcommitment

•CPU compatibility mode

•SMT support

•Dynamic and static Micro-Threading mode

•CPU pinning

•CPU shares

•NUMA

•Huge pages

•CPU and memory hotplug

Chapter 6, “I/O virtualization” on page 163 covers the I/O subsystem, which includes networking and storage.

5.1 CPU virtualization

CPU virtualization is a technique that allows a virtual CPU to run over another CPU (virtual or physical). The process of running a virtual CPU on top of another virtual CPU is called nested virtualization, and that topic is outside the scope of this publication. This chapter covers only CPU virtualization over a physical CPU.

In the beginning of CPU virtualization, most of the instructions that ran on the virtual CPU were emulated. But with recent virtualization technologies, most of the guest instructions run directly on the physical CPU, which avoids the translation overhead.

The different ways to virtualize CPUs are covered in the sections that follow.

5.1.1 Types of virtualization

When an operating system runs inside a virtual machine, it can work in two different ways, depending on how it interacts with the hypervisor layer: Full virtualization or paravirtualization.

Full virtualization

In full virtualization mode, the guest operating system runs inside the virtual machine and does not know that it is running in a virtualized environment. This means that the guest operating system has instructions to run on real hardware, so the hypervisor needs to emulate the real hardware.

In this mode, the hypervisor emulates the full hardware, such as registers, timing, and hardware limitations. The guest operating system thinks it is interacting with real hardware. However, emulation is complex and inefficient.

Paravirtualization

In paravirtualization, the guest operating system knows that it is running inside a virtual machine, so it helps the hypervisor whenever possible. The advantage is the better performance of the virtual machine, mainly because the communication between hypervisor and guest can be shortened, which reduces overhead. With PowerKVM, all of the supported guests can run in paravirtualized mode.

Much of the paravirtualization optimization happens when the virtual machine operating system (OS) needs to do input and output (I/O) operations, which are processed by the hypervisor. One example is when the guest operating system needs to send a network packet outside of the server. When the guest OS sends the packet in full virtualization mode, it operates in the same way that it would when interacting with a physical NIC, using the same memory space, interruptions, and so on.

However, when the guest uses the paravirtualization approach, the guest operating system knows it is virtualized and knows that the guest I/O will arrive in a hypervisor (not on a physical hardware), and it cooperates with the hypervisor. This cooperation is what provides most of the performance benefits of paravirtualization.

In the context of KVM, this set of device drivers are called Virtio device drivers (see 1.3.11, “Virtio drivers” on page 20). There is a set of paravirtualized device drivers used initially on
IBM PowerVM that is also supported on PowerKVM, including ibmveth, ibmvscsi, and others.

Hardware-assisted virtualization

Hardware-assisted virtualization is a platform feature that enables the hypervisor to take advantage of the hardware when using guest virtualization. One of the main benefits is not changing the code of the guest images when running it, so the guest binary code can be run without any translation.

IBM Power Systems introduced virtualization assistance hardware with the POWER5 family of servers. At that time, Power Systems did much of the assistance by cooperating with the hypervisor for certain functions, such as fast page movement, micropartitioning, and Micro-Threading.

5.2 CPU overcommitment

CPU overcommitment allows an under-used CPU to be shared among other virtual machines. The CPU overcommit is usually enabled when the virtual machines are not expected to use all of the CPU resources at the same time. Therefore, when one virtual machine is not using its share of the CPU, another virtual machine can use it.

A CPU assigned to a virtual machine is called virtual CPU (vCPU). In an overcommitment scenario, the number of vCPUs is larger than the number of CPUs available.

For example, Figure 5-1 shows a hypervisor with four CPUs that is hosting two virtual machines (VMs) that are using three vCPUs each. This means that the guest operating system can use up to three CPUs if another VM is not using more than one CPU.

If the vCPU gets 100% used at a time, the hypervisor will multiplex the vCPU in the real CPU according to the hypervisor policies.

Figure 5-1 CPU overcommitment scenario

5.3 CPU configuration

There are many ways and techniques how to configure CPUs in a PowerKVM environment, which are discussed in the following sections.

5.3.1 CPU compatibility mode

It is possible to run a guest in compatibility mode with IBM POWER8, POWER7®, and POWER6® modes.

To enable POWER7 compatibility mode, add or edit the XML element in the domain element of the guest XML configuration file, as shown in Example 5-1.

Example 5-1 Enable POWER7 compatibility mode

<model>power7</model>

</cpu>

Note: POWER7 compatibility mode is limited to up to four threads per core.

Example 5-2 shows how to verify the compatibility mode inside the guest. In this case, for POWER7.

Example 5-2 Guest in POWER7 compatibility mode

# cat /proc/cpuinfo

processor : 0

cpu : POWER7 (architected), altivec supported

clock : 3026.000000MHz

revision : 2.1 (pvr 004b 0201)

processor : 1

cpu : POWER7 (architected), altivec supported

clock : 3026.000000MHz

revision : 2.1 (pvr 004b 0201)

processor : 2

cpu : POWER7 (architected), altivec supported

clock : 3026.000000MHz

revision : 2.1 (pvr 004b 0201)

processor : 3

cpu : POWER7 (architected), altivec supported

clock : 3026.000000MHz

revision : 2.1 (pvr 004b 0201)

timebase : 512000000

platform : pSeries

model : IBM pSeries (emulated by qemu)

machine : CHRP IBM pSeries (emulated by qemu)

Note: The XML tag for the compatibility mode has been changed. In PowerKVM V2.1, it was <cpu mode=’custom’>. In PowerKVM V3.1.0, the tag is <cpu mode=’host-model’>. For the host migration from PowerKVM V2.1 to PowerKVM V3.1, scripts will take care of that change, as described in section 2.3, “Install over existing IBM PowerKVM and host migration” on page 52.

To enable POWER6 compatibility mode, add or edit the XML element shown in Example 5-3 on the domain element of the guest XML configuration file.

Example 5-3 Enable POWER6 compatibility mode

<model>power6</model>

</cpu>

Note: POWER6 compatibility mode is limited to up to two threads per core.

5.3.2 Simultaneous multithreading

To run PowerKVM on Power Systems servers, the SMT option needs to be turned off in the hypervisor. The simultaneous multithreading (SMT) feature is visible only inside the guests, rather than on the hypervisor. In this scenario, a single core VM can use the SMT feature and have up to eight threads activated in the virtual machine.

To disable SMT on the hypervisor, run the following command:

ppc64_cpu --smt=off

PowerKVM disables SMT in the hypervisor during the boot. Each virtual machine that needs to use the SMT feature should enable it in the virtual machine configuration.

To check whether the SMT is disabled on the cores, run the ppc64_cpu command with the --smt or --info parameter. The ppc64_cpu --info command shows the output of the CPUs, marking the threads for each CPU that are enabled with an asterisk (*) near the thread. Example 5-4 shows that in a six-core machine, only one thread per CPU is enabled.

Example 5-4 SMT disabled on the hypervisor

# ppc64_cpu --info

Core 0: 0* 1 2 3 4 5 6 7

Core 1: 8* 9 10 11 12 13 14 15

Core 2: 16* 17 18 19 20 21 22 23

Core 3: 24* 25 26 27 28 29 30 31

Core 4: 32* 33 34 35 36 37 38 39

Core 5: 40* 41 42 43 44 45 46 47

If you want to start the VM using SMT, it needs to specify that manually. For example, if you want to use only one core with SMT 8, the machine should be assigned with eight vCPUs, which will use just one core and eight threads, as covered in “SMT on the guests” on page 136.

To enable SMT support on a guest, the XML configuration file needs to set the number of threads per core. This number must be a power of 2, that is: 1, 2, 4, or 8. The number of vCPUs must also be the product of the number of threads per core and the number of cores.

Example 5-5 demonstrates how to set these numbers for four threads per core and two cores, resulting in eight vCPUs.

Example 5-5 Setting the number of threads per core

<cpu>

</cpu>

Example 5-6 shows the CPU information for the guest defined in Example 5-5. The guest is running with four threads per core and two cores. The example includes the information with SMT enabled and disabled.

Example 5-6 CPU information about a guest with SMT

# ppc64_cpu --smt

SMT=4

# cat /proc/cpuinfo

processor : 0

cpu : POWER8E (raw), altivec supported

clock : 3026.000000MHz

revision : 2.1 (pvr 004b 0201)

processor : 1

cpu : POWER8E (raw), altivec supported

clock : 3026.000000MHz

revision : 2.1 (pvr 004b 0201)

processor : 2

cpu : POWER8E (raw), altivec supported

clock : 3026.000000MHz

revision : 2.1 (pvr 004b 0201)

processor : 3

cpu : POWER8E (raw), altivec supported

clock : 3026.000000MHz

revision : 2.1 (pvr 004b 0201)

processor : 4

cpu : POWER8E (raw), altivec supported

clock : 3026.000000MHz

revision : 2.1 (pvr 004b 0201)

processor : 5

cpu : POWER8E (raw), altivec supported

clock : 3026.000000MHz

revision : 2.1 (pvr 004b 0201)

processor : 6

cpu : POWER8E (raw), altivec supported

clock : 3026.000000MHz

revision : 2.1 (pvr 004b 0201)

processor : 7

cpu : POWER8E (raw), altivec supported

clock : 3026.000000MHz

revision : 2.1 (pvr 004b 0201)

timebase : 512000000

platform : pSeries

model : IBM pSeries (emulated by qemu)

machine : CHRP IBM pSeries (emulated by qemu)

# lscpu

Architecture: ppc64le

Byte Order: Little Endian

CPU(s): 8

On-line CPU(s) list: 0-7

Thread(s) per core: 4

Core(s) per socket: 2

Socket(s): 1

NUMA node(s): 1

Model: IBM pSeries (emulated by qemu)

L1d cache: 64K

L1i cache: 32K

NUMA node0 CPU(s): 0-7

# ppc64_cpu --smt=off

# cat /proc/cpuinfo

processor : 0

cpu : POWER8E (raw), altivec supported

clock : 3026.000000MHz

revision : 2.1 (pvr 004b 0201)

processor : 4

cpu : POWER8E (raw), altivec supported

clock : 3026.000000MHz

revision : 2.1 (pvr 004b 0201)

timebase : 512000000

platform : pSeries

model : IBM pSeries (emulated by qemu)

machine : CHRP IBM pSeries (emulated by qemu)

# lscpu

Architecture: ppc64le

Byte Order: Little Endian

CPU(s): 8

On-line CPU(s) list: 0,4

Off-line CPU(s) list: 1-3,5-7

Thread(s) per core: 1

Core(s) per socket: 2

Socket(s): 1

NUMA node(s): 1

Model: IBM pSeries (emulated by qemu)

L1d cache: 64K

L1i cache: 32K

NUMA node0 CPU(s): 0,4

SMT on the guests

To enable SMT on the guests, the virtual machine needs to be assigned with the number of threads that will run on the operating system. Keep the following formula in mind:

vCPU = sockets x cores x threads

Table 5-1 shows the relation between the number of vCPU in guests, according to the number of sockets, cores, and threads configured in the guest XML definition in libvirt.

Table 5-1 The relation between vCPU, cores, and threads on guest configuration

vCPU	Cores	SMT	Guest XML definition
32	4	8	<topology sockets=’1’ cores=’4’ threads=’8’ />
16	4	4	<topology sockets=’1’ cores=’4’ threads=’4’ />
8	4	2	<topology sockets=’1’ cores=’4’ threads=’2’ />
4	4	off	<topology sockets=’1’ cores=’4’ threads=’1’ />
16	2	8	<topology sockets=’1’ cores=’2’ threads=’8’ />
8	2	4	<topology sockets=’1’ cores=’2’ threads=’4’ />
4	2	2	<topology sockets=’1’ cores=’2’ threads=’2’ />
2	2	off	<topology sockets=’1’ cores=’2’ threads=’1’ />
8	1	8	<topology sockets=’1’ cores=’1’ threads=’8’ />
4	1	4	<topology sockets=’1’ cores=’1’ threads=’4’ />
2	1	2	<topology sockets=’1’ cores=’1’ threads=’2’ />
1	1	off	<topology sockets=’1’ cores=’1’ threads=’1’ />

5.3.3 Micro-Threading

Micro-Threading is an IBM POWER8 feature that enables each POWER8 core to be split into two or four subcores. Each subcore has also a limited number of threads, as listed in Table 5-2.

Table 5-2 Threads per subcore

Subcores per core	Threads per subcore
2	1, 2, 4
4	1, 2

This type of configuration provides performance advantages for some types of workloads.

Figure 5-2 shows the architecture of a POWER8 core using the Micro-Threading feature. In this scenario, the core is configured to have four subcores, and each subcore configured in two threads.

Figure 5-2 Example of a POWER8 core with four subcores and two threads each subcore

Another way to demonstrate how Micro-Threading works is defining a scenario where a user wants to start four virtual machines on a single core. You can start it without using Micro-Threading or with Micro-Threading enabled.

Figure 5-3 shows that four virtual machines are running in the same core, and each VM can access up to eight threads. The core switches among the four virtual machines, and each virtual machine runs only about one-fourth of the time. This indicates that the CPU is overcommitted.

Figure 5-3 Four virtual machines running in a single core without Micro-Threading enabled

Figure 5-4 shows the same four virtual machines running on four different subcores in the same core. Each virtual machine can have up to two SMT threads. In this case, the guest is always running in the CPU.

Figure 5-4 Four virtual machines running in a single core with Micro-Threading enabled

Micro-Threading benefits:

•Better CPU resources use

•More virtual machines per core

Micro-Threading limitations:

•SMT limited to 2 or 4 depending on the number of subcores

•Guests in single thread (SMT 1) mode cannot use the full core

Dynamic Micro-Threading

PowerKVM V3.1 introduces dynamic Micro-Threading, which is enabled by default. Dynamic Micro-Threading allows virtual processors from several guests to run concurrently on the processor core. The processor core is split on guest entry and then made whole again on guest exit.

If the static Micro-Threading mode is set to anything other than whole core (in other words, set to 2 or 4 subcores) as described in “Enabling static Micro-Threading on the PowerKVM hypervisor” on page 139, dynamic Micro-Threading is disabled.

Along with dynamic Micro-Threading, PowerKVM V3.1 also implements a related feature called subcore sharing. Subcore sharing allows multiple virtual CPUs from the same guest to run concurrently on one subcore. Subcore sharing applies only to guests that are running in SMT 1 (whole core) mode and to virtual CPUs in the same guest. It applies in any Micro-Threading mode (static or dynamic).

Dynamic Micro-Threading can be also disabled or restricted to a mode that allows the core only to be dynamically split into two subcores or four subcores. This can be done by using the dynamic_mt_modes parameter.

Example 5-7 sets the parameter from the default 6 to 4, which means only splitting into four cores is allowed (not into two).

Example 5-7 Only 4-way dynamic Micro-Threading

# cat /sys/module/kvm_hv/parameters/dynamic_mt_modes

# echo 4 > /sys/module/kvm_hv/parameters/dynamic_mt_modes

Table 5-3 shows the supported values for dynamic_mt_modes.

Table 5-3 Supported values for dynamic_mt_modes

dynamic_mt_modes value	Result
0	Disables dynamic Micro-Threading
2	Allows 2-way Micro-Threading (but not 4-way Micro-Threading)
4	Allows 4-way Micro-Threading (but not 2-way Micro-Threading)
6	(= 4+2) (default): Allows both 2-way and 4-way Micro-Threading

Note: The documentation of dynamic Micro-Threading in the IBM Knowledge Center contains a table that shows the maximum number of virtual CPUs that can run on one core for the various Micro-Threading modes:

http://www.ibm.com/support/knowledgecenter/SSZJY4_3.1.0/liabp/liabpdynamicsplit.htm

Enabling static Micro-Threading on the PowerKVM hypervisor

To enable static Micro-Threading on the PowerKVM hypervisor, run the following procedures (the best way to do it is after a fresh reboot):

1. Ensure that all guests are not running.

2. Set the number of subcores to 1:

# ppc64_cpu --subcores-per-core=1

3. Enable SMT on the host:

# ppc64_cpu --smt=on

4. Set the number of subcores to 4:

# ppc64_cpu --subcores-per-core=4

5. Turn the SMT off on the host:

# ppc64_smt --smt=off

Note: To configure two subcores per core, specify --subcores-per-core=2.

To verify that the machine has Micro-Threading enabled, use the ppc64_cpu command and show the CPUs information with the --info parameters. Example 5-8 on page 140 shows the output of the ppc64_cpu command, displaying that the server has six cores and each core has four subcores.

Example 5-8 Checking if Micro-Threading is enabled

# ppc64_cpu --info

Core 0:

Subcore 0: 0* 1

Subcore 1: 2* 3

Subcore 2: 4* 5

Subcore 3: 6* 7

Core 1:

Subcore 4: 8* 9

Subcore 5: 10* 11

Subcore 6: 12* 13

Subcore 7: 14* 15

Core 2:

Subcore 8: 16* 17

Subcore 9: 18* 19

Subcore 10: 20* 21

Subcore 11: 22* 23

Core 3:

Subcore 12: 24* 25

Subcore 13: 26* 27

Subcore 14: 28* 29

Subcore 15: 30* 31

Core 4:

Subcore 16: 32* 33

Subcore 17: 34* 35

Subcore 18: 36* 37

Subcore 19: 38* 39

Core 5:

Subcore 20: 40* 41

Subcore 21: 42* 43

Subcore 22: 44* 45

Subcore 23: 46* 47

Note: If Micro-Threading is turned on with four subcores, and a guest is started that uses more than two threads, this results in the error Cannot support more than 2 threads on PPC with KVM. A four-thread configuration would be possible by activating Micro-Threading with only two subcores.

Disabling static Micro-Threading

To disable the static Micro-Threading feature, follow these steps in the PowerKVM hypervisor:

1. Ensure that all guests are stopped.

2. Set the hypervisor cores back to full core mode:

ppc64_cpu --subcores-per-core=1

3. Turn SMT on to “reset” the online thread topology:

ppc64_cpu --smt=on

4. Turn the SMT off before starting the guests:

ppc64_cpu --smt=off

To verify that the Micro-Threading feature is disabled, check with the ppc64_cpu --info command, as shown previously in Example 5-4 on page 133.

5.3.4 Configuring NUMA

NUMA stands for Non-Uniform Memory Access. It describes an environment where processors on different sockets, boards, or nodes have local memory that they can access directly, but also have access to memory at the other processors in the system. The far memory is also referred to as remote or distant memory. Local memory can be accessed faster as remote or distant memory. Therefore, it is best from a performance point of view if a guest only works with local memory.

Within PowerKVM, it is possible to define a NUMA environment on a guest. If that NUMA environment fits to the physical architecture of the system, that can result in better performance. To link the processors of a NUMA guest to the physical environment, CPU pinning can be used as described in 5.3.5, “CPU pinning” on page 142. Also, the memory can be linked to the physical environment of the system. This is done by restricting a guest to allocate memory from a set of NUMA nodes as described in 5.5.5, “Restrict NUMA memory allocation” on page 156.

A guest’s NUMA environment is defined in the CPU section of the domain in the XML file. Example 5-9 shows an environment of a system with two sockets and four cores in each socket. The guest should run in SMT8 mode. The NUMA section shows that the first 32 vCPUs (0 - 31) should be in NUMA cell 0 and the other 32 vCPUs (32 - 63) will be assigned to NUMA cell 1. The tag current=’8’ in the vCPU section makes sure that the guest will start with only eight vCPUs, which is one core with eight threads. More CPUs can be later added using CPU Hotplug as described in 5.4, “CPU Hotplug” on page 145.

For the memory part of the guest, the XML file as shown in Example 5-9 defines that each cell should have 4 GB memory, equally spread over the two NUMA cells. The sum of the memory in the cells is also the maximum memory stated by the memory tag. If you try to set the maximum memory higher than the sum of the cells, PowerKVM automatically adjusts the maximum memory to the sum of the cells. Nevertheless, there is a possibility to have a higher maximum as the sum of memory in the cells by adding (virtual) dual inline memory modules (DIMMs) to the NUMA cells, as described in “Memory Hotplug in a NUMA configuration” on page 159.

Example 5-9 Definition of a NUMA guest

...

...

<cpu>

<numa>

</numa>

</cpu>

To verify the result inside the guests, the lscpu and numactl commands can be used as shown in Example 5-10.

Example 5-10 Verification of a NUMA configuration inside the guest

# lscpu

Architecture: ppc64le

Byte Order: Little Endian

CPU(s): 8

On-line CPU(s) list: 0-7

Thread(s) per core: 8

Core(s) per socket: 1

Socket(s): 1

NUMA node(s): 2

Model: IBM pSeries (emulated by qemu)

L1d cache: 64K

L1i cache: 32K

NUMA node0 CPU(s): 0-7

NUMA node1 CPU(s):

# numactl -H

available: 2 nodes (0-1)

node 0 cpus: 0 1 2 3 4 5 6 7

node 0 size: 4096 MB

node 0 free: 2329 MB

node 1 cpus:

node 1 size: 4096 MB

node 1 free: 4055 MB

node distances:

node 0 1

0: 10 10

1: 10 10

5.3.5 CPU pinning

CPU pinning allows a guest virtual machine to be pinned to a given CPU or set of CPUs. It means that the hypervisor schedules only vCPUs in those CPUs that the guest is pinned to. By default, the guest can be scheduled on any CPU.

The advantage of pinning is that it can improve data locality. Two threads on the same core using the same data are able to share it on a local cache. The same thing happens for two cores on the same NUMA node.

Example 5-11 shows a configuration with four vCPUs without SMT turned on (SMT=1), where the four vCPUs are pinned to the first four cores in the first socket of the host.

Example 5-11 CPU pinning without SMT

<cpu>

</cpu>

If the topology fits the system layout, for example within a Power System S812L with two physical sockets and six cores in each socket, this configuration makes sure that this guest only runs in the first socket of the system.

To verify if the pinning works correctly, the following commands can be used on the PowerKVM host as shown in Example 5-12.

Example 5-12 Verification on CPU pinning

# ppc64_cpu --info

Core 0: 0* 1 2 3 4 5 6 7 <-- Physical Socket 1

Core 1: 8* 9 10 11 12 13 14 15

Core 2: 16* 17 18 19 20 21 22 23

Core 3: 24* 25 26 27 28 29 30 31

Core 4: 32* 33 34 35 36 37 38 39

Core 5: 40* 41 42 43 44 45 46 47

Core 6: 48* 49 50 51 52 53 54 55 <-- Physical Socket 2

Core 7: 56* 57 58 59 60 61 62 63

Core 8: 64* 65 66 67 68 69 70 71

Core 9: 72* 73 74 75 76 77 78 79

Core 10: 80* 81 82 83 84 85 86 87

Core 11: 88* 89 90 91 92 93 94 95

# ps -ef | grep qemu | grep linux-guest

qemu 30179 1 5 17:44 ? 00:00:48 /usr/bin/qemu-system-ppc64 -name linux-guest-1 -S -machine pseries-2.4,accel=kvm,usb=off -m

...

# taskset -cp 30179

pid 30179's current affinity list: 0,8,16,24

With SMT turned on in the guest, pinning CPUs works the same way, as SMT is not activated on the host. In an example with SMT 4, the first four guest vCPUs are mapped to threads 0, 1, 2, and 3 of the core 0 on the host. The second four guest vCPUs are mapped to threads 8, 9, 10, and 11 of the core 1 on the host, and so on. Example 5-13 shows the same configuration as in the previous example but with SMT 4.

Example 5-13 CPU pinning with SMT

<cpu>

</cpu>

Note: All threads of a core must be running on the same physical core. It is not supported to activate SMT on the PowerKVM host and pin single threads to different cores.

CPU pinning can be also used with subcores, which is explained in detail in 5.3.3, “Micro-Threading” on page 136. Also, in this case the pinning works in the same manner. In Example 5-14 on page 144, a guest using four subcores with two threads each is pinned to the first physical core.

Example 5-14 CPU pinning with subcores

# ppc64_cpu --info

Core 0:

Subcore 0: 0* 1

Subcore 1: 2* 3

Subcore 2: 4* 5

Subcore 3: 6* 7

Core 1:

Subcore 4: 8* 9

Subcore 5: 10* 11

Subcore 6: 12* 13

Subcore 7: 14* 15

Core 2:

Subcore 8: 16* 17

Subcore 9: 18* 19

Subcore 10: 20* 21

Subcore 11: 22* 23

...

<cpu>

</cpu>

5.3.6 CPU shares

In a kernel-based virtual machine (KVM), the virtual machines run as processes on the host. This means that they are scheduled to run on host CPUs just like any other process. The implication is that CPUs are shared by default. This CPU sharing allows CPU overcommitment, that is, creating more vCPUs than there are CPUs on the system.

The Linux scheduler spreads the vCPUs among the CPU cores. However, when there is overcommitment, multiple vCPUs can share a CPU core. To balance the amount of time that a virtual machine has compared to another virtual machine, you can configure shares.

Example 5-15 demonstrates how to configure the relative share time for a guest. By default, guests have a relative share time of 1024. Two guests with share time of 1024 shares the CPU for the same amount of time. If a third guest has a share time of 256, it runs a quarter of the time, relative to the other guests. A guest with a share time of 2048 runs twice the time compared to the other guests.

Example 5-15 CPU shares

</cputune>

This share time applies only when there is sharing either because of CPU pinning or because of CPU overcommitment. If vCPUs are idle or only a few vCPUs have been allocated, it is possible that a guest with a share time of 256 will be able to run on a CPU without sharing. If another guest needs to run on that same CPU, the configured share time will be in effect.

5.4 CPU Hotplug

Starting with PowerKVM V3.1, CPU Hotplug is supported. CPU Hotplug allows to add or remove CPUs in a running guest operating system. To support CPU Hotplug, the operating system needs the minimum required versions of the following packages.

Table 5-4 Required packages to support CPU and memory Hotplug

Package	Minimum required version
powerpc-utils	1.2.26
ppc64-diag	2.6.8
librtas	1.3.9

The addition or removal of CPUs is done on a per socket basis as defined in the CPU section in the guests XML file. A socket in that sense is not necessarily a physical socket of the Power System. It is just a virtual definition.

Before you start a hotplug operation, ensure that the rtas_errd daemon is running inside the guest:

# ps -ef | grep rtas
root 1367 1 0 09:22 ? 00:00:00 /usr/sbin/rtas_errd

The following examples were created on a Power System S812L with six cores on two sockets, giving a total of 12 cores in the system. The XML file of the guest system contains the following configuration as written in Example 5-16.

Example 5-16 Base definition of sockets, cores, and threads for CPU Hotplug

...

<cpu>

</cpu>

In Example 5-16, we defined a guest with 12 sockets, each with one core and eight threads, giving a total of 96 vCPUs. The guest will start with eight vCPUs, which is one socket with one CPU and eight threads, as defined with the attribute current in the vcpu section. From a CPU Hotplug perspective, the guest can be increased in steps of eight vCPUs up to 96 vCPUs (12 cores with eight threads).

The Hotplug task itself works in the same manner as PCI Hotplug works. An XML snippet is needed to define a sequence number for the additional socket. The snippet defining the first socket to be added begins with sequence number 0 as defined in Example 5-17.

Example 5-17 XML snippet for Hotplugging a socket

# cat cpu_hot_0.xml

<spapr-cpu-socket id="0">

</spapr-cpu-socket>

Note: spapr-cpu-socket stands for Server IBM Power Architecture® Platform Reference CPU socket.

This snippet can be attached to the running guest with a virsh attach-device command as described in Example 5-18.

Example 5-18 CPU Hotplug example

[linux-guest]# lscpu

Architecture: ppc64le

Byte Order: Little Endian

CPU(s): 8

On-line CPU(s) list: 0-7

Thread(s) per core: 8

Core(s) per socket: 1

Socket(s): 1

NUMA node(s): 1

Model: IBM pSeries (emulated by qemu)

L1d cache: 64K

L1i cache: 32K

NUMA node0 CPU(s): 0-7

[powerkvm-host]# virsh attach-device linux-guest cpu_hot_0.xml --live

Device attached successfully

[linux-guest]# lscpu

Architecture: ppc64le

Byte Order: Little Endian

CPU(s): 16

On-line CPU(s) list: 0-15

Thread(s) per core: 8

Core(s) per socket: 1

Socket(s): 2

NUMA node(s): 1

Model: IBM pSeries (emulated by qemu)

L1d cache: 64K

L1i cache: 32K

NUMA node0 CPU(s): 0-15

Note: A persistent attachment of CPUs in the XML file by using the --config attribute is not supported.

In Example 5-18 on page 146, we added another socket with one core and eight threads. This can be repeated with a snippet containing the next available sequence number, for example 1. If the SMT type was changed in the meantime, CPU Hotplug is also possible. In that case, only the vCPUs matching the SMT mode will be online. The other vCPUs remain offline.

Example 5-19 continues Example 5-18 on page 146 by changing the SMT mode to 4 and adding another socket.

Example 5-19 Change SMT mode and Hotplug another socket

[linux-guest]# ppc64_cpu --smt

SMT=8

[linux-guest]# ppc64_cpu --smt=4

[linux-guest]# lscpu

Architecture: ppc64le

Byte Order: Little Endian

CPU(s): 16

On-line CPU(s) list: 0-3,8-11

Off-line CPU(s) list: 4-7,12-15

Thread(s) per core: 4

Core(s) per socket: 1

Socket(s): 2

NUMA node(s): 1

Model: IBM pSeries (emulated by qemu)

L1d cache: 64K

L1i cache: 32K

NUMA node0 CPU(s): 0-3,8-11

[powerkvm-host]# cat cpu_hot_1.xml

<spapr-cpu-socket id="1">

</spapr-cpu-socket>

[powerkvm-host]# virsh attach-device linux-guest cpu_hot_1.xml --live

Device attached successfully

[linux-guest]# lscpu

Architecture: ppc64le

Byte Order: Little Endian

CPU(s): 24

On-line CPU(s) list: 0-3,8-11,16-23

Off-line CPU(s) list: 4-7,12-15

Thread(s) per core: 5

Core(s) per socket: 1

Socket(s): 3

NUMA node(s): 1

Model: IBM pSeries (emulated by qemu)

L1d cache: 64K

L1i cache: 32K

NUMA node0 CPU(s): 0-3,8-11,16-23

Note: It is not possible to attach two or more sockets with one snippet. However, this can be done by using several snippets in a loop.

Removing sockets using CPU Hotplug is also supported. To remove sockets, the same snippets are needed. The snippets must be applied using virsh detach-device in the opposite direction as the addition of the sockets. It is not possible to remove a lower sequence number before a higher sequence number. Example 5-20 shows the removal of one socket continuing the previous example.

Example 5-20 Removal of one socket using CPU Hotplug

[powerkvm-host]# virsh detach-device linux-guest cpu_hot_0.xml --live

error: Failed to detach device from cpu_hot_0.xml

error: unsupported configuration: Non-contiguous socket index '0' not allowed. Expecting : 1

[powerkvm-host]# virsh detach-device linux-guest cpu_hot_1.xml --live

Device detached successfully

[linux-guest]# lscpu

Architecture: ppc64le

Byte Order: Little Endian

CPU(s): 16

On-line CPU(s) list: 0-3,8-11

Off-line CPU(s) list: 4-7,12-15

Thread(s) per core: 4

Core(s) per socket: 1

Socket(s): 2

NUMA node(s): 1

Model: IBM pSeries (emulated by qemu)

L1d cache: 64K

L1i cache: 32K

NUMA node0 CPU(s): 0-3,8-11

5.4.1 CPU Hotplug with a NUMA configuration

For a NUMA guest, the NUMA node to which a CPU will be hotplugged, is determined by the NUMA topology defined in the guest XML. Example 5-21 shows a two-NUMA cells configuration with two sockets each and how the sockets will be populated using CPU Hotplug, according to the NUMA definition in the XML file.

Example 5-21 CPU Hotplug with a NUMA configuration

[powerkvm-host]# virsh edit linux-guest

...

...

<cpu>

<numa>

</numa>

</cpu>

...

[linux-guest]# lscpu

Architecture: ppc64le

Byte Order: Little Endian

CPU(s): 8

On-line CPU(s) list: 0-7

Thread(s) per core: 8

Core(s) per socket: 1

Socket(s): 1

NUMA node(s): 2

Model: IBM pSeries (emulated by qemu)

L1d cache: 64K

L1i cache: 32K

NUMA node0 CPU(s): 0-7

NUMA node1 CPU(s):

[powerkvm-host]# virsh attach-device linux-guest cpu_hot_0.xml --live

Device attached successfully

[linux-guest]# lscpu

Architecture: ppc64le

Byte Order: Little Endian

CPU(s): 16

On-line CPU(s) list: 0-15

Thread(s) per core: 8

Core(s) per socket: 1

Socket(s): 2

NUMA node(s): 2

Model: IBM pSeries (emulated by qemu)

L1d cache: 64K

L1i cache: 32K

NUMA node0 CPU(s): 0-7

NUMA node1 CPU(s): 8-15

[powerkvm-host]# virsh attach-device linux-guest cpu_hot_1.xml --live

Device attached successfully

[linux-guest]# lscpu

Architecture: ppc64le

Byte Order: Little Endian

CPU(s): 24

On-line CPU(s) list: 0-23

Thread(s) per core: 8

Core(s) per socket: 1

Socket(s): 3

NUMA node(s): 2

Model: IBM pSeries (emulated by qemu)

L1d cache: 64K

L1i cache: 32K

NUMA node0 CPU(s): 0-7,16-23

NUMA node1 CPU(s): 8-15

[powerkvm-host]# virsh attach-device linux-guest cpu_hot_2.xml --live

Device attached successfully

[linux-guest]# lscpu

Architecture: ppc64le

Byte Order: Little Endian

CPU(s): 32

On-line CPU(s) list: 0-31

Thread(s) per core: 8

Core(s) per socket: 1

Socket(s): 4

NUMA node(s): 2

Model: IBM pSeries (emulated by qemu)

L1d cache: 64K

L1i cache: 32K

NUMA node0 CPU(s): 0-7,16-23

NUMA node1 CPU(s): 8-15,24-31

5.4.2 Considerations for CPU Hotplug

There are some considerations when using CPU Hotplug.

No removal of sockets that were present at the time of starting the guest

Only added Hotplug sockets can be removed by using a Hotplug action. If for example the guest was started with two sockets (as defined in the XML definition), and one socket should be removed by using virsh detach-device, this results in an error.

No CPU Hotplug with unfilled sockets

CPU Hotplug is not possible in a configuration where a socket was not completely used at starting time. The following configuration in Example 5-22 does not support CPU Hotplug. The guest starts with only four vCPUs, which means that only one of the two cores of the socket is used.

Example 5-22 Unsupported configuration for CPU Hotplug

<cpu>

</cpu>

5.5 Memory

With virtualization, the memory is basically static, which means it is not virtualized like the CPU, and a block of memory is mapped directly to a single (and just one) virtual machine.

Because each virtual machine is also a hypervisor Linux thread, the memory can be overcommitted.

This section covers methods to improve the performance of PowerKVM memory management. These methods involve resizing the guest memory dynamically and merging identical guests pages on the hypervisor.

5.5.1 Memory allocation

Guest memory is allocated by the host according to the guest configuration. It is possible to set a maximum amount of memory and a current amount. The guest will have the maximum amount of memory available, but it can choose to use only the current amount and release the remaining amount to the host. See 5.5.2, “Memory ballooning” on page 151.

Example 5-23 shows the configuration for the maximum amount of memory allocated to the guest on the memory element and the current amount of memory on the currentMemory element. Since PowerKVM V3.1, it is possible to also increase the memory across the maximum amount by using memory hotplug as described in 5.6, “Memory Hotplug” on page 157.

Example 5-23 Memory allocation

Note: On the guest, you might notice that there is a total amount of memory that is less than what is set as the current amount. This might happen because the guest subcore has reserved an amount of memory for some reason. One example is the crashkernel command, which is used for a kernel dump.

5.5.2 Memory ballooning

Memory ballooning is a technique that allows the guest memory to be increased or decreased cooperatively, depending on the amount of free memory available on the guests and hypervisor.

When memory ballooning is enabled on the guest, the hypervisor can remove and add memory to the guest dynamically.

This technique can be used if the memory should be overcommitted, which means assigning the guests in sum more memory that the system provides. In case a guest needs more memory and another guest needs less memory at the same time, the memory is used more efficiently. But if all guests need their assigned overcommitted memory, this can cause a bad performance because in that case the host starts to swap pages to disk.

How to enable and manage memory ballooning on PowerKVM

Memory ballooning is enabled by default using the virtio memballon model as shown in Example 5-24. Only if you want do disable ballooning, change the model to none.

Example 5-24 Enable memory balloon on the guest

</devices>

When a guest is configured to support ballooning, the memory can be added and removed from the virtual machine using the virsh setmem linux-guest command. The total memory allocated to the virtual machine can be seen with the virsh dommemstat command.

Example 5-25 shows a virtual machine initially with 2 GB memory. After the virsh setmem linux-guest 1048576 --config --live command, the memory assigned to that partition goes to 1 GB. The --live flag changes the amount of memory in the running guest and the --config changes the currentMemory unit tag in the XML file. The two flags can be used individually or together.

Example 5-25 Decreasing the virtual machine memory to 1 GB

# virsh dommemstat linux-guest

actual 2097152

swap_in 46312795184

rss 1954752

# virsh setmem linux-guest 1048576 --config --live

# virsh dommemstat linux-guest

actual 1048576

swap_in 46312795184

rss 1955200

Note: If the virtual machine or the guest operating system is not configured properly to support virtio ballooning, the following message displays on the hypervisor:

Error: Requested operation is not valid: Unable to change memory of active domain without the balloon device and guest OS balloon drive.

Monitoring

To check whether the memory ballooning is working on the guest, you can check with the QEMU monitor that is running the command, as shown in Example 5-26. If the balloon is not available in the virtual machine, the output is “Device balloon has not been activated.”

Example 5-26 Output of memory available on the balloon

# virsh qemu-monitor-command --domain linux-guest --hmp ‘info balloon’

ballon: actual=3559

To change the amount of memory in the guest, the ‘balloon <memory in MB>’ command is used, as in Example 5-27, that changes the memory from 3559 MB to 1024 MB. After this command, only 1024 MB of memory is available to the guest.

Example 5-27 Changing the memory allocated to the virtual machine

(qemu) virsh qemu-monitor-command --domain linux-guest --hmp ‘info balloon’

ballon: actual=3559

(qemu) virsh qemu-monitor-command --domain linux-guest --hmp ‘ballon 1024’

(qemu) virsh qemu-monitor-command --domain linux-guest --hmp ‘info balloon’

ballon: actual=1024

Note: Most of the operating systems have virtio-balloon embedded into the kernel. If you are using an operating system that does not have the virtio-balloon device driver in the kernel, you need to install it manually.

5.5.3 Kernel SamePage Merging

Kernel SamePage Merging (KSM) is a KVM technology that merges blocks of memory pages with the same content to reduce the memory use in the hypervisor.

KSM technology can detect that two virtual machines have identical memory pages. In that case, it merges both pages in the same physical memory page, which reduces that amount of memory use. To do so, a certain number of CPU cycles is used to scan and spot these pages.

For example, Figure 5-5 shows that all three virtual machines have pages that contain the same content. In this case, when KSM is enabled, all four pages that contain the same content will use only one physical memory block.

Figure 5-5 KSM mapping when VM uses the same page

There is a similar feature found in the PowerVM hypervisor, called Active Memory Deduplication. For more information about this feature, see “Power Systems Memory Deduplication, REDP-4827.”

How to enable Kernel SamePage Merging on PowerKVM

KSM is supported in PowerKVM server virtualization, but it is not enabled automatically.

To verify whether KSM is running and to enable and disable it, you need to interact with the /sys/kernel/mm/ksm/run file.

Important: The ksmtuned daemon must be running to run KSM. PowerKVM already has this daemon running automatically, so you do not need to turn it on. To verify that the daemon is running, check Example 5-28.

Example 5-28 Verify that the ksmtuned daemon is running

# systemctl status ksmtuned

ksmtuned.service - Kernel Samepage Merging (KSM) Tuning Daemon

Loaded: loaded (/usr/lib/systemd/system/ksmtuned.service; enabled)

Active: active (running) since Sat 2014-05-10 10:55:52 EDT; 2 days ago

Main PID: 18420 (ksmtuned)

CGroup: name=systemd:/system/ksmtuned.service

17510 sleep 60

18420 /bin/bash /usr/sbin/ksmtuned

Example 5-29 shows that KSM is disabled and how to enable it.

Example 5-29 Enable KSM in PowerKVM

# cat /sys/kernel/mm/ksm/run

# echo 1 > /sys/kernel/mm/ksm/run

# cat /sys/kernel/mm/ksm/run

Monitoring KSM

To monitor the pages being merged by KSM, check the /sys/kernel/mm/ksm files. The subsections that follow explain some of the status files.

Pages shared

The /sys/kernel/mm/ksm/pages_shared file shows how many merged pages exist in the system. Example 5-30 shows that 2976 pages are shared by two or more virtual machines in the system.

Example 5-30 Number of pages shared in the hypervisor

# cat /sys/kernel/mm/ksm/page_shared

2976

Pages sharing

The /sys/kernel/mm/ksm/pages_sharing file shows how many pages on the virtual machines are using a page that is shared and merged in the hypervisor. Example 5-31 shows the number of pages in the virtual machines that are linked to a shared page in the hypervisor.

Example 5-31 Number of pages that are linked to a shared page

# cat /sys/kernel/mm/ksm/page_sharing

6824

Looking at both of the previous examples, you see that 6824 virtual pages are using 2976 physical pages, which means that 3848 pages are saved. Considering 64 KB pages, this means that approximately 246 MB of memory was saved by using this feature.

There are some other monitoring options for KSM, as shown in Table 5-5.

/sys/kernel/mm/ksm options	Description
pages_unshared	How many pages are candidates to be shared but are not shared at the moment
pages_volatile	The number of pages that are candidates to be shared but are being changed so frequently that they will not be merged
full_scans	How many times the KSM scanned the pages looking for duplicated content
merge_across_nodes	Option to enable merges across NUMA nodes (disable it for better performance)
pages_to_scan	How many pages the KSM algorithm scans per turn before sleeping
sleep_milisecs	How many milliseconds ksmd should sleep before the next scan

Table 5-5 KSM options

5.5.4 Huge pages

Huge pages is a Linux feature that uses the processor capability to use multiple page sizes. POWER processors support multiple page sizes since POWER5. Some workloads benefit from using a larger page size. IBM Power Systems that run Linux can use 16 MiB page sizes.

On IBM PowerKVM, a guest must have its memory backed by huge pages for the guest to be able to use it. You need to enable huge pages on the host and configure the guest to use huge pages before you start it.

Example 5-32 demonstrates how to enable huge pages on the host. Run the command on a host shell. The number of pages to use depends on the total amount of memory for guests that are backed by huge pages. In this example, 4 GB of memory is reserved for huge pages (256 pages with 16384 KB each).

Example 5-32 Setting huge pages on the host

# echo 256 > /proc/sys/vm/nr_hugepages

# grep -i hugepage /proc/meminfo

HugePages_Total: 256

HugePages_Free: 256

HugePages_Rsvd: 0

HugePages_Surp: 0

Hugepagesize: 16384 kB

Example 5-33 shows an excerpt from an XML configuration file for a guest, demonstrating how to enable huge pages. The memoryBacking element must be inside the domain element of the XML configuration file.

Example 5-33 Enabling huge pages on a guest

</memoryBacking>

If there are not enough huge pages to back your guest memory, you will see the error in Example 5-34. Try increasing the number of huge pages on the host.

Example 5-34 Error starting a guest with huge pages

# virsh start linux-guest

error: Failed to start domain linux-guest

error: internal error: early end of file from monitor: possible problem:

2015-11-04T17:46:01.720148Z qemu-system-ppc64: unable to map backing store for hugepages: Cannot allocate memory

5.5.5 Restrict NUMA memory allocation

It is possible to restrict a guest to allocate memory from a set of NUMA nodes. If the guest vCPUs are also pinned to a set of cores on that same set of NUMA nodes, memory access will be local, which improves memory access performance.

Example 5-35 presents the output of a command on the PowerKVM host that shows how many pages have been allocated on every node before restricting the guest to only one NUMA node.

Example 5-35 Memory allocation to NUMA nodes before restricting it to one node

# cat /sys/fs/cgroup/memory/machine.slice/machine-qemux2dlinux-guestx2d1.scope/memory.numa_stat

total=27375 N0=23449 N1=3926

file=0 N0=0 N1=0

anon=27375 N0=23449 N1=3926

unevictable=0 N0=0 N1=0

hierarchical_total=27375 N0=23449 N1=3926

hierarchical_file=0 N0=0 N1=0

hierarchical_anon=27375 N0=23449 N1=3926

hierarchical_unevictable=0 N0=0 N1=0

The output shows that most of the memory is assigned to NUMA node 0 (N0) but some memory to NUMA node 1 (N1).

Note: The path in the command contains the name of the guest (in Example 5-35 linux-guest) and is only available when the guest is running.

Example 5-36 presents a possible configuration to restrict a guest to NUMA node 0.

Example 5-36 NUMA node set

</numatune>

Note: To find out how many nodes a system contains, use the numactl -H command. An example output is contained in Example 5-42 on page 159.

After restarting the guest and if the system has enough free memory on NUMA node 0, the command lists that all memory now fits into NUMA node 0 as shown in Example 5-37 on page 157.

Example 5-37 Memory allocation to NUMA nodes after restricting it to one node

# cat /sys/fs/cgroup/memory/machine.slice/machine-qemux2dlinux-guestx2d1.scope/memory.numa_stat

total=24751 N0=24751 N1=0

file=0 N0=0 N1=0

anon=24751 N0=24751 N1=0

unevictable=0 N0=0 N1=0

hierarchical_total=24751 N0=24751 N1=0

hierarchical_file=0 N0=0 N1=0

hierarchical_anon=24751 N0=24751 N1=0

hierarchical_unevictable=0 N0=0 N1=0

Note: The number of memory pages shown here is used pages by the guest. Therefore, the number changes over time.

5.6 Memory Hotplug

Memory Hotplug was introduced in PowerKVM V3.1 and allows to increase the memory over the maximum amount of memory that is defined by the memory attribute in the XML file. Memory Hotplug uses up to 32 (virtual) hotpluggable DIMM modules that can be added to a domain. The hotplugged DIMM modules can be different in size and are not limited to a maximum amount. Only the guest definition limits the maximum amount of a DIMM that can be added.

Only adding of memory is supported. It is not possible to remove DIMMs that were added using memory hotplug. Memory Hotplug assigns contiguous chunks of memory to the guest. By adding memory using memory ballooning this is not necessarily the case, which can result in memory fragmentation. Although it is possible to also reduce the memory with memory ballooning if the guest supports it, as described in 5.5.2, “Memory ballooning” on page 151.

Before using memory hotplug, ensure that the guest operating system has the required packages installed as listed in Table 5-4 on page 145.

Like CPU Hotplug, a memory DIMM can be added by using an XML snippet that defines the size of the DIMM that should be added. Example 5-38 shows a snipped for a DIMM of 4 GB.

Example 5-38 XML snippet for a DIMM with 4 GB

</target>

</memory>

Note: In comparison to CPU Hotplug, there is no sequence number needed. That means a snippet can be used several times for one running guest.

To attach a memory DIMM to a running domain, use the virsh attach-device <snipped.xml> --live command for adding CPU or other devices. In Example 5-39 on page 158, we show how to increase the memory of a guest with a maximum of 4 GB of memory first by 2 GB and then by another 4 GB giving 10 GB in total. After that, we reduce the memory by using ballooning back to 4 GB.

Example 5-39 Example of how to increase the memory by using memory hotplug

[powerkvm-host]# virsh dumpxml linux-guest

...

...

[linux-guest]# free -m

total used free shared buff/cache available

Mem: 3558 587 2378 19 592 2812

Swap: 1023 0 1023

[powerkvm-host]# cat mem_hot_2G.xml

</target>

</memory>

[powerkvm-host]# virsh attach-device linux-guest mem_hot_2G.xml --live

Device attached successfully

[linux-guest]# free -m

total used free shared buff/cache available

Mem: 5606 615 4367 19 623 4817

Swap: 1023 0 1023

[powerkvm-host]# cat mem_hot_2G.xml

</target>

</memory>

[powerkvm-host]# virsh attach-device linux-guest mem_hot_4G.xml --live

Device attached successfully

[linux-guest]# free -m

total used free shared buff/cache available

Mem: 9702 635 8442 19 625 8883

Swap: 1023 0 1023

[powerkvm-host]# virsh dommemstat linux-guest

actual 10485760

swap_in 0

rss 1879744

[powerkvm-host]# virsh setmem linux-guest 4194304 --live

[linux-guest]# free -m

total used free shared buff/cache available

Mem: 3558 618 2315 19 625 2756

Swap: 1023 0 1023

In Example 5-39 on page 158, we reduced the amount of memory back to its original value, but remember that nevertheless two DIMMs were added to the running guest and are still added. That means only 30 DIMMs (out of 32) are left for being hotplugged.

Remember: It is not possible to remove the added DIMMs by using the memory hotplug function.

Memory DIMMs can be also added persistently to the configuration of the guest by adding --config to the attach command as shown in Example 5-40. The DIMMs are added into the devices section of the guest XML.

Example 5-40 Persistent attachment of DIMMs

# virsh attach-device linux-guest mem_hot_1G.xml --live --config

# virsh edit linux-guest

...

...

</target>

</memory>

...

</devices>

This section describes additional options and possibilities used with memory hotplug.

Memory Hotplug in a NUMA configuration

Memory Hotplug can be also used within a NUMA configuration. In this case, the NUMA node is specified in the XML snippet. Example 5-41 shows a snippet defining that the memory DIMM should be added to node 1.

Example 5-41 Memory Hotplug snippet in a NUMA environment

</target>

</memory>

Example 5-42 shows how to attach 1 GB of memory to just NUMA node 1 by using the snippet as shown in Example 5-41.

Example 5-42 Memory Hotplug within a NUMA configuration

[linux-guest]# numactl -H

available: 2 nodes (0-1)

node 0 cpus: 0 1 2 3 4 5 6 7

node 0 size: 2048 MB

node 0 free: 1074 MB

node 1 cpus: 8 9 10 11 12 13 14 15

node 1 size: 2048 MB

node 1 free: 1152 MB

node distances:

node 0 1

0: 10 40

1: 40 10

[powerkvm-host]# virsh attach-device linux-guest mem_hot_1G_numa1.xml --live

Device attached successfully

[linux-guest]# numactl -H

available: 2 nodes (0-1)

node 0 cpus: 0 1 2 3 4 5 6 7

node 0 size: 2048 MB

node 0 free: 1036 MB

node 1 cpus: 8 9 10 11 12 13 14 15

node 1 size: 3072 MB

node 1 free: 2166 MB

node distances:

node 0 1

0: 10 40

1: 40 10

Also in a NUMA environment, the DIMMs can be added persistently by adding --config to the virsh attach-device command. As a result, the DIMMs are added including the correct cell (node) definition for the DIMMs as shown in Example 5-43. The example also shows that in this case, the maximum memory of the guest is higher than the sum of memory defined in the NUMA section of the XML file.

Example 5-43 Persistent Hotplug Memory DIMMs

# virsh attach-device linux-guest mem_hot_1G_numa1.xml --live --config

# virsh edit linux-guest

...

<cpu>

<numa>

</numa>

</cpu>

...

</target>

</memory>

</device>

Huge pages support

Guests that use huge pages are also supported by memory hotplug. If enough huge pages are available, these can be added to a guest using the same methodology as described in this chapter. For more information about huge pages, see 5.5.4, “Huge pages” on page 155.

Live migration support

Guests with virtual DIMMs added using memory hotplug can also be migrated to a different host.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5. Processor and memory virtualization

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 5. Processor and memory virtualization