1.6.3 Monitoring the CPU resource
This subsection explains how to monitor the CPU resource of a logical partitioning feature system.
- Organization of this subsection
(1) Overview
In a system with logical partitioning feature, CPUs on the host machine are allocated to each LPAR and used accordingly. The CPU resource allocated to each LPAR is called a virtual CPU. The OS running on a LPAR recognizes a virtual CPU as a normal physical CPU.
In logical partitioning feature, there are the following two ways to allocate a physical CPU to a LPAR:
-
This is a mode in which a single LPAR exclusively uses the specified number of physical CPUs. The rate of CPU resource usage per LPAR can be adjusted by changing the number of allocated CPUs.
-
This is a mode in which multiple LPARs share the specified number of physical CPUs. The rate of CPU resource usage per LPAR can be adjusted by setting the CPU service rate.
In addition, a hypervisor that manages the system with logical partitioning feature uses all physical CPU resources. The hypervisor can be divided into the kernel component (which is called SYS1), and the communication and service component (which is called SYS2).
The following figure shows the relationship among LPARs, the hypervisor, and virtual CPUs.
|
If more than one LPAR running in shared mode exists, the following problem may occur:
-
Although there are free CPU resources allocated in shared mode available for the entire host machine, a specific LPAR runs short of CPU resource.
If this problem occurs, you must revise the settings related to the processor capping function and idle detection function of logical partitioning feature to make effective use of CPU resources allocated in shared mode.
In addition, if a mix of dedicated and shared modes is used for operation, the following problem may occur:
-
A shortage of CPU resources for LPARs running in shared mode reduces performance of the LPARs because the LPARs cannot use physical CPU resources in dedicated mode.
If this problem occurs, you must switch the mode of LPARs from dedicated mode to shared mode to distribute the workload of the LPARs.
By monitoring CPU performance data, you can detect such performance deterioration in the LPARs, and thus you can take an appropriate corrective action. The following four records are used to monitor the CPU resource. For details about the records, see 5. Records.
-
PI record
This record is used to monitor the performance data of the host machine's CPU.
-
PI_HCI record
This record is used to monitor the performance data of each core of the physical CPU.
-
PI_VI record
This record is used to monitor the performance data of the CPU that is being used by each LPAR.
-
PI_VCI record
This record is used to monitor the performance data of each virtual CPU.
The following figure shows the range of performance data collected in each record.
|
Note that a system with logical partitioning feature uses CPU resources allocated to SYS2 when providing the virtual NIC service. Therefore, performance of CPU resources is affected by virtual NIC usage. By monitoring CPU resources and virtual NIC related resources simultaneously, you can capture the performance of the system with logical partitioning feature more effectively.
- Tip
-
A system with logical partitioning feature uses the HBA for accessing disks in LPARs. HBA processing consumes CPU resources for SYS1. However, the effects are still less than when the virtual NIC service consumes CPU resources.
(2) Monitoring examples
Using CPU resource monitoring on LPARs vhost1 and vhost2 as an example, this subsection explains the factors that cause insufficient CPU resources, and how to solve this problem. The following figure shows the items monitored here and the flow of actions to take.
|
(a) Example of monitoring CPU insufficiency in a LPAR
You can monitor the LPAR's CPU insufficiency in the Insufficient % field of the PI_VI record. If a sufficient amount of CPU resources have been allocated to the LPAR, the CPU insufficiency approaches 0%. Note that you can monitor this item with an alarm provided in a monitoring template.
The figure below shows an example of such monitoring.
|
- Monitoring template report to be checked
- Monitoring template alarm to be checked
In this example, there appears to be severe insufficiency in the CPU resource of vhost2. In this case, check the CPU usage of SYS2.
(b) Example of monitoring CPU usage of the host machine
You can monitor the CPU usage of the host machine in the VM Used, VMM Kernel Used, and VMM Others Used fields of the PI record. The VM Used field indicates the CPU usage of each LPAR. The VMM Kernel Used field indicates the CPU usage of SYS1. The VMM Others Used field indicates the CPU usage of SYS2.
The figure below shows an example of such monitoring.
|
- Monitoring template report to be checked
If the CPU usage of SYS2 exceeds the threshold, the virtual NIC may be heavily loaded. For details about how to check and handle such a situation, see 1.6.6 Monitoring the network resources.
- Tip
-
As a guideline, the threshold for the CPU usage of SYS2 should be as much as the usage of two CPU cores. For example, if there are CPUs with eight cores on a system, the threshold would be an amount of use equivalent to 25% of the total usage.
Also, if the total CPU usage of SYS1 and SYS2 exceeds the threshold, the HBA may be heavily loaded when accessing disks. For details about how to check and handle such a situation, see 1.6.5 Monitoring disk resources.
- Tip
-
As a guideline for the total CPU usage of SYS1 and SYS2, the threshold is 90% of the total usage.
(c) Example of checking the maximum CPU allocation size and CPU allocation balancing value of a LPAR
You can check the maximum CPU allocation size of a LPAR in the Max field of the PI_VI record. You can also check the CPU allocation balancing value of a LPAR in the Expectation field of the PI_VI record. Comparing the maximum CPU allocation sizes and CPU allocation balancing values enables you to investigate a cause of lack of CPU resources in LPARs.
The figure below shows an example of such monitoring.
|
- Compound report to be checked (see 1.9)
-
Monitoring of virtual machine CPU allocation upper limit
If CPU resources in a LPAR are insufficient, compare the values in the Max and Expectation fields. By using the comparison results, you may be able to address a lack of CPU resources, as described below:
-
If the value in the Max field is more than the one in the Expectation field
The CPU service rate of the LPAR is set to a low value. Check and, if necessary, revise the setting value of the service rate.
-
If the value in the Max field is equal to the one in the Expectation field
The processor capping function restricts the amount of CPU resources allocated to LPARs. Check and, if necessary, revise the setting value of the processor capping function.
(3) Settings of the CPU service rate, processor capping function, and CPU idle detection function
A system with logical partitioning feature provides the functions associated with CPU allocations to LPARs that are described below. Note that in some cases, depending on the settings of these functions, CPU resources cannot be appropriately allocated to LPARs.
- CPU service rate and processor capping function
-
Based on the CPU service rate setting, a CPU allocation to LPARs can be specified by percentage. Also, if the processor capping function is enabled, the CPU service rate value will be the upper limit of the amount to be allocated even if the CPU allocation to LPARs is insufficient.
If the CPU service rate is set to a low value and the processor capping function is enabled for a LPAR that consumes more CPU resources, sufficient CPU resources may not be allocated to the LPAR.
- CPU idle detection function
-
This function detects whether CPUs allocated to a LPAR is in idle state. In a system with logical partitioning feature, if a CPU for a LPAR is in idle state, its resources are allocated to LPARs with insufficient CPU resources allocated. When the CPU idle detection function is disabled, CPUs are not allocated to other LPARs even if they are in idle state, possibly leading to ineffective use of CPU resources.