Hitachi

JP1 Version 12 JP1/Performance Management - Remote Monitor for Virtual Machine Description, User's Guide and Reference


1.9.3 Monitoring the CPU resource

This subsection explains how to monitor the CPU resource of a logical partitioning feature system.

Organization of this subsection

(1) Overview

In a system with logical partitioning feature, CPUs on the host machine are allocated to each LPAR and used accordingly. The CPU resource allocated to each LPAR is called a virtual CPU. The OS running on a LPAR recognizes a virtual CPU as a normal physical CPU.

In logical partitioning feature, there are the following two ways to allocate a physical CPU to a LPAR:

In addition, a hypervisor that manages the system with logical partitioning feature uses all physical CPU resources. The hypervisor can be divided into the kernel component (which is called SYS1), and the communication and service component (which is called SYS2).

The following figure shows the relationship among LPARs, the hypervisor, and virtual CPUs.

Figure 1‒92: Relationship among LPARs, the hypervisor, and virtual CPUs

[Figure]

If more than one LPAR running in shared mode exists, the following problem may occur:

If this problem occurs, you must revise the settings related to the processor capping function and idle detection function of logical partitioning feature to make effective use of CPU resources allocated in shared mode.

In addition, if a mix of dedicated and shared modes is used for operation, the following problem may occur:

If this problem occurs, you must switch the mode of LPARs from dedicated mode to shared mode to distribute the workload of the LPARs.

By monitoring CPU performance data, you can detect such performance deterioration in the LPARs, and thus you can take an appropriate corrective action. The following four records are used to monitor the CPU resource. For details about the records, see 5. Records.

  1. PI record

    This record is used to monitor the performance data of the host machine's CPU.

  2. PI_HCI record

    This record is used to monitor the performance data of each core of the physical CPU.

  3. PI_VI record

    This record is used to monitor the performance data of the CPU that is being used by each LPAR.

  4. PI_VCI record

    This record is used to monitor the performance data of each virtual CPU.

The following figure shows the range of performance data collected in each record.

Figure 1‒93: Correspondence between records and data collection ranges

[Figure]

Note that a system with logical partitioning feature uses CPU resources allocated to SYS2 when providing the virtual NIC service. Therefore, performance of CPU resources is affected by virtual NIC usage. By monitoring CPU resources and virtual NIC related resources simultaneously, you can capture the performance of the system with logical partitioning feature more effectively.

Tip

A system with logical partitioning feature uses the HBA for accessing disks in LPARs. HBA processing consumes CPU resources for SYS1. However, the effects are still less than when the virtual NIC service consumes CPU resources.

(2) Monitoring examples

Using CPU resource monitoring on LPARs vhost1 and vhost2 as an example, this subsection explains the factors that cause insufficient CPU resources, and how to solve this problem. The following figure shows the items monitored here and the flow of actions to take.

Figure 1‒94: Monitored items and flow of actions

[Figure]

(a) Example of monitoring CPU insufficiency in a LPAR

You can monitor the LPAR's CPU insufficiency in the Insufficient % field of the PI_VI record. If a sufficient amount of CPU resources have been allocated to the LPAR, the CPU insufficiency approaches 0%. Note that you can monitor this item with an alarm provided in a monitoring template.

The figure below shows an example of such monitoring.

Figure 1‒95: CPU insufficiency monitoring example

[Figure]

Monitoring template report to be checked

VM CPU Insufficient

Monitoring template alarm to be checked

VM CPU Insufficient

In this example, there appears to be severe insufficiency in the CPU resource of vhost2. In this case, check the CPU usage of SYS2.

(b) Example of monitoring CPU usage of the host machine

You can monitor the CPU usage of the host machine in the VM Used, VMM Kernel Used, and VMM Others Used fields of the PI record. The VM Used field indicates the CPU usage of each LPAR. The VMM Kernel Used field indicates the CPU usage of SYS1. The VMM Others Used field indicates the CPU usage of SYS2.

The figure below shows an example of such monitoring.

Figure 1‒96: Example of monitoring the CPU usage

[Figure]

Monitoring template report to be checked

Host CPU Used Status

If the CPU usage of SYS2 exceeds the threshold, the virtual NIC may be heavily loaded. For details about how to check and handle such a situation, see 1.9.6 Monitoring the network resources.

Tip

As a guideline, the threshold for the CPU usage of SYS2 should be as much as the usage of two CPU cores. For example, if there are CPUs with eight cores on a system, the threshold would be an amount of use equivalent to 25% of the total usage.

Also, if the total CPU usage of SYS1 and SYS2 exceeds the threshold, the HBA may be heavily loaded when accessing disks. For details about how to check and handle such a situation, see 1.9.5 Monitoring disk resources.

Tip

As a guideline for the total CPU usage of SYS1 and SYS2, the threshold is 90% of the total usage.

(c) Example of checking the maximum CPU allocation size and CPU allocation balancing value of a LPAR

You can check the maximum CPU allocation size of a LPAR in the Max field of the PI_VI record. You can also check the CPU allocation balancing value of a LPAR in the Expectation field of the PI_VI record. Comparing the maximum CPU allocation sizes and CPU allocation balancing values enables you to investigate a cause of lack of CPU resources in LPARs.

The figure below shows an example of such monitoring.

Figure 1‒97: Example of monitoring the maximum CPU allocation size and CPU allocation balancing value

[Figure]

Compound report to be checked (see 1.10)

Monitoring of virtual machine CPU allocation upper limit

If CPU resources in a LPAR are insufficient, compare the values in the Max and Expectation fields. By using the comparison results, you may be able to address a lack of CPU resources, as described below:

  • If the value in the Max field is more than the one in the Expectation field

    The CPU service rate of the LPAR is set to a low value. Check and, if necessary, revise the setting value of the service rate.

  • If the value in the Max field is equal to the one in the Expectation field

    The processor capping function restricts the amount of CPU resources allocated to LPARs. Check and, if necessary, revise the setting value of the processor capping function.

(3) Settings of the CPU service rate, processor capping function, and CPU idle detection function

A system with logical partitioning feature provides the functions associated with CPU allocations to LPARs that are described below. Note that in some cases, depending on the settings of these functions, CPU resources cannot be appropriately allocated to LPARs.

CPU service rate and processor capping function

Based on the CPU service rate setting, a CPU allocation to LPARs can be specified by percentage. Also, if the processor capping function is enabled, the CPU service rate value will be the upper limit of the amount to be allocated even if the CPU allocation to LPARs is insufficient.

If the CPU service rate is set to a low value and the processor capping function is enabled for a LPAR that consumes more CPU resources, sufficient CPU resources may not be allocated to the LPAR.

CPU idle detection function

This function detects whether CPUs allocated to a LPAR is in idle state. In a system with logical partitioning feature, if a CPU for a LPAR is in idle state, its resources are allocated to LPARs with insufficient CPU resources allocated. When the CPU idle detection function is disabled, CPUs are not allocated to other LPARs even if they are in idle state, possibly leading to ineffective use of CPU resources.