3.1.3 Disk monitoring examples

This subsection explains how to monitor disk performance.

Organization of this subsection

(1) Overview

(1) Overview

You can monitor disk performance to detect disk resource shortages and bottlenecks caused by a disk. Continuous monitoring of disk performance allows you to check for trends in increased disk space usage so that you can determine an appropriate configuration for the system or determine when the system configuration should be expanded.

A disk stores programs, the data used by the programs, and other data. If the amount of free disk space becomes insufficient, data might be lost or system response might slow down.

If a program that is performing a disk I/O operation must pause (that is, wait for the response), the disk is becoming a bottleneck.

A disk bottleneck can cause any of several types of performance degradation, such as slow process response. For this reason, it is important to check that disk performance is not degrading.

When you monitor the number of disk I/O operations, note the following.

I/O information for a disk that PFM - Agent for Platform acquires is the I/O information that the OS has acquired from the disk device. This information is not related to the I/O operations on the actual disk. The following figure shows I/O processing that occurs between an application and a disk.

Figure 3‒3: Conceptual figure for I/O processing

The fields related to disk I/O load are Avg Service Time and Busy %.

The Avg Service Time field indicates the average time required for one I/O operation. If a very large amount of information is input or output, or if an I/O operation is delayed, the value of this field becomes large.

The Busy % field indicates the percentage of time that the disk device was operating during a collection interval. The value of this field becomes large if I/O operations are concentrated in a short period of time.

As described above, the Avg Service Time and Busy % fields are related to the disk device load. You can therefore use these fields in the way that best suits your monitoring requirements.

The following figure explains what the Avg Service Time and Busy % field values indicate.

Figure 3‒4: Values of the Avg Service Time and Busy % fields

The Disk Service Time alarm and Disk Busy % alarm are provided by the monitoring templates. If you want to perform more detailed monitoring, see the following table, which lists and describes the principal records and fields related to the monitoring of disk performance.

Table 3‒4: Principal fields related to disk monitoring
Record	Field	Description (example)
`PI_DEVD`	`Avg` `Service` `Time`	The average time for I/O operations. If the value of this field is large, the amount of information being input or output might be very large.
	`Avg` `Wait` `Time`	The average wait time for I/O operations. If the value of this field is large, the amount of information being input or output might be very large.
	`Busy` `%`	The disk busy rate. If the value of this field is large, I/O operations might be concentrated on a specific disk.
	`I/O` `Mbytes`	The total amount of information input or output. If the value of this field is large, the amount of information being input or output might be very large.
	`Total` `I/O` `Ops`	The number of I/O operations that occurred. If the value of this field is large, I/O operations might be concentrated on a specific disk.
	`Queue` `Length`	The queue length. If the value of this field continues to be at or above the threshold, the device is congested.
`PD_FSL`	`Mbytes` `Free`	The amount of available disk space. If the value of this field is small, the amount of free disk space is insufficient.
`PD_FSL`	`Mbytes` `Free` `%`
`PD_FSR`	`Mbytes` `Free`
`PD_FSR`	`Mbytes` `Free` `%`

(a) Monitoring the disk free space

You can use the File System Free(L) alarm or File System Free(R) alarm provided by the monitoring template to monitor the amount of free disk space.

You can use an alarm to effectively monitor the percentage of free logical-disk space.

If the amount of free logical-disk space (the Mbytes Free or Mbytes Free % field of the PD_FSL or PD_FSR record) falls below the threshold, you might need to take appropriate action, such as deleting unnecessary files or adding a disk.

For details, see 3.2.3(1) Monitoring template.

(b) Monitoring the disk I/O delay status

You can use the I/O Wait Time alarm provided by the monitoring template to monitor the disk I/O delay.

The alarm includes the Wait % field (of the PI record), with which you can monitor the disk I/O delay status. If the value of this field is large, you might need to take appropriate action, such as checking for a process that is performing too many I/O operations to update a database, for example.

For details, see 3.2.3(1) Monitoring template.

(c) Monitoring the disk I/O status

You can use the Disk Service Time alarm provided by the monitoring template to monitor the disk I/O.

The alarm includes the Avg Service Time field (of the PI_DEVD record), which enables you to check for a process that is inputting or outputting a very large amount of information.

For details, see 3.2.3(1) Monitoring template.

(d) Monitoring the disk busy rate

You can use the Disk Busy % alarm provided by the monitoring template to monitor the disk busy rate.

You can monitor the disk busy rate by using an alarm to check whether excessive paging (reading and writing of pages by processes) is occurring.

If the disk busy rate (the Busy % field of the PI_DEVD record) continues to be at or above the threshold, you might need to take action. For example, you might need to identify the processes that frequently request disk I/O operations, and then distribute the processing of these processes.

When you monitor the disk busy rate, monitoring the disk I/O delay status, disk I/O status, and disk congestion is also recommended.

For details, see 3.2.3(1) Monitoring template.

(e) Monitoring disk congestion

You can use the Disk Queue alarm provided by the monitoring template to monitor disk congestion.

You can monitor disk congestion by using an alarm to check whether I/O requests have been excessive.

If the disk congestion level (the Queue Length field of the PI_DEVD record) continues to be at or above the threshold, you might need to take action. For example, you might need to identify those processes that frequently request disk I/O, and then distribute the processing of the processes.

When you monitor disk congestion, monitoring the disk I/O delay status, disk I/O status, and disk busy rate is also recommended.

For details, see 3.2.3(1) Monitoring template.

To Page Top