Job Management Partner 1/Performance Management - Agent Option for Platform Description, User's Guide and Reference
The following provides examples of definitions for the monitoring template and definitions for items not included in the monitoring template for each monitored resource. The following notes apply to reading the definition examples:
- In the examples, the PFM - Web Console check boxes are shown as follows:
(selected) and
(not selected)
- In the examples, the PFM - Web Console radio buttons are shown as follows:
(selected) and
(not selected)
- In the examples, xxx, yyy, zzz, and dummy are variables that the user replaces with the character strings appropriate for the system environment. For other definition items, the values should be changed as required.
- In the examples, the proper values for the frequency of occurrence settings (for example, m occurrence(s) during n interval(s)) differ depending on the system environment. Accordingly, specify the appropriate values. For example, assume that the status whose threshold has been exceeded for at least two minutes in the system environment is the high-load status. Further assume that the collection interval is 60 seconds and that the maximum for the number of times that the threshold can be exceeded is twice per five intervals. Under these conditions, an unacceptable high-load condition occurs when the threshold is exceeded at least three times per five intervals. The setting in this case is 3 occurrence(s) during 5 interval(s).
- Organization of this subsection
- (1) Processor
- (2) Memory
- (3) Disks
- (4) Network
- (5) Processes
(1) Processor
The following shows definition examples for the monitoring template and for items not included in the monitoring template.
(a) Monitoring template
l Processor-related monitoring template alarms
Processor-related alarms in the monitoring template are stored in the alarm table for PFM UNIX Template Alarms [CPU] 09.00.
Table 1-8 Processor-related monitoring template alarms
Monitoring template alarm Record Field Error threshold Warning threshold Description Kernel CPU PI Kernel CPU % > 75 > 50 If the percentage of time during which the processor operates continues to be above the threshold, there might be a problem with the OS or the operation method.# User CPU PI User CPU % > 85 > 65 If the percentage of time during which the processor operates continues to be above the threshold, there might be a problem with a specific application.# Run Queue PI 5-Minute Run Queue Avg > 4 > 8 If the average number of threads is above the threshold, there might be a problem with the OS, operation method, or a specific application.# CPU Per Processor(K) PI_CPUP Processor ID >= 0 >= 0 If the CPU usage continues to be at or above the threshold, there might be a problem with the OS or operation method. System % > 75 > 50 CPU Per Processor(U) PI_CPUP Processor ID >= 0 >= 0 If the CPU usage continues to be at or above the threshold, there might be a problem with a specific application. User % > 85 > 65
- #
- Any processes that are excessively using the processor must be found, and appropriate action must be taken. If no such processes exist, operations that outstrip the kernel's scheduling capabilities, such as the generation and deletion of many processes in a short time, might be occurring. In such a case, because the system environment is not adequate for the processing, you might need to upgrade the processor or add processors.
If you want to perform more detailed monitoring of processor performance you can use your own alarms or reports in addition to the existing alarms. For details about how to create your own alarms or reports, see 1.3.2(1)(b) Monitoring methods.
For details about the settings for the existing alarms, see 5. Monitoring Templates.
l Processor-related monitoring template reports
Table 1-9 Processor monitoring template reports
Report name Displayed information CPU Per Processor Status Real-time information about the processor status CPU Per Processor Usage Real-time information about the CPU usage for a processor CPU Trend Daily CPU usage for a host for the last month CPU Usage Summary A history of the CPU usage on a minute-by-minute basis for the last hour CPU Status(Multi-Agent) An hourly history of the CPU usage by multiple hosts for the last 24 hours CPU Status Real-time information about the CPU usage CPU Trend(Multi-Agent) A daily history of the CPU usage by multiple hosts for the last month For details about settings for existing reports, see 5. Monitoring Templates.
(b) Definition examples other than for monitoring templates
l Real-time report for checking processes whose processor usage is high
Table 1-10 Definition example
Item Explanation Name and Type Report name PD_PDI - Memory Product UNIX (6.0) Report type Real-time (single agent)
(Select)
Historical (single agent)
-- Historical (multiple agents)
-- Field Record PD_PDI Selected fields Program
PDI
CPU %
System CPU
User CPUFilter Conditional expression: (Select Simple or Complex.)
PID <> "0"Specify when displayed (Clear)
Indication settings Specify when displayed
(Select)
Indicate delta value
(Clear)
Refresh interval Do not refresh automatically
(Clear)
Initial value 30 Minimum value 30 Display by ranking Field CPU % Display number 10# In descending order
(Clear)
Components Table All fields List -- Graph System CPU
User CPUDisplay key Field (None) In descending order -- Graph Graph type Stacked bar graph Series direction Row Axis labels X-axis Process name (process ID) Y-axis CPU usage time Data label Data label 1 Process name Data label 2 Process ID Drilldown Report drilldown Arbitrary Field drilldown Arbitrary
- Legend:
- --: Do not specify this item.
- #
- Specify a value appropriate for the circumstances.
(2) Memory
The following shows definition examples for the monitoring template and for items not included in the monitoring template.
(a) Monitoring template
l Memory-related monitoring template alarms
Memory-related alarms in the monitoring template are stored in the alarm table for PFM UNIX Template Alarms [MEM] 09.00.
Table 1-11 Memory monitoring template alarms
Monitoring template alarm Record Field Error threshold Warning threshold Description Pagescans PI Page Scans/sec > 150 > 100 If the number of page scans that occurred is greater than the threshold, memory might be insufficient. Swap Outs PI Swapped-Out Pages/sec > 200 > 100 If the number of pages is greater than the threshold, memory might be insufficient. Alloc Mem Mbytes PI Alloc Swap Mbytes >= 1800 >= 1024 If the amount of used memory continues to be at or above the threshold (the Total Physical Mem Mbytes field value of the PI record), physical memory might be insufficient. For details about the settings for the existing alarms, see 5. Monitoring Templates.
l Memory-related monitoring template reports
Table 1-12 Memory monitoring template reports
Report name Displayed information Memory Paging A history of memory use on a minute-by-minute basis for the last hour# Memory Paging Status Real-time information about memory usage and the paging status# Memory Paging Status (Multi-Agent) An hourly history of memory usage by multiple hosts for the last 24 hours# Paging Trend (Multi-Agent) A daily history of page scans performed by multiple hosts for the last month#
- #
- This report cannot be used in Linux.
l System-related monitoring template reports (for memory)
Table 1-13 System monitoring template reports
Report name Displayed information I/O Overview A history of the number of I/Os on a minute-by-minute basis for the last hour. This report cannot be used in Linux. Process Trend A daily history of the number of operated processes for the last month System Overview (real-time report indicating the system operating status) Real-time information about the system operating status System Overview (historical report indicating the system operating status) A history of the system operating status on a minute-by-minute basis for the last hour System Utilization Status Real-time information about the system operating status Workload Status Real-time information about the system workload Workload Status (Multi-Agent) An hourly history of the workload for multiple hosts for the last 24 hours For details about the settings for the existing reports, see 5. Monitoring Templates.
(b) Definition example other than for monitoring templates
l Real-time report for checking the memory usage of a process
Table 1-14 Definition example
Item Explanation Name and Type Report name PD_PDI - Memory Product UNIX (6.0) Report type Real-time (single agent)
(Select)
Historical (single agent)
-- Historical (multiple agents)
-- Field Record PD_PDI Selected fields Select all fields. Filter Conditional expression: (Select Simple or Complex.)
PID <> "0"Specify when displayed (Clear)
Indication settings Specify when displayed
(Select)
Indicate delta value
(Clear)
Refresh interval Do not refresh automatically
(Clear)
Initial value 30 Minimum value 30 Display by ranking Field Virtual Mem Kbytes#1 Display number 30#2 In descending order
(Select)
Components Table Program
PID
Real Mem Kbytes
Virtual Mem Kbytes
Major Faults
Swaps
Context Switches
CPU %List -- Graph Virtual Mem Kbytes
Real Mem KbytesDisplay name -- Display key Field (None) In descending order -- Graph Graph type Line graph Series direction Row Axis labels X-axis Time Y-axis Memory usage Data label Data label 1 (None) Data label 2 (None) Drilldown Report drilldown Arbitrary Field drilldown Arbitrary
- Legend:
- --: Do not specify this item.
- #1
- Set the fields that you want to monitor.
- #2
- Specify a value appropriate for the circumstances.
(3) Disks
The following shows definition examples for the monitoring template.
(a) Monitoring template
l Disk-related monitoring template alarms
Disk-related alarms in the monitoring template are stored in the alarm table for PFM UNIX Template Alarms [DSK] 09.00.
Table 1-15 Disk monitoring template alarms
Monitoring template alarms Record Field Error threshold Warning threshold Description I/O Wait Time PI Wait % > 80 > 60 If the percentage of time during which the processor is waiting for I/O is greater than the threshold, I/O operations, such as updating a database, might be delayed. Disk Service Time PI_DEVD Avg Service Time > 0.1 > 0.06 If the average operating time is greater than the threshold, the amount of information being input or output might be very large. File System Free(L) PD_FSL File System <> dummy <> dummy If there is little unused area, disk space is insufficient. Mbytes Free < 5120 < 10240 File System Free(R) PD_FSR File System <> dummy <> dummy If there is little unused area, disk space is insufficient. Mbytes Free < 5120 < 10240 Disk Busy % PI_DEVD Device Name <> dummy <> dummy If the disk busy rate continues to be at or above the threshold, I/O operations might be concentrated on a specific disk. Busy % >= 90 >= 80 Disk Queue PI_DEVD Device Name <> dummy <> dummy If the queue length continues to be at or above the threshold, the device is congested. Queue Length >= 5 >= 3 For details about the settings for the existing alarms, see 5. Monitoring Templates.
l Disk-related monitoring template reports
Table 1-16 Disk monitoring template reports
Report name Displayed information Avg Service Time - Top 10 Devices Real-time information about the 10 devices with the longest average operating time Avg Service Time Status - Top 10 Devices Real-time information about the 10 devices with the longest average operating time Device Detail Real-time information about the selected device Device Usage Status Real-time information about device usage Device Usage Status (Multi-Agent) An hourly history of device usage by multiple hosts for the last 24 hours Free Space Mbytes - Top 10 Local File Systems Real-time information about the 10 local file systems with the smallest amount of free space Local File System Detail Real-time information about the selected local file system Remote File System Detail Real-time information about the selected remote file system Space Usage - Top 10 Local File Systems Real-time information about the 10 local file systems with the highest usage Space Usage - Top 10 Remote File Systems Real-time information about the 10 remote file systems with the highest usage NFS Activity Overview A history of the operating status of the NFS clients and server on a minute-by-minute basis for the last hour# NFS Load Trend A daily history of the operating status of the NFS clients and server for the last month# NFS Usage Status Real-time information about the operating status of the NFS clients and server# NFS Usage Status (Multi-Agent) An hourly history of the operating status of the NFS clients and server for the last 24 hours#
- #
- This report cannot be used in Linux.
For details about the settings for the existing reports, see 5. Monitoring Templates.
(4) Network
The following shows definition examples for the monitoring template.
(a) Monitoring template
l Network-related monitoring template alarms
Network-related alarms in the monitoring template are stored in the alarm table for PFM UNIX Template Alarms [NET] 09.00.
Table 1-17 Network monitoring template alarms
Monitoring template alarm Record used Field used Abnormal condition Warning condition Meaning Network Rcvd/sec PI_NINS Pkts Rcvd/sec >= 9 >= 8 If the number of packets is large, many packets have been received successfully. For details about the settings for the existing alarms, see 5. Monitoring Templates.
l Network-related monitoring template reports
Table 1-18 Network monitoring template reports
Report name Displayed information Network Interface Detail Real-time information about network usage of the selected system Network Interface Summary (real-time report indicating the network usage) Real-time information about network usage Network Interface Summary (historical report indicating the network usage) A history of network usage on a minute-by-minute basis for the last hour Network Overview A history of network usage on a minute-by-minute basis for the last hour Network Status (Multi-Agent) An hourly history of the network usage by multiple hosts for the last 24 hours Network Status Real-time information about network usage For details about the settings for the existing reports, see 5. Monitoring Templates.
(5) Processes
The following shows definition examples for the monitoring template.
(a) Monitoring template
l Process-related monitoring template alarms
Process-related monitoring template alarms are stored in the alarm table for PFM UNIX Template Alarms [PS] 09.00.
Table 1-19 Process monitoring template alarms
Monitoring template alarms Record used Field used Abnormal condition Warning condition Meaning Process End PD_PDI Program = jpcsto = jpcsto If performance data is not collected, the process has stopped. Process Alive PI_WGRP Process Count > 0 > 0 This indicates that the workgroup process is running. Workgroup = workgroup = workgroup For details about the settings for the existing alarms, see 5. Monitoring Templates.
l Process-related monitoring template reports
Table 1-20 Process monitoring template reports
Report name Displayed information CPU Usage - Top 10 Processes Real-time information about the 10 processes with the highest CPU usage I/O Activity - Top 10 Processes Real-time information about the 10 processes that processed the most I/O operations. This report cannot be used in HP-UX, AIX, and Linux. Major Page Faults - Top 10 Processes Real-time information about the 10 processes with the most page faults causing physical I/O operations Process Detail Real-time information about the processes on the selected host Process Overview A history of the process operating status on a minute-by-minute basis for the last hour Process Summary Status Real-time information about the process operating status For details about the settings for the existing reports, see 5. Monitoring Templates.
All Rights Reserved. Copyright (C) 2009, Hitachi, Ltd.