Job Management Partner 1/Performance Management Planning and Configuration Guide

[Contents][Glossary][Index][Back][Next]


3.4.3 Examining items to be monitored

The purpose of an operation monitoring system is to detect in advance that the system has reached a critical state, and to prevent problems. As such, examining the items to be monitored in the system is of the utmost importance. When examining items to be monitored, you must decide which items to be monitored and how to monitor them. When selecting the items to be monitored, see the manual for each PFM - Agent or PFM - RM.

The process flow for examining items to be monitored is as follows:

  1. Examine alarms.
    After deciding what items are to be monitored in the system, you need to decide the thresholds for the items. For example, to keep a shared server from malfunctioning, you can monitor the percentage of free space on the server's logical hard drive, and decide an appropriate threshold.
    You also decide the method for notifying the system administrator when a threshold is reached. For example, a notification can be sent by an email or SNMP trap.
  2. Examine reports.
    To analyze the cause of some situation and understand the current status, you can decide how the report is to be displayed when a threshold is exceeded and an alarm occurs. For example, for the items to be monitored and the display method, you might use a bar graph to show the ten logical hard drives with the least amount of free space.

When selecting items to be monitored, you can use the monitoring template provided by Performance Management. Using the monitoring template as is, or customizing a part of the definition reduces the system administrator's work required to define the items to be monitored.

Point:
Selecting parameters for programs being monitored is not just a technical matter. Hitachi recommends that you also consider both job characteristics and operating structure of the system.

The following describes examining alarms and reports.

Organization of this subsection
(1) Examining alarms
(2) Examining reports

(1) Examining alarms

This subsection describes details about examining alarms.

(a) How to set the threshold

Performance Management can issue an alarm event when the performance data collected by PFM - Agent or PFM - RM reaches a pre-defined threshold. The system administrator can use this function to decide which values, when exceeded for the items being monitored, will cause an alarm event. With Performance Management, the conditions that cause an alarm event to be issued can be defined for specified periods of time.

For example, you could set up the following configurations:

The system administrator should decide the time periods during which the system is to be monitored.

Performance Management can be configured so that an alarm event does not occur if the threshold is exceeded due to a sudden load increase. By configuring Performance Management to send out a notification only after the threshold has been exceeded a certain number of times within a certain number of monitoring intervals, monitoring can be performed such that notification occurs only when the CPU has a heavy, continuous load. By suppressing notifications for sudden, temporary load increases, you can ensure that alarms are issued efficiently, taking into account the system attributes.

(b) Measures to take when a threshold is reached

The system administrator needs to decide the following for when a program reaches a critical state: how the problem location is to be detected, what primary measures are to be taken and by whom, and how to resolve the problem if the primary measures fail. Performance Management can automatically execute the following actions when an alarm event occurs:

The system administrator should examine measures to take when an alarm is sent by the operation monitoring system, including the above functions.

The following figure shows an example of examining process to follow when an alarm occurs.

Figure 3-9 Example of the process to follow when a monitored system reaches a critical state

[Figure]

Point:
When an alarm event occurs, if the system administrator wants to automatically execute a recovery program and return the system to the normal operating status, Hitachi recommends that the system be set up to issue a JP1 event and link with systems such as a job management system.

(2) Examining reports

This subsection describes details about examining reports.

(a) What type of report to use

Performance Management can create a real-time report to indicate the current operation status and a historical report to show long-term trends in the operation status. The system administrator should examine what type of report based on the performance data should be created in order to check the operation status. Creating easy-to-understand reports allows problems with the system to be understood correctly.

Performance Management can display reports every day, week, month, or year. You can configure Performance Management to display a report from the alarm icon when an alarm event occurs or define an association between related reports that allows you to open a different report using a drill-down operation.

Performance Management can also display multiple reports combined on the same graph, allowing you to determine the operating status of the overall system in a comprehensive manner.

To analyze operations over a long period of time, the output format of reports should be examined if they need to be output regularly. In Performance Management, you can display reports in the PFM - Web Console GUI, or use an operation command (the jpcrpt command) to output the report to a text file in CSV or HTML format.

For details on how to define and output a report, see the chapter that explains the creation of reports for operation analysis in the Job Management Partner 1/Performance Management User's Guide.

(b) What items to save in the database

The system administrator should decide the following regarding the performance data collected by monitoring agents:

For details on how to record the performance data, see the chapter that describes the management of operation monitoring data in the Job Management Partner 1/Performance Management User's Guide.

[Contents][Back][Next]


[Trademarks]

All Rights Reserved. Copyright (C) 2009, Hitachi, Ltd.