Job Management Partner 1/Performance Management Planning and Configuration Guide
The purpose of an operation monitoring system is to detect in advance that the system has reached a critical state, and to prevent problems. As such, examining the items to be monitored in the system is of the utmost importance. When examining items to be monitored, you must decide which items to be monitored and how to monitor them. When selecting the items to be monitored, see the manual for each PFM - Agent or PFM - RM.
The process flow for examining items to be monitored is as follows:
- Examine alarms.
After deciding what items are to be monitored in the system, you need to decide the thresholds for the items. For example, to keep a shared server from malfunctioning, you can monitor the percentage of free space on the server's logical hard drive, and decide an appropriate threshold.
You also decide the method for notifying the system administrator when a threshold is reached. For example, a notification can be sent by an email or SNMP trap.
- Examine reports.
To analyze the cause of some situation and understand the current status, you can decide how the report is to be displayed when a threshold is exceeded and an alarm occurs. For example, for the items to be monitored and the display method, you might use a bar graph to show the ten logical hard drives with the least amount of free space.
When selecting items to be monitored, you can use the monitoring template provided by Performance Management. Using the monitoring template as is, or customizing a part of the definition reduces the system administrator's work required to define the items to be monitored.
- Point:
- Selecting parameters for programs being monitored is not just a technical matter. Hitachi recommends that you also consider both job characteristics and operating structure of the system.
The following describes examining alarms and reports.
- Organization of this subsection
- (1) Examining alarms
- (2) Examining reports
(1) Examining alarms
This subsection describes details about examining alarms.
(a) How to set the threshold
Performance Management can issue an alarm event when the performance data collected by PFM - Agent or PFM - RM reaches a pre-defined threshold. The system administrator can use this function to decide which values, when exceeded for the items being monitored, will cause an alarm event. With Performance Management, the conditions that cause an alarm event to be issued can be defined for specified periods of time.
For example, you could set up the following configurations:
- Specify separate settings for the processes to be monitored during the day and during the night
- Because a system operator is constantly monitoring from a monitoring center during the daytime, you can specify that notification by a blinking icon on the monitoring console. You can specify that, at night, an email to be sent to the system administrator's mobile phone.
The system administrator should decide the time periods during which the system is to be monitored.
Performance Management can be configured so that an alarm event does not occur if the threshold is exceeded due to a sudden load increase. By configuring Performance Management to send out a notification only after the threshold has been exceeded a certain number of times within a certain number of monitoring intervals, monitoring can be performed such that notification occurs only when the CPU has a heavy, continuous load. By suppressing notifications for sudden, temporary load increases, you can ensure that alarms are issued efficiently, taking into account the system attributes.
(b) Measures to take when a threshold is reached
The system administrator needs to decide the following for when a program reaches a critical state: how the problem location is to be detected, what primary measures are to be taken and by whom, and how to resolve the problem if the primary measures fail. Performance Management can automatically execute the following actions when an alarm event occurs:
- Notify the system administrator by an email:
- Send an SNMP trap
- Execute commands, including net send (messenger service) and wall, to notify the system administrator without sending an email or SNMP trap
- Issue a JP1 event to link with other JP1 products
The system administrator should examine measures to take when an alarm is sent by the operation monitoring system, including the above functions.
The following figure shows an example of examining process to follow when an alarm occurs.
Figure 3-9 Example of the process to follow when a monitored system reaches a critical state
- Point:
- When an alarm event occurs, if the system administrator wants to automatically execute a recovery program and return the system to the normal operating status, Hitachi recommends that the system be set up to issue a JP1 event and link with systems such as a job management system.
(2) Examining reports
This subsection describes details about examining reports.
(a) What type of report to use
Performance Management can create a real-time report to indicate the current operation status and a historical report to show long-term trends in the operation status. The system administrator should examine what type of report based on the performance data should be created in order to check the operation status. Creating easy-to-understand reports allows problems with the system to be understood correctly.
Performance Management can display reports every day, week, month, or year. You can configure Performance Management to display a report from the alarm icon when an alarm event occurs or define an association between related reports that allows you to open a different report using a drill-down operation.
Performance Management can also display multiple reports combined on the same graph, allowing you to determine the operating status of the overall system in a comprehensive manner.
To analyze operations over a long period of time, the output format of reports should be examined if they need to be output regularly. In Performance Management, you can display reports in the PFM - Web Console GUI, or use an operation command (the jpcrpt command) to output the report to a text file in CSV or HTML format.
For details on how to define and output a report, see the chapter that explains the creation of reports for operation analysis in the Job Management Partner 1/Performance Management User's Guide.
(b) What items to save in the database
The system administrator should decide the following regarding the performance data collected by monitoring agents:
- Whether to record the performance data in the Store database
To display the performance data as a historical report, the performance data to be displayed must be configured so that it is saved in the Store database.
- Performance data collection interval and timing
When many items are monitored, the system performance might decrease when data collection and recording occurs. In such a case, the load on the system can be distributed over time if the performance data collection is staggered by item.
For example, if the performance data for two items is being collected every minute, the offset of one can be configured to 0 seconds and the other to 20 seconds in order to stagger the collection starting time by 20 seconds. When changing the offset, consider the load due to performance data collection and then specify the setting.
Figure 3-10 Example of the performance data collection interval and offset configuration
For details on how to record the performance data, see the chapter that describes the management of operation monitoring data in the Job Management Partner 1/Performance Management User's Guide.
All Rights Reserved. Copyright (C) 2009, Hitachi, Ltd.