Hitachi

JP1 Version 12 JP1/Performance Management Planning and Configuration Guide


1.2.2 Use in mission-critical systems that demand high reliability and availability

Performance Management can monitor system operation in mission-critical systems while maintaining high reliability and availability.

Organization of this subsection

(1) Monitor the operation of an enterprise system without imposing a load on the monitored system

With Performance Management, the monitoring agent, rather than the monitoring manager, collects the performance data associated with the operating status of database resources, business applications, and other resources, and then stores the data into the Performance Management database (called the Store database). This means that large amounts of performance data are not regularly sent over the network, which eases the burden on the network.

Performance Management also provides functions that prevent the data collected by monitoring operations from becoming excessively large. This is accomplished by collecting only the necessary operating information, without taking up too much disk space.

This functionality is illustrated by the following examples:

For details about managing data in Performance Management, see 3.2 Functions for collecting and managing performance data.

(2) System administrators can be notified of problems before they have an effect on business

Performance Management can be configured to notify the system administrator when a monitored system reaches a critical state.

This allows for problems in monitored systems to be detected at an early stage and resolved before they have an effect on business operations.

For example, the system administrator can be notified by email when CPU usage of 80% or more is detected on a server in a specific system, or a system operator can be notified when the proportion of available disk space falls below 30% by having an icon on the console blink on and off.

You can also configure Performance Management to execute a command automatically when a system reaches a critical state.

In this manner, with Performance Management, a small team can efficiently monitor even large-scale systems.

Figure 1‒11: Example of using email to notify a system administrator of a problem in the system

[Figure]

For details about the alarms issued when performance data exceeds a threshold, see 3.3 Functions for alerting the user when the system reaches a critical state.

(3) Performance Management can be used to achieve stable system operation 24 hours a day, 365 days a year

By incorporating Performance Management into a cluster system or a multiple-monitoring system, you can create a highly reliable system that continues to operate even in the event of a system error.

Figure 1‒12: Example of cluster system operation

[Figure]

Figure 1‒13: Example of multiple-monitoring system operation

[Figure]

For details about how to configure and operate Performance Management in a cluster system, see the chapter that describes the cluster system configuration and operation in the JP1/Performance Management User's Guide.

For details about how to configure and operate Performance Management in a multiple-monitoring system, see the chapter that describes multiple-monitoring system configuration and operation in the JP1/Performance Management User's Guide.

(4) Ensuring stable system operation by detecting faults in the monitoring system itself

Performance Management can monitor the operating statuses of the PFM - Agent or PFM - RM services and the operating status of the host on which PFM - Agent or PFM - RM is running. It can also monitor the operating status of the host that is monitored by PFM - RM. This monitoring is achieved through the use of the health check function. By using this function, you can monitor the operating status of the host, and confirm whether PFM - Agent or PFM - RM is monitoring its monitoring targets correctly.

You can view the results in the following windows in PFM - Web Console:

Because the heath check function gives you the option of saving monitoring results, you can view a report showing the operating status over time.

By defining alarms for the monitoring results of the operating statuses of the host, the system can issue alarm events when the health check function detects that the host is operating abnormally or detects that PFM - Agent or PFM - RM is malfunctioning. The system can also execute an action in this situation, such as sending an email.

Figure 1‒14: Overview of monitoring the operating status by using the health check function (for agent monitoring)

[Figure]

For remote monitoring, when an abnormal operation is detected within the monitored host, you can trigger an alarm event and perform other actions, such as sending emails.

Additionally, if for some reason a PFM service abnormally terminates, the Performance Management system is equipped with a function that can automatically restart the PFM service. This function is called the PFM service automatic restart function.

The PFM service automatic restart function is available for the following PFM services: PFM - Manager, Action Handler, PFM - Agent, and PFM - RM. This function cannot be used for the Status Server service. For details about services, see 3.1 Performance Management services.

This function allows you to continue monitoring, even when a PFM service abnormally terminates.

Figure 1‒15: Overview of the PFM service automatic restart function

[Figure]