Hitachi

JP1 Version 12 JP1/Performance Management Planning and Configuration Guide


2.4.7 Examining measures to take when the operation monitoring system itself malfunctions

In order to provide stable operation for the operation monitoring system itself, the system administrator examines measures to take when problems in the operation monitoring system itself are detected and a malfunction occurs.

Organization of this subsection

(1) Detecting problems in Performance Management

With Performance Management, you can use the health check function to monitor the status of the host where the monitoring agent is running and to check whether the monitoring agent is working correctly. The health check function offers a two-tiered monitoring approach, consisting of the following two monitoring levels:

Monitoring the operating status of the host running the monitoring agent:

The health check function monitors the operating status of a host running PFM - Agent or PFM - RM, or the operating status of the hosts monitored by PFM - RM. You can check the operating status from PFM - Web Console.

Monitoring the operating status of the monitoring agent service:

In addition to monitoring the operating status of the host running PFM - Agent or PFM - RM, the health check function monitors the operating status of the Agent Collector, Remote Monitor Collector, Agent Store, and Remote Monitor Store services. You can check the operating statuses from PFM - Web Console.

You can change how the health check function operates according to what the health check function is to monitor and the desired monitoring conditions. However, the prerequisites differ for each mode of operation. For details about the prerequisites for using the health check function, see the chapter that describes detecting problems in the JP1/Performance Management User's Guide.

You cannot use the health check function to monitor the operating status of PFM - Manager itself. By using the jpctool service list command, you can check a detailed status of services for PFM - Manager, PFM - Agent, or PFM - RM. You can also detect errors by linking with other products (JP1/Base) in the JP1 series.

For details about detecting problems within Performance Management, see the chapter that describes detecting problems within Performance Management in the JP1/Performance Management User's Guide.

(2) Automatically restarting PFM services

If for some reason a PFM service abnormally terminates, the PFM service automatic restart function of Performance Management allows you to automatically restart the service. This ensures continuous system monitoring. If you are not using a cluster system that has high system availability, we recommend using the PFM service automatic restart function. You can automatic restart PFM services in the following ways:

Automatic restart function

If for some reason a PFM service abnormally terminates, this functionality automatically restarts the service.

Scheduled restart functionality

This functionality restarts a PFM service at scheduled intervals. This helps to avoid memory leaks and handle leaks caused by problems with the OS or actual PFM service.

For the prerequisite conditions and procedure to use the PFM service automatic restart function, see the chapter that describes detecting problems in Performance Management in the JP1/Performance Management User's Guide.

(3) Collecting the maintenance information when a problem occurs

In order to identify the cause of a problem, information including the OS logs and internal log output by Performance Management might be necessary in addition to the operating information. Performance Management provides operation commands (the jpcras and jpcwras commands) to collectively obtain this maintenance information.

For details about collecting the maintenance information when a problem occurs, see the chapter that describes the error handling procedures in the JP1/Performance Management User's Guide.