Hitachi

JP1 Version 12 JP1/Performance Management User's Guide


16.2.1 Configuring the health check function

Organization of this subsection

(1) Setting up the health check function

The following prerequisites must be met prior to using the health check function. If these prerequisite conditions are not met, you will not be able to use the health check function.

Name resolution

The monitoring host name# of the PFM - Agent host must be resolved to an IP address available for communication in the jpchosts or hosts file on the PFM - Manager host, or the domain name server.

#: hostname for Windows, uname -n for UNIX, or an alias if the functionality for setting monitoring-host names is used

Monitoring the operating status of the host running the monitoring agent:
  • Version 08-11 or later of PFM - Manager and PFM - Web Console

  • Any version of PFM - Agent or PFM - RM

To monitor the operating status of a host monitored by PFM - RM, you must enable the status management function on the PFM - RM host. If the status management function is not enabled, the status of the remote agent is not recognized correctly.

Monitoring the operating status of the monitoring agent service:

The health check function uses the status management function to monitor the operating status of PFM - Agent services. For this reason, the product being monitored by the health check function must support the status management function. The prerequisite conditions for using the function are as follows. Any version of PFM - RM can be used.

  • Version 08-11 or later of PFM - Manager and Web Console

  • The version of PFM - Agent used supports the status management function.

  • The status management function on the PFM - Agent or PFM - RM host is enabled.

Unless the second and third conditions are satisfied, the health check function will be unable to check the status of PFM - Agent or PFM - RM. For details on the versions of PFM - Agent that support the status management function, see 16.3 Using the status management function to check service status. The following table describes support for operating status monitoring of services by PFM - Agent version.

Table 16‒1: Support for operating status monitoring of services by PFM - Agent version

Status management function on monitored agent host

Version of monitored agent

Operating status monitoring of services

Enabled

08-00 or later

Can be used

Enabled

07-00 or earlier#1

Cannot be used#2

Disabled

n/a

Cannot be used#3

Legend:

n/a: Not applicable.

#1

When PFM - Agent 07-00 or earlier is installed on the same host as PFM - Agent 08-00 or later or PFM - Base 08-00 or later, and the status management function is enabled on the target PFM - Agent host

#2

The operating status monitoring of the agent service appears as Not Supported.

#3

The operating status monitoring of the agent service appears as Unconfirmed.

For details on how to configure the status management function, see 16.3.1 Configuring the status management function.

To monitor the status of a host monitored by PFM - RM, you must enable polling with an appropriate PFM - RM property. For details on settings for PFM - RM polling, see 16.2.1(1)(d) Setting PFM - RM polling.

(a) Enabling the health check function

To enable the health check function on the PFM - Manager host:

  1. Stop Performance Management services.

    If Performance Management services are running on a physical host, stop the services by using the following command:

    jpcspm stop -key jp1pc

    To stop Performance Management services on a logical host, use the cluster software.

  2. Execute the jpcconf hc enable command

    To enable the health check function, use the following command:

    jpcconf hc enable
  3. Check the status of the health check function.

    To confirm that the status of the health check function is available, use the following command:

    jpcconf hc display
  4. Start Performance Management services.

    To start all Performance Management services on a physical host, use the following command:

    jpcspm start -key jp1pc

    To start all Performance Management services on a logical host, use the cluster software.

    The service ID is 0A1host-name or 0S1host-name.

Note:

When one of the services you are starting is PFM - Manager, and the health check function is enabled on the PFM - Manager host, the health check agent starts as one of the PFM - Manager services when you execute the jpcspm start command. When you execute the jpcspm stop command to stop the PFM - Manager services, the health check agent also stops.

You cannot specify agt0 as the service key when you execute the jpcspm start or jpcspm stop commands.

Tip

The health check function provides two monitoring levels. One level is Service, which allows monitoring of the operating status of services. The other level is Host, which allows monitoring of the operating status of agent hosts. The default level is Host.

For details on how to configure a monitoring level, see 16.2.1(2) Setting the health check agent properties.

(b) Disabling the health check function

To disable the health check function on the PFM - Manager host:

  1. Stop Performance Management services.

    If Performance Management services are running on a physical host, stop the services by using the following command:

    jpcspm stop -key jp1pc

    To stop Performance Management services on a logical host, use the cluster software.

  2. Execute the jpcconf hc disable command.

    To disable the health check function, use the following command:

    jpcconf hc disable
  3. Check the status of the health check function.

    To confirm that the status of the status management function is unavailable, use the following command:

    jpcconf hc display

    If PFM - Manager is running in a logical host environment, execute the jpcconf hc display command on the PFM - Manager host on the executing or standby node.

  4. Start Performance Management services.

    To start all Performance Management services on a physical host, use the following command:

    jpcspm start -key jp1pc

    To start all Performance Management services on a logical host, use the cluster software.

For details on each command, see the manual JP1/Performance Management Reference.

(c) Checking the status of the health check agent

Use the jpctool service list command to check the status of the health check agent. You can also use the health check function to check the operating status of the health check agent. If the Agent Collector or Remote Monitor Collector service of the health check agent has terminated abnormally, the wrong health check results might be displayed.

(d) Setting PFM - RM polling

To monitor the status of a host monitored by PFM - RM, you must enable polling of the monitored host with an appropriate PFM - RM property. The following table describes the property to be set.

Table 16‒2: Setting the polling property

Folder name

Property name

Description

Health Check Configurations

Health Check for Target Hosts

Specifies whether to perform polling to the monitored hosts. The default is No.

Yes: Performs polling.

No: Does not perform polling.

If polling is disabled, the operating status of the monitored host appears as Not Supported.

(2) Setting the health check agent properties

When the health check function is enabled, you can make settings related to the health check function, such as the collection interval for operation monitoring data and the monitoring level, by setting the properties of the health check agent from the Services tree of PFM - Web Console. The following table lists the health check agent properties you can set.

Table 16‒3: Health check agent properties

Folder name

Property name

Description

Detail Records - HC

Description

Displays Health Check Detail as a description for the record.

Log

Specifies whether to collect performance data. The default is No.

Yes: Collects performance data.

No: Does not collect performance data.

Collection Interval

Specifies the collection interval in seconds, as a value in the range from 0 to 2147483647. The default is 300. This value serves as the polling interval of the health check function.#4

Collection Offset

Specifies the offset of the collection start time in seconds, as a value in the range from 0 to 32767. The default is 0.

LOGIF

Specifies the conditions for acquiring logs.

Health Check Configurations#1

Monitoring Level#2

Specifies the monitoring level. To monitor the operating status of the agent service, specify Service. To monitor the operating status of the agent host, specify Host. The default is Host.

This property cannot be specified when there is a host or agent for which monitoring is suspended.

Polling Interval

Displays the polling interval. This value is taken from the Collection Interval in the PD_HC record.

Incl. Action Handler

Specifies whether to include the Action Handler service when monitoring service operating statuses. The default is No.

Yes: The Action Handler service is monitored.

No: The Action Handler service is not monitored.

Busy as Inactive

Specify whether agents whose service status remains Busy for extended periods should be considered inactive. The default is No. If you specify Yes, the Time to Busy as Inactive setting takes effect.

Yes: The agent is considered inactive#3.

No: The agent is not considered inactive.

You can check the service status of an agent in the Status column in the output of the jpctool service list command.

Time to Busy as Inactive Collector

Specifies how long#3 busy statuses should persist for the Agent Collector and Remote Monitor Collector services before the services are considered inactive. Specify this item in seconds. The default is 300.

Time to Busy as Inactive Store

Specifies how long#3 busy statuses should persist for the Agent Store and Remote Monitor Store services before the services are considered inactive. Specify this item in seconds. The default is 300.

Time to Busy as Inactive AH

Specifies how long#3 a busy status should persist for the Agent Handler service before the service is considered inactive. Specify this item in seconds. The default is 300.

JP1 Event

-

Specifies whether to issue health check events as JP1 events. The default is No.

Yes: Health check events are issued as JP1 events.

No: Health check events are not issued as JP1 events.

Not Supported

Not Supported health check event. The default is None.

None: This health check event is not issued as a JP1 event.

Information: This health check event is issued as a JP1 event with Information as SEVERITY.

Warning: This health check event is issued as a JP1 event with Warning as SEVERITY.

Error: This health check event is issued as a JP1 event with Error as SEVERITY.

Running

Running health check event. The default is Information.

None: This health check event is not issued as a JP1 event.

Information: This health check event is issued as a JP1 event with Information as SEVERITY.

Warning: This health check event is issued as a JP1 event with Warning as SEVERITY.

Error: This health check event is issued as a JP1 event with Error as SEVERITY.

Incomplete

Incomplete health check event. The default is Warning.

None: This health check event is not issued as a JP1 event.

Information: This health check event is issued as a JP1 event with Information as SEVERITY.

Warning: This health check event is issued as a JP1 event with Warning as SEVERITY.

Error: This health check event is issued as a JP1 event with Error as SEVERITY.

Stopped

Stopped health check event. The default is Error.

None: This health check event is not issued as a JP1 event.

Information: This health check event is issued as a JP1 event with Information as SEVERITY.

Warning: This health check event is issued as a JP1 event with Warning as SEVERITY.

Error: This health check event is issued as a JP1 event with Error as SEVERITY.

Unconfirmed

Unconfirmed health check event. The default is Error.

None: This health check event is not issued as a JP1 event.

Information: This health check event is issued as a JP1 event with Information as SEVERITY.

Warning: This health check event is issued as a JP1 event with Warning as SEVERITY.

Error: This health check event is issued as a JP1 event with Error as SEVERITY.

Host Not Available

Host Not Available health check event. The default is Error.

None: This health check event is not issued as a JP1 event.

Information: This health check event is issued as a JP1 event with Information as SEVERITY.

Warning: This health check event is issued as a JP1 event with Warning as SEVERITY.

Error: This health check event is issued as a JP1 event with Error as SEVERITY.

#1

When you change a setting in the Health Check Configurations folder, the new setting takes effect from the next polling interval.

#2

When you change the Monitoring Level setting, the health check results displayed in a realtime report of the health check agent differ according to whether polling under the new setting had taken place by the time the report was displayed.

Monitoring agent

Displayed health check results

Monitoring agent for which polling under the new setting has completed

The newest health check results recorded under the new setting

Monitoring agent for which polling under the new setting has not yet taken place

The newest health check results recorded under the old setting

For this reason, the report may briefly display results from both the old and new settings.

#3

The length of time a service is in Busy status is calculated from the difference between the time when polling occurred (the time on the host running PFM - Manager) and the time when the status of the service changed to Busy (the time on the host running PFM - Agent or PFM - RM). Make sure that the clocks are synchronized on all hosts that run Performance Management services.

#4

Specify the default value or a value that is at least 60 seconds and that is a divisor of 3,600. If you are specifying a record collection interval that exceeds 3,600 seconds (1 hour), ensure that the specified value is a multiple of 3,600 and a divisor of 86,400 (24 hours). If the specified record collection interval is smaller than the default value or shorter than 60 seconds, collected performance data might not be saved because of too heavy a workload on the Agent Collector service and the Agent Store service on the health check agent host.