Job Management Partner 1/Base User's Guide

[Contents][Glossary][Index][Back][Next]


1.7.2 Remote host monitoring with the health check function

The health check function is meant to detect problems in JP1/Base, but this is not possible if a hangup or other error occurs in the function itself. Also, in a system that uses JP1/IM - Manager, if an error occurs in the event service, JP1 events cannot be issued or forwarded, so the higher-level host cannot be notified even if an error is detected.

In case something happens and there is no way of detecting or notifying a process error on the local host, the JP1/Base health check function and the event service can be monitored from a remote host. A maximum of 1,024 remote hosts can be monitored from one host.

The following describes remote host monitoring and system operation with JP1/IM - Manager or JP1/AJS.

Organization of this subsection
(1) Remote host monitoring in a system that uses JP1/IM - Manager
(2) Remote host monitoring in a system that uses JP1/AJS
(3) System operation with remote host monitoring

(1) Remote host monitoring in a system that uses JP1/IM - Manager

You can monitor whether the JP1/Base health check function and event service are operating normally on the remote hosts.

The following describes remote host monitoring in a system that uses JP1/IM - Manager, based on the following configuration example.

Figure 1-21 Example of remote host monitoring in a system that uses JP1/IM - Manager

[Figure]

The hosts in this example have the following settings.

Host Purpose Setting for remote host monitoring
hostA Manager host Monitor hostB and hostX.
hostB Submanager host Monitor hostA, hostY, and hostZ.
hostX Agent host None
hostY Agent host None
hostZ Agent host None

The following processing is performed if an error occurs in the health check function or event service at agent hostY or manager hostA.

Error in the health check function at hostY
The health check function at hostB detects the error and issues a JP1 event. The JP1 event is forwarded to hostA. At hostA, a message about the problem at hostY appears in JP1/IM - View.

Error in the event service at hostY
The health check function on hostY detects an error, but cannot issue a JP1 event. Therefore, the health check function at hostB detects the error and issues a JP1 event. The JP1 event issued by hostB is forwarded to hostA. At hostA, a message about the problem at hostY appears in JP1/IM - View.

Error in the health check function at hostA
The health check function at hostB detects the error and issues a JP1 event. The JP1 event is forwarded to hostA. At hostA, a message about the problem on the local host appears in JP1/IM - View.

Error in the event service at hostA
If the health check function is enabled at JP1/IM - Manager on hostA, the health check function at JP1/IM - Manager detects the error in the event service on the local host and a message appears in JP1/IM - View.

(2) Remote host monitoring in a system that uses JP1/AJS

By specifying the target remote hosts, you can monitor whether the JP1/Base health check function is operating normally on those hosts.

To report JP1/Base process errors to the manager host in a system that uses JP1/AJS, monitor the messages output to the syslog or event log by the health check function, and notify the manager host using JP1/Cm2/OAA and NNM.

The following describes remote host monitoring in a system that uses JP1/AJS, based on the following configuration example.

Figure 1-22 Example of remote host monitoring in a system that uses JP1/AJS

[Figure]

The hosts in this example have the following settings.

Host Purpose Setting for remote host monitoring
hostA Manager host Monitor hostX and hostY.
hostX Agent host Monitor hostA.
hostY Agent host None

The following processing is performed if an error occurs in the health check function at agent hostX or manager hostA.

Error in the health check function at hostX
The health check function at hostA detects the error and outputs a message to the syslog or event log. JP1/Cm2/OAA on hostA detects the output message and notifies NNM. A message about the problem at hostX appears in NNM.

Error in the health check function at hostA
The health check function at hostX detects the error and outputs a message to the syslog or event log. JP1/Cm2/OAA on hostX detects the output message and notifies NNM. A message about the problem at hostA appears in NNM.

(3) System operation with remote host monitoring

The following describes the system operation when monitoring remote hosts.

(a) Operation with a large number of monitored hosts

When two or more remote hosts are monitored from a single host, the health check function checks the status of the JP1/Base processes at each host in turn. It takes about 3 seconds at each host. This can take a long time if there are a large number of hosts to monitor.

For example, it would take 600 seconds for one host to check 200 remote hosts. You can reduce the monitoring time by splitting the target hosts into groups, and setting a dummy manager host for each group.

Figure 1-23 Example of monitoring 200 hosts

[Figure]

In this example, the target hosts are split into groups of 20 hosts each. Manager hostA monitors the dummy manager hosts (host1, host21, and so on). As monitoring is by group rather than by individual host, the monitoring time can be cut to about 60 seconds.

(b) Operation when errors occur in a hierarchical configuration

The following describes error handling when the target hosts are arranged in a hierarchy, as in the figure below.

Figure 1-24 Example of error handling in a hierarchical configuration

[Figure]

If an error occurs in the health check function or event service at hostB, errors at hostD and hostE being monitored by hostB cannot be detected or reported.

If hostB is restored quickly, any JP1 event issued because of an error at hostD or hostE while hostB was stopped will be forwarded when hostB retries the send operation at recovery. If hostB recovery takes a long time, you must change the settings in the health check definition file (jbshc.conf) so that hostD and hostE will be monitored directly by hostA until hostB is restored.

As illustrated in this example, in a hierarchical configuration, it is a good idea to prepare a health check definition file (jbshc.conf), specifying that the agent hosts are to be monitored directly from the manager host in the event of an error on the submanager host.

(c) Reviewing the monitoring interval

In the health check definition file (jbshc.conf), you can specify an interval for monitoring remote hosts. Perform a trial run before you start operations, and check whether the specified monitoring interval is appropriate. If message KAVA7219-W is output to the integrated trace log, the monitoring interval might be too short. Change the interval, referring to the estimate equation given in Health check definition file in 14. Definition Files.

[Contents][Back][Next]


[Trademarks]

All Rights Reserved. Copyright (C) 2009, Hitachi, Ltd.