Hitachi

JP1 Version 12 JP1/Integrated Management 2 - Manager Overview and System Design Guide


8.2.3 How the health check function works

The JP1/IM - Manager health check function is realized by having processes monitor one another.

The following table describes the correspondence between the processes that perform monitoring in the JP1/IM - Manager health check function, and the processes they monitor.

Table 8‒3: Correspondence between monitoring processes and monitored processes

Monitoring processes

Monitored processes

Event base service (evflow)

Event console service (evtcon)

Automatic action service (jcamain)

Event generation service (evgen)#1

Event service (jevservice)#2

Event console service (evtcon)

Event base service (evflow)

#1: Applicable when not using the integrated monitoring database.

#2: A JP1/Base service that runs on the manager.

Organization of this subsection

(1) Detecting process errors

In the JP1/IM - Manager health check function, a process that performs monitoring communicates over the network with the processes it monitors, to check whether the processes are working normally.

To detect process errors, the health check function sends polling signals to the monitored processes at regular intervals. If a process has not responded to the signal within a set time, the health check function regards the process as being in an abnormal state.

The interval at which processes are polled, and the number of non-responses for a process to be judged abnormal, differ according to the monitored process, as follows:

Table 8‒4: Differences in non-response count

Monitored process

Polling interval#1

Non-response count#2

Event service (jevservice)

60 to 3,600 seconds

1 to 60

Process other than the event service

60 to 3,600 seconds

1 to 60

#1

Specify the interval with NO_RESPONSE_TIME (no response time) in the health check definition file.

#2

Specify the count with ERROR_THRESHOLD (no response count for a process to be judged abnormal) in the health check definition file.

For details about the definition, see Health check definition file (jcohc.conf) in the manual JP1/Integrated Management 2 - Manager Command, Definition File and API Reference.

The following describes the JP1/IM - Manager operation after a non-response is detected:

The figure below shows in diagrammatic form how process errors are detected.

Figure 8‒3: Communication between processes

[Figure]

(2) Reporting process errors

When the JP1/IM - Manager health check function is enabled, on detection of a process error, JP1/IM - Manager executes the following processing to report that an error has occurred:

When the failed process has been restored to normal status, message KAVB8061-I for a monitored process of JP1/IM - Manager or message KAVB8063-I for an event service of JP1/Base is output to the integrated trace log and to the Windows event log or UNIX syslog. If JP1 event issuance is enabled, a JP1 event (event ID: 00002014) is issued.

Note
  • The JP1 event with event ID 00002013 is a dummy event (an event not registered in the event database) issued to JP1/IM - View. A dummy event is issued when an error occurs in the event service in which JP1 events are registered.

  • We recommend that you set up the functionality for executing a notification command when using the JP1/IM - Manager health check function.

    Execution of a notification command is recommended because if errors are reported only by issuing JP1 events, the user might fail to respond promptly when not monitoring services in JP1/IM - View or if a problem occurs in the event console service (that is, the user is not made aware that an error has been detected by JP1/IM).