9.2.3 How the health check function works
The JP1/IM - Manager health check function is realized by having processes monitor one another.
The following table describes the correspondence between the processes that perform monitoring in the JP1/IM - Manager health check function, and the processes they monitor.
Monitoring processes |
Monitored processes |
---|---|
Event base service (evflow) |
Event console service (evtcon) |
Automatic action service (jcamain) |
|
Event generation service (evgen)#1 |
|
Event service (jevservice)#2 |
|
Event console service (evtcon) |
Event base service (evflow) |
#1: Applicable when not using the integrated monitoring database.
#2: A JP1/Base service that runs on the manager.
- Organization of this subsection
(1) Detecting process errors
In the JP1/IM - Manager health check function, a process that performs monitoring communicates over the network with the processes it monitors, to check whether the processes are working normally.
To detect process errors, the health check function sends polling signals to the monitored processes at regular intervals. If a process has not responded to the signal within a set time, the health check function regards the process as being in an abnormal state.
The interval at which processes are polled, and the number of non-responses for a process to be judged abnormal, differ according to the monitored process, as follows:
Monitored process |
Polling interval#1 |
Non-response count#2 |
---|---|---|
Event service (jevservice) |
60 to 3,600 seconds |
1 to 60 |
Process other than the event service |
60 to 3,600 seconds |
1 to 60 |
- #1
-
Specify the interval with NO_RESPONSE_TIME (no response time) in the health check definition file.
- #2
-
Specify the count with ERROR_THRESHOLD (no response count for a process to be judged abnormal) in the health check definition file.
For details about the definition, see Health check definition file (jcohc.conf) in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
The following describes the JP1/IM - Manager operation after a non-response is detected:
-
When a non-response is detected, the JP1/IM - Manager outputs the KAVB8064-W message until an error is detected.
-
When the maximum non-response count is reached, the JP1/IM - Manager outputs the KAVB8060-E or KAVB8062-E message.
-
If a non-response is detected but recovered later, JP1/IM - Manager resets the non-response count.
-
If a non-response is detected and is not recovered, JP1/IM - Manager does not output the KAVB8064-W, KAVB8060-E, or KAVB8062-E message.
The figure below shows in diagrammatic form how process errors are detected.
|
(2) Reporting process errors
When the JP1/IM - Manager health check function is enabled, on detection of a process error, JP1/IM - Manager executes the following processing to report that an error has occurred:
-
If the error occurred in a process being monitored by JP1/IM - Manager (evtcon, jcamain, evflow, or evgen), message KAVB8060-E is output to the integrated trace log and to the Windows event log or UNIX syslog.
-
If the error occurred in the JP1/Base event service, message KAVB8062-E is output to the integrated trace log and to the Windows event log or UNIX syslog.
-
If a notification command has been set, the command is executed.
-
If the FAILOVER parameter is enabled, any process for which an error has been detected ends abnormally. The abnormality of JP1/IM - Manager is then reported to the cluster system. The cluster system can initiate a failover when the health check function detects a process error if you set it to initiate a failover when an error in JP1/IM - Manager occurs in the cluster system.
When the failed process has been restored to normal status, message KAVB8061-I for a monitored process of JP1/IM - Manager or message KAVB8063-I for an event service of JP1/Base is output to the integrated trace log and to the Windows event log or UNIX syslog. If JP1 event issuance is enabled, a JP1 event (event ID: 00002014) is issued.
- Note
-
-
The JP1 event with event ID 00002013 is a dummy event (an event not registered in the event database) issued to JP1/IM - View. A dummy event is issued when an error occurs in the event service in which JP1 events are registered.
-
We recommend that you set up the functionality for executing a notification command when using the JP1/IM - Manager health check function.
Execution of a notification command is recommended because if errors are reported only by issuing JP1 events, the user might fail to respond promptly when not monitoring services in JP1/IM - View or if a problem occurs in the event console service (that is, the user is not made aware that an error has been detected by JP1/IM).
-