Job Management Partner 1/Integrated Management - Manager Command and Definition File Reference
Format
[HEALTHCHECK] ENABLE={true | false} FAILOVER={true | false} EVENT={true | false} COMMAND=command NO_RESPONSE_TIME=no-response-time ERROR_THRESHOLD=no-response-count-treated-as-error BASE_NO_RESPONSE_TIME=no-response-time BASE_ERROR_THRESHOLD=no-response-count-treated-as-error [End]
File
jcohc.conf (health check definition file)
jcohc.conf.model(model file for the health check definition file)
Storage directory
- In Windows
- Console-path\conf\health\
- shared-folder\jp1cons\conf\health\ (logical host)
- In UNIX
- /etc/opt/jp1cons/conf/health/
- shared-directory/jp1cons/conf/health/ (logical host)
Description
This file defines whether the health check function is to be enabled. If you enable the health check function, you can also define whether errors are to be notified by issuing a JP1 event or by executing a notification command.
You must specify this definition file by using the character encoding supported by JP1/IM - Manager.
If you have deleted the health check definition file (jcohc.conf), copy the model file for the health check definition file (jcohc.conf.model) under the name jcohc.conf and then edit the definition in the copy, if necessary.
The health check function cannot monitor Central Scope Service (jcsmain).
When you enable the health check function by using this definition file, you gain the capability to check whether each process of JP1/IM - Manager and the event service of JP1/Base on the local host is running normally.
The health check function can detect errors in the following processes:
- Event Console Service (evtcon)
- Automatic Action Service (jcamain)
- Event Base Service (evflow)
- Event Generation Service (evgen)
- Event service (jevservice)
If any of these processes hang up# or terminate abnormally, the health check function can issue a JP1 event or execute a specified notification command to prompt the operator to recover the process.
- #
- A process hang-up is a status in which a process can no longer accept processing requests due to deadlock or looping.
When the definitions are applied
The settings in the health check definition file take effect at the following times:
- When JP1/IM - Manager is started.
- When the file is reloaded by the jco_spmd_reload command.
Information that is specified
- ENABLE={true | false}
- Specifies whether the health check function is to be enabled.
- Specify either true or false. To enable the health check function, specify true; to disable the function, specify false. The default is false.
- When the health check function has been enabled and it detects an error, a message (KAVB8060-E or KAVB8062-E) is output to the integrated trace and the Windows event log (syslog) reporting whether the EVENT setting in the health check definition file is true or false.
- FAILOVER={true | false}
- Specifies whether a JP1/IM - Manager operation is to be performed when an error is detected by the health check function when you are operating in a cluster system. Specify true if the operation is to be performed, or specify false if the operation is not to be performed. The default is false. If you do not use a cluster system, do not change the default setting.
- In Windows
When true is specified, JP1/IM - Manager is terminated when an error is detected. When false is specified, JP1/IM - Manager is not terminated when an error is detected. If the primary server is terminated because an error has been detected by the health check function, failover to the secondary server can occur.
- In UNIX
When true is specified, the JP1/IM - Manager process in which the error was detected is terminated. When false is specified, the process is not terminated. If an error is detected by the health check function and the processes constituting JP1/IM - Manager are terminated forcibly by the jco_killall.cluster command at the primary server, failover to the secondary server can take place.
- EVENT={true | false}
- Specifies whether JP1 events (event ID: 2012 and 2013) are to be issued when an error is detected by the health check function.
- Specify either true or false. If JP1 events are to be issued, specify true; otherwise, specify false.
- The default is true. When true is specified, a JP1 event (event ID: 2014) is also issued in the following case:
- The health check function detects abnormal recovery.
- For details about JP1 events, see 3.2.2 Details of JP1 events.
- COMMAND=command
- Specifies the notification command that is to be executed when an error is detected by the health check function.
- You can execute the following types of commands:
- When the host executing the command is Windows:
- Executable file (.com, .exe)
- Batch file (.bat)
- JP1/Script script file (.spt)
(An appropriate association must have been set so that an .spt file can be executed.)
- When the host executing the command is UNIX:
- Executable file (with execution permissions)
- Shell script (with execution permissions)
- The following notes apply to defining a notification command:
- Everything from COMMAND= to the linefeed code is defined as a single command.
- The maximum length of a command is 1,023 bytes. This length includes spaces, but does not include the linefeed code. If the length exceeds 1,023 bytes, the default value is assumed. If you specify variables and the character string obtained by expanding variables exceeds 1,023 bytes, the command will not execute. In such a case, the message KAVB8072-E The notification command command could not be executed. : maintenance-information is output to the integrated trace log.
- If you specify a variable, specify it immediately after $. The following table lists and describes the variables that can be specified.
Table 2-18 Variables that can be specified in the health check definition file
#: If the error occurred in jevservice, the process ID value is replaced with -1.
Variable name Description HCHOST Name of host resulting in the error HCFUNC Name of function resulting in the error
(evflow, jcamain, evtcon, evgen, or jevservice)HCPNAME Name of process resulting in the error
(evflow, jcamain, evtcon, evgen, or jevservice)HCPID Process ID of process resulting in the error
(process ID of evflow, jcamain, evtcon, evgen, or jevservice#)HCDATE Date the error occurred (YYYY/MM/DD) HCTIME Time the error occurred (hh:mm:ss)
- For the notification command, specify a command that will always terminate. If you set a batch file (Windows) or shell script (UNIX), make sure that it will terminate with exit 0. If the specified command does not terminate or uses the GUI, processes of the executed notification command will remain unresolved.
- The notification command specified in COMMAND inherits the execution environment of JP1/IM - Manager.
- The notification command is executed with the execution permissions of JP1/IM - Manager (Windows: SYSTEM user; UNIX: root).
- Specify in COMMAND the full path of the notification command.
- Use the jcohctest command to test thoroughly whether the set notification command functions successfully. For details about the jcohctest command, see jcohctest in 1. Commands.
- The default is COMMAND=, in which case no notification command is executed.
- To use $, specify $$.
- NO_RESPONSE_TIME=no-response-time
- Specifies in seconds the amount of time to wait for a response to be sent from the JP1/IM - Manager process. The permitted value range is from 60 to 3,600 seconds. The default is 60 seconds.
- If the value that is specified is outside the permitted value range or the definition is omitted, the default value (60 seconds) is assumed.
- ERROR_THRESHOLD=no-response-count-treated-as-error
- Specifies the number of times to wait for the set no-response time to elapse before assuming that an error has occurred in the JP1/IM - Manager process. The permitted value range is from 1 to 60 times. The default is 3 times.
- If the value that is specified is outside the permitted value range or the definition is omitted, the default value (3 times) is assumed.
- BASE_NO_RESPONSE_TIME=no-response-time
- Specifies in seconds the interval for checking the JP1/Base process for the set no-response time. The permitted value range is from 60 to 3,600 seconds. The default is 300 seconds.
- If the value that is specified is outside the permitted value range or the definition is omitted, the default value (300 seconds) is assumed.
- BASE_ERROR_THRESHOLD=no-response-count-treated-as-error
- Specifies the number of times to wait for the set no-response time to elapse before assuming that an error has occurred in the JP1/Base process. The permitted value range is from 1 to 60 times. The default is 2.
- If the value that is specified is outside the permitted value range or the definition is omitted, the default value (2 times) is assumed.
Example definition
Issue a JP1 event and execute the jcohc01.exe notification command when an error is detected by the health check function:
[HEALTHCHECK] ENABLE=true FAILOVER=false EVENT=true COMMAND=C:\Command\jcohc01.exe NO_RESPONSE_TIME=60 ERROR_THRESHOLD=3 BASE_NO_RESPONSE_TIME=300 BASE_ERROR_THRESHOLD=2 [End]
All Rights Reserved. Copyright (C) 2009, Hitachi, Ltd.