Hitachi

JP1 Version 12 JP1/Integrated Management 2 - Manager Command and Definition File Reference


Health check definition file (jcohc.conf)

Organization of this page

Format

[HEALTHCHECK]
ENABLE={true | false}
FAILOVER={true | false}
EVENT={true | false}
COMMAND=command
NO_RESPONSE_TIME=no-response-time
ERROR_THRESHOLD=no-response-count-treated-as-error
BASE_NO_RESPONSE_TIME=no-response-time
BASE_ERROR_THRESHOLD=no-response-count-treated-as-error
[End]

File

jcohc.conf (health check definition file)

jcohc.conf.model(model file for the health check definition file)

Storage directory

In Windows
For a physical host:

Console-path\conf\health\

For a logical host:

shared-folder\jp1cons\conf\health\

In UNIX
For a physical host:

/etc/opt/jp1cons/conf/health/

For a logical host:

shared-directory/jp1cons/conf/health/

Description

This file defines whether the health check function is to be enabled. If you enable the health check function, you can also define whether errors are to be notified by issuing a JP1 event or by executing a notification command.

You must specify this definition file by using the character encoding supported by JP1/IM - Manager.

If you have deleted the health check definition file (jcohc.conf), copy the model file for the health check definition file (jcohc.conf.model) under the name jcohc.conf and then edit the definition in the copy, if necessary.

The health check function cannot monitor Central Scope Service (jcsmain).

When you enable the health check function by using this definition file, you gain the capability to check whether each process of JP1/IM - Manager and the event service of JP1/Base on the local host is running normally.

The health check function can detect errors in the following processes:

If any of these processes hang up# or terminate abnormally, the health check function can issue a JP1 event or execute a specified notification command to prompt the operator to recover the process.

#

A process hang-up is a status in which a process can no longer accept processing requests due to deadlock or looping.

When the definitions are applied

The settings in the health check definition file take effect at the following times:

Information that is specified

ENABLE={true | false}

Specifies whether the health check function is to be enabled.

Specify either true or false. To enable the health check function, specify true; to disable the function, specify false. The default is false.

When the health check function has been enabled and it detects an error, a message (KAVB8060-E or KAVB8062-E) is output to the integrated trace and the Windows event log (syslog) reporting whether the EVENT setting in the health check definition file is true or false.

FAILOVER={true | false}

Specifies whether a JP1/IM - Manager operation is to be performed when an error is detected by the health check function when you are operating in a cluster system. Specify true if the operation is to be performed, or specify false if the operation is not to be performed. The default is false. If you do not use a cluster system, do not change the default setting.

  • In Windows

    When true is specified, JP1/IM - Manager is terminated when an error is detected. When the health check function detects an error, it notifies the cluster system of the error in JP1/IM - Manager by stopping JP1/IM - Manager. If you set the cluster system to fail over when a JP1/IM - Manager error occurs, failover can take place when an error is detected.

  • In UNIX

    When true is specified, the JP1/IM - Manager process in which the error was detected is terminated. When the health check function detects an error, it notifies a cluster system of the error in JP1/IM - Manager by stopping JP1/IM - Manager. If you set the cluster system so that, on detection of an error, it is stopped forcibly by the jco_killall.cluster command and then failed over, failover can take place when an error is detected.

EVENT={true | false}

Specifies whether JP1 events (event ID: 2012 and 2013) are to be issued when an error is detected by the health check function.

Specify either true or false. If JP1 events are to be issued, specify true; otherwise, specify false.

The default is true. When true is specified, a JP1 event (event ID: 2014) is also issued in the following case:

  • The health check function detects abnormal recovery.

For details about JP1 events, see 3.2.2 Details of JP1 events.

COMMAND=command

Specifies the notification command that is to be executed when an error is detected by the health check function.

You can execute the following types of commands:

When the host executing the command is Windows:

  • Executable file (.com, .exe)

  • Batch file (.bat)

  • JP1/Script script file (.spt)

    (An appropriate association must have been set so that an .spt file can be executed.)

When the host executing the command is UNIX:

  • Executable file (with execution permissions)

  • Shell script (with execution permissions)

The following notes apply to defining a notification command:

  • Everything from COMMAND= to the linefeed code is defined as a single command.

  • The maximum length of a command is 1,023 bytes. This length includes spaces, but does not include the linefeed code. If the length exceeds 1,023 bytes, the default value is assumed. If you specify variables and the character string obtained by expanding variables exceeds 1,023 bytes, the command will not execute. In such a case, the message KAVB8072-E is output to the integrated trace log.

  • If you specify a variable, specify it immediately after $. The following table lists and describes the variables that can be specified.

    Table 2‒17: Variables that can be specified in notification commands

    Variable name

    Description

    HCHOST

    Name of host resulting in the error

    HCFUNC

    Name of function resulting in the error

    (evflow, jcamain, evtcon, evgen, or jevservice)

    HCPNAME

    Name of process resulting in the error

    (evflow, jcamain, evtcon, evgen, or jevservice)

    HCPID

    • For evflow, jcamain, evtcon, or evgen

      ID of process resulting in the error

    • For jevservice

      -1

    HCDATE

    Date the error occurred (YYYY/MM/DD)

    HCTIME

    Time the error occurred (hh:mm:ss)

  • For the notification command, specify a command that will always terminate. If you set a batch file (Windows) or shell script (UNIX), make sure that it will terminate with exit 0. If the specified command does not terminate or uses the GUI, processes of the executed notification command will remain unresolved.

  • The notification command specified in COMMAND inherits the execution environment of JP1/IM - Manager.

  • The notification command is executed with the execution permissions of JP1/IM - Manager (Windows: SYSTEM user; UNIX: root).

  • Specify in COMMAND the full path of the notification command.

Use the jcohctest command to test thoroughly whether the set notification command functions successfully. For details about the jcohctest command, see jcohctest in 1. Commands.

  • The default is COMMAND=, in which case no notification command is executed.

  • To use $, specify $$.

  • In Windows, if you execute a command in the %WINDIR%\System32 folder, the WOW64 redirect functionality redirects execution to the same command in the %WINDIR%\SysWow64 folder. If there is no applicable command in the destination folder, command execution might fail. Make sure that the applicable command is in the %WINDIR%\System32 folder when you specify it for execution.

NO_RESPONSE_TIME=no-response-time

Specifies in seconds the amount of time to wait for a response to be sent from the JP1/IM - Manager process. The permitted value range is from 60 to 3,600 seconds. The default is 60 seconds.

If the value that is specified is outside the permitted value range or the definition is omitted, the default value (60 seconds) is assumed.

Note that this parameter is not included in the health check definition file (jcohc.conf) that is deployed when JP1/IM - Manager is installed. If you want to change the default value, you must add the parameter.

ERROR_THRESHOLD=no-response-count-treated-as-error

Specifies the number of times to wait for the set no-response time to elapse before assuming that an error has occurred in the JP1/IM - Manager process. The permitted value range is from 1 to 60 times. The default is 3 times.

If the value that is specified is outside the permitted value range or the definition is omitted, the default value (3 times) is assumed.

BASE_NO_RESPONSE_TIME=no-response-time

Specifies in seconds the interval for checking the JP1/Base event service for the set no-response time on Manager. The permitted value range is from 60 to 3,600 seconds. The default is 300 seconds.

If the value that is specified is outside the permitted value range or the definition is omitted, the default value (300 seconds) is assumed.

BASE_ERROR_THRESHOLD=no-response-count-treated-as-error

Specifies the number of times to wait for the set no-response time to elapse before assuming that an error has occurred in the JP1/Base event service on Manager. The permitted value range is from 1 to 60 times. The default is 2.

If the value that is specified is outside the permitted value range or the definition is omitted, the default value (2 times) is assumed.

Example definition

Issue a JP1 event and execute the jcohc01.exe notification command when an error is detected by the health check function:

[HEALTHCHECK]
ENABLE=true
FAILOVER=false
EVENT=true
COMMAND=C:\Command\jcohc01.exe
NO_RESPONSE_TIME=60
ERROR_THRESHOLD=3
BASE_NO_RESPONSE_TIME=300
BASE_ERROR_THRESHOLD=2
[End]