Hitachi

JP1 Version 12 JP1/Automatic Job Management System 3 Configuration Guide


15.2.12 Changing the wait time for recovery when an agent has failed

This subsection discusses JP1/AJS3 behavior when an agent host executing a job (PC job other than a queueless job, Unix job other than a queueless job, flexible job#, HTTP connection job, queue job running on JP1/AJS3, action job other than a queueless job, or custom job) fails or a communication error occurs. In such situations, JP1/AJS3 does not immediately assume a failure, and retries communication after waiting a specified time for recovery. The purpose of waiting is to prevent operation from stopping due to a temporary, recoverable failure. The default wait time is 10 minutes. However, depending on the operation, you might want to determine the failure location and take corrective action immediately rather than waiting for recovery. You can do this by reducing the wait time for recovery.

#:

For a flexible job, replace agent host with relay agent.

The following describes how to change the wait time for recovery when an agent host has failed.

Organization of this subsection

(1) Definition procedure

  1. Stop the JP1/AJS3 service.

    Execute the following commands to confirm that all processes have stopped:

    # /etc/opt/jp1ajs2/jajs_stop#1
    # /opt/jp1ajs2/bin/jajs_spmd_status
    #1:

    Confirm that automatic termination has been set.

    In a cluster system, also stop the JP1/AJS3 service on each logical host.

  2. Execute the following command to set the environment setting parameters described in (2) below:

    jajs_config -k "definition-key" "parameter-name-1"=value-1 ["parameter-name-2"=value-2]
    Cautionary note:

    In a cluster system, perform this step on both the primary and secondary nodes.

  3. Restart JP1/AJS3.

    The new settings are applied.

(2) Environment setting parameters

Table 15‒30: Environment setting parameters used to set the amount of time to wait for recovery when an agent has failed

Definition key

Environment setting parameter

Explanation

  • For all scheduler services

    [{JP1_DEFAULT|logical-host}\JP1AJS2\SCHEDULER\QUEUE\MANAGER\Job]#

  • For a specific scheduler service

    [{JP1_DEFAULT|logical-host}\JP1AJSMANAGER\scheduler-service\QUEUE\MANAGER\Job]#

  • For submit jobs

    [{JP1_DEFAULT|logical-host}\JP1NBQMANAGER\Job]#

"QueuingJobRecoveryTime"=

Specifies in seconds how long to wait for recovery from an agent failure related to a queued job.

"ExecutingJobRecoveryTime"=

Specifies in seconds how long to wait for recovery from an agent failure related to a job being executed.

#:

The specification of the {JP1_DEFAULT|logical-host} part depends on whether the host is a physical host or a logical host. For a physical host, specify JP1_DEFAULT. For a logical host, specify the logical host name.

For details about the definition of these environment setting parameters, see the following: