15.2.19 Placing all running jobs in an end status when a communication error occurs

JP1/AJS3 periodically (at five-minute intervals) performs polling to monitor running jobs (PC jobs other than queueless jobs, Unix jobs other than queueless jobs, flexible jobs^#, HTTP connection jobs, queue jobs running on JP1/AJS3, action jobs other than queueless jobs, or custom jobs).

#:: For flexible jobs, polling is performed between the manager host and relay agent to monitor jobs.

If a communication error occurs during the monitoring on the agent host on which a job is to be executed, JP1/AJS3 does not immediately declare an abnormal end. Instead, it retries communication for a specified period of time (default: 10 minutes) while waiting for recovery from the system or communication error on the agent host. If the error is a temporary, recoverable error, then operation is not stopped needlessly.

If there has been no recovery on the agent host when the polling period ends, jobs are placed in an end status^# one by one in order by expiration of a job's wait time for recovery. However, if many jobs are being executed, a long time might be required before all jobs have been placed in an end status. In some cases, therefore, depending on the operation, immediate recovery will have precedence over waiting for recovery in the event of an error. For these cases, you can specify settings so that all jobs being executed on the same agent host are immediately paced in an end status^# if there has been no error recovery on the agent host when the polling period ends. For the jobs in an execution agent group, these settings are applied to the jobs that are running on the same agent host. Immediately placing jobs in an end status enables recovery action to be taken sooner.

#:: For a job defined in a jobnet, the job status changes to Killed, and -1 is set as the return code. For a submit job executed by the jpqjobsub command, the job status changes to the status specified by the -rs option (the default is Hold).

The following describes how to specify the settings for placing all running jobs in an end status when a communication error occurs.

Organization of this subsection

(1) Definition procedure
(2) Environment setting parameter

(1) Definition procedure

Execute the following commands to confirm that all processes have stopped:
```
# /etc/opt/jp1ajs2/jajs_stop^#1
# /opt/jp1ajs2/bin/jajs_spmd_status
```
#1:

Confirm that automatic termination has been set.
Execute the following command to set the environment setting parameter described in (2) below:
```
jajs_config -k "definition-key" "parameter-name"=value
```
Restart JP1/AJS3.

The new settings are applied.

To Page Top

(2) Environment setting parameter

Table 15‒38: Environment setting parameter used to place all running jobs in an end status when a communication error occurs
Definition key	Environment setting parameter	Explanation
For all scheduler services `[{JP1_DEFAULT\|logical-host}\JP1AJS2\SCHEDULER\QUEUE\MANAGER\Job]`^# For a specific scheduler service `[{JP1_DEFAULT\|logical-host}\JP1AJSMANAGER\scheduler-service\QUEUE\MANAGER\Job]`^# For submit jobs `[{JP1_DEFAULT\|logical-host}\JP1NBQMANAGER\Job]`^#	`"ExecutingJobChangeStatus"=`	Specifies that all running jobs are placed in an end status when a communication error occurs.

#:: The specification of the {JP1_DEFAULT|logical-host} part depends on whether the host is a physical host or a logical host. For a physical host, specify JP1_DEFAULT. For a logical host, specify the logical host name.

For details about the definition of this environment setting parameter, see 20.5.2(24) ExecutingJobChangeStatus.

To Page Top