2.5.5 Considering reduction of job distribution delay

When a job is distributed simultaneously from the manager host to multiple execution agents, if a communication failure has occurred on three or more of those agents, job distribution to the other execution agents might be delayed. When this occurs, the manager host can check the status of communication with each execution agent to suppress job distribution to execution agents on which a communication failure has occurred. As a result, delays in job distribution can be reduced. In JP1/AJS3, this function is called job distribution delay reduction function.

For details about how to enable the job distribution delay reduction function, see 21.5 Setting up the job distribution delay reduction function in the JP1/Automatic Job Management System 3 Configuration Guide.

Organization of this subsection

(1) Overview of the job distribution delay reduction function
(2) Jobs subject to job distribution delay reduction
(3) Monitoring interval of the job distribution delay reduction function
(4) Forcibly terminating a job when the job distribution delay reduction function is used
(5) When you want to stop an execution agent for a scheduled purpose
(6) Relationship between the job transfer restriction and the job distribution delay reduction function
(7) Note on the job distribution delay reduction function

(1) Overview of the job distribution delay reduction function

If the job distribution delay reduction function is enabled, the manager host manages the status of each execution agent. The following table describes the statuses of execution agents.

Table 2‒42: List of statuses of execution agents
No.	Status	Description
1	Not checked	The status of the execution agent has not been identified yet. Job distribution to the execution agent is possible.
2	Connectable	The manager host can normally communicate with the execution agent. Job distribution to the execution agent is possible.
3	Unconnectable	The execution agent cannot receive jobs because a communication failure has occurred. Job distribution to the execution agent is not possible. (Jobs to be distributed to this execution agent enter the Queuing status on the manager host.)
4	Unavailable	Job distribution to the execution agent is explicitly suppressed by using the `ajsagtalt` command. Job distribution to the execution agent is not possible. (Jobs to be distributed to this execution agent enter the Queuing status on the manager host.)

When no communication failures have been detected on any execution hosts, the manager host assumes the status of each execution agent to be Not checked (the default status). The manager host distributes a job to execution agents with the Not checked status.

Upon detecting a failure in communication with an execution agent, the manager host starts monitoring the status of the execution agents. The following describes the status transitions of execution agents:

The manager host checks the communication status of execution agents whose status is Not checked or Connectable and for which jobs are queuing, as well as the communication status of execution agents whose status is Not checked or Connectable and that are connected to a group of execution agents for which jobs are queuing. This is called the communication status check.

The status of the execution agents changes to one of the following according to the result of the communication status check:
- Connectable
- Unconnectable

The manager host distributes the job to the execution agents for which the status is Connectable, and does not distribute the job to the execution agents for which the status is Unconnectable.

The job distributed to Connectable execution agents is executed on the execution agents. One hour after the status of an execution agent changes to Connectable, the status changes to Not checked.

A job that could not be distributed to an execution agent for which the status is Unconnectable remains queuing, and waits for the communication failure on the execution agent to be corrected. If an execution agent has not recovered when the wait time for error recovery elapses^#(in the case of the event job, even if the retransmission of the unreported information is complete), the job status changes to Failed to start.

#: For details about the wait time for error recovery of agents, see 6.2.12 Changing the wait time for recovery when an agent has failed in the JP1/Automatic Job Management System 3 Configuration Guide (for Windows) or see 15.2.12 Changing the wait time for recovery when an agent has failed in the JP1/Automatic Job Management System 3 Configuration Guide (for UNIX).

The following figure shows an overview of distributing a job according to the status of execution agents.

Figure 2‒51: Distribution of a job according to the status of execution agents

The manager host periodically polls Unconnectable execution agents to check the connection status. This is called the communication recovery check.

The communication recovery check is repeated until the execution agent recovers from the communication failure.
The status of each execution agent changes to one of the following statuses according to the result of repeated communication recovery checks:
- Connectable
  
  When an execution agent recovers from a communication failure, the execution agent's status changes to Connectable, and job distribution starts. If there are no execution agents whose status is Unconnectable, the communication recovery check ends.
- Not checked
  
  If an execution agent does not recover from a communication failure within 24 hours (at the default settings), the execution agent's status changes from Unconnectable to Not checked, and the communication recovery check stops.

Supplementary note

If an execution agent group is specified as the job destination, the job is distributed to the execution agents with the highest priority among the Connectable or Not checked execution agents associated with the group.
Even if JP1/AJS3 version 11-10 or earlier is installed on the job-destination agent hosts, the agent hosts are subject to communication status check and communication recovery check.

To Page Top

(2) Jobs subject to job distribution delay reduction

The job distribution delay reduction function can be applied to the following types of jobs:

Unix job (with the exception of queueless jobs)
PC job (with the exception of queueless jobs)
Flexible job

The function only affects the communications between JP1/AJS3 - Manager and the JP1/AJS3 that is working as a relay agent. The function does not affect the communications between JP1/AJS3 and the destination agents.
Event job
Action job (with the exception of queueless jobs)
Custom job
Passing information setting job
HTTP connection job

The function only affects the communications between JP1/AJS3 - Manager and the JP1/AJS3 - Agent hosts that execute HTTP connection jobs. The function does not affect the communications between JP1/AJS3 - Agent and a web server.

To Page Top

(3) Monitoring interval of the job distribution delay reduction function

By using the environment setting parameters for the job distribution delay reduction function, you can set the connection timeout period, the interval at which to perform communication recovery checks, and the amount of time after which communication checks are to stop. The following figure shows the time periods that can be set by using environment setting parameters.

Figure 2‒52: Time periods that can be set by using environment setting parameters

The time required to carry out a communication status check or communication recovery check (connection timeout). This value is specified by using the AGMCONNECTTIMEOUT environment setting parameter. The default is 10 seconds.
The interval at which to perform a communication recovery check. This value is specified by using the AGMINTERVALFORRECOVER environment setting parameter. The default is 180 seconds.
The time before the status of an execution agent changes from Unconnectable to Not checked. This value is specified by using the AGMERRAGTSTATRESETTIME environment setting parameter. The default is 24 hours.

To Page Top

(4) Forcibly terminating a job when the job distribution delay reduction function is used

If the job distribution delay reduction function is used, job distribution is not delayed even when a communication failure occurs during distribution of a forced-termination request for a job. In the same way that jobs are distributed, the request is distributed to the execution agents whose status is Connectable or Not checked. Distribution of the request is suppressed for execution agents whose status is Unconnectable or Unavailable. If the forced-termination request for a job is suppressed, the status of the job changes to Ended abnormally, and the message KAVU4221-E is output to the integrated trace log. Note that user programs executed by the job are not forcibly terminated.

To Page Top

(5) When you want to stop an execution agent for a scheduled purpose

If the job distribution delay reduction function is enabled and you want to stop an execution agent for a scheduled purpose such as maintenance, specify the following settings for the execution agent you want to stop, in order to prevent delays in job distribution to other running execution agents:

Use the ajsagtalt command to change the job transfer restriction status to Hold or Blockade.
Use the ajsagtalt command to change the status of the execution agent to Unavailable.

If you do not change the status of the execution agent to Unavailable, the communication status check reports the status to be Unconnectable. In this case, because the communication recovery check continues for the execution agent that has stopped, recovery detection might be delayed until the status of the execution agent changes to Not checked. To redistribute a job to an execution agent whose status you changed to Unavailable, use the ajsagtalt command to change the status of the execution agent to Not checked.

Note that you can also perform these operations from JP1/AJS3 - Web Console.

To Page Top

(6) Relationship between the job transfer restriction and the job distribution delay reduction function

The following table describes the job status transitions that occur based on the job transfer restriction status and the status of the execution agent according to the job distribution delay reduction function. For details about job transfer restriction, see 5.2 Restricting job transfer in the manual JP1/Automatic Job Management System 3 Overview.

Table 2‒43: Job status transitions that occur when job transfer restriction is used in conjunction with the job distribution delay reduction function
Job transfer restriction status	Execution agent status	Job status transition	Event job status transition
Effective	Not checked Connectable	Now queuing → Now running → Ended	Now queuing → Now running → Ended (The transfer of event jobs cannot be restricted.)
Effective	Unconnectable Unavailable	Now queuing → Failed to start (The status changes to Failed to start after the wait time for the error recovery of agents has elapsed.)	Now queuing → Failed to start
Ineffective	Not checked Connectable	Immediately Failed to start (Now queuing jobs have the same status as Effective jobs.)	Now queuing → Now running → Ended (The transfer of event jobs cannot be restricted.)
Ineffective	Unconnectable Unavailable	Immediately Failed to start (The status of Now queuing jobs changes to Failed to start after the wait time for the error recovery of agents has elapsed.)	Now queuing → Failed to start
Hold	Not checked Connectable	Now queuing	Now queuing → Now running → Ended (The transfer of event jobs cannot be restricted.)
Hold	Unconnectable Unavailable	Now queuing	Now queuing → Failed to start
Blockade	Not checked Connectable	Immediately Failed to start (Now queuing jobs have the same status as Hold jobs.)	Now queuing → Now running → Ended (The transfer of event jobs cannot be restricted.)
Blockade	Unconnectable Unavailable	Immediately Failed to start (Now queuing jobs have the same status as Hold jobs.)	Now queuing → Failed to start

To Page Top

(7) Note on the job distribution delay reduction function

One hour after the status of an execution agent is determined to be Connectable, the status of the execution agent changes to Not checked. Suppose that a communication failure occurs on three or more execution agents determined to be Connectable before their status changes from Connectable to Not checked. In this case, an attempt to simultaneously distribute jobs to those execution agents might cause a delay in job distribution. If you are going to enable the job distribution delay reduction function, do not stop the agent monitoring process (ajsagtmond). If you stop this process, the following operations are performed:

Jobs are distributed to all execution agents regardless of the agents' status, excluding execution agents whose status is Unavailable.
If jobs are distributed to three or more execution agents for which communication is disabled, the distribution of jobs to normally operating execution agents will be delayed. (This is the same as when the job distribution delay reduction function is disabled.)
While the agent monitoring process is stopped, the detection of the recovery of execution agents for which an error was previously detected will be delayed. (This is the same as when the job distribution delay reduction function is disabled.)

To Page Top