Hitachi

JP1 Version 12 JP1/Automatic Job Management System 3 Configuration Guide


6.2.8 Changing the timeout period, interval of retries, and number of retries for TCP/IP connections

As job execution control, TCP/IP is used to pass information between the processes. However, if the host to be connected to is not running or if a network error has occurred, TCP/IP connection fails.

If the other host does not respond to a TCP/IP connection request, job execution control first waits for a maximum of 90 seconds for a response, and then makes two retry attempts at 20-second intervals (under the default settings). If both attempts fail, four or five minutes might pass before the connection finally resulted in an error.

If a communication error occurs during the following operations, detection might take more time:

This delay might result in a further delay in changing the job status.

If TCP/IP connection errors are frequent, you can set smaller values for the connection timeout value, the number of retry attempts, and the retry interval to speed up the detection of an error.

The following figure shows an example of processing (from executing a job to forcibly terminating it) with communication over a TCP/IP connection between the manager and agent hosts.

Figure 6‒1: Example of processing with communication over a TCP/IP connection between the manager and agent hosts

[Figure]

For connections (1) and (3) in the above figure, the timeout period, number of retries, and interval of retries are controlled by using the environment setting parameters for TCP/IP communication from the manager host to the agent host. For connections (2) and (4) in the above figure, the timeout period, number of retries, and interval of retries are controlled by using the environment setting parameters for TCP/IP communication from the agent host to the manager host.

The following describes these two types of environment setting parameters.

Environment setting parameters for TCP/IP communication from the manager host to the agent host:

TCP/IP communication from the manager host to the agent host is used when the following operations are performed:

  • Delivering jobs

  • Killing jobs

  • Checking the job status

  • Checking the agent host status

For details about checking the status of a job or agent, see 5.4.8 Monitoring the status of registered jobs in the manual JP1/Automatic Job Management System 3 Overview.

The following table lists the environment setting parameters that are used to set the timeout period, number of retries, and interval of retries for TCP/IP communication from the manager host to the agent host.

Table 6‒22: Environment setting parameters that are used for TCP/IP communication from the manager host to the agent host

No.

Definition key#

Environment setting parameter

Definition

1

[{JP1_DEFAULT|logical-host}\JP1AJS2\HOST\NETWORK],

[{JP1_DEFAULT|logical-host}\JP1AJS2\HOST\NETWORK\QUEUEMANAGER], and

[{JP1_DEFAULT|logical-host}\JP1AJSMANAGER\scheduler-service-name\NETWORK\QUEUEMANAGER]

"ClientConnectTimeout"=

Connection timeout

2

"ClientRetryInterval"=

Connection retry interval

3

"ClientRetryCount"=

Number of connection retries

#:

The specification of the {JP1_DEFAULT|logical-host} part depends on whether the host is a physical host or a logical host. For a physical host, specify JP1_DEFAULT. For a logical host, specify the logical host name.

For details about the definition of these environment setting parameters, see 20.8 Setting up the communication control environment.

Environment setting parameters for TCP/IP communication from the agent host to the manager host:

TCP/IP communication from the agent host to the manager host is used when the following operations are performed:

  • Reporting the start of a job

  • Reporting the end of a job

  • Transferring a result file

The following table lists the environment setting parameters that are used to set the timeout period, number of retries, and interval of retries for TCP/IP communication from the agent host to the manager host.

Table 6‒23: Environment setting parameters that are used for TCP/IP communication from the agent host to the manager host

No.

Definition key

Environment setting parameter

Definition

1

[{JP1_DEFAULT|logical-host}\JP1NBQAGENT\Network]#

"ConnectTimeout"=

Defines the timeout value for a TCP/IP connection attempted by the job execution control agent.

2

"CommunicateRetryCount"=

Defines the maximum number of retry attempts for a TCP/IP connection attempted by the job execution control agent.

3

"CommunicateRetryInterval"=

Defines the retry interval for a TCP/IP connection attempted by the job execution control agent.

#:

The specification of the {JP1_DEFAULT|logical-host} part depends on whether the host is a physical host or a logical host. For a physical host, specify JP1_DEFAULT. For a logical host, specify the logical host name.

For details about the definition of these environment setting parameters, see 20.5 Setting up the job execution environment.

Organization of this subsection

(1) Definition procedure

  1. In Windows Control Panel, open the Services administrative tool, and stop the following service:

    • JP1/AJS3 service

  2. Execute the following command to set the environment setting parameters described in (2) below:

    jajs_config -k definition-key "parameter-name-1"=value-1 
    ["parameter-name-2"=value-2] 
    ["parameter-name-3"=value-3]

    You can specify only one definition key. If you want to set environment setting parameters for different definition keys, you must execute the jajs_config command for each definition key.

  3. Restart JP1/AJS3.

    The new settings are applied.

(2) Environment setting parameters

The following table lists the definition keys for which values are to be changed, and their purpose.

Table 6‒24: Definition keys for which values are to be changed

Definition key

Purpose

  • JP1AJS2\HOST\NETWORK

  • JP1AJS2\HOST\NETWORK\QUEUEMANAGER

  • JP1AJSMANAGER\scheduler-service-name\NETWORK\QUEUEMANAGER

  • Delivering jobs

  • Killing jobs

  • Checking the job status

  • Checking the agent host status

  • For all scheduler services

    JP1AJS2\SCHEDULER\QUEUE\MANAGER\Network

  • For a specific scheduler service

    JP1AJSMANAGER\scheduler-service\QUEUE\MANAGER\Network

  • For submit jobs

    JP1NBQMANAGER\Network

Reporting the job status

JP1NBQAGENT\Network

  • Reporting the start of a job

  • Reporting the end of a job

  • Transferring a result file

JP1NBQCLIENT\Network

Registering a job from the scheduler and executing a job from a command

  • For all scheduler services

    JP1AJS2\SCHEDULER\QUEUE\NOTIFY\Network

  • For a specific scheduler service

    JP1AJSMANAGER\scheduler-service\QUEUE\NOTIFY\Network

  • For submit jobs

    JP1NBQNOTIFY\Network

Checking the job status on another system (such as JP1/NQSEXEC or JP1/OJE) and reporting the status

The following table describes the definition keys and corresponding environment setting parameters. Note that you do not need to set these environment setting parameters for the queueless job execution facility.

Table 6‒25: Environment setting parameters for communication control

No.

Definition key#

Environment setting parameter

Definition

Reference

1

[{JP1_DEFAULT|logical-host}\JP1AJS2\HOST\NETWORK],

[{JP1_DEFAULT|logical-host}\JP1AJS2\HOST\NETWORK\QUEUEMANAGER], and

[{JP1_DEFAULT|logical-host}\JP1AJSMANAGER\scheduler-service-name\NETWORK\QUEUEMANAGER]

"ClientConnectTimeout"=

Connection timeout

20.8.2(1) ClientConnectTimeout (communication control)

2

"ClientRetryInterval"=

Connection retry interval

20.8.2(2) ClientRetryInterval

3

"ClientRetryCount"=

Number of connection retries

20.8.2(3) ClientRetryCount

#:

The specification of the {JP1_DEFAULT|logical-host} part depends on whether the host is a physical host or a logical host. For a physical host, specify JP1_DEFAULT. For a logical host, specify the logical host name.

Table 6‒26: Environment setting parameters for job execution control

No.

Definition key

Environment setting parameter

Definition

Reference

1

  • For all scheduler services

    [{JP1_DEFAULT|logical-host}\JP1AJS2\SCHEDULER\QUEUE\MANAGER\Network]#

  • For a specific scheduler service

    [{JP1_DEFAULT|logical-host}\JP1AJSMANAGER\scheduler-service\QUEUE\MANAGER\Network]#

  • For submit jobs

    [{JP1_DEFAULT|logical-host}\JP1NBQMANAGER\Network]#

"ConnectTimeout"=

Defines the timeout value (in milliseconds) for a TCP/IP connection from the job execution control manager process to the status reporting process for job execution control.

20.5.2(25) ConnectTimeout (for job execution control manager)

2

"CommunicateRetryCount"=

Defines the maximum number of retry attempts for a TCP/IP connection from the job execution control manager process to the status reporting process for job execution control.

20.5.2(26) CommunicateRetryCount (for job execution control manager)

3

"CommunicateRetryInterval"=

Defines the retry interval (in seconds) for a TCP/IP connection from the job execution control manager process to the status reporting process for job execution control.

20.5.2(27) CommunicateRetryInterval (for job execution control manager)

4

[{JP1_DEFAULT|logical-host}\JP1NBQAGENT\Network]#

"ConnectTimeout"=

Defines the timeout value (in milliseconds) for a TCP/IP connection attempted by the job execution control agent.

20.5.2(67) ConnectTimeout (for job execution control agent)

5

"CommunicateRetryCount"=

Defines the maximum number of retry attempts for a TCP/IP connection attempted by the job execution control agent.

20.5.2(68) CommunicateRetryCount (for job execution control agent)

6

"CommunicateRetryInterval"=

Defines the retry interval (in seconds) for a TCP/IP connection attempted by the job execution control agent.

20.5.2(69) CommunicateRetryInterval (for job execution control agent)

7

[{JP1_DEFAULT|logical-host}\JP1NBQCLIENT\Network]#

"ConnectTimeout"=

Defines the timeout value (in milliseconds) for a TCP/IP connection attempted by job execution commands and the scheduler.

20.5.2(75) ConnectTimeout (for the command and scheduler to be used for job execution)

8

"CommunicateRetryCount"=

Defines the maximum number of retry attempts for a TCP/IP connection attempted by job execution commands and the scheduler.

20.5.2(76) CommunicateRetryCount (for the command and scheduler to be used for job execution)

9

"CommunicateRetryInterval"=

Defines the retry interval (in seconds) for a TCP/IP connection attempted by job execution commands and the scheduler.

20.5.2(77) CommunicateRetryInterval (for the command and scheduler to be used for job execution)

10

  • For all scheduler services

    [{JP1_DEFAULT|logical-host}\JP1AJS2\SCHEDULER\QUEUE\NOTIFY\Network]#

  • For a specific scheduler service

    [{JP1_DEFAULT|logical-host}\JP1AJSMANAGER\scheduler-service\QUEUE\NOTIFY\Network]#

  • For submit jobs

    [{JP1_DEFAULT|logical-host}\JP1NBQNOTIFY\Network]#

"ConnectTimeout"=

Defines the timeout value (in milliseconds) for a TCP/IP connection attempted by the process that reports the job execution control status.

20.5.2(82) ConnectTimeout (for the status reporting process for job execution control)

11

"CommunicateRetryCount"=

Defines the maximum number of retry attempts for a TCP/IP connection attempted by the process that reports the job execution control status.

20.5.2(83) CommunicateRetryCount (for the status reporting process for job execution control)

12

"CommunicateRetryInterval"=

Defines the retry interval (in seconds) for a TCP/IP connection attempted by the process that reports the job execution control status.

20.5.2(84) CommunicateRetryInterval (for the status reporting process for job execution control)

#:

The specification of the {JP1_DEFAULT|logical-host} part depends on whether the host is a physical host or a logical host. For a physical host, specify JP1_DEFAULT. For a logical host, specify the logical host name.

(3) Notes on communication errors caused by insufficient socket ports

When a system has a large number of jobs to execute per unit of time, the number of socket ports used for TCP/IP communication increases. This can result in insufficient socket ports being available. For communication errors that result from insufficient socket ports, the system retries communication at regular intervals. However, failure to resolve the situation by the time communication is retried may cause delays in job execution, or result in the abnormal termination of jobs, scheduler services, and commands.

If you encounter an error related to insufficient socket ports, refer to 3.1.1(5) OS tuning in the JP1/Automatic Job Management System 3 System Design (Configuration) Guide and adjust the operating system parameters as needed.

The retry behaviour of JP1/AJS3 in the event of a communication error related to insufficient socket ports depends on your operating system.

The environment setting parameters (for communication control) listed in Table 6-26 and 20.8 Setting up the communication control environment apply when a communication error related to insufficient socket ports occurs.