15.3.14 Environment setting parameters related to communication for event/action control
When an event job, a custom event job or a jobnet with start conditions is executed, the event/action control manager and the event/action control agent communicate with each other. To initiate communication, the event/action control manager and agent establish a connection over which an execution or kill request for the event job, custom event job or jobnet with start conditions and an event occurrence report can be exchanged.
The following figure shows the communication that occurs when an event job, a custom event job or a jobnet with start conditions is executed.
|
|
If an error occurs during communication, the information that could not be sent is saved in a file to prepare for a retry. This information is called unreported information.
If a communication error occurs, communication is retried as defined in the environment setting parameters.
The following table describes the environment setting parameters related to communication retries for event/action control.
|
Definition key |
Environment setting parameter |
Explanation |
|---|---|---|
|
"ClientConnectTimeout"= |
Connection timeout period |
|
"NotificationConstantRetry"= |
Option for resending unreported information at regular intervals |
|
"NotificationRetryInterval"= |
Interval for retrying to send unreported information |
|
"NotificationRetryCount"= |
Maximum number of retries for sending unreported information |
|
|
[{JP1_DEFAULT|logical-host}\JP1AOMAGENT]# |
"NotificationAlarmCount"= |
The threshold for outputting a message if resending to the manager occurs frequently |
- #:
-
The specification of the {JP1_DEFAULT|logical-host} part depends on whether the host is a physical host or a logical host. For a physical host, specify JP1_DEFAULT. For a logical host, specify the logical host name.
The following describes the relationship between the environment setting parameters, and provides examples of setting these parameters.
- Organization of this subsection
(1) About ClientConnectTimeout
When the event/action control manager sends a connection request to the event/action control agent, or when the event/action control agent sends a connection request to the event/action control manager, the sender waits for a response. If no response is returned within a predefined time, the wait times out so that other processing can be performed. The time during which the manager or agent waits for a response to a connection request is called the connection timeout period.
Use the ClientConnectTimeout environment setting parameter to set the connection timeout period.
The following figure shows the connection timeout period set by using the ClientConnectTimeout environment setting parameter.
|
|
Increasing the value of the ClientConnectTimeout environment setting parameter also increases the connection timeout period. Accordingly, connection timeouts might not occur very often even when a long time is required to receive a response due to communication load.
However, if no response has been sent from a connection request for a long time because of a network device failure or similar reason, the time that elapses before the timeout also increases. Accordingly, the time during which neither an execution registration or kill request for an event job, a custom event job or a jobnet with start conditions nor an event occurrence report is processed also increases. When the manager or agent is waiting for a timeout, an attempt to kill or register an event job, a custom event job or a jobnet with start conditions for execution on another agent available for communication cannot be processed immediately. As a result, changing the job status will take a long time. Therefore, if a connection timeout occurs, the manager or agent with default settings gradually increases the interval for each retry, instead of using the regular interval, in order to gradually reduce the retry frequency. For details, see (2) About NotificationConstantRetry.
(2) About NotificationConstantRetry
(a) Communication from the event/action control manager to the event/action control agent
Depending on the value of the ClientConnectTimeout environment setting parameter, a long time is required before the response to a connection request is sent if a network device failure or other problem occurs. In such cases, there is a long delay before an event job, a custom event job or a jobnet with start conditions is registered for execution or killed. To reduce the frequency of processing delays, unlike a regular interval, the communication retry interval used when a connection timeout occurs gradually increases by default. On the manager host, the settings are specified so that a retry is performed at an interval that gradually becomes longer as follows: 300, 600, 900, 1,800, and 3,600 seconds (max). A retry can be performed a maximum of 27 times (24 hours).
The following figure shows how communication is performed from the event action control manager to the event action control agent if a connection error occurs.
|
|
However, if a connection timeout is due to a temporary cause such as a high communication load, the retry process described above takes more time, delaying the execution of an event job, custom event job or jobnet with start conditions on the execution agent. For such cases, you can also use a regular interval for retries.
Set Y for the NotificationConstantRetry environment setting parameter to use a regular interval for retries, irrespective of whether retries are due to connection timeouts or other types of errors. For details about the retry interval, see (3) About NotificationRetryInterval and NotificationRetryCount.
(b) Communication from the event/action control agent to the event/action control manager
If a connection error occurs due to reasons such as a high load on the network or a problem with communication lines, the event action control agent performs a retry 8,640 times (for 24 hours) at a fixed 10-second communication interval by default. If you specify N for this environment setting parameter, a retry is performed at an interval that gradually becomes longer as follows: 10, 20, 30, and 60 seconds (max) a maximum of 1,442 times (24 hours). The maximum retry count if N is specified for this environment setting parameter is value-of-NotificationRetryCount / 6 + 2. The retry interval gradually becomes longer in four steps: (1) value-of-NotificationRetryInterval x 1, (2) value-of-NotificationRetryInterval x 2, (3) value-of-NotificationRetryInterval x 3, and (4) value-of-NotificationRetryInterval x 6 (max).
The following figure shows how communication is performed from the event action control agent to the event action control manager if a connection error occurs.
|
|
(3) About NotificationRetryInterval and NotificationRetryCount
In addition to a connection timeout, a communication error might also be caused by the following problems:
-
The execution agent host name cannot be resolved.
-
The event/action control agent is busy and cannot accept an execution or kill request.
For retries performed for an error other than a connection timeout that occurs during communication between the event/action control manager and event/action control agent, you can set the retry interval by using the NotificationRetryInterval environment setting parameter (the default is 30 seconds). Similarly, you can set the maximum number of retries by using the NotificationRetryCount environment setting parameter (the default is 2,880).
The following figure shows an example of an error that is not a timeout error.
|
|
Note that if you want to change only the retry interval or only the number of retries, the retry period (the period during which retries can be performed) also changes. If you want to retain a retry period, you need to adjust the values of both environment setting parameters. For example, if you change the retry interval to 15 seconds, which is half the default value, the number of retries that preserves the retry period is 5,760 (twice the default value).
(4) About NotificationAlarmCount
If a notification is not sent to the manager even after repeated resending of the notification, you can set a threshold for outputting the message KAVT0669-E, so you will be able to notice that the notification has not reached the manager for an extended period. In JP1/AJS3 13-10 or later, the initial setting value for a new installation and new setup is 24 times# for Windows and 29 times# for UNIX. If the NotificationConstantRetry environment setting parameter is set to N, the retry interval gradually becomes longer. In this case, perform calculation using the following expression, round up the result to the nearest integer, and use the resulting value as a guideline of the retry count to be specified.
- #:
-
If notification continues to fail due to a pending connection timeout as a result of specifying Y for the environment setting parameter NotificationConstantRetry to perform retries at a regular interval and specifying 10 seconds for the environment setting parameter NotificationRetryInterval, an output will be made approximately 10 minutes after the notification to the manager fails for the first time (this might vary depending on the cause of the communication error, the load on the system and network, etc.).
Guideline-value-for-NotificationAlarmCount-environment-setting-parameter = ((n - 6 x b) / (a + 6 x b)) + 2
- Legend:
-
- n:
-
The time in seconds before failure in communication is detected (units: seconds)
- a:
-
The value set in the ClientConnectTimeout environment setting parameter or the OS default time before a wait for connection times out, whichever is shorter# (units: seconds)
- b:
-
The value of the environment setting parameter NotificationRetryInterval (units: seconds)
- #:
-
This is a value to be specified if resending fails due to a connection timeout.
Note that the KAVT0669-E message is not output if this environment setting parameter is set to 0 or omitted.
(5) Guideline for environment setting parameter settings
The following table provides the guidelines for environment setting parameter settings based on what is most important for communication.
|
Environment setting parameters requiring adjustment |
Cautionary note |
|||||
|---|---|---|---|---|---|---|
|
"ClientConnectTimeout" (in milliseconds) |
"NotificationConstantRetry" |
"NotificationRetryInterval" (in seconds) |
"NotificationRetryCount" |
|||
|
Default value |
Windows: 30,000 UNIX: 1,000#1 |
N |
30 |
2,880 or 27#2 |
N/A |
|
|
Most important consideration |
Suppress processing delays for other agents when timeouts occurs for an agent during communication |
3,000 to 10,000 |
N |
N/A |
N/A |
Because the retry interval gradually increases by 300, 600, 900, 1,800, and 3,600 seconds even when communication with the agent no longer times out, a long time is still required before sending is retried. |
|
Prevent timeouts for agents during communication |
10,000 to 60,000 |
N |
N/A |
N/A |
If timeout occurs for an agent during communication, processing for other agents might be delayed and event detection might be disabled. |
|
|
Suppress processing delays with a quick recovery response for the communication environment if a temporary communication error occurs in an otherwise stable communication environment |
3,000 to 10,000 |
Y |
3 to 10 |
2,880#2 |
If timeouts occur for an agent in rapid succession or continue over a long time during communication, processing for other agents might be delayed and event detection might be disabled. |
|
|
Ensure communication in an unstable communication environment even if communication is delayed |
10,000 to 60,000 |
Y |
3 to 10 |
8,640 to 28,800 |
If timeouts occur for an agent in rapid succession or continue over a long time during communication, processing for other agents might be delayed and event detection might be disabled. |
|
|
Detecting errors at an early stage |
3,000 to 10,000 |
Y |
3 to 10 |
30 to 100 |
An event job and a custom event job might end abnormally, in which case the KAVT0103-E message is output to the integrated trace log. Monitoring for this message allows environment errors to be detected. |
|
- Legend:
-
N/A: Not applicable.
- #1
-
The default values are very different for Windows and UNIX because the default values in UNIX have backward compatibility with the settings of JP1/AJS2 version 8.
In version 8, the ClientConnectTimeout environment setting parameter does not exist, but the operation is the same as when the environment setting parameter is set to 1,000. The UNIX default value is based on this value.
- #2
-
Use 2,880 for errors that are not timeout errors. Use 27 for timeout errors that continue to occur.
|
Environment setting parameters requiring adjustment |
Cautionary note |
|||||
|---|---|---|---|---|---|---|
|
"ClientConnectTimeout" (in milliseconds) |
"NotificationConstantRetry" |
"NotificationRetryInterval" (in seconds) |
"NotificationRetryCount" |
|||
|
Default value |
Windows: 30,000 UNIX: 1,000 |
Y |
10 |
8,640 |
N/A |
|
|
Most important consideration |
Prevent timeouts for managers during communication |
10,000 to 60,000 |
N |
N/A |
N/A |
If communications to some manager hosts time out, processing assigned to those hosts does not start, causing delay in detection of job start or event occurrence. |
- Legend:
-
N/A: Not applicable.
For details about the definition of each environment setting parameter, see the following documentation:
-
20.6.2(8) NotificationConstantRetry (for communication from a manager host to an agent host)
-
20.6.2(26) NotificationConstantRetry (for communication from an agent host to a manager host)
For details about the definition of the environment setting parameters related to communication between the event/action control manager and the event/action control agent, see the following documentation:
-
20.6.2(10) NotificationRetryCount (when sending information from the manager host to the agent host)
For details about the definition of the environment setting parameter related to communication from the event/action control agent to the event/action control, see the following documentation: