2.4.13 Automatic retry for abnormally ending jobs
If an executable file defined for a job ends abnormally, retrying the job might correct temporary errors. You can define automatic retry for those jobs capable of being retried in the event they end abnormally. In this way, you can continue tasks even if a temporary error has occurred in an executable file.
Automatic retry means automatically retrying a job if an executable file specified for a job ends abnormally. The execution of a job by an automatic retry is called a retry execution.
The following figure shows the behavior of a job when an automatic retry is performed.
|
When an automatic retry is performed, the job does not enter the Ended abnormally status even when an executable file defined for the job ends abnormally. Instead, after a preset interval, the job is automatically retried.
- Organization of this subsection
-
(1) Overview of automatic retries
The following provides an overview of automatic retries.
(a) Conditions triggering automatic retries
If an error occurs in an executable file of a job that satisfies the following conditions, the job is automatically retried without entering the Ended abnormally status.
-
In the End judgment section, in the Rule box, Judgment by threshold is selected.
-
In the Retry on abnormal end section, Yes is chosen.
Retry on abnormal end is available for the following jobs:
-
Unix jobs
-
PC jobs
-
QUEUE jobs
-
Flexible jobs
-
HTTP connection jobs
-
Standard custom jobs
-
Custom PC jobs
-
Custom Unix jobs
-
- Cautionary note
-
Automatic retries can be configured when the version of JP1/AJS3 - View and JP1/AJS3 - Manager is 10-00 or later. However, even when the version of JP1/AJS3 - Manager is 10-00 or later, automatic retries are not available if the database uses a compatible ISAM configuration.
(b) Settings related to how an automatic retry is executed
The settings related to executing an automatic retry are called retry settings.
The following table describes the retry settings.
No. |
Item |
Description |
---|---|---|
1 |
Retry on abnormal end |
Indicates whether to perform an automatic retry if executable files specified for jobs cause an error. |
2 |
Return code |
Indicates a range of return codes for which retry execution is to be performed. |
3 |
Maximum retry times |
Indicates the maximum number of times a retry is to be executed. |
4 |
Retry interval |
Indicates an interval between the time an executable file for a job causes an error and the time retry execution begins. |
- Cautionary note
-
For Return code, specify the minimum range that is necessary. If you set an unnecessarily wide range of return codes for automatic retries, retries are executed for return codes that are impossible to correct by performing retry execution. As a result, the number of job executions increases and job execution performance is likely to be affected.
The following figure shows the behavior of a job that has ended abnormally when retry settings are specified.
|
If an executable file specified for a job with retry settings ends abnormally, a retry is executed after the length of time specified in Retry interval elapses. Retry execution is repeated until the job ends normally or with a warning, or the retry execution count has reached the maximum specified in Maximum retry times.
You can check the retry settings in JP1/AJS3 - View windows or by using a command. The following table describes the JP1/AJS3 - View windows and command you can use and the retry settings you are able to check.
No. |
JP1/AJS3 - View window and command for checking retry settings |
Retry settings that can be checked |
---|---|---|
1 |
Jobnet Editor window |
Retry on abnormal end |
2 |
Search window |
|
3 |
ajsprint command |
All retry settings |
(c) Information related to the execution status of an automatic retry
The following information related to the execution status of an automatic retry is called retry information:
-
Retry status
-
Retry execution times
-
Retry registration time
-
Retry start time
Retry information details are described below.
- Retry status
-
The retry status indicates the progress of an automatic retry processing when it is being executed.
The following table describes the types of retry statuses.
Table 2‒38: Types of retry statuses No.
Type of retry status
Description
Corresponding job status
1
Retry waiting
Indicates that an executable file for a job resulted in an error and the job is waiting for the length of time specified in Retry interval to elapse.
-
Wait for prev. to end
-
Being held
2
Retry executing
The job is in the Waiting to execute, Now queuing, or Now running status due to an automatic retry processing.
-
Waiting to execute
-
Now queuing
-
Now running
3
Retry end
Automatic retry processing has ended.
-
Ended normally
-
Ended with warning
-
Ended abnormally
-
Failed to start
-
Unknown end status
-
Bypassed
End status including the above
The Retry waiting and Retry executing statuses are generically referred to as during a retry.
The following figure shows an example of the status transitions when an automatic retry is performed.
Figure 2‒112: Status transitions when an automatic retry is performed If an executable file specified for a job with retry settings ends abnormally, the retry status is Retry waiting for the length of time specified in Retry interval. The job at this time is in the Wait for prev. to end status, not the Ended abnormally status. When the length of time specified in Retry interval elapses, the retry status transitions to Retry executing.
If the executable file specified for the job ends normally or ends with a warning before the number of retry executions can reach the number specified in Maximum retry times, the job enters the Ended normally or Ended with warning status.
If the executable file specified for the job does not end normally or ends with a warning when the number specified in Maximum retry times has been reached, the job enters the Ended abnormally status.
The retry status transitions to Retry end when the job enters the Ended normally, Ended with warning, or Ended abnormally status.
-
- Number of retry executions
-
The number of retry executions indicates the number of retry executions.
The following figure shows how the number of retry executions is counted.
Figure 2‒113: Counting the number of retry executions - Retry registration time
-
In a retry execution, the retry registration time is the time the job enters the Waiting to execute status.
The specified retry registration time is cleared when an executable file specified for a job ends abnormally and the job enters the Wait for prev. to end status. When the job enters the Waiting to execute status, the retry registration time is updated to the time that the job enters that status. If retry is executed multiple times, the retry registration time is updated to the time at which the job enters the Waiting to execute status for the last time.
The following figure shows updating of the retry registration time.
Figure 2‒114: Updating the retry registration time - Retry start time
-
In a retry execution, the retry start time is the time that the job enters the Now running status.
The specified retry start time is cleared when an executable file specified for a job ends abnormally and the job enters the Wait for prev. to end status. When the job enters the Now running status, the retry start time is updated to the time that the job enters that status. If retry execution is performed multiple times, the retry start time is updated to the time at which the job enters the Now running status for the last time.
The following figure show the updating of the retry start time.
Figure 2‒115: Updating the retry start time
You can check the retry information in JP1/AJS3 - View windows or by using a command. The following table describes the JP1/AJS3 - View windows and command you can use and the retry information you are able to check.
No. |
JP1/AJS3 - View window and command for checking retry information |
Retry information you can check |
---|---|---|
1 |
|
All retry information |
2 |
|
|
3 |
Search window |
Retry execution times |
If retry execution is performed multiple times, you can use JP1/AJS3 - View windows and the ajsshow command to check the result of the last retry execution. If you want to check the result of each retry execution, use JP1 events, scheduler logs, or the Execution Result Details dialog box.
(2) Monitoring jobs with retry settings
The following describes the timeout period and monitoring for a end delay for jobs using retry settings.
(a) Timeout period for jobs with retry settings
When you specify both a timeout period and retry settings for a job, the elapsed time for the job is reset at the beginning of each retry execution. The elapsed time for a job is monitored to determine when the timeout period expires.
If you want to monitor the entire elapsed time for a job that includes multiple retry executions, monitor a end delay for the jobnet containing the job for the length of time required for executing the jobnet. To do so, in the Define Details dialog box for the jobnet containing the job, select the Time-required-for-execution check box for Monitor jobnet, and then specify the monitoring time.
The following figure shows how the elapsed time for a job is monitored to determine the expiration of the timeout period when retry settings are specified.
|
In this example, 20 minutes have passed since the start of the job during the first retry execution. However, the elapsed time for the job (monitored to determine when the timeout period expires) is reset at the beginning of each retry execution. For this reason, the job does not enter the Killed status. When 20 minutes have passed since the start of the second retry execution, the job enters the Killed status.
Note that if the timeout period has expired and the job enters the Killed status when the number specified in Maximum retry times has not been reached, no more retry executions will be performed.
(b) End delay monitoring for jobs with retry settings
When you specify both delayed monitoring and retry settings for a job, the elapsed time for the job is reset at the beginning of each retry execution, and the elapsed time for a job is monitored to determine if a end delay has occurred.
If you want to monitor the entire elapsed time for a job that includes multiple retry executions, use end delay monitoring, which monitors the jobnet containing the job for the length of time required for the jobnet.
The following figure shows how the elapsed time for a job is monitored to determine if a end delay has occurred when retry settings are specified.
|
In this example, 20 minutes have passed since the start of the job during the first retry execution. However, the elapsed time for the job (monitored to determine if a end delay has occurred) is reset at the beginning of each retry execution. For this reason, a end delay is not detected. When 20 minutes have passed since the start of the second retry execution, a end delay is detected. When retry execution is performed after a end delay is detected, end delays are no longer detected.
If a end delay is detected as a result of the last retry execution, the delay information continues to be displayed even after the job has ended.
- Cautionary note
-
If a end delay is detected for a job after which a end delay has not been detected as a result of the next retry execution, the upper-level jobnet enters the Nested jobnet delayed end status.
(3) Execution simulation of jobs with retry settings
When you simulate the execution of a job with retry settings, the simulation includes the time required for waiting for an automatic retry and the length of time for retry execution.
The following figure shows an example of execution simulation of a job with retry settings.
|
In this example, the execution time for the job is 20 minutes. When the job was defined, 4 was specified for Maximum retry times and 5 minutes was specified for Retry interval. As a result, the simulation takes a total of 120 minutes (the time for the original job execution, the time for four retry executions, and the retry intervals between the original and retry executions).
(4) Behavior of units with retry settings
The following describes the behavior of jobs with retry settings, the behavior of preceding units of jobs with retry settings, and the behavior of upper-level units of jobs with retry settings.
(a) Re-executing jobs with retry settings and preceding units
The following describes the behavior of units when you re-execute jobs after automatic retry has ended, re-execute processing from preceding jobs during a retry, and re-execute only preceding units during a retry.
■ Re-executing jobs after automatic retries have ended
When you re-execute a job after automatic retries have ended, the number of retry executions is also reset.
The following figure shows how a job behaves when you re-execute a job after automatic retries have ended.
|
When you re-execute a job, the number of retry executions is set to 0.
■ Re-executing processing from preceding units during a retry
When you re-execute processing from a preceding unit during a retry, the number of retry executions is reset. The job being retried transitions to the Wait for prev. to end status and waits for the preceding unit to end.
The time at which the job being retried transitions to the Wait for prev. to end status to wait for the preceding unit to end depends on the status of the job being retried at the time the preceding unit is re-executed.
The following describes how the job behaves when you re-execute processing from a preceding unit during a retry.
- When the job being retried is in the Wait for prev. to end or Being held status
-
The job being retried enters the Wait for prev. to end status when the preceding unit is re-executed. The job waits for the preceding unit to end. When the preceding unit ends, the number of retry executions is reset and the job is re-executed.
The following figure shows how the job being retried behaves when you re-execute processing from the preceding unit while it is in the Wait for prev. to end status.
Figure 2‒120: Behavior of a job in the Wait for prev. to end status during a retry when you re-execute processing from the preceding unit When the preceding unit is re-executed, the number of retry executions is set to 0.
- When the job being retried is in the Waiting to execute, Now queuing, or Now running status
-
The current retry execution will finish. Regardless of whether the job has ended normally, the job enters the Wait for prev. to end status when the retry execution ends. When the preceding unit ends, the number of retry executions is reset and the job is re-executed.
The following figure shows how the job being retried behaves when you re-execute processing from the preceding unit while it is in the Now running status.
Figure 2‒121: Behavior of the job in the Now running status during a retry when you re-execute from the preceding unit When the preceding unit is re-executed, the number of retry executions is set to 0.
■ Re-executing only preceding units during a retry
When you re-execute only a preceding unit during a retry, the job transitions to the Wait for prev. to end status when the retry execution ends and the job waits for the retry interval to expire. If the preceding unit does not end after the retry interval has expired, the job waits for the preceding unit to end and will be retried until the number of retry executions is reached.
The following describes how the job being retried behaves when you re-execute only the preceding unit.
- When the job being retried is in the Wait for prev. to end or Being held status
-
The job during a retry waits for the preceding unit to end after the retry interval has expired. When the preceding unit ends, the job is retried again.
The following figure shows how the job being retried behaves when you re-execute only the preceding unit while the job being retried is in the Wait for prev. to end status.
Figure 2‒122: Behavior of a job in the Wait for prev. to end status during a retry when you re-execute only a preceding unit - When the job being retried is in the Waiting to execute, Now queuing, or Now running status
-
The current retry execution will finish. If the executable file ends abnormally during the current retry execution, the job waits for the preceding unit to end after the retry interval has expired. When the preceding unit ends, the job is retried again.
The following figure shows how a job during a retry behaves when you re-execute only the preceding unit while the job being retried is in the Now running status.
Figure 2‒123: Behavior of a job in the Now running status during a retry when you re-execute only the preceding unit
(b) Suspending root jobnets during job retries
You can suspend a root jobnet containing a job being retried or cancel the suspension of a root jobnet.
While the root jobnet is suspended, retry execution is not performed for jobs even when a job has ended abnormally. When you cancel suspension, retry execution is performed after the retry interval expires.
For details about how to suspend root jobnets, see 4.5.17 Changing job and jobnet definitions without unregistering the jobnet in the manual JP1/Automatic Job Management System 3 Overview.
(c) Interrupting root jobnets containing jobs being retried
When you interrupt a root jobnet containing a job being retried, the root jobnet enters the Interrupted status and the job with retry settings enters the Not executed + Ended status.
The time that the job being retried enters the Not executed + Ended status depends on the status of the job being retried.
The following describes how a job being retried behaves when the root jobnet is interrupted.
- When the job being retried is in the Wait for prev. to end or Being held status
-
The scheduled retry execution is not performed. The job enters the Not executed + Ended status and the retry status transitions to Retry end.
The following figure shows how a job during a retry behaves when you interrupt the root jobnet while the job being retried is in the Wait for prev. to end status.
Figure 2‒124: Behavior of a job in the Wait for prev. to end status during a retry when you interrupt the root jobnet - When the job being retried is in the Waiting to execute, Now queuing, or Now running status
-
The current retry execution will finish. Regardless of whether the job ends normally, the ended job enters the Not executed + Ended status and the retry status transitions to Retry end.
The following figure shows how the job being retried behaves when you interrupt the root jobnet while the job being retried is in the Now running status.
Figure 2‒125: Behavior of a job in the Now running status during retry when you interrupt the root jobnet
When you interrupt the root jobnet containing a job being retried, the error in the executable file defined for the job will not be corrected. The job enters the Not executed + Ended status and the retry status transitions to Retry end.
(d) Changing the status of jobs and killing jobs during retries
When a job enters the Now running, Now queuing, or Waiting to execute status due to an automatic retry (for jobs for which Queueless Agent is specified as the execution target service), you can change the status of the job to an end status. You can also change the status of a job in an end status due to an automatic retry to another end status. You can kill a job being automatically retried as well. When you change the status of a job or kill a job, no automatic retry will be performed for the job after it enters an end status, regardless of the new end status or the return code.
(e) Changing the definitions of jobs during a retry
When yes is specified for the UNITDEFINERELOAD environment setting parameter and you change the definition of a job during a retry, the definition of the job is read again each time retry execution is performed. Accordingly, when you change the definition of a job during a retry, the new definition takes effect from the next retry execution after the change.
For details about the behavior of jobs when you change the definitions, see 7.4 Changing the unit definition information during registration for execution in the JP1/Automatic Job Management System 3 Administration Guide.
If you delete retry settings when the retry status is Retry waiting, only one retry execution is performed. Thereafter, no more retry executions will be performed.
- Cautionary note
-
When you change the definition of a job during a retry, the subsequent behavior of the job changes depending on the status of the job at the point when the change is made. The job might therefore behave unexpectedly. We recommend that you do not change any definitions during a retry.
(f) Waiting during a job retry
When a wait condition is assigned to a job with retry settings, after the wait condition is satisfied and the wait status is set to Wait complete, the job does not wait for the job being waited for during retry executions. If you enable the wait condition again and change the wait status to Wait incomplete (manual) before the retry status transitions to Retry waiting, the job only waits for the retry interval to expire and does not wait for the wait condition to be satisfied.
The following figure shows how a job with retry settings behaves when you assign a wait condition to it.
|
(5) Information to be updated by retry execution
When retry execution is performed, information in addition to the retry information is also updated. If you reference that information while a job is being executed, the results might differ depending on when you reference it. Define jobs taking this point into consideration.
The following table describes the information that is updated by retry execution.
No. |
Updated information |
Description |
---|---|---|
1 |
JP1JobID environment variable |
The job ID is updated to the ID used at retry execution. |
2 |
Standard output file |
|
3 |
Standard error output file |
|
4 |
Transfer file |
The file is transferred each time retry execution is performed. |
5 |
Execution result details |
Information is added each time retry execution is performed. |
- Cautionary note
-
When you select the Append check box, the sizes of the standard output file and the standard error output file increase, creating a high system load. We recommend that you do not select the Append check box. If you need to select the Append check box, limit the amount of output information or periodically clear the files.
(6) Restarting the scheduler service during a retry
When you stop the scheduler service and restart it, the jobs being retried enter the same status as regular jobs depending on the start mode of the scheduler service. The differences are as follows.
-
Starting the scheduler service in warm start mode or disaster recovery start mode
Automatic retries do not continue. Jobs enter an end status specified for each start mode based on their job status at that time.
-
Starting the scheduler service in hot start mode
Automatic retries continue. The retry interval and the number of retry executions are handled as follows.
- Retry interval
-
The period during which the scheduler service is inactive is included in the elapsed time for the jobs monitored to determine when the retry interval expires. Suppose you have a job for which retry execution is performed every five minutes. When the scheduler service stops for two minutes after two minutes have passed since the retry status transitions to the Retry waiting status, and the scheduler service restarts in hot start mode, retry execution starts one minute after the scheduler service restarts.
Figure 2‒127: Behavior of the job when the scheduler service starts in hot start mode Note that if you restart the JP1/AJS3 service or the scheduler service in hot start mode after it has stopped for more than the retry interval, jobs start executing as soon as the scheduler service starts. If that occurs, the number of jobs temporarily increases, possibly degrading job execution performance for a while.
- Number of retry executions
-
The number of retry executions is not initialized. The number continues to increase as retry executions are performed. Suppose you have a job for which 5 is specified for Maximum retry times. If you restart the scheduler service when two retry executions have finished, a maximum of three more retry executions will be performed after the scheduler service restarts.
For details about the status of jobs for each start mode, see 6.2.1 Temporarily changing the start mode of JP1/AJS3 in the JP1/Automatic Job Management System 3 Administration Guide.
(7) Cautionary notes on automatic retry
Note the following when you use automatic retries:
-
When retry settings are specified for a unit under a remote jobnet and either of the following conditions is satisfied for the manager executing the jobnet, an error occurs:
-
The database uses a compatible ISAM configuration.
-
The version of JP1/AJS3 - Manager is 09-50 or earlier.
-
-
An automatic retry is not performed while job restrictions or forced termination of jobs are in effect to stop the scheduler service. Jobs enter an end status without performing retry executions regardless of the return codes and the number of retry executions. When all the running jobs have ended, the scheduler service stops.
For details about restricting how the scheduler service stops, see 7.5.2 Stopping the scheduler service in the JP1/Automatic Job Management System 3 Administration Guide.
-
Sometimes jobs are ended by the OS with return codes instead of by user applications. In such cases, an automatic retry is performed.
-
For queueless jobs, an automatic retry is also performed if there are no script files or script files cannot be accessed. Use the definition pre-check function to make sure that script files can be accessed.
-
When you execute queueless jobs, use version 10-00 or later of JP1/AJS3 - Manager or JP1/AJS3 - Agent. If the version is 09-50 or earlier, an automatic retry is also performed when the process for starting job processes ends abnormally for some reason.