2.4.13 Automatic retry for abnormally ending jobs

If an executable file defined for a job ends abnormally, retrying the job might correct temporary errors. You can define automatic retry for those jobs capable of being retried in the event they end abnormally. In this way, you can continue tasks even if a temporary error has occurred in an executable file.

Automatic retry means automatically retrying a job if an executable file specified for a job ends abnormally. The execution of a job by an automatic retry is called a retry execution.

The following figure shows the behavior of a job when an automatic retry is performed.

Figure 2‒110: Behavior of a job when an automatic retry is performed

When an automatic retry is performed, the job does not enter the Ended abnormally status even when an executable file defined for the job ends abnormally. Instead, after a preset interval, the job is automatically retried.

Organization of this subsection

(1) Overview of automatic retries
(2) Monitoring jobs with retry settings
(3) Execution simulation of jobs with retry settings
(4) Behavior of units with retry settings
(5) Information to be updated by retry execution
(6) Restarting the scheduler service during a retry
(7) Cautionary notes on automatic retry

(1) Overview of automatic retries

The following provides an overview of automatic retries.

(a) Conditions triggering automatic retries

If an error occurs in an executable file of a job that satisfies the following conditions, the job is automatically retried without entering the Ended abnormally status.

In the End judgment section, in the Rule box, Judgment by threshold is selected.
In the Retry on abnormal end section, Yes is chosen.

Retry on abnormal end is available for the following jobs:
- Unix jobs
- PC jobs
- QUEUE jobs
- Flexible jobs
- HTTP connection jobs
- Standard custom jobs
- Custom PC jobs
- Custom Unix jobs

Cautionary note: Automatic retries can be configured when the version of JP1/AJS3 - View and JP1/AJS3 - Manager is 10-00 or later. However, even when the version of JP1/AJS3 - Manager is 10-00 or later, automatic retries are not available if the database uses a compatible ISAM configuration.

(b) Settings related to how an automatic retry is executed

The settings related to executing an automatic retry are called retry settings.

The following table describes the retry settings.

Table 2‒36: Settings related to executing an automatic retry
No.	Item	Description
1	Retry on abnormal end	Indicates whether to perform an automatic retry if executable files specified for jobs cause an error.
2	Return code	Indicates a range of return codes for which retry execution is to be performed.
3	Maximum retry times	Indicates the maximum number of times a retry is to be executed.
4	Retry interval	Indicates an interval between the time an executable file for a job causes an error and the time retry execution begins.

Cautionary note: For Return code, specify the minimum range that is necessary. If you set an unnecessarily wide range of return codes for automatic retries, retries are executed for return codes that are impossible to correct by performing retry execution. As a result, the number of job executions increases and job execution performance is likely to be affected.

The following figure shows the behavior of a job that has ended abnormally when retry settings are specified.

Figure 2‒111: Behavior of a job ending abnormally when retry settings are specified

If an executable file specified for a job with retry settings ends abnormally, a retry is executed after the length of time specified in Retry interval elapses. Retry execution is repeated until the job ends normally or with a warning, or the retry execution count has reached the maximum specified in Maximum retry times.

You can check the retry settings in JP1/AJS3 - View windows or by using a command. The following table describes the JP1/AJS3 - View windows and command you can use and the retry settings you are able to check.

Table 2‒37: JP1/AJS3 - View windows and command you can use to check the retry settings
No.	JP1/AJS3 - View window and command for checking retry settings	Retry settings that can be checked
1	Jobnet Editor window	Retry on abnormal end
2	Search window	Maximum retry times Retry interval
3	`ajsprint` command	All retry settings

(c) Information related to the execution status of an automatic retry

The following information related to the execution status of an automatic retry is called retry information:

Retry status
Retry execution times
Retry registration time
Retry start time

Retry information details are described below.

Retry status

The retry status indicates the progress of an automatic retry processing when it is being executed.

The following table describes the types of retry statuses.

Table 2‒38: Types of retry statuses
No.	Type of retry status	Description	Corresponding job status
1	Retry waiting	Indicates that an executable file for a job resulted in an error and the job is waiting for the length of time specified in Retry interval to elapse.	Wait for prev. to end Being held
2	Retry executing	The job is in the Waiting to execute, Now queuing, or Now running status due to an automatic retry processing.	Waiting to execute Now queuing Now running
3	Retry end	Automatic retry processing has ended.	Ended normally Ended with warning Ended abnormally Failed to start Unknown end status Bypassed End status including the above

The Retry waiting and Retry executing statuses are generically referred to as during a retry.

The following figure shows an example of the status transitions when an automatic retry is performed.

Figure 2‒112: Status transitions when an automatic retry is performed

If an executable file specified for a job with retry settings ends abnormally, the retry status is Retry waiting for the length of time specified in Retry interval. The job at this time is in the Wait for prev. to end status, not the Ended abnormally status. When the length of time specified in Retry interval elapses, the retry status transitions to Retry executing.

If the executable file specified for the job ends normally or ends with a warning before the number of retry executions can reach the number specified in Maximum retry times, the job enters the Ended normally or Ended with warning status.

If the executable file specified for the job does not end normally or ends with a warning when the number specified in Maximum retry times has been reached, the job enters the Ended abnormally status.

The retry status transitions to Retry end when the job enters the Ended normally, Ended with warning, or Ended abnormally status.

Number of retry executions

The number of retry executions indicates the number of retry executions.

The following figure shows how the number of retry executions is counted.

Figure 2‒113: Counting the number of retry executions

Retry registration time

In a retry execution, the retry registration time is the time the job enters the Waiting to execute status.

The specified retry registration time is cleared when an executable file specified for a job ends abnormally and the job enters the Wait for prev. to end status. When the job enters the Waiting to execute status, the retry registration time is updated to the time that the job enters that status. If retry is executed multiple times, the retry registration time is updated to the time at which the job enters the Waiting to execute status for the last time.

The following figure shows updating of the retry registration time.

Figure 2‒114: Updating the retry registration time

Retry start time

In a retry execution, the retry start time is the time that the job enters the Now running status.

The specified retry start time is cleared when an executable file specified for a job ends abnormally and the job enters the Wait for prev. to end status. When the job enters the Now running status, the retry start time is updated to the time that the job enters that status. If retry execution is performed multiple times, the retry start time is updated to the time at which the job enters the Now running status for the last time.

The following figure show the updating of the retry start time.

Figure 2‒115: Updating the retry start time

You can check the retry information in JP1/AJS3 - View windows or by using a command. The following table describes the JP1/AJS3 - View windows and command you can use and the retry information you are able to check.

Table 2‒39: JP1/AJS3 - View windows and command that you can use to check the retry information
No.	JP1/AJS3 - View window and command for checking retry information	Retry information you can check
1	Monitor Details dialog box Detailed Schedule dialog box `ajsshow` command	All retry information
2	Daily Schedule window Monthly Schedule window Jobnet Monitor window	Retry status Retry execution times
3	Search window	Retry execution times

If retry execution is performed multiple times, you can use JP1/AJS3 - View windows and the ajsshow command to check the result of the last retry execution. If you want to check the result of each retry execution, use JP1 events, scheduler logs, or the Execution Result Details dialog box.

To Page Top

(2) Monitoring jobs with retry settings

The following describes the timeout period and monitoring for a end delay for jobs using retry settings.

(a) Timeout period for jobs with retry settings

When you specify both a timeout period and retry settings for a job, the elapsed time for the job is reset at the beginning of each retry execution. The elapsed time for a job is monitored to determine when the timeout period expires.

If you want to monitor the entire elapsed time for a job that includes multiple retry executions, monitor a end delay for the jobnet containing the job for the length of time required for executing the jobnet. To do so, in the Define Details dialog box for the jobnet containing the job, select the Time-required-for-execution check box for Monitor jobnet, and then specify the monitoring time.

The following figure shows how the elapsed time for a job is monitored to determine the expiration of the timeout period when retry settings are specified.

Figure 2‒116: Monitoring the elapsed time for a job to determine the expiration of the timeout period when retry settings are specified

In this example, 20 minutes have passed since the start of the job during the first retry execution. However, the elapsed time for the job (monitored to determine when the timeout period expires) is reset at the beginning of each retry execution. For this reason, the job does not enter the Killed status. When 20 minutes have passed since the start of the second retry execution, the job enters the Killed status.

Note that if the timeout period has expired and the job enters the Killed status when the number specified in Maximum retry times has not been reached, no more retry executions will be performed.

(b) End delay monitoring for jobs with retry settings

When you specify both delayed monitoring and retry settings for a job, the elapsed time for the job is reset at the beginning of each retry execution, and the elapsed time for a job is monitored to determine if a end delay has occurred.

If you want to monitor the entire elapsed time for a job that includes multiple retry executions, use end delay monitoring, which monitors the jobnet containing the job for the length of time required for the jobnet.

The following figure shows how the elapsed time for a job is monitored to determine if a end delay has occurred when retry settings are specified.

Figure 2‒117: Monitoring the elapsed time for a job to determine if a end delay has occurred when retry settings are specified

In this example, 20 minutes have passed since the start of the job during the first retry execution. However, the elapsed time for the job (monitored to determine if a end delay has occurred) is reset at the beginning of each retry execution. For this reason, a end delay is not detected. When 20 minutes have passed since the start of the second retry execution, a end delay is detected. When retry execution is performed after a end delay is detected, end delays are no longer detected.

If a end delay is detected as a result of the last retry execution, the delay information continues to be displayed even after the job has ended.

Cautionary note: If a end delay is detected for a job after which a end delay has not been detected as a result of the next retry execution, the upper-level jobnet enters the Nested jobnet delayed end status.

To Page Top

(3) Execution simulation of jobs with retry settings

When you simulate the execution of a job with retry settings, the simulation includes the time required for waiting for an automatic retry and the length of time for retry execution.

The following figure shows an example of execution simulation of a job with retry settings.

Figure 2‒118: Execution simulation of a job with retry settings

In this example, the execution time for the job is 20 minutes. When the job was defined, 4 was specified for Maximum retry times and 5 minutes was specified for Retry interval. As a result, the simulation takes a total of 120 minutes (the time for the original job execution, the time for four retry executions, and the retry intervals between the original and retry executions).

To Page Top

(4) Behavior of units with retry settings

The following describes the behavior of jobs with retry settings, the behavior of preceding units of jobs with retry settings, and the behavior of upper-level units of jobs with retry settings.

(a) Re-executing jobs with retry settings and preceding units

The following describes the behavior of units when you re-execute jobs after automatic retry has ended, re-execute processing from preceding jobs during a retry, and re-execute only preceding units during a retry.

■ Re-executing jobs after automatic retries have ended

When you re-execute a job after automatic retries have ended, the number of retry executions is also reset.

The following figure shows how a job behaves when you re-execute a job after automatic retries have ended.

Figure 2‒119: Re-executing a job after automatic retries have ended

When you re-execute a job, the number of retry executions is set to 0.

■ Re-executing processing from preceding units during a retry

When you re-execute processing from a preceding unit during a retry, the number of retry executions is reset. The job being retried transitions to the Wait for prev. to end status and waits for the preceding unit to end.

The time at which the job being retried transitions to the Wait for prev. to end status to wait for the preceding unit to end depends on the status of the job being retried at the time the preceding unit is re-executed.

The following describes how the job behaves when you re-execute processing from a preceding unit during a retry.

When the job being retried is in the Wait for prev. to end or Being held status

The job being retried enters the Wait for prev. to end status when the preceding unit is re-executed. The job waits for the preceding unit to end. When the preceding unit ends, the number of retry executions is reset and the job is re-executed.

The following figure shows how the job being retried behaves when you re-execute processing from the preceding unit while it is in the Wait for prev. to end status.

Figure 2‒120: Behavior of a job in the Wait for prev. to end status during a retry when you re-execute processing from the preceding unit

When the preceding unit is re-executed, the number of retry executions is set to 0.

When the job being retried is in the Waiting to execute, Now queuing, or Now running status

The current retry execution will finish. Regardless of whether the job has ended normally, the job enters the Wait for prev. to end status when the retry execution ends. When the preceding unit ends, the number of retry executions is reset and the job is re-executed.

The following figure shows how the job being retried behaves when you re-execute processing from the preceding unit while it is in the Now running status.

Figure 2‒121: Behavior of the job in the Now running status during a retry when you re-execute from the preceding unit

When the preceding unit is re-executed, the number of retry executions is set to 0.

■ Re-executing only preceding units during a retry

When you re-execute only a preceding unit during a retry, the job transitions to the Wait for prev. to end status when the retry execution ends and the job waits for the retry interval to expire. If the preceding unit does not end after the retry interval has expired, the job waits for the preceding unit to end and will be retried until the number of retry executions is reached.

The following describes how the job being retried behaves when you re-execute only the preceding unit.

When the job being retried is in the Wait for prev. to end or Being held status

The job during a retry waits for the preceding unit to end after the retry interval has expired. When the preceding unit ends, the job is retried again.

The following figure shows how the job being retried behaves when you re-execute only the preceding unit while the job being retried is in the Wait for prev. to end status.

Figure 2‒122: Behavior of a job in the Wait for prev. to end status during a retry when you re-execute only a preceding unit

When the job being retried is in the Waiting to execute, Now queuing, or Now running status

The current retry execution will finish. If the executable file ends abnormally during the current retry execution, the job waits for the preceding unit to end after the retry interval has expired. When the preceding unit ends, the job is retried again.

The following figure shows how a job during a retry behaves when you re-execute only the preceding unit while the job being retried is in the Now running status.

Figure 2‒123: Behavior of a job in the Now running status during a retry when you re-execute only the preceding unit

(b) Suspending root jobnets during job retries

You can suspend a root jobnet containing a job being retried or cancel the suspension of a root jobnet.

While the root jobnet is suspended, retry execution is not performed for jobs even when a job has ended abnormally. When you cancel suspension, retry execution is performed after the retry interval expires.

For details about how to suspend root jobnets, see 4.5.17 Changing job and jobnet definitions without unregistering the jobnet in the manual JP1/Automatic Job Management System 3 Overview.

(c) Interrupting root jobnets containing jobs being retried

When you interrupt a root jobnet containing a job being retried, the root jobnet enters the Interrupted status and the job with retry settings enters the Not executed + Ended status.

The time that the job being retried enters the Not executed + Ended status depends on the status of the job being retried.

The following describes how a job being retried behaves when the root jobnet is interrupted.

When the job being retried is in the Wait for prev. to end or Being held status

The scheduled retry execution is not performed. The job enters the Not executed + Ended status and the retry status transitions to Retry end.

The following figure shows how a job during a retry behaves when you interrupt the root jobnet while the job being retried is in the Wait for prev. to end status.

Figure 2‒124: Behavior of a job in the Wait for prev. to end status during a retry when you interrupt the root jobnet

When the job being retried is in the Waiting to execute, Now queuing, or Now running status

The current retry execution will finish. Regardless of whether the job ends normally, the ended job enters the Not executed + Ended status and the retry status transitions to Retry end.

The following figure shows how the job being retried behaves when you interrupt the root jobnet while the job being retried is in the Now running status.

Figure 2‒125: Behavior of a job in the Now running status during retry when you interrupt the root jobnet

When you interrupt the root jobnet containing a job being retried, the error in the executable file defined for the job will not be corrected. The job enters the Not executed + Ended status and the retry status transitions to Retry end.

(d) Changing the status of jobs and killing jobs during retries

When a job enters the Now running, Now queuing, or Waiting to execute status due to an automatic retry (for jobs for which Queueless Agent is specified as the execution target service), you can change the status of the job to an end status. You can also change the status of a job in an end status due to an automatic retry to another end status. You can kill a job being automatically retried as well. When you change the status of a job or kill a job, no automatic retry will be performed for the job after it enters an end status, regardless of the new end status or the return code.

(e) Changing the definitions of jobs during a retry

When yes is specified for the UNITDEFINERELOAD environment setting parameter and you change the definition of a job during a retry, the definition of the job is read again each time retry execution is performed. Accordingly, when you change the definition of a job during a retry, the new definition takes effect from the next retry execution after the change.

For details about the behavior of jobs when you change the definitions, see 7.4 Changing the unit definition information during registration for execution in the JP1/Automatic Job Management System 3 Administration Guide.

If you delete retry settings when the retry status is Retry waiting, only one retry execution is performed. Thereafter, no more retry executions will be performed.

Cautionary note: When you change the definition of a job during a retry, the subsequent behavior of the job changes depending on the status of the job at the point when the change is made. The job might therefore behave unexpectedly. We recommend that you do not change any definitions during a retry.

(f) Waiting during a job retry

When a wait condition is assigned to a job with retry settings, after the wait condition is satisfied and the wait status is set to Wait complete, the job does not wait for the job being waited for during retry executions. If you enable the wait condition again and change the wait status to Wait incomplete (manual) before the retry status transitions to Retry waiting, the job only waits for the retry interval to expire and does not wait for the wait condition to be satisfied.

The following figure shows how a job with retry settings behaves when you assign a wait condition to it.

Figure 2‒126: Behavior of a job with retry settings when you assign a wait condition to it

To Page Top

(5) Information to be updated by retry execution

When retry execution is performed, information in addition to the retry information is also updated. If you reference that information while a job is being executed, the results might differ depending on when you reference it. Define jobs taking this point into consideration.

The following table describes the information that is updated by retry execution.

Table 2‒40: Information updated by retry execution
No.	Updated information	Description
1	`JP1JobID` environment variable	The job ID is updated to the ID used at retry execution.
2	Standard output file	When the Append check box is not selected in the Define Details dialog box for jobs The contents of the file are overwritten each time retry execution is performed. When the Append check box is selected in the Define Details dialog box for jobs Information is added to the file each time retry execution is performed.
3	Standard error output file	When the Append check box is not selected in the Define Details dialog box for jobs The contents of the file are overwritten each time retry execution is performed. When the Append check box is selected in the Define Details dialog box for jobs Information is added to the file each time retry execution is performed.
4	Transfer file	The file is transferred each time retry execution is performed.
5	Execution result details	Information is added each time retry execution is performed.

Cautionary note: When you select the Append check box, the sizes of the standard output file and the standard error output file increase, creating a high system load. We recommend that you do not select the Append check box. If you need to select the Append check box, limit the amount of output information or periodically clear the files.

To Page Top

(6) Restarting the scheduler service during a retry

When you stop the scheduler service and restart it, the jobs being retried enter the same status as regular jobs depending on the start mode of the scheduler service. The differences are as follows.

Starting the scheduler service in warm start mode or disaster recovery start mode

Automatic retries do not continue. Jobs enter an end status specified for each start mode based on their job status at that time.

Starting the scheduler service in hot start mode

Automatic retries continue. The retry interval and the number of retry executions are handled as follows.

Retry interval

The period during which the scheduler service is inactive is included in the elapsed time for the jobs monitored to determine when the retry interval expires. Suppose you have a job for which retry execution is performed every five minutes. When the scheduler service stops for two minutes after two minutes have passed since the retry status transitions to the Retry waiting status, and the scheduler service restarts in hot start mode, retry execution starts one minute after the scheduler service restarts.

Figure 2‒127: Behavior of the job when the scheduler service starts in hot start mode

Note that if you restart the JP1/AJS3 service or the scheduler service in hot start mode after it has stopped for more than the retry interval, jobs start executing as soon as the scheduler service starts. If that occurs, the number of jobs temporarily increases, possibly degrading job execution performance for a while.

Number of retry executions

The number of retry executions is not initialized. The number continues to increase as retry executions are performed. Suppose you have a job for which 5 is specified for Maximum retry times. If you restart the scheduler service when two retry executions have finished, a maximum of three more retry executions will be performed after the scheduler service restarts.

For details about the status of jobs for each start mode, see 6.2.1 Temporarily changing the start mode of JP1/AJS3 in the JP1/Automatic Job Management System 3 Administration Guide.

To Page Top

(7) Cautionary notes on automatic retry

Note the following when you use automatic retries:

When retry settings are specified for a unit under a remote jobnet and either of the following conditions is satisfied for the manager executing the jobnet, an error occurs:
- The database uses a compatible ISAM configuration.
- The version of JP1/AJS3 - Manager is 09-50 or earlier.
An automatic retry is not performed while job restrictions or forced termination of jobs are in effect to stop the scheduler service. Jobs enter an end status without performing retry executions regardless of the return codes and the number of retry executions. When all the running jobs have ended, the scheduler service stops.

For details about restricting how the scheduler service stops, see 7.5.2 Stopping the scheduler service in the JP1/Automatic Job Management System 3 Administration Guide.
Sometimes jobs are ended by the OS with return codes instead of by user applications. In such cases, an automatic retry is performed.
For queueless jobs, an automatic retry is also performed if there are no script files or script files cannot be accessed. Use the definition pre-check function to make sure that script files can be accessed.
When you execute queueless jobs, use version 10-00 or later of JP1/AJS3 - Manager or JP1/AJS3 - Agent. If the version is 09-50 or earlier, an automatic retry is also performed when the process for starting job processes ends abnormally for some reason.

To Page Top