OpenTP1 Version 7 Description

[Contents][Glossary][Index][Back][Next]

5.3.2 Recovering from UAP failures

When a failure occurs in a UAP, OpenTP1 performs a partial recovery of the affected UAP, preventing the problem from impacting on the whole system. This section describes the processing when a failure occurs in a UAP.

Organization of this subsection
(1) Recovering from an inability to start a UAP
(2) Recovering from infinite loops in which a UAP cannot terminate
(3) Abnormal termination with UAP linkage errors
(4) Recovering from UAP abnormal terminations
(5) Recovering from a deadlock

(1) Recovering from an inability to start a UAP

Two common causes of being unable to start a UAP are an incorrect system definition or insufficient memory.

To recover from an incorrect system definition, the OpenTP1 administrator must correct the definitions and then restart the UAP.

To recover from insufficient memory, the OpenTP1 administrator must terminate unnecessary processes or delete unnecessary files, and then restart the UAP.

(2) Recovering from infinite loops in which a UAP cannot terminate

The OpenTP1 recovery procedure for an infinite loop partly depends on how OpenTP1 detected the loop. To detect a program loop, OpenTP1 monitors:

OpenTP1 monitors the elapsed transaction time: from the start to the end of a transaction. In the user service definition, user service default definition, or transaction service definition, you can specify a limit for this time period. If this specified period is exceeded, OpenTP1 forcibly terminates the program and rolls back the transaction. This is possible for transaction processing only. If 0 is specified as the monitoring period, OpenTP1 does not monitor the elapsed time.

OpenTP1 monitors the RPC response time: the time elapsed from the call of the server UAP to return of control. In the system common definition or individual service definitions, you can specify a limit for this time period. If this specified period is exceeded, OpenTP1 returns an error to the source of the call. A transaction is rolled back at the synchronization point. If 0 is specified as the RPC response time, the source of the call continues to wait until it receives a response.

OpenTP1 monitors the CPU time in which a transaction branch can complete a transaction. In the user service definition, user service default definition, or transaction service definition, you can specify a limit for this time period. If this specified period is exceeded (i.e., the transaction branch cannot complete the whole operation within the specified period), the OpenTP1 system terminates the branch's transaction processes and executes a rollback operation.

(3) Abnormal termination with UAP linkage errors

If the operating system is HP-UX, always specify immediate in the bind mode during linkage. If the UAP is created in a mode other than the bind mode, the UAP may terminate abnormally. Use the operating system's chatr command to check if the bind mode of the created UAP is immediate.

(4) Recovering from UAP abnormal terminations

If a UAP terminates abnormally, OpenTP1 detects the abnormal termination of processing and begins partial recovery. In this partial recovery of a UAP transaction, OpenTP1

(a) Restarts the UAP processing, and closes the service groups or services

What happens while restarting UAP processing and closing service groups or services depends on whether the UAP is an SPP (service-providing program), MHP (message-handling program), or SUP (service-using program).

(b) Recovers the relevant transactions

If the UAP that terminated abnormally was executing a transaction, OpenTP1 performs a transaction partial recovery. If an error occurred in the UAP, OpenTP1 detects the abnormal termination of the UAP processing the transaction and requests the transaction recovery service to perform a partial recovery on the transaction. The transaction recovery service performs a transaction determination and recovers the transaction.

During transaction partial recovery, the transaction recovery requests are queued using the scheduling facility and the recovery is performed concurrently by multiple transaction-recovery service processes, which improves the efficiency of transaction recovery. In the transaction service definition, you can specify the number of recovery processes.

When recovery requests are queued, a transaction recovery might fail because of an OpenTP1 system area error or because of insufficient memory in the schedule queue. To successfully recover the transaction in such a case, OpenTP1 regularly checks whether an unrecovered transaction exists, and re-issues a recovery request if OpenTP1 detects an unrecovered transaction.

The preceding method is also used to perform a transaction determination that has been temporarily suppressed because a file was held by a resource manager or by a transaction service at recovery.

(5) Recovering from a deadlock

When two or more UAPs share a resource, a deadlock might occur. If it detects a deadlock, OpenTP1 compares the deadlock priorities of the UAPs and returns a lock error to the function issued from the UAP with the lowest priority.

For the OpenTP1 processing when a deadlock occurs, see 3.9.1(5) Deadlocks in TAM and DAM files.