Recovering from OpenTP1 system failures

In the event of an OpenTP1 system failure, a complete recovery can restart OpenTP1 and return the whole system to the state immediately before the failure. A complete recovery uses the history information stored by OpenTP1. This history information records the previous online state before the failure occurred. As described in 4. File System, an OpenTP1 system preserves its history information in journal files in preparation for a complete-recovery restart. During such a restart, OpenTP1 recovers components of the system in the following order:

System recovery
Service recovery
Transaction recovery

Online processing can be restarted after OpenTP1 completes system recovery and service recovery.

System recovery
During this initial stage of a complete recovery, OpenTP1 uses the information in status files. Status files contain system control information: such as information about the configuration of system services and UAPs, and system file information. Using this system control information, OpenTP1 first performs a system recovery to recover those system statuses that do not depend on synchronization point processing of a transaction. OpenTP1 uses the system control information to determine which checkpoint dump or which system journal should be used for recovery.
Service recovery
After using the system control information to recover the status of the system, OpenTP1 starts recovering system services. System services are recovered using the data stored in checkpoint dump files and system journal files. To recover system services, OpenTP1 attempts to use the checkpoint dump of the most recent generation and all the recovery journals obtained after that checkpoint dump. If the checkpoint dump of the most recent generation is unavailable, OpenTP1 uses the next most recent generation for the recovery and all the recovery journals obtained after that checkpoint dump. Only the most recent and the next most recent generations are guaranteed.
Transaction recovery
Along with the new online processing, OpenTP1 also recovers the transactions of the UAPs that were executing when OpenTP1 terminated. In this recovery, OpenTP1 performs a commit operation or rollback operation on every unfinished transaction. In effect, during this transaction recovery stage of a complete recovery, OpenTP1 recovers each UAP (that is, performs a partial recovery for each UAP) by recovering each unfinished transaction affected by the system shutdown.
Whether a commit or a rollback operation is performed on the transaction depends on how far the transaction processing proceeded. If the transaction is still before or at the first phase at a synchronization point, OpenTP1 rolls back the global transaction. If the transaction is already at the second phase or later, the commit or rollback operation depends on the determination of the root transaction branches.
OpenTP1 uses the synchronization point journal in the system journal files to determine how far transaction processing had proceeded when the OpenTP1 system shutdown.

Figure 5-4 Transactions recovered in a complete recovery

If the system stops while terminating the OpenTP1 system by entering the dcstop command, the user service definition and user service default definition allow you to select whether to recover the status when you input the dcstop command without applying the change of the user server status during the termination processing or to recover the status when the error occurred, applying the change of the status. The user server with Y specified by the status_change_when_terming operand in the user service definition applies the final status change and recovers the status when the system terminated. When specifying N with this operand, the user server does not apply the final status change and recovers the status when the dcstop command was input.

The following figure shows the status of the user servers during a complete recovery when the OpenTP1 system has stopped while in the process of terminating.

Figure 5-5 User server status at complete recovery after the OpenTP1 system stops during termination processing

[Figure]

5.3.1 Recovering from OpenTP1 system failures

(1) Restart at complete recovery

(2) User server status at complete recovery after the OpenTP1 system stops during termination processing