Using only the recovery journals to recover the online system from a stoppage would require all the journals from the start of the online processing, and using all these journals might take a lot of time. This recovery time can be reduced by requesting OpenTP1 to periodically save the status of system tables needed in recovery. OpenTP1 then saves this information at various checkpoints.
The table information obtained at a checkpoint is called a checkpoint dump. In a complete recovery of an online system OpenTP1 does not need to use all the recovery journals from the beginning of processing: OpenTP1 can use checkpoint dumps and only those recovery journals obtained from the time the last checkpoint dump was obtained until the time the online system stopped.
Since OpenTP1 performs recovery for each system service, it obtains a checkpoint dump for each system service for which tables in memory need to be recovered: the transaction service and MCF service. A file is allocated to each system service to store the checkpoint dumps. This file is called a checkpoint dump file.
If you have specified that checkpoint dumps are to be taken, the dumps are obtained at these checkpoints:
A checkpoint dump file is a logical filegroup, and the actual file entity that obtains a checkpoint dump is called a physical file. A filegroup consists of one or two physical files. A filegroup that consists of two physical files is called a duplicated checkpoint dump file. When a checkpoint dump file is duplicated, one physical file is called system A and the other physical file is called system B.
A filegroup that consists of two physical files is called a duplicated checkpoint dump file. When a checkpoint dump file is duplicated, one physical file is called system A and the other physical file is called system B.
For each of the system services that require checkpoint dump files, you must prepare a checkpoint dump service definition, specifying the system service name, filegroup, and physical file relationships.
The filegroup defined in the checkpoint dump service definition can be given any name, which will be used when working with the checkpoint dump file.
The filegroup in a checkpoint dump file has one of the following statuses:
In the checkpoint dump service definition, define the filegroup that is to be open when OpenTP1 starts. Filegroups not defined to be open are placed in reserved status when OpenTP1 starts.
A checkpoint dump taken at one point in time is called a generation. Because one generation is stored in one checkpoint dump filegroup, a different filegroup needs to be swapped in for each generation. When checkpoint dumps have been output to all the available filegroups, the data in the first filegroup is overwritten. This method of storing data in multiple filegroups, overwriting each in turn, is called the round-robin method. Normally, the filegroup with the most recent generation has overwrite-prohibited status, while the other filegroups are placed in overwrite-permitted status. However, when using the multi-generation guarantee facility, the filegroups for the two most recent generations are both placed in overwrite-prohibited status.
Figure 4-10 shows how checkpoint dumps are assigned to filegroups by round-robin scheduling.
Figure 4-10 Most recent checkpoint dump generation overwrites earlier checkpoint-dump generation
If an error occurs in a checkpoint dump file during online processing or if there are insufficient files for operations, physical files can be added dynamically. The method to dynamically add such physical files differs according to whether a reserved file has been defined.
Even with physical files that have not been defined in the checkpoint dump service definition, you can use commands to add, during online processing, the physical files to the files used for checkpoint processing.
When a standby file has been previously defined in the checkpoint dump service definition, files can be assigned by opening the file with the jnlopnfg command or opening the file automatically whenever OpenTP1 starts.
Using the automatic open facility enables automatic allocation of the standby file for the current file when the number of online physical files decreases to the number of guaranteed-valid generations (described below). You can specify automatic opening in the checkpoint dump service definition.
Physical files that are added dynamically to a reserved filegroup can be deleted using the jnldelpf command.
The Multigeneration Guarantee facility improves OpenTP1 reliability by enabling OpenTP1 to recover in situations where the filegroup storing the most recent checkpoint-dump generation cannot be read for some reason. In such a case, OpenTP1 can recover by reading the filegroup that contains the generation preceding the most recent one. The Multigeneration Guarantee facility thus prevents the filegroups containing the last two checkpoint-dump generations from being overwritten. The overwrite-prohibited generations are called guaranteed-valid generations (or, in some manual versions, valid guarantee generations). The number of guaranteed-valid generations is 2 when the Multigeneration Guarantee facility is used and 1 when not used.
OpenTP1 suppresses the overwriting of the system journal filegroups used to store the guaranteed-valid generation of checkpoint dumps that are required for recovery. In an overwrite check, OpenTP1 checks whether the system journal file for the guaranteed-valid generation can be overwritten.
When the multi-generation guarantee facility is enabled, the filegroup containing the most recent checkpoint generation is read first. Recovery processing is then performed, based on all journals collected since the most recent generation. If an error occurs for any reason and the filegroup containing the most recent generation cannot be read, the checkpoint dump file for the preceding generation is read. Recovery processing in this case is based on all journals collected since the preceding generation. If neither of the filegroups storing guaranteed-valid generations can be read, the next most recent generation is used, although journals earlier than the guaranteed-valid generations may have been overwritten and cannot be recovered. The required number of checkpoint dump filegroups that are online and in a status other than reserved is the number of guaranteed-valid generations + 1. The following figure shows the relationships between guaranteed-valid generations and system journal files.
Figure 4-11 Guaranteed-valid generations and system journal files
Even when the number of filegroups required for online operations or for restarting processing falls below one plus the number of guaranteed-valid generations, processing can still continue if a minimum of 2 filegroups are usable. This feature is called the fallback facility for checkpoint dumps. In the checkpoint dump service definition you can specify whether to use the fallback facility.
Note that there is a drawback to using this fallback facility. When an error occurs in an OpenTP1 system during fallback operations, information for restart is guaranteed for only one filegroup. If that filegroup should fail, recovery is not possible.
When fallback operation occurs, a message informs the user that OpenTP1 has changed from ordinary operation to fallback operation. When such a message is output, the OpenTP1 administrator should quickly prepare a filegroup usable for a guaranteed-valid generation. After the filegroup is prepared, another message informs the administrator that fallback operation has switched to ordinary operation.
The required size of a system journal file increases in proportion to the number of guaranteed-valid generations for checkpoint dump files. You can estimate the required size (that is, the required number of blocks) of the system journal file as follows:
Number of journal blocks used to store journal information between checkpoints (1 + number of guaranteed-valid generations)
When a filegroup is operated with a duplicated checkpoint dump file, OpenTP1 outputs the same checkpoint dump to both systems A and B. If a failure occurs in one system while reading the checkpoint dump, the same data can be read from the other system. In this way, duplicating a checkpoint dump file increases reliability.
To duplicate a checkpoint dump file, specify Y in the jnl_dual option of the checkpoint dump service definition. At this time, specify two physical files (system A and system B) in a filegroup.
You should store the physical file of system A and the physical file of system B on separate disks to prevent a failure in both physical files at the same time. The size of the physical file of system A and that of system B need not be the same. If the sizes are different, however, OpenTP1 considers the smaller file to be the size of the checkpoint dump. To use resources efficiently, match the sizes whenever possible.
OpenTP1 manages the generations of each filegroup. When you use OpenTP1 commands to open and close files, the operation is executed in units of filegroups; therefore, it is impossible to open or close only system A or B during online processing.
When a checkpoint dump file is duplicated, one filegroup requires two physical files. If a failure occurs either in system A or B, the filegroup becomes reserved.
You can select whether to enable or disable one-system operation if an error occurs on either physical file of a checkpoint duplicated dump file.
When one-system operation is unavailable, it is impossible to open or close only either system.
When one-system operation is available, it is possible to open or close only either system. However, an overwrite-prohibited file group cannot be closed, whether one-system operation is available or not.
Table 4-8 shows the differences between when one-system operation is available and when unavailable.
Table 4-8 Differences between when one-system operation is available and when unavailable
Operation | Mode | |
---|---|---|
One-system operation available | One-system operation unavailable | |
Allocating only one system while online | Possible | Possible |
Disconnecting only one system while online | Possible | Possible |
Opening only one system | Possible when both systems are allocated. | Impossible |
Closing only one system (when overwrite-prohibited) | Impossible | Impossible |
Closing only one system (when overwrite-permitted) | Possible | Impossible |