Recovering from file errors

The following describes the procedure when an error occurs in a system file.

(a) Recovering from system-file errors: status files

OpenTP1 can detect a status file error when the status information is written or read, or when the status service is started. When an error is detected while attempting to read to or write from a status file, the procedure for recovering from such an error depends on:

whether a reserved status file exists
whether only one or both the physical files in the current filegroup are unreadable

When an error is detected when attempting to start the status service, the system and user actions depend on which of the following is specified in the status service definition:

OpenTP1 termination
start of status service

(b) Recovering from system-file errors: system journal files

System journal file errors can be generally classified into write errors and read errors. In a write error, the journal to be obtained cannot be written to during online processing. In a read error, the journal cannot be read from during a complete recovery or a UAP transaction partial recovery.

The procedure for recovering from such an error depends on:

whether the error occurred during a read or write
whether a swappable standby filegroup exists

Note, however, that when a system journal file is duplicated, the journal is input first from physical file A. If an error occurs during the write, the journal can be input from physical file B, which increases reliability at system recovery.

When a system journal file is duplicated, even if only one of the files can be used as a swap destination, you can specify whether to use it as the swap destination in the system journal service definition.

(c) Recovering from system-file errors: checkpoint dump files

As with system journal file errors, checkpoint dump file errors can be roughly classified into write errors and read errors. In a write error, the checkpoint dump cannot be written to during online processing. In a read error, the checkpoint file cannot be read from during a complete recovery.

If a checkpoint dump is duplicated, errors can be recovered starting from either system A or B at system recovery.

The procedure for recovering from such an error depends on:

whether the error occurred during a read or write
whether overwritable filegroups exist

(d) Recovering from system-file errors: archive journal files

Archive journal file errors can be classified into write errors and read errors. In a write error you cannot write journal information you are attempting to acquire during online processing. In a read error you cannot read journals during complete recovery.

If an archive journal file is duplicated, data is input from system A. If a write error occurs, input can be switched from system A to system B, increasing the reliability during system recovery.

When a file is duplicated, you can specify in the archive journal service definition whether to allocate the archive journal file as the swap destination even if only one standby physical file can be used.

(e) Recovering from system-file errors: transaction recovery journal files and server recovery journal files

If a failure occurs in a transaction recovery journal file or in a server recovery journal file, a message about the failure is output. In accordance with this message:

Execute the jnlmkrf command to restore the target recovery journal file.

If the recovery journal files are not recovered despite executing the jnlmkrf command:

Restore the resource manager using commands (damfrc or tamfrc)
Forcibly restart the OpenTP1 system.

(f) Recovering from system-file errors: message log files

For a message log file, two files are used with round-robin scheduling. If an error occurs in either of the two files, OpenTP1 uses the normal file only, isolating the erroneous file. If errors occur in both files, OpenTP1 continues processing without outputting a message log.

The procedure for recovering from such an error depends on whether OpenTP1 can detect the error:

If a message log file error can be detected, OpenTP1 outputs error messages to the standard error output and the OpenTP1 administrator must take the required recovery measures.
If OpenTP1 cannot detect a message log file error, no recovery measures can be taken.

5.3.3 Recovering from file errors

(1) Recovering from system file errors

(a) Recovering from system-file errors: status files

(b) Recovering from system-file errors: system journal files

(c) Recovering from system-file errors: checkpoint dump files

(d) Recovering from system-file errors: archive journal files

(e) Recovering from system-file errors: transaction recovery journal files and server recovery journal files

(f) Recovering from system-file errors: message log files

(2) Recovering from queue file errors

(a) Recovering from queue-file errors: MCF message queue file

(b) Recovering from queue-file errors: MQA message queue file

(3) Recovering from user file errors

(a) Recovering from user-file errors: DAM files

(b) Recovering from user-file errors: TAM files

(c) Recovering from user-file errors: ISAM files