Files and processes used during extraction processing

The files discussed as follows are used during extraction processing.

(a) HiRDB system log file

This file stores database update information at the HiRDB. The source Datareplicator extracts update information from this file.

(b) Extraction information queue files

Datareplicator uses these files to store the update information extracted from the HiRDB system log file during extraction processing.

The source Datareplicator stores the update information extracted during extraction processing sequentially into one of the extraction information queue files. When this extraction information queue file becomes full, Datareplicator uses another extraction information queue file. This is called swapping, and it enables the source Datareplicator to store a large amount of update information. Swapping takes place in the order of the qufile001 to qufile016 operands specified in the extraction environment definition.

When all extraction information queue files become full, Datareplicator re-uses the first file. However, if transmission of the update information has not been completed for the file that is to be used next, swapping cannot take place. In such a case, Datareplicator outputs a message indicating that the queue file is full and stops extracting update information from the system log file until transmission from the file has been completed.

The following figure shows the procedure for storing data in the extraction information queue files.

Figure 3-13 Procedure for storing data in the extraction information queue files

[Figure]

(c) Data linkage file

Datareplicator uses the data linkage file to store and read HiRDB communication messages that are required for extraction processing, such as the storage status of update information in the system log at HiRDB and the status of reading update information from the system log file at the source Datareplicator.

(d) Extract-time status files

These files store the extraction/transmission status required for recovery in the event of an error. The extract-time status files include the extraction master status file and the extraction server status file.

(e) Extract-time error information files

If extraction or transmission processing results in an error, Datareplicator outputs error details to an error information file. The extract-time error information files include the extraction master error information files and the extraction node master error information files.

You can also output to the syslog file the information that is output to the error information files. You use the syslogout operand in the extraction system definition to specify whether the information is to be output also to the syslog file.

(f) Extract-time activity trace file

Datareplicator uses the activity trace files to collect Datareplicator's activity status. These files contain information about Datareplicator's operation and performance. The activity trace files include the extraction master trace files and the extraction node master trace files.

To use the activity trace files, specify the int_trc_lvl and int_trc_filesz operands in the extraction system definition. You can edit the obtained activity trace files with the hdstrcedit command. For details about the hdstrcedit command, see its command syntax in Chapter 7. Command Syntax.

(g) Extraction system definition file

The extraction system definition file defines the overall operating environment for the source Datareplicator, such as the source Datareplicator's identifier and the target identifier.

(h) Extraction environment definition file

This file defines the operating environment for extraction processing, such as the names and sizes of the extraction information queue files.

(i) Transmission environment definition files

These files define operating environments for transmission processing, such as the service names and host names for communications.

(j) Extraction definition file

This file defines detailed information about extraction and transmission processing, such as the correspondence between source table/columns and update information and the destination of update information.

(k) Extraction definition preprocessing file

This file is obtained by using the hdeprep command to convert the extraction definition file to the internal format. You must execute this conversion to the extraction definition preprocessing file before you start the source Datareplicator.

Checking the validity of the extraction definition preprocessing file

When the source Datareplicator is started, the validity of the extraction definition preprocessing file is checked automatically. Startup processing of the source Datareplicator is cancelled in the following cases:

The creation date of the extraction definition preprocessing file is earlier than the creation date of the extraction master status file (the KFRB00713-E message is output).
The table definition for the table to be extracted had been changed after the hdeprep command was executed (the KFRB00866-E message is output).

Note that if Datareplicator cannot connect to HiRDB for a reason such as omission of the PDUSER environment variable or a password, the KFRB00868-W message is output, in which case the source Datareplicator is started without checking the validity of the extraction definition preprocessing file.

(l) Command log files

These files store a record of the dates and times Datareplicator's commands are executed.

Figure 3-14 shows the organization of processes during extraction processing when the source HiRDB is a single server, and Figure 3-15 shows the organization of processes when the source HiRDB is a parallel server.

Figure 3-14 Organization of processes during extraction processing: Source HiRDB is a single server

[Figure]

Figure 3-15 Organization of processes during extraction processing: Source HiRDB is a parallel server

[Figure]

(a) Extraction command process

The extraction command process processes the source Datareplicator's command and issues an instruction to the extraction master process. If the source HiRDB is a parallel server, there is one extraction command process under the system manager.

(b) Extraction master process

The extraction master process controls the extraction node master process. If the source HiRDB is a parallel server, there is one extraction master process under the system manager.

(c) Extraction node master process

The extraction node master process manages the extraction process and the transmission process. If the source HiRDB is a parallel server, there is one extraction node master process at each server machine that contains a back-end server. The source Datareplicator calls such a server machine a node.

(d) Extraction process

The extraction process extracts update information from the system log file and stores it in the extraction information queue file. If the source HiRDB is a parallel server, one extraction process exists at each back-end server that is subject to extraction.

(e) Transmission process

The transmission process reads update information from the extraction information queue file and sends it to the target system. There are as many transmission processes as there are destinations. If the source HiRDB is a parallel server, there are as many transmission processes as there are destinations at each back-end server that is subject to extraction. However, if you specify sendmst in the sendcontrol extraction system definition operand, up to the number of transmission process can exist as is specified in the sendprocnum extraction system definition operand.

Hereafter, the method used when sendmst is specified in the sendcontrol extraction system definition operand is referred to as the sendmst method, while the method used when nodemst is specified is referred to as the nodemst method.

(f) Transmission master process

The transmission master process starts, stops, and schedules the transmission processes to control/suppress the number of transmission processes to be started when there are many destinations. The transmission master process is generated when a definition is made to control the number of transmission processes. For details about how to control the number of transmission processes, see 3.2.5 Controlling the number of transmission processes.

(g) Activity trace collection process

The activity trace collection process collects activity trace information.

3.2.2 Files and processes used during extraction processing

(1) Files used during extraction processing

(a) HiRDB system log file

(b) Extraction information queue files

(c) Data linkage file

(d) Extract-time status files

(e) Extract-time error information files

(f) Extract-time activity trace file

(g) Extraction system definition file

(h) Extraction environment definition file

(i) Transmission environment definition files

(j) Extraction definition file

(k) Extraction definition preprocessing file

(l) Command log files

(2) Organization of processes during extraction processing

(a) Extraction command process

(b) Extraction master process

(c) Extraction node master process

(d) Extraction process

(e) Transmission process

(f) Transmission master process

(g) Activity trace collection process