Nonstop Database, HiRDB Version 9 Installation and Design Guide
The parallel loading facility executes, in parallel, data loads from a single input data file to multiple RDAREAs that constitute a row-partitioned table. Executing a pdparaload command allows data to be loaded to multiple RDAREAs at once.
The following figure shows an overview of the parallel loading facility.
Figure 6-3 Overview of the parallel loading facility
Using the parallel loading facility offers the following two advantages.
The parallel loading facility thus offers the dual advantage of fast processing from data loads by RDAREA and easy data loading by table.
Do the following prior to executing parallel loading.
With parallel loading, multiple pdload commands execute in parallel. For example, when a table is divided into three RDAREAs for storage, three pdload commands are executed simultaneously, so three portions of resources are needed. Factor this in when you calculate the amount of resources that will be necessary.
Prepare the data to be input to the table as an input data file. The pdparaload command uses a single input data file.
The figure below shows the operating procedures for parallel loading.
Figure 6-4 Parallel loading operating procedure
The following limitations apply when data is loaded using the parallel loading facility:
pdparaload command options and control statements can be specified in the same manner as pdload command options and control statements, with some exceptions. For details about what can be specified, see the section on pdparaload in the manual HiRDB Version 9 Command Reference.
The performance of the parallel loading facility, such as its allocation of the input data, might be affected by the environment in which data loading is executed. For this reason, in some execution environments data loading might not be any faster even when you use the parallel loading facility. Therefore, try using the parallel loading facility and measure how much time data loading takes before you start actual operations. If this trial shows that using the parallel loading facility will shorten data loading time, use the parallel loading facility. If the facility does not deliver the expected processing performance, load data in table units or divide the input data file and load data in RDAREA units. For details about creating a divided-input data file, see 6.1.7 Loading data into a row-partitioned table (Creating divided-input data files).
The conditions that affect the parallel loading facility's performance are described below.
The parallel loading facility is advantageous when you are partitioning rows among servers. Parallel loading can also be executed on tables with row partitioning on a single server. But data loading is faster when it is divided among servers, since processing of a single pdload command to store data can monopolize the servers.
During parallel loading, multiple pdload commands read data from a single input data file. Processing time can be shortened at this point by allocating data so as to increase read processing efficiency. As the following figure shows, allocating input data evenly across partitioning keys results in efficient read processing.
Figure 6-5 Allocation of input data vs. read efficiency
If data from a table that uses row partitioning on one server is unloaded by a pdrorg command to a single file, the unloaded data file will demonstrate lopsided data allocation, as shown in the figure. For this reason, if these tables are used as input data files, performance might decline despite the use of parallel loading.
The parallel loading facility allows data loading in all the data formats that can be loaded using the pdload command. However, whenever you are working with large quantities of data, we recommend using the binary format for input data. The DAT and fixed-size data formats both require that the format be converted when the data is stored in a database. This might increase CPU usage and decrease the performance of parallel processing of data loading.
When LOB columns and abstract data type columns that have LOB parameters are defined in the table you are processing, we recommend specifying f as the LOB creation type (the -k option) for the data load. A file is prepared for each LOB data item as the LOB input file for data loads that specify f. For this reason, there is no duplicate reading of LOB data when data loads are executed in parallel. This reduces input and output.
When d is specified as the LOB creation type, all LOB data is stored in the input data file. This generates processing to skip reading of LOB data that is not a target of processing when data is loaded in RDAREA units. This increase in processing can weaken performance.
This section provides operating examples that use parallel loading.
The conditions for these operating examples are as follows.
Table 6-2 Parallel loading operating example conditions
| Operating example | Table definition | Index definition | Input data format | Index creation method | Log acquisition mode | ||
|---|---|---|---|---|---|---|---|
| Row partitioning | LOB column | Partition-ing key index | Non-partitioning key index | ||||
| Operating example 1 | Y | Y | Y | N | Binary | Batch index creation mode | Pre-update log acquisition mode |
| Operating example 2 | Y | N | Y | Y | Binary | Batch index creation mode | Pre-update log acquisition mode |
Data is loaded to TBL2, which defines LOB columns, using the parallel loading facility. For the LOB creation type (-k option), f is specified and a file is created for each LOB data item.
CREATE TABLE "TBL2"( col001 int not null, col002 varchar(20), col003 blob(1M) in ((LOB01),(LOB02),(LOB03)), col004 decimal(10,3) ) fix hash hashf by col001 in (RDUSER11,RDUSER21,RDUSER31); |
CREATE INDEX "INDX2" ON "TBL2"(col001,col002) in ((RDUSER12),(RDUSER22),(RDUSER32)); |
pdhold -r LOB01,LOB02,LOB03,RDUSER11,RDUSER21,RDUSER31,RDUSER12,RDUSER22,RDUSER32 |
source fes01:/users/data/input_file error=/users/rep/error_file lobdata /users/data/lob idxwork bes01 /users/work idxwork bes02 /users/work idxwork bes03 /users/work sort bes01 /users/work sort bes02 /users/work sort bes03 /users/work report file=/users/rep/result_file |
pdparaload -d -b -k f -i c -l p "TBL2" /users/cntl/control_file |
pdcopy -m /users/rdarea/mast/mast01 -M r -p /users/pdcopy/list/list01 -b /users/pdcopy/backup/backup01 -r LOB01,LOB02,LOB03,RDUSER11,RDUSER21,RDUSER31,RDUSER12,RDUSER22,RDUSER32 |
pdrels -r LOB01,LOB02,LOB03,RDUSER11,RDUSER21,RDUSER31,RDUSER12,RDUSER22,RDUSER32 |
Load data to TBL3, which defines a non-partitioning key index, using the parallel loading facility. Since a non-partitioning key index is being defined, batch creation of the index using the pdrorg command is required after you execute the pdparaload command. For batch creation of the index, you specify the index information file output by the pdparaload command.
CREATE TABLE "TBL3"( col001 int not null, col002 varchar(20), col003 char(32), col004 decimal(10,3) ) partitioned by col001 in ((RDUSER11) 10000000, (RDUSER21) 20000000, (RDUSER31)); |
CREATE INDEX "INDX3" ON "TBL3" (col001,col002) in ((RDUSER12),(RDUSER22),(RDUSER32)); |
CREATE INDEX "INDX4" ON "TBL3" (col003) in ((RDUSER13)); |
pdhold -r RDUSER11,RDUSER21,RDUSER31,RDUSER12,RDUSER22,RDUSER32,RDUSER13 |
source /users/data/input_file error=/users/rep/error_file idxwork /users/work sort /users/work report file=/users/rep/result_file |
pdparaload -d -b -i c -l p "TBL3" /users/cntl/control_file |
index "INDX4" RDUSER13 "/users/work/INDEX-INDX4-RDUSER13-faaG4MnMf" index "INDX4" RDUSER13 "/users/work/INDEX-INDX4-RDUSER13-caa34EnEc" index "INDX4" RDUSER13 "/users/work/INDEX-INDX4-RDUSER13-caa06MnMc" sort /users/work report file=/users/rep/result_file2 |
pdrorg -k ixmk -l p -t "TBL3" /users/cntl/control_rorg |
pdcopy -m /users/rdarea/mast/mast01 -M r -p /users/pdcopy/list/list01 -b /users/pdcopy/backup/backup01 -r RDUSER11,RDUSER21,RDUSER31,RDUSER12,RDUSER22,RDUSER32,RDUSER13 |
pdrels -r RDUSER11,RDUSER21,RDUSER31,RDUSER12,RDUSER22,RDUSER32,RDUSER13 |
There are two basic types of errors that can occur when the pdparaload command is executed, and two corresponding ways to handle them, as follows.
This section describes details about errors generated when the pdparaload command is executed and their handling. The handling procedure when an error occurs when the pdparaload command is executed is shown below.
Figure 6-6 Handling procedure when an error occurs during pdparaload command execution
Identify when the error occurred. Check whether the following two messages have been output.
Extract the KFPL00793-I message from the messages output to the OS console that executed the pdparaload command or to the syslog file, and then identify the RDAREAs that require handling. When extracting the message, use the following as search keywords.
KFPL00793-I Pdload execution abnormal terminated, table=user1."T1", RDAREA=RDAREA1, return code=8, pdload process id=1385412 (1212478) : KFPL00793-I Pdload execution abnormal terminated, table=user1."T1", RDAREA=RDAREA2, return code=8, pdload process id=1385433 (1212478) |
Take one of the following corrective actions for the identified RDAREA, depending on the return code of the KFPL00793-I message.
From the message log file, identify the IDs of the pdload control process and server process that the pdparaload command executed. Extract the process IDs by the following procedure.
1572976 2010/11/04 15:55:48 0mload1 lod KFPL00711-I pdloadm started, table=user1."T1", RDAREA=RDAREA1 1728690 2010/11/04 15:55:48 bes1 lod KFPL00711-I pdbes started, table=user1."T1", RDAREA=RDAREA1 |
Identify the nature of the error from the message log file. Extract the error message that requires handling from the message log file acquired in step 1 of (c) using the process IDs identified in step (c) as search keywords.
1572976 2010/11/04 15:55:48 0mload1 lod KFPL00711-I pdloadm started, table=user1."T1", RDAREA=RDAREA1 1572976 2010/11/04 15:55:48 0mload1 lod KFPL00704-I Pdload terminated, return code=8 |
1728690 2010/11/04 15:55:48 bes1 lod KFPL00711-I pdbes started, table=user1."T1", RDAREA=RDAREA1 1728690 2010/11/04 15:55:48 bes1 lod KFPL00709-I Error information file was created, file=/tmp/ERROR-4cd258f41728690 1728690 2010/11/04 15:55:48 bes1 lod KFPL00709-I Lobmid file was created, file=/tmp/LOBMID-T1-4cd258f41728690 1728690 2010/11/04 15:55:48 bes1 lod KFPL00702-I Pdload started, table=user1."T1", generation=0 1728690 2010/11/04 15:55:48 bes1 lod KFPL00710-I Index information file assigned, index=user1."T1NX", RDAREA="USER01", file=/tmp/INDEX-T1NX-USER01-GN0-daan_ylid 1728690 2010/11/04 15:55:48 bes1 lod KFPL00723-I 0 rows loaded, table=user1."T1", RDAREA="USER01" 1728690 2010/11/04 15:55:48 bes1 lod KFPLxxxxx-E YYYYYY |
After corrective action is taken, re-execute the data load in units of RDAREAs according to the procedure of step (e).
Re-execute the data load in units of RDAREAs using the pdload command. Here, we recommend re-using the pdload command control statement generated by the pdparaload command. The storage directory and file name of the pdload command control statement file generated by the pdparaload command are as follows. Create the pdload control statement file based on these.
The following is an example of the pdload command control statements and command lines when re-executing a data load in units of RDAREAs.
option divermsg=off source "RDUSER02" fes01:/users/data/input_file error=/users/rep/error_file_RDUSER02 lobdata /users/data/lob idxwork bes02 /users/work sort bes02 /users/work report file=/users/rep/result_file_RDUSER02 |
pdload -d -b -k f -i c -l p "TBL2" "/users/tmp/LOD_CTL_USER02_ TBL2_RDUSER02" |
All Rights Reserved. Copyright (C) 2012, 2015, Hitachi, Ltd.