2.89 pdparaload (Perform parallel loading)

Organization of this section
(1) Function
(2) Executor
(3) Format
(4) Options
(5) Rules
(6) Notes

(1) Function

The pdparaload command performs data loading (pdload command) from a single input data file concurrently on multiple RDAREAs that constitute a row-partitioned table (parallel loading facility).

(2) Executor

User with the same execution privilege as for the pdload command

(3) Format

 pdparaload [pdload-command-options] [-I execution-interval] [authorization-identifier.]table-identifier

             pdparaload-control-statements-file-name

(4) Options

(a) pdload-command-options

Specifies the pdload command's options. The pdparaload command uses the specified options when it executes the pdload command.

Some pdload command options cannot be specified in the pdparaload command. The table below shows whether each pdload command option can be specified in the pdparaload command.

If an unsupported option is specified, the pdparaload command terminates with an error during data loading.

Table 2-23 pdload command options that can and cannot be specified

No.OptionSpecification in pdparaloadDescription
1-dYExecutes the pdload command in the creation mode.
2-a|-b|-UY
(-w all cannot be specified)
Specifies the format of the input data file.
Note that -w all (data loading using a table transfer unload file that involves table and index definitions) cannot be specified.
3-iYSpecifies the index creation method.
4-lYSpecifies the update log acquisition mode for the database during execution of the pdload command.
5-kYSpecifies the data input method for storing LOB data in a LOB column, if a LOB parameter is used as an argument of the constructor function that generates the values to be stored in an abstract data type column.
With the pdparaload command, we recommend that you use the -k f option to create files.
If this option is specified, one file is created for each item of LOB data. Therefore, duplicate LOB data will not be read during parallel processing, thereby reducing the number of I/O operations. However, when this option is specified, a LOB middle file is created during data loading for each RDAREA. The created LOB middle file is retained even after the pdparaload command terminates, so the user must delete the LOB middle file after termination of the pdparaload command.
6-cYSpecifies the name of the column structure information file.
7-vYSpecifies the name of the null value/function information file.
8-nYSpecifies the number of buffer sectors when a local buffer is used.
9-uNThe authorization identifier of the user who executes the pdload command cannot be specified.
10-xYSpecifies that checking for whether the input data is in ascending or descending order of the cluster key values is not to be performed.
11-fNEasyMT cannot be specified for the input data file or LOB input file.
12-sYSpecifies that the separator character between data items is to be changed for an input data file in DAT format.
13-eYSpecifies that processing is to be cancelled if an error is detected in the input data.
14-rYSpecifies that data input is to begin at a specified line, not at the beginning of the input data file.
15-zYSpecifies that variable-length character string data, variable-length national character string data, and variable-length mixed character string data with a length of 0 is to be stored.
16-yYSpecifies that data is to be stored in unused area in used pages during data loading if all unused pages become completely full.
17-oNThe index information file specified in the index statement cannot be deleted automatically after batch index creation has terminated normally.
18-mYSpecifies the interval for display of the message indicating the progress of the current process.
19-XYSpecifies the response monitoring time for dictionary manipulation performed by commands in order to detect failures.
20-qYSpecifies the generation number of the RDAREAs subject to data loading when the inner replica facility is used.
21-KYSpecifies the format of input data values for XML-type parameters.
22-GYSpecifies the type of XML data specified as the input data file (XML document or ESIS-B format).
23-FYSpecifies that an input data value of the FLOAT or SMALLFLT type is to be corrected if it is outside the value range permitted by the OS.
24-ENData loading to a table that is expanded to the memory database cannot be performed forcibly.
Legend:
Y: Can be specified in the pdparaload command
N: Cannot be specified in the pdparaload command
(b) -I execution-interval ~<unsigned integer>((10~600000))<<1000>> (milliseconds)

Specifies the execution interval between data loading sessions because the pdparaload command performs data loading separately for each RDAREA.

When you specify an execution interval, only one pdload command can be executing at the same time, which avoids concentration of accesses to the data dictionary table that could result in an execution wait status.

Guideline for the value to be specified

Normally, the default value is used.

If pdload's preprocessing time exceeds the specified value, add 1,000 each time you re-execute the pdparaload command.

(c) [authorization-identifier.]table-identifier ~<identifier>((1-30 bytes))

Specifies the table identifier of the table on which data loading is to be performed by using the pdparaload command. The specification rules are the same as for the pdload command. For details about [authorization-identifier.]table-identifier in the pdload command, see 5.4.2(25) [authorization-identifier]table-identifier.

(d) pdparaload-control-statements-file-name ~<path name>((1-1023 bytes))

Specifies the path of the pdparaload control statements file. This file contains the pdload command control statements that are to be executed by the pdparaload command. For details about the pdload command control statements, see 5.4 Command format. Note that some pdload command control statements cannot be specified in the pdparaload control statements file. The table below shows whether each pdload command control statement can be specified in the pdparaload control statements file.

If an unsupported control statement is specified, a control statement error occurs during data loading because the pdparaload command does not check the control statements.

Table 2-24 Whether pdload command control statements can be specified

No.Control statementOptionSpecification in pdparaloadDescription
1mtguide--NA tape device cannot be used.
2emtdef--N
3sourceRDAREA-areaNThe user cannot specify RDAREA-area because it is specified by the pdparaload command.
4server-name|host-nameFor a HiRDB parallel server configuration: M
For a HiRDB single server configuration: O
This option must be specified for a HiRDB parallel server configuration. If you perform data loading without specifying this option, files on the servers that contain each RDAREA will be processed as input data files.
5input-data-file-nameMSpecifies the absolute path name of the input data file that contains the input data.
6(uoc)OSpecifies that a UOC is to be used to input/output the input data file.
7errorOSpecifies the absolute path name of the file to which error information is to be output.
You must pay attention to the length of the specified path name because the pdparaload command adds an RDAREA name and "" to this file name during data loading. For details, see 4 in Notes.
8errdataOSpecifies that the erroneous rows of data are to be output if the input data results in an error.
You must pay attention to this file name because the pdparaload command adds an RDAREA name and "" to this file name during data loading. For details, see 4 in Notes.
9errworkOSpecifies the buffer size (in kilobytes) for creating an error data file when the errdata option is specified.
10maxreclenOSpecifies the input data length when the input data file is in DAT format, extended DAT format, binary format, or pdrorg-output binary format.
11EasyMT-informationNEasyMT cannot be specified because it is not supported.
12validate-signN
13index--NInformation about an index information file cannot be specified.
14idxworkserver-nameOSpecifies the name of the server at which the index information file is to be created.
15directory-nameYSpecifies the absolute path name of the directory in which the index information file is to be created.
16sortserver-nameOSpecifies the name of the server in which sort work files are to be created.
17directory-nameYSpecifies the absolute path name of the directory under which sort work files are to be created.
18lobdataLOB-input-file-nameN#Specifies the LOB information when loading data to a table containing LOB columns or entering LOB data as an input parameter for a constructor function.
If a BLOB column or a column of abstract data type with the BLOB parameter is defined for the target table and f or v is specified in the -k option, the lobdata statement must be specified. If the lobdata statement is omitted, data loading is performed only for the base table; BLOB data will not be loaded.
19LOB-input-file-directory-nameO#
20EasyMT-informationN#
21lobcolumn--NLOB input files by the column cannot be used.
22lobmid--NLOB middle files cannot be specified.
LOB middle files are created in the work directory under the following name:
/work-directory/LOBMID-xxxxxxxxx
xxxxxxxxx is a name containing each process's execution time and process ID. Such a file is created for each RDAREA.
23srcuoc--OSpecifies UOC information in order to use a UOC to edit data and then store the data in a database.
24array--OSpecifies the handling of the array data format and null values specified in the input data file for a table containing repetition columns.
25extdat--OUses extended functions with input data files in DAT format.
26src_work--NDivided-input data files cannot be created.
27constraint--OSpecifies settings for check pending status.
28optionspacelvlOSpecifies whether space conversion is to be executed on the input data.
29tblfreeOChanges the percentage of free space specified with CREATE TABLE (value of PCTFREE) for storing data during data loading.
30idxfreeOChanges the percentage of unused area specified with CREATE INDEX (value of PCTFREE) when creating indexes in the batch index creation mode.
31jobNData loading with the synchronization point specification is not supported.
32cutdtmsgOSpecifies whether a warning message is to be output to the error information file if data truncation occurs during data loading on an input data file in DAT format.
33nowaitNNOWAIT search is not supported.
34bloblimitOSpecifies the size of the area for retaining data when executing data conversion using a pdrorg-output binary-format input data file before loading data.
35exectimeOSpecifies an interval in minutes for monitoring the pdload execution time.
36null_stringOSpecifies whether the default value set in the DEFAULT clause or the null value is to be stored when input data is the null value ("*" or omitted) during data loading on a table with the DEFAULT clause specified.
37dataerrOSpecifies that data storage processing is to be ignored (rollback) when an input data error (logical error) is detected.
If a non-partitioning key index is used and an option other than -i s (index update mode) is specified, a control statement error occurs during each data loading session because data loading is performed for each RDAREA.
38lengoverOSpecifies that an input data error is to be detected when the input data in a DAT-format (including extended DAT format) input data file that is to be stored in a CHAR, VARCHAR, NCHAR, NVARCHAR, MCHAR, or MVARCHAR data type column is longer than the defined column length.
39divermsgNThe user cannot specify divermsg because it is specified by the pdparaload command.
40srcendianOSpecifies that pdrorg-output binary-format files are to be used to transfer data between platforms that use different endians.
41allspaceOSpecifies that the character data to be stored in a numeric data type column (INTEGER, SMALLINT, DECIMAL, FLOAT, or SMALLFLT) is to be converted to 0 and then stored when input data files in the fixed-size data format are used for data loading.
42whitespaceOSpecifies how to handle spaces contained in XML documents when data loading is performed from XML documents to XML-type columns.
43seq_rangeOSpecifies how to acquire the sequence number when the automatic numbering facility is used for data loading.
44file_buff_sizeOSpecifies the input buffer memory size when data is loaded from the input data file to an input buffer.
45charsetOSpecifies the endian for input data files when data is loaded from input data files for UTF-16 to a table defined in a UTF-8 environment.
Legend:
M: Specification is mandatory.
Y: Must be specified if the control statement is specified.
O: Specification is optional.
N: Cannot be specified.
--: Not applicable
#
If a BLOB column or a column of abstract data type with the BLOB parameter is defined for the target table and f or v is specified in the -k option, the lobdata statement must be specified. The table below provides the details:
No.-k optionBLOB columnColumn of abstract data type with BLOB-type input parameter
1dNN
2fYY
3vYN
Legend:
Y: The lobdata statement is required.
N: The lobdata statement is not required.

(5) Rules

  1. You can execute the pdparaload command only while HiRDB is active.
  2. The pdparaload command can perform data loading only on row-partitioned tables.
    The pdparaload command does not support data loading on a table for which flexible hash partitioning is defined (including flexible hash partitioning defined for matrix partitioning). If a table for which flexible hash partitioning is defined is specified, the pdparaload command terminates with an error.
  3. The RDAREAs constituting the row-partitioned table must be in a status in which the database load utility (pdload) can be executed. For details about RDAREA status, see C.2 Availability of utility or UAP execution depending on RDAREA status.
  4. In order to start the pdparaload command, you need as many locked resources as the total number of locked resources required for data loading for all the RDAREAs. For details about the number of locked resources required for data loading for an RDAREA, see 5.4.2(4)(a) Notes.
  5. The pdparaload command cannot be executed more than once concurrently on the same table. You can execute the pdparaload command concurrently on different tables, but those tables must not share any RDAREAs. The pdparaload command locks each RDAREA internally in order to perform data loading for the RDAREA. If the pdparaload command is executed on more than one table stored in the same RDAREA, the command terminates with a lock error.
    For details about the lock mode for data loading for an RDAREA, see B.2 Lock mode for utilities.

(6) Notes

  1. The following are the pdparaload command's return codes:
    0: Normal termination
    4: Normal termination (database storage processing was skipped because a logical error occurred in some of the input data)
    8: Abnormal termination
  2. The pdparaload command generates a name for the pdload control statements file under the following naming convention:
    LOD_CTL_authorization-identifier_table-identifier_RDAREA-name
    If a file with the same name already exists, the command terminates with an error.
  3. If a file with the file name generated by the pdparaload command from the following control statements already exists, the command overwrites the existing file:
    • Error information file name specified in error in the source statement
    • Error data file name specified in errdata in the source statements
  4. The pdparaload command adds an RDAREA name and enclosing double quotation marks ("") to the files specified in the source statement. You must specify path names and file names so that their lengths satisfy the rules described in the following.
    • For the source statement
      The lengths of path names and file names specified in the error and errdata operands of the source statement must be no greater than the values obtained from the formulas shown below:
      In the error operand:
      [Figure] Length of a path name (bytes)
      Maximum length of a path name for the OS - (length of RDAREA name + 1)
      [Figure]Length of a file name (bytes)
      Maximum length of a file name for the OS - (length of RDAREA name + 1)
      In the errdata operand:
      [Figure] Length of a path name (bytes)
      (Maximum length of a path name for the OS - 8) - (length of RDAREA name + 1)
      [Figure]Length of a file name (bytes)
      (Maximum length of a file name for the OS - 8) - (length of RDAREA name + 1)
      Note also that the length of a row of source statement must not exceed 1,023 bytes. If such a row exceeds 1,023 bytes, the command terminates with an error.
      The following are rules for the RDAREA name and "" that are added by the source statement.

      [Figure]