5.4.14 src_work statement (specification of the output destination of divided-input data files)

When you perform data loading in units of RDAREAs, you can use the src_work statement to create from the table input data files a separate input data file for each RDAREA (divided-input data files). The src_work statement is specified in advance of data loading in order to create divided-input data files; you do not specify it during data loading.

The figure below provides an overview of creating divided-input data files.

Figure 5-14 Overview of creating divided-input data files

[Figure]

Criteria
Specify the src_work statement when you intend to perform data loading in parallel for the RDAREAs of a row-partitioned table, but you have not prepared an input data file for each RDAREA.
The src_work statement eliminates the need for the user to create an input data file for each RDAREA.
Prerequisites
  1. For the input data files, you can use DAT-format (including extended DAT format) and fixed-length data-format files. The divided-input data files are output in the same format.
  2. To create divided-input data files, you must have the INSERT privilege for the table.
  3. The target table must be a row-partitioned table (there are no restrictions with respect to the partitioning method or column structure).
  4. A divided-input data file must be a regular file on a single volume. If the file already exists, it is overwritten; a new file is not created.
  5. The divided-input data files are created on the host that contains the input data files.
  6. The utility performs some data checking during creation of divided-input data files; however, checking for the following data errors is not performed:
    [Figure]Invalid cluster key order
    [Figure]Key duplication error
    [Figure]Invalid LOB column storage data#1
    [Figure] Invalid abstract-data type storage data#2
    #1: One of the following:
    [Figure]The LOB input file is not accessible.
    [Figure]The LOB column's data length is greater than the length defined for the LOB column.
    #2: One of the following:
    [Figure]There is invalid data that results in an error during plug-in function data checking.
    [Figure]BLOB-type argument error (same as the errors for #1).
  7. Divided-input data files are created under the following names:
    For a HiRDB single server configuration:
    directory-name-specified-in-src_work-statement + input-data-file-name + RDAREA-name
    For a HiRDB parallel server configuration:
    directory-name-specified-in-src_work-statement + input-data-file-name + server-name + RDAREA-name
    You must ensure that the absolute path name plus file name do not exceed the maximum length supported by the OS.
    When multiple input data files are specified, input-data-file-name represents the first file name.
    RDAREA-name represents the name of the table storage RDAREA (if the inner replica facility is used, the original RDAREA name).
  8. pdload with the src_work statement specified does not access the target table (the target table is not locked).
Organization of this subsection
(1) Format
(2) Explanation
(3) Notes
(4) Examples

(1) Format

 src_work divided-input-data-file-output-target-directory

(2) Explanation

(a) divided-input-data-file-output-target-directory

Specifies the absolute path name of the directory to which the divided-input data files are to be output.

The specified directory must be located at the server or host that contains the input data files. Access privileges for the specified directory must have been granted to the HiRDB administrator.

(3) Notes

  1. Names of the divided-input data files
    If pdload is executed using an input data file name that is duplicated for multiple tables defined in the same table storage RDAREA, the names of the divided-input data files will also be duplicated and the file contents will not be reliable. For this reason, you must ensure that directory-name-specified-in-src_work-statement or input-data-file-name is unique.
  2. Table containing LOB columns
    Creation of divided-input data files is simply a matter of subdividing the input data files; it does not involve accessing any LOB input files. In the case of a table with a LOB column, you must place the LOB input file at a location that can be referenced by a method such as NFS from the host where pdload is executed.
  3. Handling of DECIMAL-type data in fixed-size data format input data files
    When the facility for conversion to a decimal signed normalized number is used, the DECIMAL-type values that are output to the divided-input data files are normalized.
  4. If there is no data to be stored in a particular RDAREA, the utility creates a divided-input data file whose length is 0 bytes.
  5. The automatic numbering facility cannot be used when divided-input data files are used.
  6. Because pdload with the src_work statement specified does not compress data, column data in compressed columns is output to the divided-input data file without being compressed.
  7. If pdload is executed on a table consisting of more than 1,024 table storage RDAREAs in HiRDB for Solaris (excluding 64-bit mode), pdload performs a partial execution that creates files for 1,024 RDAREAs multiple times to create the divided-input data files for all RDAREAs. Therefore, the messages shown below are issued n times#. You can reduce the number of times these messages are output by specifying the -m option during execution of pdload. For details, see 5.4.2(18) [-m[ progress-message-output-interval][,information-message-output-suppression-level]].
    [Figure]KFPL00700-I
    [Figure] KFPL00701-I
    [Figure] KFPL00702-I
    [Figure] KFPL00705-I
    [Figure] KFPL00709-I
    If the error operand is not specified in the source control statement, n# error information files are created. You must specify the error operand in the source control statement if you do not want to create multiple error information files. For details, see 5.4.4(2)(e) error=error-information-file-name.
    #
    n = number of table storage RDAREAs[Figure] 1,024
    Round up any decimal places to an integer.

(4) Examples

(a) Example of creating divided-input data files and performing data loading in units of RDAREAs

This example performs parallel data loading into a row-partitioned table (T1) in units of RDAREAs. Because an input data file (inputfile) is available for the table, the example creates divided-input data files in order to execute data loading in units of RDAREAs. The example then shows execution of parallel data loading in units of RDAREAs.

1. Defining the table

CREATE FIX TABLE T1 (C1 DEC,
                     C2 CHAR(10))
                    IN ((RDAREA1) C1 > 1000,(RDAREA2) C1 < -1000,(RDAREA3));

2. Creating divided-input data files
Explanation
  1. Specifies the names of the input data file and error information file.
  2. Specifies the name of the directory where the divided-input data files are to be created.
3. Executing parallel data loading in units of RDAREAs
As a result of 2. Creating divided-input data files, the following three divided-input data file are created:
  • /divwork/inputfile_BES1_RDAREA1
  • /divwork/inputfile_BES2_RDAREA2
  • /divwork/inputfile_BES3_RDAREA3
These files are used as the input data files in executing parallel data loading in units of RDAREAs.
(b) Example of using an input data file in the all-column fixed-length binary format

Binary-format input data files cannot be used to create divided-input data files. However, if all columns in the table are of the fixed-length data type,# the table can be handled as if it were in the fixed-size data format.

#: Applicable data types are as follows:
INTEGER, SMALLINT, DECIMAL, FLOAT, SMALLFLT, DATE, TIME, INTERVAL YEAR TO DAY, INTERVAL HOUR TO SECOND, CHAR, NCHAR, MCHAR, TIMESTAMP
1. Defining the table

CREATE FIX TABLE T2 (C1 DEC(7),
                     C2 CHAR(10)) FIX HASH HSAH6 BY C1
                    IN (RDAREA1,RDAREA2);

2. Creating divided-input data files
Explanation
Specifies the table component columns C1 and C2 and their data types.
Explanation
  1. Specifies the names of the input data file and error information file.
  2. Specifies the name of the directory where the divided-input data files are to be created.
3. Executing parallel data loading in units of RDAREAs
As a result of 2. Creating divided-input data files, the following two divided-input data file are created:
  • /divwork/inputfile_BES1_RDAREA1
  • /divwork/inputfile_BES2FES_RDAREA2
These files are used as the input data files in executing parallel data loading in units of RDAREAs.