5.4.3 source statement (specification of input data file information)

The source statement specifies information about the input data file.

Criterion
Be sure to specify this statement unless you are loading data only to the LOB columns of a table.
Rule
Specify the source statement in single line (with a maximum length of 1,023 bytes). You can specify the source statement only once in a control information file.
Organization of this subsection
(1) Format
(2) Explanation

(1) Format

source [RDAREA-area] [{server-name|host-name}:]
     {input-data-filename[,input-data-filename]... |(uoc)}
     [error=error-information-file-name]
     [errdata=error-data-filename[, output-rows-count]]
     [errwork=work-buffer-size-for-error-data-file-creation]
     [maxreclen=input-data-length]

(2) Explanation

(a) RDAREA-name

~<identifier> ((1-30))

When data loading is to be performed by RDAREA on a row-partitioned table, this option specifies the name of the RDAREA to be subject to data loading.

Rules
  1. For a table partitioned by key ranges or FIX hash values, HiRDB checks the data to determine whether or not it can be stored in the RDAREA. If the data falls beyond the specified storage range, an error results. However, for a table partitioned by flexible hash values, HiRDB stores the data as is without checking.
  2. The system treats an RDAREA name that is enclosed in double quotation marks (") as case sensitive; otherwise, the system treats it as all uppercase letters. If an RDAREA name contains a space, you must enclose it in double quotation marks.
(b) {server-name|host-name}

Specifies the name of the server or host containing the input data file(s).

server-name ~<identifier> ((1-8))
HiRDB/Single Server
Do not specify this option for a HiRDB/Single Server.
HiRDB/Parallel Server
Specify the name of the front-end server or back-end server containing the input data file. When you perform data loading on an audit trail table, you must specify the name of the server on the unit that contains the audit trail files. If you use the standby-less system switchover (effects distributed) facility and specify a guest BES (server subject to switchover), the utility assumes that the regular unit (on which that server is the running server) contains the input data files.
If you specified an RDAREA name, you can omit this information, in which case the system assumes the name of the server where the specified RDAREA is stored. If you omitted the RDAREA name, be sure to specify this option.
host-name ~<identifier> ((1-32))
HiRDB/Single Server
Specify the name of the host containing the input data file. This is the name of the host where the single server is located.
You can omit this option regardless of whether or not an RDAREA name is specified. When it is omitted, the system assumes the name of the host where the database load utility (pdload command) was executed.
If you are using the system switchover facility, specify the primary system's host name.
HiRDB/Parallel Server
Do not specify this option for a HiRDB/Parallel Server.
(c) input-data-file-name

~<pathname>

Specifies the absolute pathname of an input data file containing data to be input.

If there are multiple input data files, separate each file by a comma (,). For details about the input data file, see section 5.5 Input data file.

The tape device access facility is supported for input data files. For details about the tape device access facility, see 1.4.3 Tape device access facility.

HiRDB/Single Server
Create the input data file(s) in the server machine where the single server is located.
HiRDB/Parallel Server
Create the input data file(s) in the front-end server or a back-end server.
Rules
  1. The system checks the specified input data file for its accessibility before starting data loading. If access is denied, the system does not execute data loading.
  2. A file containing a byte order mark (BOM) cannot be used as an input data file. For pdload, use files that do not have a BOM.
  3. You can use the pd_utl_file_buff_size operand in the system definition to change the buffer size used for input/output processing of input data files.
(d) (uoc)

Specifies that a UOC is to be used to input/output the input data file. Note that when you perform data loading on an audit trail table, you must specify this operand because a system-provided UOC has to be used.

For details about UOCs that are created by the user, see 5.10 Using a UOC to load a table.

(e) error=error-information-file-name

~<pathname>

Specifies the absolute pathname of the file to which error information is to be output.

You must specify a file name at the server machine where the input data file is located. If the specified file dos not exist, the utility creates a file with the specified file name. For this reason, there is no need to create a file in advance. If you omit the error operand, the utility creates a file with a unique name in the following format:

\directory-name\ERROR-xxxxxxxxx

directory-name: Directory shown in Table 5-57 Directories to which pdload outputs files
ERROR-: Prefix for an error information file
xxxxxxxxx: Value obtained by converting the file creation time and process ID to a character string

The file name created by the utility is displayed in the KFPL00709-I message.

If the -e option is specified, the utility creates the error information file only when there is an error in the input data.

For a HiRDB/Parallel Server, information about the errors detected in the input data by a back-end server is created as a temporary file for the error information file. Such a file is created in each back-end server in the following format:

\directory-name\ERRTMP-xxxxxxxxx

directory-name: Directory shown in Table 5-57 Directories to which pdload outputs files

ERRTMP-: Prefix for an error information file

xxxxxxxxx: Value obtained by converting a unique file name, file creation time, and process ID to a character string

(f) errdata=error-data-filename[,output-rows-count]

Specifies that erroneous rows of data are to be output, if detected.

You can correct the rows of data that are output to the error data file and load them again as an input data file.

If the -e option is specified, the system ignores the specification of an error data file.

error-data-filename~<pathname>
Specifies the absolute pathname of the file to which erroneous row data is to be output. You need to create this file at the same server as the input data file. If omitted, the system does not output erroneous rows of data.
output-rows-count ~<unsigned integer> ((1-4294967295)) <<100>>
Specifies the maximum number of erroneous rows of data that can be output. If the number of erroneous rows exceeds the specified value, the system continues processing but outputs only the specified number of erroneous rows of data.
Rules
  1. If you specify an error data file, make sure that the lengths of the file name and the path name do not exceed the values obtained from the following formulas:
    Length of path name (bytes)
    Maximum path name length for the OS - 8
    Length of file name (bytes)
    Maximum file name length for the OS - 8
    When an error data file is used, temporary files for creating the error data file might be created to store error data. For the conditions under which such a temporary file for creating error data file is created and the file name that is used, see 5.6.3 Notes about referencing error information.
  2. The following limitations apply to the output results of the error data file:
    Input data file in the DAT format
    If the actual data is larger than the value specified in the maxreclen operand, the system does not output any data. If the maxreclen operand is omitted and there is at least 32 KB of data, the system does not output the data.
    Input data file in the binary format
    If the system is unable to edit one line of data due to erroneous length information in the variable-length character string data, the system outputs only the part of the data that was edited successfully. The system does not output fixed-length column data that is less than the defined length.
    LOB column input file
    The system outputs only the erroneous rows of data from the input data file to the error data file. To re-load data after correcting the input data file, you need to correct the LOB column input file according to the output sequence of the input data file.
  3. An index key value duplication error is not output to the error data file in the following cases:
    • It is created in the batch index creation mode.
    • For a HiRDB/Parallel Server, the processing is in the index update mode, the input data file is located at a server other than the server containing the table storage RDAREAs, and a buffer shortage occurred during the creation of the error data file.
  4. Output of an error data file is not applicable when data loading is executed on LOB columns or the input data file used is in binary format created by pdrorg.
  5. Even if an error occurs in a variable-length character string in a binary-format input data file, only the columns up to the one immediately preceding the error are output to the error data file. Note these points when checking the error data file.
(g) work-buffer-size-for-error-data-file-creation

~((0-2097152))

When specifying the errdata operand, specify the buffer size for creating an error data file in KB.

If the KFPL25222-W message is issued during the data loading specifying the errdata operand, and a part of the error data is not output to the error data file, specify this option to re-execute data loading.

If you specify a value of 0 in the errwork operand, neither key duplication errors nor invalid value errors for abstract data type columns are output, but data loading performance improves.

A KFPL25222-W message is issued when all of the following conditions are met:

  1. In a HiRDB/Parallel Server, the server name specified in the source statement is different from the name of the back-end server in which a table storage RDAREA is defined; in a HiRDB/Parallel Server, or, a table is partitioned by row and stored in multiple back-end servers.
  2. A unique key index or primary key index is defined for the table subject to data loading, or the table contains columns of abstract data type.
  3. One of the following is true:
    • The input data file contains data that is not to be stored in the database (data in a column structure information file for which a skipdata statement is specified; data that is longer than a defined column length; or data that matches a null comparison value and is treated as a null value).
    • Data is to be stored in a column of VARCHAR, NVARCHAR, MVARCHAR, BINARY, or BLOB data type.
    • Data is to be stored in an abstract data type for which the input parameter type of the constructor function is VARCHAR, NVARCHAR, MVARCHAR, BINARY, or BLOB.
Estimation formula
The following shows a formula for estimating the size of the work buffer for the error data file. This formula provides a value in bytes, but round it up to the nearest KB to specify the option.

Buffer size = {[Figure]X[Figure] (average length of database storage row + Y)[Figure]}
[Figure]average length of input data row[Figure] 2
[Figure]number of servers for which table storage RDAREAs are defined

X: Value of pd_utl_buff_size in the system common definitions [Figure] 1,024
Y: FIX table: 24
Non-FIX table: (number of columns + 1) [Figure] 4 + 24
For details about the average length of database storage row (how to calculate the number of table storage pages), see the HiRDB Version 9 Installation and Design Guide.
Note that the previous estimation formula is based on the average row length. Therefore, a buffer shortage may occur depending on the actual arrangement of row data. If there is enough memory, you can ensure output to an error data file by revising the formula as follows:
  • For non-FIX tables, set the average length of a database storage row to 0.
  • Set the average length of an input data row to the maximum length of input data.
(h) maxreclen=input-data-length
This option is applicable to an input data file created in DAT format, extended DAT format, binary format, or pdrorg-output binary format. When you are using the input data file in the fixed-length data format or a streaming tape device, this operand is ignored, if specified.
When the input data file is in DAT or extended DAT format ~<unsigned integer> ((0, 32-524288)) <<32>>
For the input data file in the DAT format, if a row of data exceeds 32 KB, this operand specifies the maximum data length per row in the input data file in KB.
If this operand is omitted and the input data file contains a row of data that is 32 KB or greater, or that is greater than the specified operand value, the system cancels the processing.
Rules
  1. If this operand is omitted or a streaming tape device is used, each row of data in the input data file (DAT format) must not exceed 32 kilobytes.
  2. If you are using an unload data file in DAT format that was output with the -W dat option specified in pdrorg, the maximum length is displayed in the KFPL22222-I message.
  3. If you specify 0 in this operand, pdload calculates the row length on the basis of the definition of the table subject to processing. Because the utility uses the table definition, if the input data file contains data that is not to be stored in the table, the value obtained by the utility does not match the input data length, resulting in an error. In this case, specify a non-zero value as the input data length.
    Data type or parameter typeCalculation value (bytes)Formula
    INTEGER12Sign + number of digits + separator character
    SMALLINT7Sign + number of digits + separator character
    DECIMAL(m, n)m + 3Sign + number of digits + decimal point + separator character
    FLOAT24Sign + mantissa part + decimal point + e + sign + exponent part + separator character
    SMALLFLT17Sign + mantissa part + decimal point + e + sign + exponent part + separator character
    DATE11Specification format + separator character
    TIME9Specification format + separator character
    INTERVAL YEAR TO DAY11Sign + number of digits + comma + separator character
    INTERVAL HOUR TO SECOND9Sign + number of digits + comma + separator character
    TIMESTAMP(p)19 + (p + 1) + 1Specification format + separator character
    CHAR(n), MCHAR(n), VARCHAR(n), or MVARCHAR(n)n + 3Number of characters + double quotation marks + separator character
    NCHAR(n), or NVARCHAR(n)(n[Figure] 2) + 3Number of characters + double quotation marks + separator character
    BINARY(n)n + 3Number of characters + double quotation marks + separator character
    BLOB1,025Maximum length of path name + separator character
When the input data file is in binary format or pdrorg-output binary format ~<unsigned integer> ((0, 32-2097152)) <<0>>
If you are executing data loading on a table with a BINARY column, specify the maximum row length in the input data in kilobytes. If you are executing data loading on a table with no BINARY columns, this operand is ignored, if specified.
Rules
  1. If this operand is omitted or 0 is specified, pdload obtains the maximum row length from the table definition and uses that value for processing. Note that if you have specified the input data length, but the value obtained from the table definition by pdload is smaller, the utility still uses the latter for processing.
  2. If the actual maximum data length is less than the maximum row length obtained by pdload from the table definition, specify this operand to minimize the area to be allocated.