8.3.3 Format of database load utility input files

If you specify the -W option at the time of unloading, you can use the unload data file as the input data file to the database load utility.

Organization of this subsection
(1) DAT and extended DAT formats
(2) Binary format
(3) Fixed-size data format

(1) DAT and extended DAT formats

In the DAT and extended DAT formats, data is output as character string data. For details about the differences between the DAT and extended DAT formats, see (c) Differences between DAT and extended DAT formats. Except for the cited differences, the DAT and extended DAT formats are identical.

To output data, use the following rules:

  1. The system outputs one table row as one line in the file (a linefeed character is output at the end of each line).
  2. The system inserts a separator character between column data. For details about column data output formats (in DAT format) for various data types, see 5.5.1 DAT format.
    Character string, national character, mixed character string, and BINARY data are enclosed in double quotation marks (") in the output format.
    The output format of an abstract data type consists of as many data items as there are specified by the constructor parameter reverse creation function, each of which is delimited by a separator character. If the constructor parameter reverse creation function's return value is BLOB, the system outputs consecutive separator characters as the null value.
  3. The system outputs LOB column data as the null value, regardless of the specification of the -j option.
  4. If the constructor parameter reverse creation function's return value is BLOB, the system outputs the column data of abstract data type as the null value.
  5. An error results if the line length exceeds 512 MB after conversion. If you assign a streaming tape device to the unload data file and the line length exceeds 32 KB, an error results.
  6. If you specify the spacelvl=3 and sup options in the option statement, the system converts two consecutive single-byte spaces of NCHAR type and suppresses the output of a pair of single-byte spaces.
  7. If the data type of a column is character string, national character data, mixed character string, or binary data, and the column value contains a combination of the enclosing character (") + separator character or the separator character + enclosing character ("), the utility may treat the value as the end or start of column data during data loading. To avoid this, use the -W option to change the separator character during unloading or output the data in the binary format.
  8. For a table with the FIX attribute for which reserved columns are defined, the reserved columns cannot be output.
(a) Format for repetition columns

For repetition columns, the system outputs data from each element by separating it with separator characters. The row data format depends on what is specified in the array statement in the control information file. Figure 8-13 Row data output format with ff specified in the array statement shows the row data output format when ff is specified in the array statement: Figure 8-14 Row data output format with vv specified in the array statement shows the row data output format when vv is specified in the array statement.

Figure 8-13 Row data output format with ff specified in the array statement

[Figure]

Explanation
  1. Data is stored in all the elements.
  2. Elements 1 to 6 contain data, and element 5 contains a null value. Elements 7 to 10 contain either no data or a null value.
  3. Data is stored in all the elements. Elements 3, 8, and 9 contain a null value.
  4. Either the column value is null or all elements contain a null value.

    Figure 8-14 Row data output format with vv specified in the array statement

    [Figure]

Explanation
  1. Data is stored in all the elements. A value of 10 is output as the current number of elements.
  2. Elements 1 to 6 contain data, and element 5 contains a null value. A value of 6 is output as the current number of elements.
  3. Data is stored in all the elements. Elements 3 and 8 to 10 contain null values. A value of 10 is output as the current number of elements.
  4. Because a null value is stored in the column value, a value of 0 is output as the current number of elements.
  5. All elements contain a null value. A value of 10 is output as the current number of elements.
(b) Format used when the sup option is specified

If you specify the sup option during unload operation, the system compresses any trailing spaces in any column that is shorter than the column length specified in the table definitions and then outputs them to the unload data file. The following shows the format depending on the specification of the sup option:

Data type: CHAR or MCHAR
Example 1
  • Table definition:

CREATE TABLE T1(C1 INTEGER,C2 CHAR(10),...);

Input data: 1,A,A,...
sup option: Specified

1,"AA",...

Explanation: The system compresses the spaces (equal to eight characters) in a column that is shorter than the defined length.
sup option: Not specified

1,"AA........",...

Note: A period (.) indicates a single-byte space.
Explanation: The system outputs single-byte spaces (.) (equal to eight characters) to pad the column that is shorter than the defined length.
Example 2
  • Table definition:

CREATE TABLE T1(C1 INTEGER,C2 MCHAR(10),...1,);

Input data: 1,..........,... (a period (.) indicates a single-byte space)
sup option: Specified

1,".",...

Explanation: The system outputs one single-byte space.
sup option: Not specified

1,"..........",...

Explanation: The system outputs as many single-byte spaces as there are defined for the table.
Data type: NCHAR
Example 1
  • Table definition:

CREATE TABLE T1(C1 INTEGER,C2 NCHAR(10),...);

Input data: 1,A,A,...
sup option: Specified

1,"AA",...

Explanation: The system compresses the spaces (equal to eight characters) in a column that is shorter than the defined length.
sup option: Not specified

1,"AA[Figure]",...

Note: A triangle ([Figure]) indicates a double-byte space.
Explanation: The system outputs double-byte spaces ([Figure]) (equal to eight characters) to pad the column that is shorter than the defined length.
Example 2
  • Table definition:

CREATE TABLE T1(C1 INTEGER,C2 NCHAR(10),...1,);

Input data: 1, [Figure],... (a triangle ([Figure]) indicates a single-byte space)
sup option: Specified

1,"[Figure]",...

Explanation: The system outputs one double-byte space.
sup option: Not specified

1,"[Figure]",...

Explanation: The system outputs as many double-byte spaces as there are defined for the table.
Data type: NCHAR (spacelvl=3 specified in the option statement)
Example 1
  • Table definition:

CREATE TABLE T1(C1 INTEGER,C2 NCHAR(10),...);

Input data: 1,A,A,...
sup option: Specified

1,"AA",...

Explanation: The system compresses the spaces (equal to eight characters) in a column that is shorter than the defined length.
sup option: Not specified

1,"AA................",...

Explanation: The system outputs single-byte spaces (equal to eight characters) to pad the column that is shorter than the defined length.
Example 2
  • Table definition:

CREATE TABLE T1(C1 INTEGER,C2 NCHAR(10),...1,);

Input data: 1,[Figure],... (a triangle ([Figure]) indicates a single-byte space)
sup option: Specified

1,"..",...

Note: A period (.) indicates a single-byte space.
Explanation: The system outputs two single-byte spaces.
sup option: Not specified

1,"....................",...

Note: A period (.) indicates a single-byte space.
Explanation: The system outputs as many single-byte spaces as there are defined for the table.
(c) Differences between DAT and extended DAT formats

The following table describes the differences between the DAT and extended DAT formats:

ItemDAT formatExtended DAT format
When the data contains the null character (\0) or linefeed character (\n)The row containing the corresponding column data value is not output.The row containing the corresponding column data value is output (\0 and \n are output as is).
When the enclosing character (") is used as part of the dataThe enclosing character used as part of the corresponding column data value is output as is.Two consecutive enclosing characters are output for an enclosing character that is used as part of the column data value.#
Example: "[Figure] ""
#: Because multi-byte characters are also subject to conversion, you must pay special attention if the file contents are to be referenced, such as by an application program. Data loading can be performed because pdload converts "" to " whether or not the data contains multi-byte characters.
(d) Creating pdload unload files in extended DAT format with UTF-16 as the character encoding

If you are creating a pdload unload file in extended DAT format with UTF-16 as the character encoding and the following data is stored in the table, you must take an action when you unload the data:

Data type of columnData contents
BINARY typeData consisting of an odd number of bytes
Abstract data type that is unloaded by using a constructor parameter reverse creation function that returns BINARY type dataData whose value returned by the constructor parameter reverse creation function consists of an odd number of bytes

You can determine whether a table stores the above data by executing the SQL statement shown below. Execute the SQL statement for all BINARY type and abstract data type columns that are unloaded by using a constructor parameter reverse creation function that returns BINARY type data.

SELECT COUNT(*) FROM [authorization-identifier.]table-identifier
                           WHERE constraint-expression [OR constraint-expression ...]

constraint-expression ::= {  MOD(LENGTH(BINARY-type-column-name),2)=1
           | MOD(LENGTH(function-name(column-name[,column-name ...])),2)=1 }
function-name ::= name-of-constructor-parameter-reverse-creation-function-that-returns-BINARY-type-data

(2) Binary format

In the binary format, data is output in an internal format based on the database column definition. If this type of file is used as an input to the database load utility, the unloaded table and the data loading target table must have the same column definitions. For a non-FIX table, data cannot be loaded into a FIX table, even if the column definitions are the same.

The output format is as follows:

[Figure]

CMT header

If data is output to CMT, a 512-byte CMT header is output first.

[Figure]Row data

The output format is different for FIX tables than it is for non-FIX tables. For details about the output format (binary format) of column data for each data type, see 5.5.2 Binary format.

[Figure]For FIX tables
The system outputs data values for columns continuously without boundary adjustment. The figure below shows the output format for FIX tables.

Figure 8-15 Output format for FIX tables

[Figure]
[Figure]For non-FIX tables
The system outputs data values sequentially without adjusting boundaries. The figure below shows the output format for non-FIX tables.

Figure 8-16 Output format for non-FIX tables

[Figure]
The figure below shows the output format for the control section.

Figure 8-17 Output format for the control section of a non-FIX table

[Figure]
  • Data length section
    Contains the length from the beginning of the data length section to the end of the data section as a binary value (4 bytes). This corresponds to L in Figure 8-17 Output format for the control section of a non-FIX table.
  • Data offset section
    The data offset section is output continuously in the order of column definitions without boundary adjustment. If the target column has an abstract data type, as many offset values are output continuously without boundary adjustment as there are constructor reverse creation functions (value specified in the unld_func statement).
  • Offset value
    Contains the length from the beginning of the data length section to the beginning of the data value for each column (data value generated by the constructor parameter reverse creation function) as a binary value (4 bytes). If the data value is the null value, 0 is output.
The data section consists of the base data section, BINARY attribute data section, BINARY column data section, BLOB attribute data section, and BLOB column data section, as shown in Figure 8-16 Output format for non-FIX tables.
Base data section
Contains a data length, column's data value, and data value returned by the constructor parameter reverse creation function up to the data length (L in Figure 8-17 Output format for the control section of a non-FIX table) starting from the end of the control section (offset value for the data). If the corresponding column's offset value is 0 (null value), the base data section is not output. Note that the order of data values and the data value returned by the constructor parameter reverse creation function for each column is not guaranteed (you can reference the start location of each data value by the offset value). The table below shows the value for each column and the data value returned by the constructor parameter reverse creation function that are output to the base data section.

Table 8-3 Value output to the base data section

Column attributeColumn's data typeData type returned by constructor parameter reverse creation functionValue output to the base data section
Repetition column----#
Other columnAbstract data typeBINARY typeLength returned by the constructor parameter reverse creation function
BLOB typeLength returned by the constructor parameter reverse creation function
OtherData value returned by the constructor parameter reverse creation function
BINARY type--Length of data value for the column
BLOB type--Length of data value for the column
Other--Data value for the column
Legend: --: Not applicable
#
For a repetition column, the value consists of the number of elements and the element data and is output to the base data section continuously without boundary adjustment. The figure below shows the output format for the base data section for repetition columns.

Figure 8-18 Output format for the base data section for repetition columns

[Figure]
  • Number of elements
    Contains the number of data elements in the repetition column as a binary value (2 bytes) in the range from 1 to 30,000. If the number of elements is 0, pdload treats it as erroneous data. Therefore, specify 0 for the offset value (null value), not for the number of elements.
  • Element data
    Contains element values (null flag and data value) for individual elements continuously in the order stored in the database without boundary adjustment. If the number of elements is 0, no element data is output.
  • Element value
    Contains a binary value (1 byte) as the null flag that indicates whether the element corresponding to the first byte is the null value. If the first byte is the null value, 0x01 is output; otherwise, 0x00 is output.
    Following the null flag, as many data values as there are data value lengths are output without boundary adjustment
BINARY attribute data section and BLOB attribute data section
Contains a BINARY or BLOB data value (without the length) returned by the constructor parameter reverse creation function specified in the unld_func statement for the abstract data type when the table contains a column of an abstract data type and the data type returned by the constructor parameter reverse creation function is BINARY or BLOB. If no constructor parameter reverse creation function is specified in the unld_func statement, the BINARY or BLOB attribute data section is not output.
For the BINARY or BLOB attribute data section, as many data values as there are lengths output to the base data section are output in the order of column definitions and the order of constructor parameter reverse creation functions specified without boundary adjustment. If the value returned by the constructor parameter reverse creation function is the null value, the corresponding data value is not output.
Figure 8-19 Output format for the BINARY attribute data section (example) shows the output format for the BINARY attribute data section in the example shown below.
Example
Table definition:

CREATE TABLE T1(c1 ADT1,C2 INTEGER,C3 ADT2)

ADT1: Abstract data type (ADT1)
ADT2: Abstract data type (ADT2)
Specification of the unld_func statement:

unld_func type=ADT1,func=adt1_func1(ADT1),adt1_func2(ADT1),adt1_func3(ADT1)
unld_func type=ADT2,func=adt2_func1(ADT2),adt2_func2(ADT2),adt2_func3(ADT2)

adt1_func1/adt1_func3: Constructor parameter reverse creation functions that return data of the BINARY type
adt1_func2: Constructor parameter reverse creation function that returns data of the INTEGER type
adt2_func1/adt2_func3: Constructor parameter reverse creation functions that return data of the BINARY type
adt2_func2: Constructor parameter reverse creation function that returns data of the INTEGER type

Figure 8-19 Output format for the BINARY attribute data section (example)

[Figure]
BINARY column data section and BLOB column data section
Contains data values for a column of the BINARY or BLOB type (without the length) for a table that contains a BINARY or BLOB type column. If the table contains no column of the BINARY type, the BINARY column data section is not output. If the table contains no column of the BLOB type or the -j option is omitted from the command line, the BLOB column data section is not output.
For a column of the BINARY or BLOB type, as many data values as there are lengths output to the base data section are output in the order of column definitions without boundary adjustment. If a value for the corresponding column is the null value, that data value is not output.

(3) Fixed-size data format

In the fixed-size data format, data is output as fixed-length character string data.

(a) Output format
Legend:
--: Not applicable
#1
If an unload data file is output using a character encoding that is not HiRDB's default character encoding, its output length in bytes is doubled.
#2
The following table shows the output lengths for the character string data types:
No.Character encoding of the unload data fileColumn's character setenclose operandOutput length (bytes)
1HiRDB's default character encodingUTF-16Omitteddefinition length[Figure] 2
2Specifieddefinition length[Figure] 2 + 2
3Other (including when the specification is omitted)Omitteddefinition length
4Specifieddefinition length + 2
5OtherUTF-16Omitteddefinition length
6Specifieddefinition length + 2 [Figure] 2
7Other (including when the specification is omitted)Omitteddefinition length[Figure] 2
8Specifieddefinition length[Figure] 2 + 2 [Figure] 2
#3
If you have specified the enclose operand in the fixtext_option statement, add 2 bytes to the output length.
(b) Rules
  1. A maximum of 512 megabytes (536,870,912 bytes) can be output per row (including linefeed characters).
  2. If the table contains a column of the abstract data type, you must specify the unld_func statement.
  3. If you convert data of the FLOAT or SMALLFLT type to character string format, the data value might be rounded up. As a result, a data conversion error might occur in the database load utility (pdload). In such a case, you can load the data by specifying the -F option in pdload.
  4. In the case of a FIX table for which reserved columns are defined, the reserved columns cannot be output.