5.1.1 What is data loading?

The database load utility reads user-provided data and stores it in a table. This is called data loading. The figure below provides an overview of data loading.

Figure 5-1 Overview of data loading

[Figure]

Organization of this subsection
(1) Input data file
(2) Control information file
(3) Table
(4) Data loading method by table attribute

(1) Input data file

An input data file contains user-provided data. The database load utility supports four different formats of files. Basically, one of the following two formats is used:

DAT format
In the DAT format, column data is described in characters. This format is generally referred to as the CSV format.
  • pdload converts column data to the internal HiRDB format before storing it; therefore, the DAT format is suitable for creating a table from data imported from a non-HiRDB system.
Example of data in the DAT format:

Jones,36,1958-10-15,Chicago
           .
           .

Additionally, there is the extended DAT format that enables you to use extended functions. Although the format is basically the same as the DAT format, the extended DAT format supports the extended functions, such as changing the enclosing characters.
Binary format
In the binary format, column data is stored in the internal HiRDB format.
  • The binary format is excellent in terms of performance, because it matches the internal HiRDB format, thereby requiring no format conversion. This format is suitable when high performance is required, such as for storing a large amount of data.
Example of data in the binary format:

928691ba814081408140000000243​f8000000008796​f6b6f68616d61
<------------------><------><------><------------------>
      Jones           36       1         Chicago
        :

Note
The upper row of the input data indicates the data and the lower row indicates data contents.

Additionally, you can use a format in which fixed-size data is specified on each line, or a format that is output by pdrorg, shown as follows:

Fixed-size data format
In the fixed-size data format, all lines have the same length and all the data items in a column begin at the same location (the same offset from the beginning of the line). Input data can be specified either in the DAT format or in the binary format.
  • The fixed-size data format is suitable for a table created from text data that is not delimited by separator characters, or created from binary-format data with a data storage sequence that needs to be changed.
pdrorg-output binary format:
In the pdrorg-output binary format, unload data files are output in binary format by using pdrorg.
  • The pdrorg-generated binary format is used to migrate data from another HiRDB system.

(2) Control information file

A control information file contains the pdload control statements.

These control statements specify an input data file, index information, LOB column information, file output destination directory, and other information. If you do not specify a file output destination directory in the control statements, files are output to the directory noted in 5.12(5) Directory to be used when no file output destination directory is specified in the control statements.

(3) Table

You need to define a table before you can perform data loading.

(4) Data loading method by table attribute

(a) Index defined for the table

You can create an index at the same time as data loading. Or, you can output only the index information during data loading and use pdrorg to create the index later.

(b) Row-partitioned table

You can execute data loading in units of tables or RDAREAs.

(c) Audit trail table

If you use the security audit facility, you can load an audit trail from the audit trail files into an audit trail table.

You use the srcuoc statement in the control information file to specify whether data loading is to be performed on a regular table (table defined by the user) or on an audit trail table. For details about the srcuoc statement, see 5.4.11 srcuoc statement (specification of UOC storage library information).