5.1.1 What is data loading?
(1) Input data file
An input data file contains user-provided data. The database load utility supports four different formats of files. Basically, one of the following two formats is used:
- DAT format
- In the DAT format, column data is described in characters. This format is generally referred to as the CSV format.
- pdload converts column data to the internal HiRDB format before storing it; therefore, the DAT format is suitable for creating a table from data imported from a non-HiRDB system.
- Example of data in the DAT format:
Jones,36,1958-10-15,Chicago
.
.
- Additionally, there is the extended DAT format that enables you to use extended functions. Although the format is basically the same as the DAT format, the extended DAT format supports the extended functions, such as changing the enclosing characters.
- Binary format
- In the binary format, column data is stored in the internal HiRDB format.
- The binary format is excellent in terms of performance, because it matches the internal HiRDB format, thereby requiring no format conversion. This format is suitable when high performance is required, such as for storing a large amount of data.
- Example of data in the binary format:
928691ba814081408140000000243f8000000008796f6b6f68616d61
<------------------><------><------><------------------>
Jones 36 1 Chicago
:
- Note
- The upper row of the input data indicates the data and the lower row indicates data contents.
Additionally, you can use a format in which fixed-size data is specified on each line, or a format that is output by pdrorg, shown as follows:
- Fixed-size data format
- In the fixed-size data format, all lines have the same length and all the data items in a column begin at the same location (the same offset from the beginning of the line). Input data can be specified either in the DAT format or in the binary format.
- The fixed-size data format is suitable for a table created from text data that is not delimited by separator characters, or created from binary-format data with a data storage sequence that needs to be changed.
- pdrorg-generated binary format
- This is an unload data file that is output by pdrorg, specifying the -W option.
- The pdrorg-generated binary format is used to migrate data from another HiRDB system.
(2) Control information file
A control information file contains the pdload control statements. These control statements specify an input data file, index information, LOB column information, and other information.
(3) Table
You need to define a table before you can perform data loading.
(4) Data loading method by table attribute
(a) Index defined for the table
You can create an index at the same time as data loading. Or, you can output only the index information during data loading and use pdrorg to create the index later.
(b) Row-partitioned table
You can execute data loading in units of tables or RDAREAs.