5.4.16 option statement (specification of data processing information)

The option statement specifies the optional data processing functions for data loading.

Criteria
Specify the option statement to use any of the following optional functions:
  • Specification of the space conversion level (spacelvl operand)
  • Change to the percentage of free space during data loading (tblfree operand)
  • Change to the percentage of free space during index creation (idxfree operand)
  • Data loading with the synchronization point specification (job operand)
  • Output of warning messages in the event of data truncation (cutdtmsg operand)
  • Execution of NOWAIT search on a table subject to data loading (nowait operand)
  • Specification of LOB columns or LOB parameters in the binary-format input data file generated by pdrorg (bloblimit operand)
  • Monitoring of pdload execution time (exectime operand)
  • Method for storing data when data is loaded into a table with the DEFAULT clause specified and the input data is the null value (null_string operand)
  • Specification for ignoring the storage processing up to the point where an input data error is detected (dataerr operand)
  • Specification for detecting an input data error when the input data length in the DAT-format input data file is greater than the defined column length (lengover operand)
  • Suppression of error data information (divermsg operand)
Organization of this subsection
(1) Format
(2) Explanation

(1) Format

option [spacelvl = {0|1|3}]
       [tblfree = {percentage-of-unused-space
       |([percentage-of-unused-space]
          ,percentage-of-free-pages-in-segment)}]
       [idxfree = percentage-of-unused-area]
       [job = [job-name], [rows-count-between-synchronization-points]
              [,CLR]]
       [cutdtmsg = {on|off}]
       [nowait = {yes|no}]
       [bloblimit = area-allocation-size]
       [exectime = pdload-execution-monitoring-interval]
       [null_string = {null | default}]
       [dataerr=rollback]
       [lengover=err]
       [divermsg=off]

(2) Explanation

(a) spacelvl = {0|1|3}

Specifies whether or not to execute space conversion on the input data.

Available space conversion levels are 0, 1, and 3.

When the spacelvl operand is omitted, the system executes space conversion according to the value of the pd_space_level operand in the system common definitions.

0
The system does not execute space conversion on the input data.
1 or 3
The system executes space conversion on the input data. Values of 1 and 3 have the same effect.
If the table columns subject to data loading are of the national or mixed character string type, the system converts the input data and stores the data in the database as follows:
  • National character string type
    The system converts two consecutive single-byte spaces in the input data to one double-byte space, in units of two bytes from the top.
  • Mixed character string type
    The system converts each double-byte space in the input data to two consecutive single-byte spaces.
    When the character codes are utf-8, the system converts one double-byte space (3 bytes) to two single-byte spaces. For MCHAR, the system adds trailing single-byte spaces up to the definition length. For MVARCHAR, the data length remains shortened.
If the data is to be stored via a constructor function for the columns of abstract data type and the data for the corresponding constructor function's argument has the national or mixed character string type, the system executes space conversion before passing the input data as an argument to the constructor function.
Criteria
Specify this operand to make all spaces in the table data either single- or double-byte characters.
For details about how to make all spaces in table data either single- or double-byte characters, see the HiRDB Version 8 System Operation Guide.
Note
When executing space conversion using the spacelvl operand or the system common definitions, note the following:
  1. The system checks the order of cluster key values after the data conversion; therefore, the cluster key sequence checking may result in an error even if the data was sorted by the cluster key values before conversion. In this case, specify the -x option to skip the cluster key sequence checking. To store data in the order of cluster key values, reorganize the data after data loading is completed.
  2. If the column used as the unique key index or primary key index is subject to space conversion, duplicate key values may result in the index. In this case, take one of the following actions:
    [Figure]Unique key index: Either remove the unique index specification beforehand or modify the corresponding data in the input data.
    [Figure]Primary key index: Modify the corresponding data in the input data.
  3. A comparison value for the null value specified in the column structure information file or null value/function information file is not subject to space conversion. In this case, the comparison value for the null value is compared to the input data before space conversion.
  4. The length of the input data before space conversion is used for checking the input data length.
(b) tblfree = {percentage-of-unused-space|([percentage-of-unused-space], percentage-of-free-pages-in-segment)}

Specify this operand to change the percentage of free space specified with CREATE TABLE (value of PCTFREE) for storing data during data loading.

percentage-of-unused-space
You can specify a value in the range from 0 to 99.
percentage-of-free-pages-in-segment
You can specify a value in the range of 0 to 50.
Criteria
If you have specified a non-zero value as the percentage of free space during table definition, a shortage of RDAREA may occur while data is stored because the defined percentage of free space takes effect during data loading. If you specify the tblfree operand in this case, you can complete the data loading without having to extend the RDAREA temporarily.
Note
It is more advantageous to specify this operand than the -y option to store data in free space for the following reasons:
  • Batch input/output operations take effect even when data is stored in free space.
  • You can specify p in the -l option.
(c) idxfree = percentage-of-unused-area

If you are creating indexes in batch index creation mode, specify this operand to change the percentage of unused area specified with CREATE INDEX (value of PCTFREE).

The permitted value range is from 0 to 99.

Criteria
If you have specified a non-zero value as the percentage of unused space during index definition, a shortage of RDAREA may occur while indexes are stored because the defined percentage of free area takes effect during batch index creation processing. If you specify the idxfree operand in this case, you can complete the batch index creation processing without having to extend the RDAREA temporarily.
(d) job = [job-name], [rows-count-between-synchronization-points] [, CLR]

Specify this operand to execute data loading with the synchronization point specification.

Data loading with the synchronization point specification is a method of data loading in which a transaction is settled each time the specified number of data items are stored. If an error occurs while data is stored, this method enables you to restore the database in a short period of time without having to store data from the beginning.

job-name[Figure] <alphanumerics> ((1-3))
Specify the name of the job that executes data loading with the synchronization point specification.
This job name is used during re-execution in the event the utility terminates abnormally during data loading. If this job name is the same as another pdrorg job name, re-execution results in a malfunction. Therefore, be sure to specify a unique job name.
rows-count-between-synchronization-points[Figure] ((1-1000)) <<100>>
Specify the number of data items as an interval at which a transaction is to be settled. For example, a value of 100 sets a synchronization point at every one million data items.
A small value settles a transaction after a small number of data items; therefore, rollback processing would take a short time in the event of abnormal termination. However, transaction generation and settlement and process restart occur frequently, resulting in a large overhead. On the other hand, a large value has a little effect on the performance, but the error recovery time would be long.
An error results if you specify only the number of rows between synchronization points without specifying a job name.
CLR
Specify this operand if there is no need to re-execute pdload or if pdload is to be re-executed after clearing the current synchronization point information in the data base.
If the utility terminates abnormally, the synchronization point information remains in the database. If this information is retained during the next execution, the utility may not function correctly.
Note that if you specify CLR, the system clears all synchronization point information, whether it was specified by another user or utility.
To determine if the synchronization point information is present in the database, use the database condition analysis utility (pddbst) to execute condition analysis in units of tables.
Criterion
This operand is useful for loading a large amount of data that will take a considerable amount of time to store the data. However, this has adverse effects on performance due to synchronization point processing. Additionally, more data pages are required than in normal data loading because data storage begins on a new page at each synchronization point.
Example of job operand specification
  • Loading data with a synchronization point specified for every one million data items
    job=JOB,100 or job=JOB
  • Clearing the current synchronization point information and specifying new synchronization points at every one million data items to execute data loading
    job=JOB,100,CLR or job=JOB,,CLR
  • Clearing the current synchronization point information and executing normal data loading
    job=,,CLR
Notes
  1. Synchronization point information is managed by the job name. Suppose that pdload(A) that was executing data loading with the synchronization point specification terminates abnormally. If you execute pdrorg(B) specifying the same job name for the same table, pdrorg(B) uses the synchronization point information that was left by pdload(A). Therefore, the job names must be controlled so that they are not duplicated.
  2. If data loading with the synchronization point specification terminates abnormally, you cannot execute the normal data loading (without the synchronization point specification) on the same table until you re-execute the data loading with the synchronization point specification having the same job name.
  3. The synchronization point information is managed in units of RDAREAs. For a row-partitioned table, inconsistency occurs on the synchronization point information for the entire table in the following cases:
    [Figure]Only some of the RDAREAs were re-initialized.
    [Figure]Error occurred during data loading with the synchronization point specification in units of RDAREAs.
    If the job name is the same, but the number of lines does not match, you cannot use the synchronization point information managed by each RDAREA for re-execution. If some of the RDAREAs have synchronization point information while the others do not, you can re-execute data loading using the retained synchronization point information. The following shows how the system handles data loading when the job name for synchronization point information is the same but the number of lines does not match:
    Retained synchronization point information (number of lines between synchronization points)Input begin line for data loading with synchronization point specification in units of tables
    RDAREA 1RDAREA 2RDAREA 3
    000Item 0
    0n0Item n
    n0mCannot be executed.
    nmpCannot be executed.
    0: No synchronization point information retained
    n, m, p: Retained number of lines between synchronization points:
  4. If data loading with the synchronization point specification terminates abnormally, you cannot modify the index creation method during re-execution (-i option).
  5. If EasyMT is remounted before data loading with synchronization point specification terminates abnormally, mount volume 1 again during the re-execution.
  6. If data loading with the synchronization point specification terminates abnormally, and if you re-initialize RDAREAs or execute the PURGE TABLE statement before re-execution, the existing synchronization point information is deleted.
  7. If data loading with the synchronization point specification terminates abnormally during the batch creation of a plug-in index, and the corresponding plug-in index supports the batch plug-in index creation partial recovery facility, execute pdrorg to create the plug-in index that resulted in an error, then re-execute pdload. If you re-execute pdload without executing pdrorg, the entire plug-in index is re-created.
  8. The maximum number of data items that can be controlled during data loading with the synchronization point specification is 4,294,960,000. To load more data items than this, use multiple input data files and execute pdload more than once.
(e) cutdtmsg = {on|off}

If you are storing character string data from the DAT-format input data file into the columns with the following data types during data loading, and the input data is longer than the defined column length, the excess part of the data is discarded:

This operand specifies whether or not to output a warning message (KFPL31090-W) to the error information file when this data truncation occurs.

on: Output warning messages.

off: Do not output warning messages.

Rules
  1. If the -e option is specified, the system assumes cutdtmsg=off.
  2. For an input data file other than in the DAT format, this operand, if specified, is ignored.
  3. If cutdtmsg=on is specified and data truncation occurs, pdload returns code 4, even when all data has been stored.
  4. A warning message is output for each data item resulting in truncation (each column and element), not for each line. Therefore, if you are specifying cutdtmsg=on, check the disk space, because many warning messages may be output.
  5. The system outputs warning messages to the error information file, but it does not output the corresponding input data to the error data file.
(f) nowait = {yes|no}

Specifies whether or not a NOWAIT search is to be conducted on the table subject to data loading. The NOWAIT search is an SQL function for executing a search process with the NOWAIT lock option specified.

yes: Execute NOWAIT search.

no: Do not execute NOWAIT search.

Rules for nowait=yes
  1. You cannot specify the batch index creation mode (-i c). Specify the index update mode (-i s).
  2. You cannot use the local buffer (-n option).
  3. The system does not support data loading in units of RDAREAs.
  4. In the creation mode (-d option specified), you cannot execute a NOWAIT search on the table while data is being deleted (if executed, processing is placed in lock-release wait status). You can execute a NOWAIT search while data is being stored.
  5. For a table with columns of abstract data type provided by a plug-in, you can execute a NOWAIT search if the plug-in provides the no-lock search facility. To see if the plug-in provides the no-lock search facility, see the applicable plug-in manual.
  6. When loading data only to the LOB column of a table, you can reference the LOB column structure base table but not the LOB columns, regardless of this operand specification. To reference the LOB columns, execute data loading on both the LOB column structure base table and LOB columns at the same time.
  7. In the event of a pdload rollback due to abnormal termination, segments will not be released once they are allocated during data loading. These segments are reused by pdload during re-execution, but not by a UAP's INSERT or UPDATE statement. Therefore, if you execute a UAP's INSERT or UPDATE statement prior to re-execution of pdload, the segments that are not reusable by pdload or UAP will be retained until the corresponding RDAREA runs out of all unused segments. To release such segments, you need to reorganize the table with pdrorg. To avoid this, place the RDAREA in reference-possible shutdown status, then execute pdload. (No UAP will be able to access the table even if pdload terminates abnormally.)
  8. If specified, yes is ignored for a shared table (no is assumed).
(g) bloblimit = area-allocation-size

[Figure]<unsigned integer> ((1-2,097,152)) <<1024>>

If you are executing the following type of data conversion using a pdrorg-generated binary-format input data file, you may need to retain data in memory:

In this case, specify the size of the memory area to retain data in KB. The system compares this operand value with the BLOB parameter or the defined length for BLOB type and allocates the area based on the smaller size.

Criteria
Specify this operand if either one of the following conditions is true:
  • When converting data for a BLOB parameter with abstract data type to a BLOB-type column, the data for the BLOB parameter is followed by the data to be stored in the abstract data type column with BLOB parameter.
  • When converting BLOB data to an abstract data type column with BLOB parameter, the BLOB data is preceded by the data to be stored in the BLOB-type column.
Rules
  1. This operand takes effect only when the -W option is specified.
  2. If there is a data item that exceeds the specified size, an error results.
(h) exectime=pdload-execution-monitoring-interval

[Figure]<unsigned integer> ((1-35791394)) <<0>>

Specifies an interval in minutes for monitoring the pdload execution time. If this operand is omitted, pdload does not monitor the execution time. If pdload does not terminate within the specified interval, the system terminates the pdload process forcibly and acquires error information in order to determine the cause of the no-response.

This operand's value takes precedence over the value of the pd_utl_exec_time operand in the system definition.

Criterion
If you specify a monitoring interval for processing such as nighttime batch processing, the processing will terminate abnormally if an error such as a communication error (including a transient error) or a disk error occurs during execution of pdload and pdload is placed in no-response status. This enables you to detect the abnormality and start recovery processing at an early stage.
Guidelines for the specification value
The purpose of this operand is to detect a no-response error, not to monitor the execution time of a long-running transaction. Therefore, the operand value must be large enough for the applicable table processing. For example, to monitor the execution time of a pdload that should terminate within 7- 8 minutes, specify exectime=20, not exectime=10. When you are executing data loading in the addition mode, specify a larger value because the execution time will vary depending on the number of base data items even when there is a predefined number of data items in the input data file (in the case of a table whose data increases monotonously, you should check and, if necessary, revise the specified value).
(i) null_string={null|default}

Specifies whether the default value set in the DEFAULT clause or the null value is to be stored when the input data is the null value ("*" or omitted) during data loading on a table with the DEFAULT clause specified.

This operand is applicable to a DAT-format input data file. An error results if this operand is specified for an input data file in the binary, fixed-length data, or pdrorg-generated binary format.

null: Store the null value.
default: Store the default value set in the DEFAULT clause.
(j) dataerr=rollback

Specifies that data storage processing (rollback) is to be ignored when an input data error (logical error) is detected.

Table 5-7 shows the relationship between the -e and dataerr operands. Table 5-8 provides notes about specifying the dataerr operand together with other options and control statements.

Table 5-7 Relationship between the -e and dataerr operands

-e optiondataerr operand in the option statementpdload operation when input data error is detectedTransaction settlement methodpdload's return code
OmittedOmittedSkips database storage processing for the erroneous data and resumes processing.Commit4
SpecifiedCancels processing when the erroneous data is detected.Rollback8
SpecifiedOmittedCancels processing when the erroneous data is detected.Commit4
SpecifiedCancels processing when the erroneous data is detected.Rollback8

Table 5-8 Notes about specifying the dataerr operand together with other operands and control statements

Option and control statementNote about specifying together with the dataerr operand
-dA control statement error occurs.
-lSpecify either a or p.
-iIf all the following conditions are satisfied, index unfinished status results after rollback; therefore, specifying this option together with the dataerr operand results in a control statement error:
  • Loading of data is in units of RDAREAs
  • There is a non-partitioning key index
  • The value of the -i option is not s
Consider changing the pdload options, control statements, or index definition so that not all the above conditions are satisfied.
If a key duplication error may occur during data loading, specify s. If any other value is specified, data stored in the table will not be rolled back even if a key duplication error is detected.
-kIf f, v, or c is specified, the unneeded LOB middle file remains after rollback. Before re-executing pdload, delete this LOB middle file.
If f, v, or c is specified and an error is detected during data loading on a LOB column, the LOB column structure base table is committed and the LOB column is rolled back, regardless of the dataerr operand specification. In this case, correct the LOB data and re-execute data loading.
-eRegardless of the option specification, processing is cancelled and then rolled back when erroneous data is detected.
Other optionNone
job operand in the option statementA control statement error occurs.
RDAREA name in the source statementIf all the following conditions are satisfied, the index unfinished status results after rollback; therefore, specifying this option together with the dataerr operand results in a control statement error:
  • Loading of data is in units of RDAREAs
  • There is a non-partitioning key index
  • The value of the -i option is not s
Consider changing the pdload options, control statements, or index definition so that not all the above conditions are satisfied.
error operand in the source statementInformation about the detected error is output and then processing is rolled back.
errdata operand in the source statementA control statement error occurs.
Other control statementsNone
(k) lengover=err

Specifies that an input data error is to be detected when the input data in a DAT-format (including extended DAT format) input data file that is to be stored in any of the following data-type columns is longer than the defined column length:

Table 5-9 shows the relationships among the input data length, defined column length, and the lengover operand. Table 5-10 provides notes about specifying the lengover operand together with other options and control statements.

Table 5-9 Relationships among input data length, defined column length, and lengover operand

Relationship between input data length and defined column lengthlengover omittedlengover specified
m=nStores the input data as is.
m<nDiscards the trailing end of the input data beginning at location m + 1 and then stores the remaining part of the input data.Sets an input data error.
m>n
CHAR, NCHAR, and MCHAR:
Pads the trailing end of the input data with spaces beginning at location n + 1 and then stores the input data.
VARCHAR, NVARCHAR, MVARCHAR, and BINARY:
Stores the input data as is.

Table 5-10 Notes about specifying the lengover operand together with other operands and control statements

Option and control statementNote about specifying together with the lengover operand
-a, -bA control statement error occurs because the lengover operand is applicable to the DAT format.
Other optionNone
cutdtmsg operand in the option statementThe KFPL31090-E message is displayed even when the cutdtmsg operand is not specified.
dataerr operand in the option statementAn input data error subject to rollback is detected.
extdat statementThe extended DAT-format input data files are also supported.
error operand in the source statementThe number of times the KFPL31090-W/E message is displayed changes as follows:
When the lengover operand is specified:
The message is displayed only for the first column in which an invalid data length is detected.
When the lengover operand is omitted:
The message is displayed for all columns in which an invalid data length is detected.
errdata operand in the source statementThe applicable data is output to the error data file.
Other control statementsNone
(l) divermsg=off

Specifies suppression of a type of error data information. When data loading is performed in units of RDAREAs on a table whose partition storage condition have been changed, and the input data contains row data that does not satisfy the RDAREA storage conditions, the utility displays error data information. This operand is used to suppress output of this error data information.

This operand is applicable if the table subject to data loading is a row-partitioned table and data loading is performed in units of RDAREAs. If this operand is specified under any other conditions, it is ignored.

Table 5-11 shows the relationship between the divermsg operand and each item when the input data contains row data that does not satisfy the partition storage conditions.

Table 5-11 Relationship between the divermsg operand and each item when input data contains row data that does not satisfy the partition storage conditions

Itemdivermsg operand
SpecifiedOmitted
pdload's return code0*4
Error information fileKFPL31009-E message is not output.KFPL31009-E message is output.
Error data fileError data is not output.Error data is output.
-e optionDisabledEnabled
* In the event of a data format error, the return code is 4.