10.2.2 Flow of data processing during file input

The following figure shows the flow and details of data processing during file input by the input adaptor.

Figure 10-2 Flow and details of data processing during file input

[Figure]

For details about the data formats handled by an input adaptor, see (1) Data formats handled by an input adaptor. For details about the processing, see the subsections beginning with (2) File input by a file input connector.

Organization of this subsection
(1) Data formats handled by an input adaptor
(2) File input by a file input connector
(3) Format conversion
(4) Mapping
(5) Tuple transmission

(1) Data formats handled by an input adaptor

The data formats handled by an input adaptor are the input record and the common record. These data formats are discussed below.

Input record
An input record is a row of data acquired from an input file. The input adaptor handles one row of data as one input record.
Common record
A common record is a set of data items consisting of multiple fields (field names and field values). A field name is a name assigned to a data segment (field) in the input record, and a field value is the value of the data segment.
A common record is in a standard data format that is handled internally by the input and output adaptors.
The standard adaptors manage multiple sets of a field name and a field value in the common record as a record structure and use a record name to identify each record structure.

Figure 10-3 Structure of a common record

[Figure]

(2) File input by a file input connector

You use the file input connector definition in the adaptor configuration definition file to define information about file input.

A file input connector reads data in rows from one or more files stored in the input file storage directory and converts the data into input records. One row of data read by the file input connector becomes one input record.

An input adaptor can read multiple input records at a time. The number of input records read by the file input connector is equal to the number of records that can be processed at one time for the format conversion, mapping, and tuple transmission operations.

This subsection discusses the types and structures of input files that are read by a file input connector.

File types
An input file to be read by a file input connector must be a text file consisting of character data only. The records can be of variable length.
File structure
The input files that are read by a file input connector have one of the structures described below.
File structureDescriptionPrerequisite
WraparoundData is written to the files sequentially in the order the files were defined; there is a fixed number of input files. When all files are filled, data is written to the first file again.The order of the files to which data is to be written is predetermined, and data is always written to the files in that order.
To write data to a file that has become full, the file is first cleared of its data and then new data is written to it.
Non-wraparoundData is written to the files in the order the files were defined; there is no fixed number of input files.The order of the files to which data is to be written is predetermined, and data is always written to the files in that order.
The order of record and file creation must be chronological.
File name
A file to be read by a file input connector is specified by its file name or sequence number in the name attribute in the file tag in the file input connector definition. For details about the name attribute, see 9.10.1 File input connector definition.
Order in which files are read
A file input connector reads files in the order the file names were specified in the name attribute in the file tag in the file input connector definition or in the order of the input file update times. You use the readOrder attribute in the input tag to specify which of these orders is to be used for reading files.
Note that if multiple input files have the same update time, the input order cannot be predicted.
File read processing modes
The two modes of reading files by a file input connector are the batch processing mode and the real-time processing mode. The table below describes these processing modes. In both modes, the input adaptor stops automatically when read processing is completed.
Processing modeDetails of read processing
Reading methodRead processing completion condition
Batch processing modeWhen the input adaptor starts, the input file storage directory is checked for any input files. If there are any input files, they are read.Read processing is completed when either of the following conditions is satisfied:
  • When the input adaptor starts, there are no unread files in the input file storage directory.
  • All files specified in the definition have been read.
Real-time processing modeThe input file storage directory is monitored periodically at a specified interval and input files are read whenever they are detected.
The input file storage directory monitoring interval and a monitoring count are specified in the input tag in the file input connector definition. When the specified monitoring count is exceeded, a warning message is issued and processing resumes.
Read processing is completed when either of the following conditions is satisfied:
  • When input files are specified by sequence number: The file with the last sequence number has been read.
  • When input files are specified by file name: All files specified in the definition have been read.
If an input file is copied to or moved to the input file storage directory while the file input connector is running, the file input connector goes onto standby for one second and then resumes file read processing. If the file input connector cannot resume file read processing after five consecutive standby operations, it issues the KFSP46200-E message.
If a new input file is created or data is added to an existing input file in the input file storage directory while the file input connector is running, the newly created file or the added records are not read.

(3) Format conversion

You define information about format conversion in the format conversion definition in the adaptor configuration definition file.

Format conversion involves segmenting an input record into fields and then converting the fields into a common record.

The figure below shows an example of format conversion from input record to common record. In this example, the input record consists of three fields delimited by the space, the format of field 1 is converted to the TIME type, and the format of fields 2 and 3 is converted to the character string type.

Figure 10-4 Example of format conversion from input record to common record

[Figure]

The table below shows the structure of the common record in the above example and the tags that are specified in the format conversion definition.

Record structureTag
Record name: R1
Record structure: ($_time)[Figure]($_method)[Figure]($_url)
record tag (record definition)
Field name: time, Type: TIMEfield tag (field definition)
Field name: method, Type: STRING
Field name: url, Type: STRING
Legend:
[Figure]: Single-byte space

For details about the data types that can be converted and the settings for the structure of common records, see 9.11.1 Format conversion definition.

Format conversion enables you to define multiple record structures. When multiple record structures are defined, the input adaptor selects automatically the corresponding record structure and performs format conversion.

When multiple record structures are defined, the input adaptor uses the following methodology to select the appropriate record structure:

  1. The input adaptor checks the structure of an input record read by the file input connector to determine whether it matches a record structure specified in the record tag in the format conversion definition (it compares the input record structure against the record tag's record structures in the order the record structures were defined). The input adaptor selects the first record structure it detects that matches the input record.
  2. If no matching record structure is found, the input adaptor discards the input record. In such a case, the input adaptor does one of the following, as specified in the unmatchedFormat tag in the format conversion definition:
    • Resumes processing.
    • Issues a warning message and resumes processing.
    • Issues an error message and terminates the input adaptor.
      Reference note
      When format conversion is completed, you can perform record filtering, record extraction, and mapping between records, as necessary (in any order).
      For details about record filtering, see 10.4 Record filtering. For details about record extraction, see 10.5 Record extraction. For details about mapping between records, see (4) Mapping.

(4) Mapping

You specify information about mapping in the mapping definition in the adaptor configuration definition file.

The two types of mapping are mapping between record and stream and mapping between records. The table below provides an overview of these types of mapping.

Table 10-2 Overview of mapping

No.Type of mappingDescription
1Mapping between record and streamA common record output by the callback before mapping (mapping source) is associated with a common record based on the input stream format (mapping target).
Mapping between record and stream is always performed before tuple transmission.
2Mapping between recordsA common record output by the callback before mapping (mapping source) is edited and converted to a target common record.
If necessary, mapping between records is performed after format conversion, but before mapping between record and stream.
You can use this type of mapping to change field names in the source common record or to delete fields that are not needed for the next callback processing.
You can also use built-in functions# to obtain character strings and time values from source common records and apply them to target common records.
You can specify multiple definitions for mapping between records.
#
You use the function attribute in the map tag in the adaptor configuration definition file to specify the built-in functions that can be used for mapping between records. For details about the built-in functions that can be specified in the function attribute, see 9.11.2 Mapping definition.

The figure below shows an example of mapping between record and stream by an input adaptor. This example maps the fields time and url, which are required for input stream s1, to the schema of the input stream and then converts them to a common record (mapping target).

Figure 10-5 Example of mapping between record and stream by an input adaptor

[Figure]

(5) Tuple transmission

You define information about tuple transmission in the input stream definition in the adaptor configuration definition file.

The common records are converted to tuples based on the mapping results, and the tuples are then sent to the input stream according to the input stream definition.

The figure below shows an example of a tuple transmission from the input adaptor to the input stream. This example sends tuples to input stream s1.

Figure 10-6 Example of tuple transmission from the input adaptor

[Figure]