6.4.2 Specifying time information in the data source mode

To manage the time information for the tuples in the stream data when using the data source mode, you must specify time information in the data transmission application. Furthermore, the time information of the tuples being sent must be set so that the tuples are always in ascending order.

This subsection explains the items that must be implemented in the data transmission application in order to specify time information in tuples. It also explains the processing by the stream data processing engine following data transmission.

Note that the processes explained here are not necessary when using the server mode.

Organization of this subsection
(1) Specifying time information in tuples
(2) Controlling the time information when sending tuples
(3) Processing after all tuples are input

(1) Specifying time information in tuples

When using the data source mode, you must specify time information as one of the data columns in the tuples to be sent.

To specify time information, you must set it in the data transmission application so it can be referenced in the query definition.

Query definition
Define the time information as a TIMESTAMP data type in the schema specification character string of the stream definition that describes the query definition file. A definition example follows:

REGISTER STREAM s1(t1 TIMESTAMP, id INT, name VARCHAR(10));

Packaging in the data transmission application
In the data transmission application, create a Java object of the java.sql.Timestamp class for the time information to be stored as column data in the tuple. An implementation example follows:

   :
   :
// Create a tuple object.
Object[] data = new Object[]{
 new Timestamp(System.currentTimeMillis()),
 new Integer(100),
 new String("data1")
};

// Set the object in the tuple.
StreamTuple tuple1 = new StreamTuple(data);
   :
   :

Note: When creating an actual data transmission application, extract the data source's time information that is contained in a log file or the like and specify it for new Timestamp.


(2) Controlling the time information when sending tuples

When using the data source mode, the time information in the tuples being sent must be controlled by the data transmission application to be in ascending order.

For each input stream, the stream data processing engine processes the tuples it has received in ascending time order. Note that tuples sent in the wrong order are discarded by the SDP server.

The figure below shows the processing sequence when tuples are sent from multiple data transmission applications to multiple input streams. In this figure, input stream 1 and input stream 2 are part of the same query group. The numeric value (such as 09:00:00) indicated below the tuple in the figure is the time information specified for that tuple.

Figure 6-7 Example of processing sequence when tuples are sent from multiple data transmission applications to multiple input streams

[Figure]

When a single query group contains multiple input streams, the stream data processing engine processes streams beginning with the one containing the tuple with the oldest time information. In the example in the figure, the tuples in input stream 1 are processed first, followed by tuples in input stream 2. In order for a query to be executed, all streams, except those for which a processing termination notification was sent using the putEnd method, must contain one or more tuples, If even one stream is empty, the stream containing the oldest tuple cannot be determined, and so no processing is performed by the stream data processing engine.

If tuples arrive out of time sequence, the stream data processing engine discards them. In the example in the figure, the time sequence of the tuples in input stream 3 is incorrect, and so the tuples that arrived later are discarded.

Reference note
If data is sent from multiple data transmission applications when using the data source mode, an error may cause the data arrival at the stream data processing engine to be out of order, so the processing order may not be in ascending order of the time stamp. In this case, if you want to process the out-of-order files instead of discarding them, you need to use the timestamp adjustment function to adjust the time. For details about how to use the timestamp adjustment function, see the explanation on adjusting the time stamp of tuples in the manual uCosminexus Stream Data Platform - Application Framework Setup and Operation Guide.

(3) Processing after all tuples are input

When all tuples have been input from the data transmission application, send a data transmission completion notification to the stream data processing engine.

When using the data source mode, the stream data processing engine processes data in time sequence based on the tuple's time information. Once all tuples have been input from the data transmission application and the input stream runs out of tuples, the operation time ceases to advance in the stream data processing engine. Since some query processing, such as a window operation, is only executed when the operation time advances, stream data may be stranded in the stream data processing engine when there are no more tuples in the input stream.

To prevent such a situation, the data transmission application must explicitly indicate when all data has been transmitted. When a data transmission completion notification is received, the stream data processing engine processes all remaining tuples and outputs the results to the output stream.

For details about how to send a data transmission completion notification, see 6.4.1 Detecting the trigger for terminating a data reception application (data processing termination notification).