To manage the time information for the tuples in the stream data when using the data source mode, you must specify time information in the data transmission application. Furthermore, the time information of the tuples being sent must be set so that the tuples are always in ascending order.
This subsection explains the items that must be implemented in the data transmission application in order to specify time information in tuples. It also explains the processing by the stream data processing engine following data transmission.
Note that the processes explained here are not necessary when using the server mode.
When using the data source mode, you must specify time information as one of the data columns in the tuples to be sent.
To specify time information, you must set it in the data transmission application so it can be referenced in the query definition.
REGISTER STREAM s1(t1 TIMESTAMP, id INT, name VARCHAR(10)); |
: |
Note: When creating an actual data transmission application, extract the data source's time information that is contained in a log file or the like and specify it for new Timestamp.
When using the data source mode, the time information in the tuples being sent must be controlled by the data transmission application to be in ascending order.
For each input stream, the stream data processing engine processes the tuples it has received in ascending time order. Note that tuples sent in the wrong order are discarded by the SDP server.
The figure below shows the processing sequence when tuples are sent from multiple data transmission applications to multiple input streams. In this figure, input stream 1 and input stream 2 are part of the same query group. The numeric value (such as 09:00:00) indicated below the tuple in the figure is the time information specified for that tuple.
Figure 6-7 Example of processing sequence when tuples are sent from multiple data transmission applications to multiple input streams
When a single query group contains multiple input streams, the stream data processing engine processes streams beginning with the one containing the tuple with the oldest time information. In the example in the figure, the tuples in input stream 1 are processed first, followed by tuples in input stream 2. In order for a query to be executed, all streams, except those for which a processing termination notification was sent using the putEnd method, must contain one or more tuples, If even one stream is empty, the stream containing the oldest tuple cannot be determined, and so no processing is performed by the stream data processing engine.
If tuples arrive out of time sequence, the stream data processing engine discards them. In the example in the figure, the time sequence of the tuples in input stream 3 is incorrect, and so the tuples that arrived later are discarded.
When all tuples have been input from the data transmission application, send a data transmission completion notification to the stream data processing engine.
When using the data source mode, the stream data processing engine processes data in time sequence based on the tuple's time information. Once all tuples have been input from the data transmission application and the input stream runs out of tuples, the operation time ceases to advance in the stream data processing engine. Since some query processing, such as a window operation, is only executed when the operation time advances, stream data may be stranded in the stream data processing engine when there are no more tuples in the input stream.
To prevent such a situation, the data transmission application must explicitly indicate when all data has been transmitted. When a data transmission completion notification is received, the stream data processing engine processes all remaining tuples and outputs the results to the output stream.
For details about how to send a data transmission completion notification, see 6.4.1 Detecting the trigger for terminating a data reception application (data processing termination notification).