2.3.6 Joining data streams

Selecting tuples from multiple data streams and performing computational processing to consolidate these tuples into a single tuple is called joining data streams.

This subsection explains a query that joins data streams, using an example that first joins two data streams (temperature and humidity), and then joins tuples from the same observation site. The following figure shows the input and output data present when this query is executed.

Figure 2-20 Input and output data present when a query that joins data streams is executed

[Figure]

Code
To accept input from multiple data streams, you specify a comma (,) in the FROM clause to delimit the data streams. In this case, a window operation must be specified for each data stream. To subsequently join the data, you specify a join condition in the WHERE clause.
In the following code, the name of the humidity stream is humidity_stream, and the name of the humidity data item is humidity.

REGISTER STREAM temperature_stream
(observation_time TIME, id INTEGER, temperature INTEGER);
REGISTER STREAM humidity_stream
(observation_time TIME, id INTEGER, humidity INTEGER);
REGISTER QUERY join_operation
ISTREAM (
SELECT temperature_stream.observation_time AS temperature_stream_time,
temperature_stream.id AS temperature_stream_id,
temperature_stream.temperature,
humidity_stream.observation_time AS humidity_stream_time,
humidity_stream.id AS humidity_stream_id,
humidity_stream.humidity
FROM temperature_stream[PARTITION BY id ROWS 1],
humidity_stream[PARTITION BY id ROWS 1]
WHERE temperature_stream.id = humidity_stream.id);

Explanation
The processing target of this query is the single most recent tuple from each observation site. In the CQL ISTREAM statement above, PARTITION BY windows are specified to retrieve the most recent single tuple from each observation site.
The output data joins two data streams, which contain temperature and humidity data, and then joins tuples by observation site. In the WHERE clause, temperature_stream.id = humidity_stream.id is specified as the condition that joins tuples of the same ID.
In the SELECT clause, output data names are specified for the data items in the tuples that are joined. When multiple data streams are input, different data streams might have data items with the same name. To distinguish which data stream such items belong to, a period (.) is added as a delimiter between the stream data name and the data item name. Then, to avoid having identically named data items, AS is used to assign unique data item names to the input data.