2.2.2 Using data manipulation CQL to specify operations on stream data

There are three types of data manipulation CQL operations:

The descriptions in the following subsections are based on the example of the temperature analysis system described in 2.2.1(1) Defining a stream (REGISTER STREAM clause).

Organization of this subsection
(1) Window operations (retrieving data for analysis)
(2) Relation operations (processing the retrieved data)
(3) Stream operations (outputting the data processing results)

(1) Window operations (retrieving data for analysis)

A window operation is used to retrieve data for analysis from stream data. In the query, a window is specified in the FROM clause following the stream name. There are four types of window operations:

The following subsections explain these window operations.

ROWS window
A ROWS window uses a count to specify the number of tuples to retrieve from the stream data. The input relation generated by a ROWS window is a tuple group beginning with the most recent tuple and going back a specified number of tuples. In a ROWS window, new tuples are added to the beginning of the input relation with each new tuple that is received. Similarly, tuples that exceed the tuple count are removed from the end of the input relation.
In a ROWS window, you specify the number of tuples to retrieve. For example, if you specify [ROWS 3], the input relation consists of the three most recent tuples that have been retrieved in order starting with the most recent one. The following figure shows a ROWS window being used to retrieve data for analysis.

Figure 2-7 Using a ROWS window to retrieve data for analysis

[Figure]
In the window operation shown in this figure, three tuples are to be retrieved, which means that the input relation always maintains exactly three tuples. To ensure this, when a new tuple is added to the input relation, the oldest tuple in the input relation is removed.
RANGE window
A RANGE window uses a unit of time to specify the tuples to retrieve from the stream data. The input relation generated by the RANGE window is a tuple group beginning with the most recent tuple and going back a specified period of time.
In a RANGE window, you specify the time period in which to retrieve tuples. For example, if you specify [RANGE 3 SECOND], the input relation consists of all tuples that have been retrieved that have a timestamp within three seconds of the most recent tuple.
The following table lists the units that can be used for specifying a time period.

Table 2-2 Time units that can be specified in a CQL statement

Specification in CQL statementUnit
MILLISECONDMillisecond
SECONDSecond
MINUTEMinute
HOURHour
DAYDay
The following figure shows a RANGE window being used to retrieve data for analysis.

Figure 2-8 Using a RANGE window to retrieve data for analysis

[Figure]
In the window operation shown in this figure, all tuples whose timestamp is within three seconds of the most recent tuple (10:00:01 to 09:59:58) will be retrieved. This means that any tuples that no longer satisfy this condition when a new tuple is added to the input relation will be removed from the input relation.
When you use a RANGE window, depending on the input data, the number of tuples handled by Stream Data Platform - AF could become quite large and the amount of memory required may increase proportionately. In this case, you can use the time division function to prevent the amount of memory required from increasing too much. For details about the time division function, see the uCosminexus Stream Data Platform - Application Framework Application Development Guide.
NOW window
A NOW window specifies that tuples are only to be processed at the time they arrive. If multiple tuples with the same timestamp arrive simultaneously, all of them are processed together. Because any tuple is removed from the NOW window as soon as it is processed, only the tuple (or tuples) with the most recent timestamp are present in the input relation generated by a NOW window.
You use the [NOW] specification to specify a NOW window. The following figure shows a NOW window being used to retrieve data for analysis.

Figure 2-9 Using a NOW window to retrieve data for analysis

[Figure]
In the window operation shown in this figure, only the tuple that arrives at that particular point in time is selected for processing. Because any tuple in the input relation is removed after being processed, no tuples are present in the input relation when a new tuple is added.
PARTITION BY window
A PARTITION BY window uses data values to specify the tuples to retrieve from the stream data. This window is used together with a ROWS window. The input relation generated by a PARTITION BY window is a tuple group beginning with the most recent tuple containing the specified data item and going back a specified number of tuples.
In a PARTITION BY window, you enter the data item names to use for selecting. You follow these with ROWS, in which you specify the number of tuples to retrieve for each group. For example, if you specify [PARTITION BY id ROWS 2], for each ID the input relation consists of the two most recent tuples that have been retrieved in the order in which they were received. The following figure shows a PARTITION BY window being used to retrieve data for analysis.

Figure 2-10 Using a PARTITION BY window to retrieve data for analysis

[Figure]
In the window operation shown in this figure, two tuples are to be retrieved for each ID that can be specified in the tuples. This means that, when a new tuple is added to the input relation, the oldest tuple with the same ID as the new tuple is removed from the input relation.

(2) Relation operations (processing the retrieved data)

A relation operation is used to process the data retrieved by the window operation. The following operations are available:

These operations can be specified in the SELECT and WHERE clauses using arithmetic operators, comparison operators, logical operators, and aggregate functions.

The following tables list the comparison operators and aggregate functions that can be specified.

Table 2-3 Comparison operators that can be used in CQL statements

Comparison operatorUsage exampleMeaning of usage example
<=A <= BA is less than or equal to B
>=A >= BA is greater than or equal to B
<A < BA is less than B
>A > BA is greater than B
=A = BA is equal to B
!=A != BA is not equal to B

Table 2-4 Aggregate functions that can be used in CQL statements

FunctionDescription
AVGComputes the average of all values.
COUNTCounts the number of items.
MAXDetermines the maximum value.
MINDetermines the minimum value.
SUMComputes the sum of all values.

The logical operators that can be specified in CQL statements vary depending on the clause. For details about the logical operators that can be specified in CQL statements, see the uCosminexus Stream Data Platform - Application Framework Application Development Guide.

For examples of how to implement relation operations, see 2.3 Implementation examples of using CQL to process stream data.

(3) Stream operations (outputting the data processing results)

A stream operation is used to take the results of a relation operation, convert it to stream data and output it. Stream operations are specified in the stream clause directly following the REGISTER QUERY clause. The following three types of stream operations are available:

Assuming that the window operation [ROWS 3] is specified, the following sections explain each of these stream operations.

ISTREAM
An ISTREAM stream operation outputs the tuples that were added to the output relation. Each time the output relation changes, ISTREAM compares the output relation before and after the change, and only outputs the tuples that were most recently added. The following figure shows the processing results that are output by ISTREAM.

Figure 2-11 Processing results that are output by ISTREAM

[Figure]
When a tuple processed by a relation operation is added to the output relation, the oldest tuple in the output relation is removed. Because ISTREAM is specified as the stream operation, when this occurs, only the tuple that was added to the output relation is output.
DSTREAM
A DSTREAM stream operation outputs the tuples that were removed from the output relation. Each time the output relation changes, DSTREAM compares the output relation before and after the change, and outputs the tuples that were removed. The following figure shows the processing results that are output by DSTREAM.

Figure 2-12 Processing results that are output by DSTREAM

[Figure]
When a tuple processed by a relation operation is added to the output relation, the oldest tuple in the output relation is removed. Because DSTREAM is specified as the stream operation, when this occurs, only the tuple that was removed from the output relation is output.
RSTREAM
An RSTREAM stream operation outputs all tuples in the output relation at specific time intervals. When you specify an RSTREAM clause, you enclose the time interval for outputting the stream in square brackets ([ ]). For example, you could specify [RSTREAM 1 MINUTE] or [RSTREAM 3 SECOND]. You can use the units listed in Table 2-2 Time units that can be specified in a CQL statement to specify the time interval. The following figure shows the processing results that are output by RSTREAM.

Figure 2-13 Processing results that are output by RSTREAM

[Figure]
RSTREAM 3 SECOND is specified as the stream operation, so every three seconds (as determined by the system time) all tuples in the output relation are output.