7.22.14 Common rules and considerations for set functions

Organization of this subsection

(1) Explanation of terms
(2) Common rules
(3) Common considerations

(1) Explanation of terms

A general set function in which DISTINCT is specified is called a DISTINCT set function. A general set function in which ALL is specified is called an ALL set function.
The following value expressions are called the aggregated arguments of a set function.
- In the case of the inverse distribution functions PERCENTILE_CONT and PERCENTILE_DISC, the value expression specified as the sort key in the WITHIN group specification
  
  Example:
```
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY "C1") FROM "T1"
```
  The underlined portion indicates the aggregated argument.
- In the case of set functions other than the inverse distribution functions PERCENTILE_CONT and PERCENTILE_DISC, the value expression specified as the argument to the set function
  
  Example:
```
SELECT "C1",SUM("C2") FROM "T1" GROUP BY "C1"
```
  The underlined portion indicates the aggregated argument.
A column specification within an aggregated argument is called an aggregated column specification.

Example:
```
SELECT "C1",SUM("C2"+1) FROM "T1" GROUP BY "C1"
```
The underlined portion indicates the aggregated column specification.
A query specification that directly contains a FROM clause that contains a table reference that is referenced by an aggregated column specification is called a qualified query of that aggregated column specification.

Example:
```
SELECT "C1",SUM("T1"."C2") FROM "T1" GROUP BY "C1"
```
The underlined portion (the entire query) indicates the qualified query.

The following example illustrates a qualified query with an external reference.

Example:

SELECT "C1" FROM "T1" GROUP BY "C1" HAVING EXISTS   ...[1]
    (SELECT * FROM "T2" WHERE MAX("T1"."C2")>"T2"."C1")

Explanation

The aggregated column specification in the set function MAX("T1"."C2") is "T1"."C2".
The table referenced by "T1"."C2" is "T1".
The query specification whose FROM clause directly contains "T1" is the part indicated by [1].
The qualified query is therefore the query specification in [1].

To Page Top

(2) Common rules

A set function can be specified in a selection expression, HAVING clause, or ORDER BY clause that is directly contained in a qualified query of that set function. However, restrictions apply when specifying a set function in an ORDER BY clause. For details about the restrictions, see (2) Rules for specifying value expressions as sort keys in 7.24.2 Rules for specifying a sort specification list in an ORDER BY clause.
If the value expression specified as an aggregated argument is not an independent column specification, you cannot specify multiple inverse distribution functions in the same query specification.

Example of an SQL statement that generates an error:
```
SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY "C1"+"C2"),
       PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY "C1"+"C2")
    FROM "T1"
```
Because the value expression specified as an aggregated argument is not an independent column specification, multiple inverse distribution functions cannot be specified.
If you specify multiple inverse distribution functions in the same query specification, the column specifications provided as aggregated arguments must reference the same column.

Example of an SQL statement that generates an error:
```
SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY "C1"),
       PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY "C2")
    FROM "T1"
```
The column specifications in inverse distribution functions must reference the same column.
Note
The following is an example of where multiple inverse distribution functions can be specified.

Example:
```
SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY "C1"),
       PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY "GC1")
    FROM "T1"
    GROUP BY "C1" AS "GC1"
```
Because the column specifications in inverse distribution functions reference the same grouping column, this statement does not result in an error.
If you specify multiple inverse distribution functions in the same query specification, make sure that the same order specification is provided for the sort specification in the WITHIN group specification in all of the functions.

Example of an SQL statement that generates an error:
```
SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY "C1" ASC),
       PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY "C1" DESC)
    FROM "T1" 
```
You cannot specify a dynamic parameter as the value expression specified for the aggregated argument in a set function.

Example of an SQL statement that generates an error:
```
SELECT MAX(CASE WHEN "C1">? THEN "C1" ELSE "C1"*100 END) FROM "T1"
```
You cannot specify a dynamic parameter in the underlined portion.
You cannot specify a set function, a subquery, a window function, or the RANDOMROW scalar function inside of a set function.

Example of an SQL statement that generates an error:
```
SELECT SUM(CASE WHEN MAX("C1")>10000 THEN "C1" END) FROM "T1"
```
You cannot specify a set function in the underlined portion.
A single query specification can include a maximum of 64 DISTINCT set functions with different aggregated arguments (excluding the DISTINCT set functions specified in window functions).
If you specify something other than a single column specification as the value expression specified for the aggregated argument in a set function, you cannot specify an external reference column in that value expression.

Example of an SQL statement that generates an error:
```
SELECT SUM("C1") FROM "T1"
   HAVING EXISTS(SELECT * FROM "T2" WHERE AVG("T1"."C2"*1.05)>"C2")
```
You cannot specify an external reference column in the underlined portion.
The input to the set function will be the results from the last-specified clause among the FROM, WHERE, and GROUP BY clauses. If a GROUP BY clause is specified, the results for each group will be input to the set function.
When a set function is used in a window function, the input to the set function will be the set of rows included in the window frame of the current row.
You cannot specify an external reference column as the aggregated argument to an inverse distribution function.
No more than one sort specification is permitted in the sort specification list specified in an inverse distribution function.

To Page Top

(3) Common considerations

If a DISTINCT set function or inverse distribution function is specified, a work table might be created. If the size of the work table DB area where the work table is created has not been estimated correctly, it might result in performance degradation. For details about estimating the size of the work table DB area, see the HADB Setup and Operation Guide. For details about work tables, see Considerations when executing an SQL statement that creates work tables in the HADB Application Development Guide.
If global hash grouping is applied as the method of grouping performed during deduplication of DISTINCT set functions, a derived table might be created. HADB automatically assigns a correlation name in the following format to the derived table:
```
##DRVTBL_xxxxxxxxxx
```
In the preceding format, xxxxxxxxxx is a 10-digit integer.

For details about global hash grouping, see Global hash grouping in Hash grouping in the HADB Application Development Guide.
If a GROUP BY clause or HAVING clause is specified, no execution results are output for groups where the number of input rows is 0.

To Page Top