7.23.16 Common rules and considerations for set functions
- Organization of this subsection
(1) Explanation of terms
-
A general set function in which DISTINCT is specified is called a DISTINCT set function. A general set function in which ALL is specified is called an ALL set function.
-
The following value expressions are called the aggregated arguments of a set function.
-
In the case of the inverse distribution functions PERCENTILE_CONT and PERCENTILE_DISC, the value expression specified as the sort key in the WITHIN group specification
Example:
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY "C1") FROM "T1"
The underlined portion indicates the aggregated argument.
-
For a ARRAY_AGG set function, the value expression specified as the argument and the value expression specified as the sort key in the sort specification list
(Example)
SELECT "C1",ARRAY_AGG("C2" ORDER BY "C3") AS "C2_LIST" FROM "T1" GROUP BY "C1"The underlined portion indicates the aggregated argument.
-
For a LISTAGG set function, the value expression specified for the argument, and the value expression specified for the sort key in the WITHIN group specification
(Example)
SELECT LISTAGG("C1",'|') WITHIN GROUP(ORDER BY "C2") AS "C1_LIST" FROM "T1"The underlined portion indicates the aggregated argument.
-
For set functions other than the above, the value expression specified as the argument of the set function
(Example)
SELECT "C1",SUM("C2") FROM "T1" GROUP BY "C1"The underlined portion indicates the aggregated argument.
-
-
A column specification within an aggregated argument is called an aggregated column specification.
Example:
SELECT "C1",SUM("C2"+1) FROM "T1" GROUP BY "C1"The underlined portion indicates the aggregated column specification.
-
A query specification that directly contains a FROM clause that contains a table reference that is referenced by an aggregated column specification is called a qualified query of that aggregated column specification.
Example:
SELECT "C1",SUM("T1"."C2") FROM "T1" GROUP BY "C1"The underlined portion (the entire query) indicates the qualified query.
The following example illustrates a qualified query with an external reference.
- Example:
-
SELECT "C1" FROM "T1" GROUP BY "C1" HAVING EXISTS ...[1] (SELECT * FROM "T2" WHERE MAX("T1"."C2")>"T2"."C1")- Explanation
-
-
The aggregated column specification in the set function MAX("T1"."C2") is "T1"."C2".
-
The table referenced by "T1"."C2" is "T1".
-
The query specification whose FROM clause directly contains "T1" is the part indicated by [1].
-
The qualified query is therefore the query specification in [1].
-
(2) Common rules
-
A set function can be specified in a selection expression, HAVING clause, or ORDER BY clause that is directly contained in a qualified query of that set function. However, restrictions apply when specifying a set function in an ORDER BY clause. For details about the restrictions, see (2) Rules for specifying value expressions as sort keys in 7.25.2 Rules for specifying a sort specification list in an ORDER BY clause.
-
If the value expression specified as an aggregated argument is not an independent column specification, you cannot specify multiple inverse distribution functions in the same query specification.
Example of an SQL statement that generates an error:
SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY "C1"+"C2"), PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY "C1"+"C2") FROM "T1"Because the value expression specified as an aggregated argument is not an independent column specification, multiple inverse distribution functions cannot be specified.
-
If you specify multiple inverse distribution functions in the same query specification, the column specifications provided as aggregated arguments must reference the same column.
Example of an SQL statement that generates an error:
SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY "C1"), PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY "C2") FROM "T1"The column specifications in inverse distribution functions must reference the same column.
- Note
-
The following is an example of where multiple inverse distribution functions can be specified.
Example:
SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY "C1"), PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY "GC1") FROM "T1" GROUP BY "C1" AS "GC1"Because the column specifications in inverse distribution functions reference the same grouping column, this statement does not result in an error.
-
If you specify multiple inverse distribution functions in the same query specification, make sure that the same order specification is provided for the sort specification in the WITHIN group specification in all of the functions.
Example of an SQL statement that generates an error:
SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY "C1" ASC), PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY "C1" DESC) FROM "T1" -
You cannot specify a dynamic parameter as the value expression specified for the aggregated argument in a set function.
Example of an SQL statement that generates an error:
SELECT MAX(CASE WHEN "C1">? THEN "C1" ELSE "C1"*100 END) FROM "T1"
You cannot specify a dynamic parameter in the underlined portion.
-
You cannot specify a set function, a subquery, a window function, or the RANDOMROW scalar function inside of a set function.
Example of an SQL statement that generates an error:
SELECT SUM(CASE WHEN MAX("C1")>10000 THEN "C1" END) FROM "T1"You cannot specify a set function in the underlined portion.
-
A single query specification can include a maximum of 64 DISTINCT set functions with different aggregated arguments (excluding the DISTINCT set functions specified in window functions).
-
If you specify something other than a single column specification as the value expression specified for the aggregated argument in a set function, you cannot specify an external reference column in that value expression.
Example of an SQL statement that generates an error:
SELECT SUM("C1") FROM "T1" HAVING EXISTS(SELECT * FROM "T2" WHERE AVG("T1"."C2"*1.05)>"C2")You cannot specify an external reference column in the underlined portion.
-
The input to the set function will be the results from the last-specified clause among the FROM, WHERE, and GROUP BY clauses. If a GROUP BY clause is specified, the results for each group will be input to the set function.
-
When a set function is used in a window function, the input to the set function will be the set of rows included in the window frame of the current row.
-
You cannot specify an external reference column for the following aggregated argument.
-
Aggregated argument of the inverse distribution function
-
Aggregated argument for a ARRAY_AGG set function
-
Aggregated argument for a LISTAGG set function
-
-
No more than one sort specification is permitted in the following sort specification list.
-
Sort specification list permitted in an inverse distribution function
-
Sort specification list for a ARRAY_AGG set function
-
Sort specification list for a LISTAGG set function
-
(3) Common considerations
-
If a DISTINCT set function, inverse distribution function, ARRAY_AGG set function, or LISTAGG set function is specified, a work table might be created. If the size of the work table DB area where the work table is created has not been estimated correctly, it might result in performance degradation. For details about estimating the size of the work table DB area, see the HADB Setup and Operation Guide. For details about work tables, see Considerations when executing an SQL statement that creates work tables in the HADB Application Development Guide.
-
If a DISTINCT set function, ARRAY_AGG set function, or LISTAGG set function is specified, a derived table might be created. HADB automatically assigns a correlation name in the following format to the derived table:
##DRVTBL_xxxxxxxxxx
In the preceding format, xxxxxxxxxx is a 10-digit integer.
-
If a GROUP BY clause or HAVING clause is specified, no execution results are output for groups where the number of input rows is 0.