Hitachi

Hitachi Advanced Database SQL Reference


7.7.1 Specification format and rules for GROUP BY clauses

Specify the GROUP BY clause when you want to aggregate the retrieval data into groups.

Organization of this subsection

(1) Specification format

GROUP-BY-clause ::= GROUP BY [grouping-method-specification] grouping-specification[, grouping-specification]...
 
  grouping-method-specification ::= /*>> WITHOUT GLOBAL HASH GROUPING <<*/
  grouping-specification ::= value-expression [[AS] column-name]

(2) Explanation of specification format

grouping-method-specification
grouping-method-specification ::= /*>> WITHOUT GLOBAL HASH GROUPING <<*/

When grouping-method-specification is specified, global hash grouping is not used as the processing method for the grouping.

Normally there is no need to specify this.

For details about grouping methods, see Grouping Methods in the HADB Application Development Guide.

Note that the character string enclosed in /*>> and <<*/ is not a comment. An error results if you specify something other than a grouping method specification.

grouping-specification
grouping-specification ::= value-expression [[AS] column-name]

Specifies a group by which the retrieval data is to be aggregated, in the form of a value expression. For details about value expressions, see 7.20 Value expression.

GROUP BY clauses are illustrated in the following examples. Example 1:

This example aggregates the retrieval data by product code (PUR-CODE)
GROUP BY "PUR-CODE"
Example 2: This example aggregates the retrieval data by month
GROUP BY EXTRACT(MONTH FROM "SALE-DAY") AS "GMONTH"

The SALE-DAY column stores the sale date of the product in DATE type format. The scalar function EXTRACT is used to extract the month part of the SALE-DAY column.

[AS] column-name:

The column name specified here becomes the grouping column name.

Example:

GROUP BY SUBSTR("C1",5,2) AS "GC1"

In the preceding example, GC1 becomes the grouping column name.

Note

A column derived from the result of a GROUP BY clause is called a grouping column. The column name assigned to the grouping column is called a grouping column name.

Example 1:

GROUP BY "C1"

In the preceding example, the underlined item becomes a grouping column with a grouping column name of C1.

Example 2:

GROUP BY "T1"."C1"

In the preceding example, the underlined item becomes a grouping column with a grouping column name of C1.

Example 3:

GROUP BY "C1" AS "GC1"

In the preceding example, the underlined item becomes a grouping column with a grouping column name of GC1.

Example 4:

GROUP BY SUBSTR("C1",5,2) AS "GC1"

In the preceding example, the underlined item becomes a grouping column with a grouping column name of GC1.

Example 5:

GROUP BY SUBSTR("C1",5,2)

In the preceding example, the underlined item becomes a grouping column. No name is assigned to the grouping column.

Example 6:

GROUP BY "C1","C2"

In the preceding example, the underlined items become grouping columns. Two grouping columns are created. These grouping columns are named C1 and C2.

Example 7:

GROUP BY 1 AS "GC1"

In the preceding example, the underlined item becomes a grouping column with a grouping column name of GC1.

Example 8:

GROUP BY 1

In the preceding example, the underlined item becomes a grouping column. No name is assigned to the grouping column.

(3) Rules

  1. The maximum number of grouping columns is 64.

  2. Set functions are not permitted in value-expression.

  3. Subqueries are not permitted in value-expression.

  4. Dynamic parameters are not permitted in value-expression.

  5. The column name of a column specification included alone in another grouping column cannot be specified in AS column-name.

    Example that generates an error:

    GROUP BY "C1","C3" AS "C1"
    GROUP BY "C1" AS "C2","C3" AS "C1"
  6. Do not specify a character string in the EXPnnnn_NO_NAME format as the column name in AS column-name in a grouping specification. Such a column name might duplicate a derived column name that is automatically set by HADB. In this format, nnnn is an unsigned integer in the range from 0000 to 9999.

  7. Each column name specified in AS column-name must be unique.

    Example that generates an error:

    GROUP BY "C1"+1 AS "GC1","C2"+1 AS "GC1"
  8. The column name specified in AS column-name cannot be referenced from a subquery in the selection expression or from a subquery in a HAVING clause.

    Example that generates an error:

    SELECT "GC1",MAX("C2") FROM "T1"
        GROUP BY SUBSTR("C1",5,2) AS "GC1"
        HAVING EXISTS(SELECT * FROM "T2" WHERE "T2"."C1"="GC1")
  9. When a GROUP BY clause is specified, only the following items can be specified in the selection expression:

    1. Grouping column name

    2. Set function

    3. Value specification

    4. Scalar subquery

    5. Value expression specifying at least one of the preceding items

    6. Same value expression that is included in a grouping specification (value expression including a column specification)

    Example of a correct specification:

    SELECT "C1","C2",COUNT(*)Selection expression contains grouping column names and set functions
        FROM "T1"
            GROUP BY "C1","C2"

    Example that generates an error:

    SELECT "C1","C2",COUNT(*)Selection expression includes column C2, which is not a grouping column name
        FROM "T1"
            GROUP BY "C1"
  10. The column specifications in the GROUP BY clause must meet the following conditions:

    • They must specify columns from tables specified in the FROM clause of the table expression in which the GROUP BY clause is specified

    • The column names must be uniquely identified

    For example, consider what happens when the SELECT statement shown below is executed.

    Example:

    SELECT "SALESLIST"."USERID",SUM("PUR-NUM")
        FROM "SALESLIST","USERSLIST"
            WHERE "PUR-CODE"='P002'
            AND "SALESLIST"."USERID"="USERSLIST"."USERID"
            GROUP BY "SALESLIST"."USERID"

    The sales history table (SALESLIST) and the customer table (USERSLIST) both have USERID columns with the same column name. In this case, if you want to specify the USERID column in the GROUP BY clause, you must do so in a way that uniquely specifies which USERID column is intended. Therefore, you cannot specify GROUP BY "USERID". Instead, specify the column qualified with a table name, as in GROUP BY "SALESLIST"."USERID".

  11. The grouping column referenced by the column specified in the value expression in a selection expression or in the value expression in the HAVING clause is determined by the following priority. A smaller number indicates a higher priority level. (1 is the highest.)

    1. If the column name is the same as a grouping column name

    Example of a correct specification:

    SELECT "C1"+"C2" FROM "T1" GROUP BY "C1"+"C2" AS "C1","C2"

    In the preceding example, C1 in the selection expression is the same as the name of grouping column C1. Therefore, grouping column "C1"+"C2" AS "C1" is referenced.

    C2 in the selection expression is the same as the name of grouping column C2. Therefore, grouping column C2 is referenced.

    2. If there is a grouping column that has a single column specification, or if there is a grouping column whose value expressions are specified in the same format

    Example of a correct specification (1):

    SELECT "C1"+"C2" FROM "T1" GROUP BY "C1"+"C2" AS "C3"

    In the preceding example, C1 and C2 in the selection expression reference grouping column "C1"+"C2" AS "C3" because the value expressions of grouping column "C1"+"C2" AS "C3" have the same format.

    Example of a correct specification (2):

    SELECT "GC1","C1" FROM "T1" GROUP BY "C1" AS "GC1"

    In the preceding example, C1 in the selection expression references grouping column "C1" AS "GC1", which has the same value expression format as C1.

    Also, because GC1 in the selection expression is the same as a grouping column name, the SQL statement in the preceding example meets condition 1 (the column name is the same as a grouping column name) shown earlier. Therefore, GC1 also references the grouping column "C1" AS "GC1".

    3. Specification order of grouping columns (former item has higher priority)

    Example of a correct specification:

    SELECT "C1"+"C2" FROM "T1" GROUP BY "C1"+"C2" AS "C3","C1"+"C2"

    In the preceding example, the value expressions of grouping columns "C1"+"C2" AS "C3" and "C1"+"C2" have the same format. In this case, because the former item has the higher priority, C1 and C2 in the selection expression reference grouping column "C1"+"C2" AS "C3".

    ■ Specification example that generates an error

    SELECT "C1"+"C2" FROM "T1" GROUP BY "C1"+"C2" AS "C1"

    In the preceding example, C1 in the selection expression references the grouping column that corresponds to the underlined grouping column name (C1). However, C2 in the selection expression does not have a grouping column name with the same name. Therefore, the preceding SQL statement will result in an error.

  12. We recommend that you do not specify the same name that was specified in the value expression of a grouping specification as the grouping column name in the same grouping specification. If the column name specified in the value expression of a grouping specification is the same as a grouping column name, an unintended grouping column might be referenced.

    Example:

    SELECT "C1"+1 FROM "T1" GROUP BY "C1"+1 AS "C1"

    In the preceding example, C1 is specified in the value expression of a grouping specification, and C1 is also specified as the grouping column name. In this case, C1 in the selection expression references grouping column "C1"+1 AS "C1".

  13. A value expression that includes a column specification provided as a grouping column cannot be referenced from a subquery in a selection expression or a subquery in the HAVING clause.

    Example that generates an error:

    SELECT "T1"."C1"+"T1"."C2" FROM "T1"
        GROUP BY "T1"."C1"+"T1"."C2"
        HAVING (SELECT "T2"."C1" FROM "T2"
                   WHERE "T2"."C1" > "T1"."C1"+"T1"."C2") > 0

    "T1"."C1"+"T1"."C2" specified in a subquery in the HAVING clause cannot reference grouping column "T1"."C1"+"T1"."C2". Therefore, the preceding SQL statement will result in an error.

  14. Grouping columns are restricted to the following data types:

    • Character string data

    • Numeric data

    • Datetime data

    • Binary data

  15. The results of the WHERE clause are grouped using the GROUP BY clause. For details about the order in which the results of the table expression are derived, see (3) Rules in 7.4.1 Specification format and rules for table expressions.

  16. Let set T denote the results of the preceding WHERE clause (or the preceding FROM clause if no WHERE clause is specified).

    • When the GROUP BY clause is specified, set T will be divided into multiple groups (where each group is a set with identical values in the grouping column). Because duplicate rows are then eliminated from each group, the number of groups created will be the same as the number of rows in the results of the GROUP BY clause.

      Note that when there are NULL values in the grouping column, all the null values are treated alike and placed in a single group.

    • If the GROUP BY clause is omitted, but a HAVING clause or set function is specified in the query specification, it creates a single group consisting of set T in its entirety.

(4) Examples

Example 1

Using the data in the sales history table (SALESLIST), this example determines the number of purchases for each customer.

SELECT "USERID",COUNT(*) AS "COUNT"
    FROM "SALESLIST"
        GROUP BY "USERID"

The underlined portion indicates the GROUP BY clause.

Example of execution results

[Figure]

Example 2

Using the data in the sales history table (SALESLIST), this example determines the sum and average of the quantities purchased on or after September 3, 2011 by product code (PUR-CODE).

SELECT "PUR-CODE",SUM("PUR-NUM") AS "SUM",AVG("PUR-NUM") AS "AVG"
    FROM "SALESLIST"
        WHERE "PUR-DATE">=DATE'2011-09-03'
        GROUP BY "PUR-CODE"

The underlined portion indicates the GROUP BY clause.

Example of execution results

[Figure]

Example 3

Using the data in the sales history table (SALESLIST) and the customer table (USERSLIST), this example determines, for each customer, the sum of quantities purchased (PUR-NUM) on or after September 4, 2011 for product code P002.

SELECT "NAME",SUM("PUR-NUM") AS "SUM"
    FROM "SALESLIST","USERSLIST"
        WHERE "PUR-DATE">=DATE'2011-09-04'
        AND "PUR-CODE"='P002'
        AND "SALESLIST"."USERID"="USERSLIST"."USERID"
        GROUP BY "NAME"

The underlined portion indicates the GROUP BY clause.

Example of execution results

[Figure]

Example 4

Using the data in the employee table (EMPLIST), this example organizes the employees' ages into 10-year groups and determines the number of employees in each group. Employees age 60 and over are grouped with the 60-year-old group.

SELECT "GAGE",COUNT(*) AS "COUNT"
    FROM "EMPLIST"
        GROUP BY CASE WHEN "AGE">=60 THEN 60
                      ELSE TRUNC("AGE",-1)
                 END AS "GAGE"

The underlined portion indicates the GROUP BY clause.

Example of execution results

[Figure]

Example 5

Using the data in the sales history table (SALESLIST), this example determines the sales amounts from 2013 on a monthly basis.

  • The SALE-DAY column stores the sale date of the product in DATE type format.

  • The AMOUNT column stores the price at which the customer purchased the product.

SELECT "GMONTH",SUM("AMOUNT") AS "SUM"
    FROM "SALESLIST"
        WHERE EXTRACT(YEAR FROM "SALE-DAY")=2013
        GROUP BY EXTRACT(MONTH FROM "SALE-DAY") AS "GMONTH"

The underlined portion indicates the GROUP BY clause.

Example of execution results

[Figure]

(5) Notes

  1. If the GROUP BY clause is specified, a work table might be created. If the size of the work table DB area where the work table is to be created has not been estimated correctly, performance might be degraded. For details about estimating the size of the work table DB area, see the HADB Setup and Operation Guide. For details about work tables, see Considerations when executing an SQL statement that creates work tables in the HADB Application Development Guide.

  2. If global hash grouping is used as the grouping method, a hash table area of an appropriate size is required. The size of the hash table area is specified in the operand adb_sql_exe_hashtbl_area_size in the server definition or client definition.

    In addition, if local hash grouping is used as the grouping method, a hash group area of an appropriate size is required. The size of the hash group area is specified in the operand adb_sql_exe_hashgrp_area_size in the server definition or client definition.

    For details about grouping methods, see the HADB Application Development Guide.