Nonstop Database, HiRDB Version 9 Command Reference

[Contents][Index][Back][Next]

17.6.3 Notes about index retrieval

Organization of this subsection
(1) Retrieval specifying OR in the retrieval condition (1)
(2) Retrieval specifying OR in the retrieval condition (2)
(3) Join retrieval specifying OR in the join condition
(4) Retrieval with a range predicate
(5) Retrieval specifying GROUP BY
(6) Retrieval specifying both GROUP BY and ORDER BY

(1) Retrieval specifying OR in the retrieval condition (1)

(a) Checking the display

For a single-table retrieval, check to see if TABLE SCAN is used, resulting in poor performance.

(b) Better method

If the same conditions are specified on both sides of OR, take the condition outside OR.

(c) Reason

Single-table retrieval
If an index is defined for the predicate that was taken outside, the search range can be narrowed in the index, thereby reducing the search time.

Join retrieval
If join conditions are connected with OR, obtain the direct product (CROSS JOIN), then evaluate the join condition for the resulting direct product. If the predicate taken out is a join condition, the utility executes nested loop join or merge join, thereby reducing the number of data match operations during join processing. If the joined column has an index defined, the utility can use the index during join processing, thereby reducing the number of input/output operations.
(d) Examples

OR should be converted to the IN predicate in some cases. For details, see (2) Retrieval specifying OR in the retrieval condition (2).

Single-table retrieval
Index defined for C1
SELECT * FROM T1 WHERE (C1=10 AND C2=20) OR (C1=10 AND C2=30)
             [Figure]
SELECT * FROM T1 WHERE C1=10 AND (C2=20 OR C2=30)
If a multicolumn index is defined for (C1,C2) because C1=10 cannot be narrowed sufficiently, performance may be better if C1=10 is kept inside.

Join retrieval
SELECT * FROM T1, T2
   WHERE (T1.C1=T2.C1 AND T1.C2=10) OR (T1.C1=T2.C1 AND T2.C2=20)
              [Figure]
SELECT * FROM T1, T2 WHERE T1.C1=T2.C1 AND (T1.C2=10 OR T2.C2=20)

(2) Retrieval specifying OR in the retrieval condition (2)

(a) Checking the display

If the condition is specified with OR for the same column, check to see if performance is poor because there is no search condition or valid search condition to narrow the index search condition.

(b) Better method

The better method depends on whether the HiRDB version is 06-02 or later, the HiRDB version is earlier than 06-01 and a single-column index is defined, or the HiRDB version is earlier than 06-01 and a multicolumn index is defined.

(c) Reason
(d) Examples
(e) Notes

If the following two conditions are satisfied, use the IN predicate even for a multicolumn index whose HiRDB version is earlier than 06-01:

(3) Join retrieval specifying OR in the join condition

(a) Checking the display

If Join Type is CROSS JOIN, internal direct product processing is taking place. If an OR operator is used in the join condition, you may be able to achieve efficient retrieval by replacing it with equivalent SQL using the set operations separately, specifying both sides of OR (UNION or UNION ALL).

(b) Better method

If join conditions are connected with OR, use the set operations separately, specifying both sides of OR.

DISTINCT specified in selection expression
Use UNION.

DISTINCT not specified in selection expression
Use UNION ALL.
(c) Reason

If join conditions are connected with OR, the utility obtains the direct product of the tables to be joined, then evaluate the join conditions for the resulting direct product.

If the OR operator is separated, the utility executes nested loop join or merge join, thereby reducing the number of data match operations during join processing.

If nested loop join is executed, the utility can use the index during join processing, thereby reducing the number of input/output operations.

(d) Examples

DISTINCT specified
SELECT DISTINCT * FROM T1, T2
    WHERE (T1.C1=T2.C1 OR T1.C2=T2.C2) AND T1.C3=10
            [Figure]
SELECT * FROM T1, T2 WHERE T1.C1=T2.C1 AND T1.C3=10
  UNION
SELECT * FROM T1, T2 WHERE T1.C2=T2.C2 AND T1.C3=10

DISTINCT not specified
SELECT * FROM T1, T2 WHERE (T1.C1=T2.C1 OR T1.C2=T2.C2) AND T1.C3=10
            [Figure]
SELECT * FROM T1, T2 WHERE T1.C1=T2.C1 AND T1.C3=10
  UNION ALL
SELECT * FROM T1, T2 WHERE T1.C2=T2.C2
    AND (T1.C1<>T2.C1 OR T1.C1 IS NULL OR T2.C1 IS NULL) AND T1.C3=10
(e) Notes
  1. If UNION is specified, the utility internally creates a work table based on the result of each query specification. In this case, the utility executes a sort operation to eliminate the duplicate values.
    If UNION ALL is specified, the utility also internally creates a work table based on the result of each query specification. In this case, however, the utility does not execute a sort operation.
  2. If the range of the outer table is not narrowed by limitations, the utility does not use an index during join processing, even when the index is defined for the column subject to join processing. The table for which an index is defined for a column subject to join processing is the inner table, and the other table is the outer table.
  3. For the example for DISTINCT not specified in (d) previously, the IS NULL predicate is not needed if T1.C1 and T1.C2 are NOT NULL columns or the null value is not inserted. When DISTINCT is specified in the selection expression, but an expected result is not obtained, use UNION for the set operation, not UNION ALL.

(4) Retrieval with a range predicate

(a) Better method

If an AND operation joins the >= and <= predicates for the same column, the utility converts it to the BETWEEN predicate prior to SQL optimization processing. In this case, the utility converts predicates sequentially beginning with the first one specified in the WHERE clause.

Inside HiRDB, the condition joining the >= and <= predicates for the same column in the AND operation is treated as being equal to the BETWEEN predicate; therefore, the result is the same, whichever is specified.

Check to see if the >= and <= predicates form a range.

(b) Examples

Two different SQL statements selecting the same access path
SELECT * FROM T1 WHERE C1 BETWEEN 10 AND 20
SELECT * FROM T1 WHERE C1 >=10 AND C1 <= 20

HiRDB's internal conversion
SELECT * FROM T1 WHERE C1<=50 AND C1>=20 AND C1<=30
              [Figure]
SELECT * FROM T1 WHERE C1 BETWEEN 20 AND 50 AND C1<=30
(c) Notes

A redundant predicate results in unneeded condition checking during database access. You should delete any redundant predicate.

(5) Retrieval specifying GROUP BY

(a) Better method

If you are executing GROUP BY retrieval on multiple columns in a HiRDB parallel server configuration, specify the grouping column with fewer duplicate values in the GROUP BY clause. Note that this is not applicable when 1,024 is specified in the SQL optimization option.

(b) Reason

When executing GROUP BY processing using multiple floating servers in a HiRDB parallel server configuration, the utility executes hashing on the first column in the GROUP BY clause to distribute data to the floating servers. If the first column in the GROUP BY clause has only a few duplicate values, the utility can hash evenly to each floating server, thereby improving the effects of parallel processing.

(c) Example
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C1,C2

If (C1's duplicates count) > (C2's duplicates count), the utility changes this as follows:

SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C2,C1
(d) Note

If 1024 is specified in the SQL optimization option, the utility uses all grouping columns for hashing; therefore, there is no need to change the GROUP BY order. However, if 17.6.2(4) Index definitions for a table used for retrieval specifying ORDER BY or GROUP BY applies, do not change the order of grouping columns.

(6) Retrieval specifying both GROUP BY and ORDER BY

(a) Better method

There is an index that includes all GROUP BY columns as its component columns (for a HiRDB single server configuration)
Specify the GROUP BY columns in the order of the index component columns.

There is no such index that includes all GROUP BY columns as its component columns
If all ORDER BY columns are included in the GROUP BY columns during retrieval specifying both ORDER BY and GROUP BY, specify the GROUP BY columns in the order of the ORDER BY columns.
(b) Reason

There is an index that includes all GROUP BY columns as its component columns (for a HiRDB single server configuration)
The sort operation can be omitted for GROUP BY processing. In this case, the utility still executes a sort operation for ORDER BY processing, but retrieval performance improves because the sort operation is executed on the grouped data (fewer data items).

There is no such index that includes all GROUP BY columns as its component columns
If the order of columns is the same in the GROUP BY clause and in the ORDER BY clause, the sort operation can be omitted for ORDER BY processing.
(c) Examples

There is an index that includes all GROUP BY columns as its component columns (for a HiRDB single server configuration)
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C1,C2 ORDER BY C1,C2
           [Figure]
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C2,C1 ORDER BY C1,C2

There is no such index that includes all GROUP BY columns as its component columns
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C1,C2 ORDER BY C2,C1
           [Figure]
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C2,C1 ORDER BY C2,C1
 
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C1,C2 ORDER BY C2 DESC
           [Figure]
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C2,C1 ORDER BY C2 DESC
(d) Note

For the GROUP BY clause, define an index described in 17.6.2(4) Index definitions for a table used for retrieval specifying ORDER BY or GROUP BY, if possible.