Nonstop Database, HiRDB Version 9 Command Reference
(1) Retrieval specifying OR in the retrieval condition (1)
(a) Checking the display
For a single-table retrieval, check to see if TABLE SCAN is used, resulting in poor performance.
(b) Better method
If the same conditions are specified on both sides of OR, take the condition outside OR.
(c) Reason
- Single-table retrieval
- If an index is defined for the predicate that was taken outside, the search range can be narrowed in the index, thereby reducing the search time.
- Join retrieval
- If join conditions are connected with OR, obtain the direct product (CROSS JOIN), then evaluate the join condition for the resulting direct product. If the predicate taken out is a join condition, the utility executes nested loop join or merge join, thereby reducing the number of data match operations during join processing. If the joined column has an index defined, the utility can use the index during join processing, thereby reducing the number of input/output operations.
(d) Examples
OR should be converted to the IN predicate in some cases. For details, see (2) Retrieval specifying OR in the retrieval condition (2).
- Single-table retrieval
- Index defined for C1
SELECT * FROM T1 WHERE (C1=10 AND C2=20) OR (C1=10 AND C2=30)
SELECT * FROM T1 WHERE C1=10 AND (C2=20 OR C2=30)
- If a multicolumn index is defined for (C1,C2) because C1=10 cannot be narrowed sufficiently, performance may be better if C1=10 is kept inside.
- Join retrieval
SELECT * FROM T1, T2
WHERE (T1.C1=T2.C1 AND T1.C2=10) OR (T1.C1=T2.C1 AND T2.C2=20)
SELECT * FROM T1, T2 WHERE T1.C1=T2.C1 AND (T1.C2=10 OR T2.C2=20)
(2) Retrieval specifying OR in the retrieval condition (2)
(a) Checking the display
If the condition is specified with OR for the same column, check to see if performance is poor because there is no search condition or valid search condition to narrow the index search condition.
(b) Better method
The better method depends on whether the HiRDB version is 06-02 or later, the HiRDB version is earlier than 06-01 and a single-column index is defined, or the HiRDB version is earlier than 06-01 and a multicolumn index is defined.
- When the HiRDB version is 06-02 or later
Use the IN predicate if the = predicate is specified for the same column on both sides of the OR operator and an index is defined for the column.
- When the HiRDB version is earlier than 06-01 and a single-column index is defined
Use the IN predicate if the = predicate is specified for the same column on both sides of the OR operator and a single-column index is defined for the column.
- When the HiRDB version is earlier than 06-01 and a multicolumn index is defined
Use the OR operator if the IN predicate is specified for the column which is the first component column of the multicolumn index.
(c) Reason
- When the HiRDB version is 06-02 or later
For a single-column index or multicolumn index, the utility searches the index for each value specification in the IN predicate (search condition type ATS or RANGES), narrowing the index search range; therefore, the CPU time and the number of input/output operations can be reduced.
- When the HiRDB version is earlier than 06-01 and a single-column index is defined
For a single-column index, the utility searches the index for each value specification in the IN predicate (search condition type ATS), narrowing the index search range; therefore, the CPU time and the number of input/output operations can be reduced.
- When the HiRDB version is earlier than 06-01 and a multicolumn index is defined
For a multicolumn index, the utility searches the minimum value to the maximum value specified in the IN predicate (if the value specification includes a non-literal value, the utility references all leaf pages); therefore, the index search range is wide.
If it is converted to an OR operation, Scan Type may be OR PLURAL INDEXES SCAN. In such a case, the utility searches the index for the left-hand term of the OR operator separately from the right-hand term; therefore, the index search range is narrowed and the input/output time is reduced.
(d) Examples
- When a single-column index is defined for T1(C1) or a multicolumn index is defined for T1(C1,C2) (when the HiRDB version is 06-02 or later)
SELECT * FROM T1 WHERE C1=10 OR C1=20
SELECT * FROM T1 WHERE C1 IN (10,20)
- When a single-column index is defined for T1(C1) (when the HiRDB version is earlier than 06-01)
SELECT * FROM T1 WHERE C1=10 OR C1=20
SELECT * FROM T1 WHERE C1 IN (10,20)
- When a multicolumn index is defined for T1(C1,C2) (C1 is the first component column) (when the HiRDB version is earlier than 06-01)
SELECT * FROM T1 WHERE C1 IN (10,20)
SELECT * FROM T1 WHERE C1=10 OR C1=20
If the following two conditions are satisfied, use the IN predicate even for a multicolumn index whose HiRDB version is earlier than 06-01:
- The item specification in the IN predicate consists of literals only.
- The item specification in the IN predicate contains only a few key values from minimum to maximum.
(3) Join retrieval specifying OR in the join condition
(a) Checking the display
If Join Type is CROSS JOIN, internal direct product processing is taking place. If an OR operator is used in the join condition, you may be able to achieve efficient retrieval by replacing it with equivalent SQL using the set operations separately, specifying both sides of OR (UNION or UNION ALL).
(b) Better method
If join conditions are connected with OR, use the set operations separately, specifying both sides of OR.
- DISTINCT specified in selection expression
- Use UNION.
- DISTINCT not specified in selection expression
- Use UNION ALL.
(c) Reason
If join conditions are connected with OR, the utility obtains the direct product of the tables to be joined, then evaluate the join conditions for the resulting direct product.
If the OR operator is separated, the utility executes nested loop join or merge join, thereby reducing the number of data match operations during join processing.
If nested loop join is executed, the utility can use the index during join processing, thereby reducing the number of input/output operations.
(d) Examples
- DISTINCT specified
SELECT DISTINCT * FROM T1, T2
WHERE (T1.C1=T2.C1 OR T1.C2=T2.C2) AND T1.C3=10
SELECT * FROM T1, T2 WHERE T1.C1=T2.C1 AND T1.C3=10
UNION
SELECT * FROM T1, T2 WHERE T1.C2=T2.C2 AND T1.C3=10
- DISTINCT not specified
SELECT * FROM T1, T2 WHERE (T1.C1=T2.C1 OR T1.C2=T2.C2) AND T1.C3=10
SELECT * FROM T1, T2 WHERE T1.C1=T2.C1 AND T1.C3=10
UNION ALL
SELECT * FROM T1, T2 WHERE T1.C2=T2.C2
AND (T1.C1<>T2.C1 OR T1.C1 IS NULL OR T2.C1 IS NULL) AND T1.C3=10
- If UNION is specified, the utility internally creates a work table based on the result of each query specification. In this case, the utility executes a sort operation to eliminate the duplicate values.
If UNION ALL is specified, the utility also internally creates a work table based on the result of each query specification. In this case, however, the utility does not execute a sort operation.
- If the range of the outer table is not narrowed by limitations, the utility does not use an index during join processing, even when the index is defined for the column subject to join processing. The table for which an index is defined for a column subject to join processing is the inner table, and the other table is the outer table.
- For the example for DISTINCT not specified in (d) previously, the IS NULL predicate is not needed if T1.C1 and T1.C2 are NOT NULL columns or the null value is not inserted. When DISTINCT is specified in the selection expression, but an expected result is not obtained, use UNION for the set operation, not UNION ALL.
(4) Retrieval with a range predicate
(a) Better method
If an AND operation joins the >= and <= predicates for the same column, the utility converts it to the BETWEEN predicate prior to SQL optimization processing. In this case, the utility converts predicates sequentially beginning with the first one specified in the WHERE clause.
Inside HiRDB, the condition joining the >= and <= predicates for the same column in the AND operation is treated as being equal to the BETWEEN predicate; therefore, the result is the same, whichever is specified.
Check to see if the >= and <= predicates form a range.
(b) Examples
- Two different SQL statements selecting the same access path
SELECT * FROM T1 WHERE C1 BETWEEN 10 AND 20
SELECT * FROM T1 WHERE C1 >=10 AND C1 <= 20
- HiRDB's internal conversion
SELECT * FROM T1 WHERE C1<=50 AND C1>=20 AND C1<=30
SELECT * FROM T1 WHERE C1 BETWEEN 20 AND 50 AND C1<=30
A redundant predicate results in unneeded condition checking during database access. You should delete any redundant predicate.
(5) Retrieval specifying GROUP BY
(a) Better method
If you are executing GROUP BY retrieval on multiple columns in a HiRDB parallel server configuration, specify the grouping column with fewer duplicate values in the GROUP BY clause. Note that this is not applicable when 1,024 is specified in the SQL optimization option.
(b) Reason
When executing GROUP BY processing using multiple floating servers in a HiRDB parallel server configuration, the utility executes hashing on the first column in the GROUP BY clause to distribute data to the floating servers. If the first column in the GROUP BY clause has only a few duplicate values, the utility can hash evenly to each floating server, thereby improving the effects of parallel processing.
(c) Example
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C1,C2
If (C1's duplicates count) > (C2's duplicates count), the utility changes this as follows:
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C2,C1
If 1024 is specified in the SQL optimization option, the utility uses all grouping columns for hashing; therefore, there is no need to change the GROUP BY order. However, if 17.6.2(4) Index definitions for a table used for retrieval specifying ORDER BY or GROUP BY applies, do not change the order of grouping columns.
(6) Retrieval specifying both GROUP BY and ORDER BY
(a) Better method
- There is an index that includes all GROUP BY columns as its component columns (for a HiRDB single server configuration)
- Specify the GROUP BY columns in the order of the index component columns.
- There is no such index that includes all GROUP BY columns as its component columns
- If all ORDER BY columns are included in the GROUP BY columns during retrieval specifying both ORDER BY and GROUP BY, specify the GROUP BY columns in the order of the ORDER BY columns.
(b) Reason
- There is an index that includes all GROUP BY columns as its component columns (for a HiRDB single server configuration)
- The sort operation can be omitted for GROUP BY processing. In this case, the utility still executes a sort operation for ORDER BY processing, but retrieval performance improves because the sort operation is executed on the grouped data (fewer data items).
- There is no such index that includes all GROUP BY columns as its component columns
- If the order of columns is the same in the GROUP BY clause and in the ORDER BY clause, the sort operation can be omitted for ORDER BY processing.
(c) Examples
- There is an index that includes all GROUP BY columns as its component columns (for a HiRDB single server configuration)
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C1,C2 ORDER BY C1,C2
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C2,C1 ORDER BY C1,C2
- There is no such index that includes all GROUP BY columns as its component columns
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C1,C2 ORDER BY C2,C1
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C2,C1 ORDER BY C2,C1
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C1,C2 ORDER BY C2 DESC
SELECT C1,C2,SUM(C3) FROM T1 GROUP BY C2,C1 ORDER BY C2 DESC
For the GROUP BY clause, define an index described in 17.6.2(4) Index definitions for a table used for retrieval specifying ORDER BY or GROUP BY, if possible.
All Rights Reserved. Copyright (C) 2011, 2015, Hitachi, Ltd.