Hitachi

Hitachi Advanced Database Setup and Operation Guide


11.1.9 Checking whether a single-chunk table needs to be reorganized

This section describes how to check whether a single-chunk table needs to be reorganized.

Note

For details about how to check whether a multi-chunk table needs to be reorganized, see 11.4.13 Checking whether a multi-chunk table needs to be reorganized.

Organization of this subsection

(1) Reason why table reorganization is necessary

If an SQL statement that inserts, updates, or deletes a row is repeatedly executed for a single-chunk table, the retrieval performance and data storage efficiency of the table are degraded. In such a case, if you reorganize the table, the area that has become unavailable in the data DB area is released, thus improving the retrieval performance and data storage efficiency of the table.

Retrieval performance and data storage efficiency of a single-chunk table might be degraded for the following reasons.

In a case where the single-chunk table is a row store table

When a row is deleted by the DELETE statement, the data on the deleted row becomes invalid. When a row is updated by the UPDATE statement, the resulting row data is added and the previous row data becomes invalid. Invalid row data is not deleted automatically from the disk.

If you repeat deletion and update of rows, invalid row data increases in the table. If invalid row data increases, it might affect as follows:

  • Data storage efficiency drops due to the increase in the amount of invalid row data.

  • Retrieval performance slows down because the number of pages to be referenced increases during retrieval processing.

In a case where the single-chunk table is a column store table
  • When the INSERT statement is used to insert rows or the UPDATE statement is used to update rows, the resulting data is stored in row store format. If you repeat insertion and update of rows, the amount of data that is stored in row store format increases. Increase of data stored in row store format will degrade the performance of retrieval processing that is suitable for a column store table. Examples of such processing include retrieval for a specific column on all rows and retrieval for a specific column on the rows within a specific range, such as a year or month.

  • When rows are deleted by the DELETE statement, the information about the deleted rows is stored on invalid row information pages. Also, when rows are updated by the UPDATE statement, the information about the rows that existed before they were updated is stored on invalid row information pages.

    If you repeat deletion and update of rows, the number of invalid row information pages increases and the retrieval performance is degraded because the number of pages to be referenced during retrieval increases.

If the single-chunk table is a row store table, check whether table reorganization is necessary by referring to (2) If the single-chunk table is a row store table.

If the single-chunk table is a column store table, check whether table reorganization is necessary by referring to (3) If the single-chunk table is a column store table.

(2) If the single-chunk table is a row store table

Determine the data storage efficiency, and if it is poor, reorganize the table. Use the following formula to determine the data storage efficiency:

Formula

[Figure]

Explanation of variables

A: Number of rows in the table

B: Number of invalid rows

The data storage efficiency is better the closer that the value obtained from the formula is to 1. Conversely, the data storage efficiency is worse the closer that the value obtained from the formula is to 0. When the obtained value is close to 0, reorganize the data in the table as explained in 11.1.10 Reorganizing a single-chunk table.

Determine variables A and B from the following procedure.

Procedure (determining variables A and B)

  1. Make sure that the SQL trace information is set to be output.

    Make sure that the access path statistical information of the SQL trace information is output. You can check whether the access path statistical information is output by checking the specification of the server definition for SQL tracing. For details, see (2) Specifying server definitions in 10.11.5 Preparations for outputting SQL trace information.

  2. Execute the following SELECT statement:

    SELECT COUNT(*) FROM "table-name" /*>> WITHOUT INDEX <<*/

    The execution result of the preceding SELECT statement becomes the number of rows in the table (the value of variable A).

  3. Check the access path statistical information.

    Check the access path statistical information for the SELECT statement executed in step 2. Check the value for Data_deleted_rows_cnt in the access path statistical information. This value becomes the number of invalid rows (the value of variable B).

If the number of invalid rows (the value of variable B) is unknown, use the following procedure to determine the data storage efficiency.

Procedure (when the value of variables B is unknown)

  1. Check the number of segments used by the table.

    Use the adbdbstatus command to output summary information of the table. Then, check the output value for Used_segments (number of segments used by the table). For details about the adbdbstatus command, see adbdbstatus (Analyze the Database Status) in the manual HADB Command Reference.

  2. Obtain the number of segments in the data DB area based on the number of rows in the table.

    First obtain the number of rows in the table, and then obtain the number of segments in the data DB area. For details about how to obtain the number of rows in the table, see the explanation of variable A above.

    For details about how to determine the number of segments in the data DB area, see (a) Determining the variable SGROWTBL (for a single-chunk table) in (2) Explanation of variables in 5.8.1 Determining the total number of pages in the data DB area.

    The variable SGROWTBL is a formula to obtain the summation results for all tables that are stored in the data DB area. In this step, obtain the value only for the target table. The number of rows in the table is used in variables BP(i) and VP(i), which are used in the formula for variable SGROWTBL.

  3. Determine the data storage efficiency based on the obtained results.

    Determine the data storage efficiency based on the results of steps 1 and 2. The following shows the formula:

    Formula

    [Figure]

    If the result of the preceding calculation is close to 0, reorganize the table as explained in 11.1.10 Reorganizing a single-chunk table.

(3) If the single-chunk table is a column store table

Execute the adbdbstatus command, check information about the need for reorganization, and determine whether to reorganize the single-chunk table. Use the following procedure for determination.

Procedure

  1. Execute the adbdbstatus command with the -d reorginfo option specified to output information about the need for reorganization.

  2. In the information about the need for reorganization, check the Reorganization_necessity section (whether reorganization is necessary).

    • If Recommended is output

      Reorganize the single-chunk table. For details about the reorganization procedure, see 11.1.10 Reorganizing a single-chunk table.

    • If Not_recommended is output

      You do not need to reorganize the single-chunk table.

For details about the adbdbstatus command, see adbdbstatus (Analyze the Database Status) in the manual HADB Command Reference.