Hitachi

Hitachi Advanced Database Setup and Operation Guide


11.4.13 Checking whether a multi-chunk table needs to be reorganized

This section describes how to check whether a multi-chunk table needs to be reorganized.

Note
Organization of this subsection

(1) Reason why table reorganization is necessary

If the following operations are repeatedly performed for a multi-chunk table, the retrieval performance and data storage efficiency of the table are degraded:

In such a case, if you reorganize the multi-chunk table, the area that has become unavailable in the data DB area is released, thus improving the retrieval performance and data storage efficiency of the table.

Retrieval performance and data storage efficiency of a multi-chunk table are degraded for the following reasons:

In a case where the multi-chunk table is a row store table

The reasons in the case of a single-chunk table also apply. For details about the reasons, see In a case where the single-chunk table is a row store table in (1) Reason why table reorganization is necessary in 11.1.9 Checking whether a single-chunk table needs to be reorganized.

In a case where the multi-chunk table is a column store table

The reasons in the case of a single-chunk table also apply. For details about the reasons, see In a case where the single-chunk table is a column store table in (1) Reason why table reorganization is necessary in 11.1.9 Checking whether a single-chunk table needs to be reorganized.

Reasons common to a row store table and column store table

If the adbimport command is repeatedly executed to perform background import of a small amount of data for a multi-chunk table, part of free space might become unavailable. Such space is free but unavailable (useless) because it is never reused. Such space degrades the data storage efficiency, resulting in lower retrieval performance of the table.

(2) How to check whether reorganization is necessary

This section describes how to check whether a multi-chunk table needs to be reorganized.

If the multi-chunk table is a row store table and SQL statements are used to perform update or deletion of rows repeatedly for the table, check whether the table needs to be reorganized as explained in (a) In a case where the multi-chunk table is a row store table.

If the multi-chunk table is a column store table and SQL statements are used to perform addition, update, or deletion of rows repeatedly for the table, check whether the table needs to be reorganized as explained in (b) In a case where the multi-chunk table is a column store table.

If the adbimport command is used to repeatedly perform background import of a small amount of data for the table, check whether the table needs to be reorganized as explained in (c) If a small amount of data is repeatedly added by background import.

(a) In a case where the multi-chunk table is a row store table

Calculate the data storage efficiency. If data storage is inefficient, reorganize the multi-chunk table. Check the data storage efficiency on a chunk basis. The following shows the procedure.

Procedure

  1. Obtain the number of segments being used by the current chunk.

    Use the adbdbstatus command to output the usage information. Then, check the value for Used_segments (number of segments being used) by using Chunk_ID (chunk ID) as a key. For details about the adbdbstatus command, see adbdbstatus (Analyze the Database Status) in the manual HADB Command Reference.

  2. Obtain the number of segments in the data DB area based on the number of data items in the current chunk.

    First check the number of data items in the current chunk, and then obtain the number of segments in the data DB area.

    You can use the subcommand #GETCOUNT of the adbsql command to check the number of data items in the current chunk. Execute the subcommand #GETCOUNT of the adbsql command by using the chunk ID you checked in step 1. For details about the adbsql command, see adbsql (Execute SQL Statements) in the manual HADB Command Reference.

    For details about how to determine the number of segments in the data DB area, see (f) Determining the variable SGROWTBL (for a multi-chunk table) in (2) Explanation of variables in 5.8.1 Determining the total number of pages in the data DB area.

    Note that the formula to determine the value of the SGROWTBL variable obtains the summation results for all chunks in all multi-chunk tables that are stored in the data DB area. In this step, obtain the value only for the target chunk for the target table.

  3. Determine the data storage efficiency based on the obtained results.

    Determine the data storage efficiency in the chunks based on the results of steps 1 and 2. The following shows the formula:

    Formula

    [Figure]

    If the result of the preceding calculation is close to 0, data storage efficiency is poor. In such a case, reorganize the chunk. There are several reorganization operations. Determine the operation to be used by referring to (3) Selecting the reorganization method.

(b) In a case where the multi-chunk table is a column store table

Execute the adbdbstatus command, check information about the need for reorganization, and determine whether to reorganize the multi-chunk table. Use the following procedure for determination.

Procedure

  1. Execute the adbdbstatus command with the -d reorginfo option specified to output information about the need for reorganization.

  2. In the information about the need for reorganization, check the Reorganization_necessity section (whether reorganization is necessary).

    • If Recommended is output

      Reorganize the chunks for which Recommended was output. There are several reorganization operations. Determine the operation to be used by referring to (3) Selecting the reorganization method.

    • If Not_recommended is output

      You do not need to reorganize the chunks for which Not_recommended was output.

For details about the adbdbstatus command, see adbdbstatus (Analyze the Database Status) in the manual HADB Command Reference.

(c) If a small amount of data is repeatedly added by background import

Calculate the data storage efficiency. If data storage efficiency is poor, reorganize the multi-chunk table. The following shows the formula for calculating the data storage efficiency.

Formula

[Figure]

The closer the value obtained from the preceding formula is to 1, the better the storage efficiency of chunk data. Conversely, the closer the value obtained from the formula is to 0, the worse the storage efficiency of chunk data. If the result is close to 0, reorganize the multi-chunk table.

Explanation of variables

A

The value to be substituted differs depending on the reorganization method.

  • Chunk-based reorganization

    Substitute the number of pages used in the chunk.

    To check the number of used pages in the chunk, execute the adbdbstatus command with the -d used option specified. The information about the usage of DB areas, tables, and indexes can be output. From the information about the usage of DB areas, tables, and indexes, check the content of Used_pages based on Chunk_ID of the chunk for which you want to check the storage efficiency.

  • Reorganization of an entire table

    Substitute the number of pages used in the table.

    To check the number of pages used in the table, execute the adbdbstatus command to output the table summary information. In the table summary information, check the value of Used_pages.

B

Page size of the data DB area that stores a multi-chunk table (kilobytes)

To check the page size of a data DB area, execute the adbdbstatus command with the -c dbarea option specified. The DB area summary information can be output. From the DB area summary information, check the content of Page_size for the data DB area whose page size you want to check.

C

The value to be substituted differs depending on the reorganization method.

  • Chunk-based reorganization

    Substitute the number of segments used in the chunk.

    To check the number of used segments in a chunk, execute the adbdbstatus command with the -d used option specified. The information about the usage of DB areas, tables, and indexes can be output. From the information about the usage of DB areas, tables, and indexes, check the content of Used_segments based on Chunk_ID of the chunk for which you want to check the storage efficiency.

    Important

    The data storage efficiency does not improve even by reorganizing a chunk in which only one segment is used. If the number of segments used in the reorganization-target chunk is 1 or close to 1, use either of the following methods to reorganize the data:

  • Reorganization of an entire table

    Substitute the number of segments used in the table.

    To check the number of segments used in the table, execute the adbdbstatus command to output the table summary information. In the table summary information, check the value of Used_segments.

(3) Selecting the reorganization method

There are several methods for reorganizing a multi-chunk table, as shown in the following table. You can select a method only if the conditions indicated are met.

Important

Note that if you reorganize an entire multi-chunk table (especially when the table has many data items), reorganization might take time. Therefore, we recommend that you periodically perform chunk-based reorganization rather than reorganizing the entire multi-chunk table (that is, the method in item 1 in the following table is recommended).

Table 11‒7: Reorganization methods for a multi-chunk table

No.

Basis of reorganization

Availability of retrieval in the table and chunk configuration

Condition required for reorganization

Reorganization method

1

Chunk-based

(1) Retrieval in the table during reorganization is possible.#1

(2) The configuration, statuses, and comments of the chunks before reorganization can be maintained.

  1. Free space in the data DB area

    Both the reorganization-target data and reorganization-result data must be temporarily stored. Therefore, the data DB area must have as large a free space as the reorganization-target chunk.

    For example, to reorganize a chunk that stores 1 TB of data, free space of 1 TB is required in the data DB area.#2

  2. Space required at the destination of the output data file

    There must be free disk space sufficient for storing the output data file to which the chunk data will be exported.

  3. Space required at the destination of the temporary work file

    A temporary work file is temporarily created when data is imported. Therefore, there must be free disk space sufficient for storing the temporary work file.

Perform reorganization by using the procedure described in (1) Reorganizing a table while continuing retrieval in 11.4.14 Reorganizing a multi-chunk table: Chunk-based reorganization.

2

(1) Retrieval in the table during reorganization is not possible.

(2) The configuration, statuses, and comments of the chunks before reorganization can be maintained.

Conditions 2 and 3 in item 1 must be met.

If only the current chunk is to be reorganized, the data DB area must have as large a free space as the data stored in the current chunk.

Perform reorganization by using the procedure described in (2) Reorganizing a table after stopping retrieval in 11.4.14 Reorganizing a multi-chunk table: Chunk-based reorganization.

3

Entire table

(1) Retrieval in the table during reorganization is possible.#1

(2) The configuration, statuses, and comments of the chunks before reorganization can be maintained.

  1. Free space in the data DB area

    Both the reorganization-target data and reorganization-result data must be temporarily stored. Therefore, the data DB area must have as large a free space as the largest chunk in the reorganization-target table.

    For example, if the table has three chunks whose sizes are 1 TB, 2 TB, and 3 TB respectively, the data DB area must have free space of 3 TB.

  2. Space required at the destination of the output data file

    Because the data is exported in units of chunks, there must be free disk space sufficient for storing the output data file to which the chunk data will be exported.

  3. Space required at the destination of the temporary work file

    The data is imported in units of chunks and a temporary work file is temporarily created when the data is imported. Therefore, there must be free disk space sufficient for storing the temporary work file.

Perform reorganization by using the procedure described in (1) In a case where you want to maintain the current chunk configuration after reorganization in 11.4.15 Reorganizing a multi-chunk table: Reorganization of an entire table.

4

(1) Retrieval in the table during reorganization is not possible.

(2) Chunks can be merged into one.

  1. Space required at the destination of the output data file

    All the data in the table is exported at one time. There must be free disk space sufficient for storing the output data file to which all table data will be exported.

  2. Space required at the destination of the temporary work file

    All the data in the table is imported at one time. A temporary work file is temporarily created when all table data is imported. Therefore, there must be free disk space sufficient for storing the temporary work file.

Perform reorganization by using the procedure described in (2) In a case where the table data can be stored to one chunk after reorganization in 11.4.15 Reorganizing a multi-chunk table: Reorganization of an entire table.

#1

The PURGE CHUNK statement is executed in a step of the reorganization procedure. Retrieval cannot be performed in the table while the PURGE CHUNK statement is being executed.

#2

If there are two chunks to be reorganized and their sizes are 1 TB and 2 TB respectively, the data DB area must have free space of 2 TB.