Hitachi

Hitachi Advanced Database Setup and Operation Guide


2.14.2 Managing data in data-import units (chunks)

To implement background import, HADB manages as a single unit the data that is stored in a base table by each background-import operation. This unit is called a chunk.

Organization of this subsection

(1) Chunk overview

HADB can manage multiple chunks. It assigns a unique number to each chunk within the base table. This number is called a chunk ID.

Each time background import is executed, a new chunk is created. Among the created chunks, the one to which data is to be added is called the current chunk. There is only one current chunk in a base table.

When a new chunk is created by an action such as execution of background import, the current chunk changes (that is, the chunk to which data is to be added changes). This process of changing the current chunk is called current chunk swapping.

The following figure shows the relationship between background import and the current chunk.

Figure 2‒42: Relationship between background import and the current chunk

[Figure]

Explanation

When table T1 is defined as a new base table, a blank chunk (chunk 1) is created as the current chunk.

Executing the first data-import operation on table T1 stores data. The cluster of data that is stored is managed as chunk 1.

Executing the second data-import operation on table T1 as background import creates a new chunk (chunk 2), and data is stored. The cluster of data that is stored is managed as chunk 2, and the current chunk changes from chunk 1 to chunk 2.

Note
■ Timing at which current chunk swapping occurs

The current chunk changes in the following cases:

• When background import is executed

• When multiple chunks, including the current chunk, are merged

Additionally, the current chunk might be changed in the following case:

• When the chunk status is changed

■ Relationship between chunks and row insertion or row updating

When the INSERT statement is used to insert a row into a base table in which data has been stored by means of background import, the inserted row is stored in the current chunk. On the other hand, when the UPDATE statement is used to update a row, the updated row is stored in the chunk in which the update-target row is located.

■ Chunk creation and chunk IDs when indexes are defined

When indexes are defined for the base table into which data is to be imported, the data in the index created during background import is also managed in units of chunks.

For example, when indexes (B-tree, text, or range indexes) are defined for the base table into which data is to be imported, chunks are created separately for the data in the base table and the data of the indexes (B-tree, text, range indexes) created during background import.

HADB manages chunks in the background-import units. Therefore, the same chunk ID is assigned to the above-mentioned chunks.

The following figure provides an overview of chunks and chunk IDs when indexes are defined.

Figure 2‒43: Overview of chunk creation and chunk IDs when indexes are defined

[Figure]

■ Relationship between the number of created chunks and the number of indexes

When indexes are defined for base tables into which data is to be imported, the number of chunks created in a single background-import operation equals the sum total of the base tables into which data is to be imported and the number of indexes defined for those base tables.

For example, if two B-tree indexes, one text index, and one range index are defined for a base table, the number of chunks created in a single background-import operation would be five.

In HADB, there is a limit to the number of chunks that can be managed in a single DB area and a single base table. For details about the maximum number of chunks in a single DB area and the maximum number of chunks that can be created in a single base table, see D.1 Maximum and minimum values related to system configuration.

■ Relationship between chunks and DB areas

Background import can be executed regardless of whether tables and indexes are stored in the same DB area or in separate DB areas. In either case, a separate chunk is created for each base table and index, and a single chunk ID is assigned. The following figure shows the relationship between chunks and DB areas.

Figure 2‒44: Relationship between chunks and DB areas

[Figure]

(2) Merging chunks

Multiple chunks created by executing background import can be merged into a single new chunk. Merging chunks can reduce the total number of chunks being used.

Chunks can be merged even if there is a user who is executing retrieval on a chunk designated for merging.

Important
■ Relationship between the merging of chunks and the performance of retrieval using B-tree indexes and text indexes

If the number of chunks increases as a result of repeated background import, the performance of retrieval using B-tree indexes and text indexes might decline. Reducing the total number of chunks by merging them can prevent such deterioration of retrieval performance.

■ Relationship between the merging of chunks and the number of chunks that can be created

In HADB, there is a limit to the number of chunks that can be managed in a single DB area and a single base table. Reducing the total number of chunks by merging them can prevent this upper limit from being reached. For details about the maximum number of chunks in a single DB area and the maximum number of chunks that can be created in a single base table, see D.1 Maximum and minimum values related to system configuration.

The following figure shows an example of merging multiple chunks.

Figure 2‒45: Example of merging multiple chunks

[Figure]

Explanation

Three chunks which include the current chunk (chunks 1 through 3) are merged. After the merger, the three chunks are managed as a single new chunk (chunk 4). In this case, the current chunk changes from chunk 3 to chunk 4.

Note

If the current chunk is not included in the chunks designated for merging, the current chunk does not change.

(3) Deleting chunks

Data stored in a base table can be deleted in units of chunks. When a chunk is deleted, its data storage area can be reused, unlike when a row is deleted. Note, however, that the current chunk cannot be deleted.

The following figure shows an example of deleting chunks and reusing the storage area.

Figure 2‒46: Example of deleting chunks and reusing the storage area

[Figure]

Explanation

When chunk 1 is deleted, the area that was used by chunk 1 can be reused. When background import is executed on table T1 after chunk 1 has been deleted, a new chunk (chunk 4) is created in the area that was used by chunk 1, and data is stored in it.

(4) Changing the chunk status

A chunk can be in one of several statuses.

When you perform background import, you can specify the status of the chunks to be created. You can also change the chunk status after background import.

The following table shows the chunk statuses that are managed by the HADB server.

Table 2‒19: Chunk statuses

No.

Chunk status

Explanation

1

Normal status

When a chunk is in normal status, the data contained in the chunk can be manipulated by a data manipulation SQL statement. Therefore, data contained in chunks that are in normal status can be retrieved.

A chunk in normal status is created when background import is executed.

You can also change a chunk's status from normal status to wait status.

2

Wait status

When a chunk is in wait status, the data it contains cannot be manipulated by a data manipulation SQL statement. Therefore, such data cannot be retrieved. However, you can delete the chunk by using the PURGE CHUNK or TRUNCATE TABLE statement.

A chunk in wait status is created when you specify this status for the chunk to be created during performance of background import.

You can also change a chunk's status from wait status to normal status.

Note that a chunk in wait status cannot become the current chunk.

Using a chunk in wait status, you can perform the following operations:

  • You can use background import to store data in a chunk that is in wait status, and then change the chunk's status to normal status at a desired time so that the data can be retrieved.

  • You can make data stored in a chunk in normal status non-retrievable by changing the chunk's status from normal status to wait status at a desired time.

3

Delete-pending status

A delete-pending chunk is a chunk whose data cannot be manipulated by a data manipulation SQL statement. Therefore, such data cannot be retrieved. However, you can delete the chunk by using the PURGE CHUNK or TRUNCATE TABLE statement.

A delete-pending chunk is an unneeded (merge-source) chunk that remains without being deleted because merge chunk processing was interrupted.

You cannot change the status of a delete-pending chunk.

Unlike chunks in wait status, delete-pending chunks are not needed and therefore must be manually deleted.

Note:

Depending on the status of a chunk, the SQL statements and commands that can be executed on the data contained in the chunk vary. For details, see (2) Relationship of chunk statuses with SQL statements and commands that can be executed in 11.4.12 Changing the chunk status.

The following figure shows an example of changing the statuses of multiple chunks that were created by executing background import.

Figure 2‒47: Example of changing chunk statuses

[Figure]

Explanation

In this example, the status of two chunks (chunks 1 and 2) is changed from normal status to wait status. Once the chunk status changes to wait status, the data stored in chunks 1 and 2 can no longer be retrieved.

Additionally, a chunk in wait status (chunk 3) is changed to normal status. After the chunk status has become normal status, the data stored in chunk 3 can be retrieved.