11.4.24 Operation taking background import and chunks into consideration (Example 2: Using chunks in wait status)
This subsection provides an operation example of background import that uses chunks in wait status. When you use chunks in wait status, you can retrieve data or prevent it from being retrieved at a desired time, as explained below.
Explanations in this example assume that operations are taking place in the following system:
-
A main office collects data on products handled by shops nationwide from 10 branch offices, and analyzes the collected product data.
-
The product data collected by the main office is managed centrally using a product table.
-
Since the product data changes every month, the coming month's product data is sent from each branch office by the 25th of each month.
-
The next month's product data is imported into the database as it arrives at the main office.
-
The imported product data for the next month needs to be excluded from analysis until the next month begins.
-
The product data for the current month is analyzed even as the next month's product data is being imported into the database.
-
When the next month begins, the product data that was excluded from analysis becomes subject to analysis. Conversely, the product data that had been subject to analysis is now excluded from analysis.
-
The database retains the product data that is now excluded from analysis for 6 months.
-
To prevent the amount of data from becoming unmanageable, product data older than six months is backed up and then deleted.
The following figure shows the flow of operations in this system.
- Organization of this subsection
(2) Adding the next month's product data to chunks in wait status (background import)
(4) Merging chunks in wait status (to reduce the number of chunks)
(5) Changing the chunk status (switching the product data to be analyzed)
(8) Repeatedly adding new data, merging, changing status, and deleting old data
(1) Defining a multi-chunk table and indexes
To manage product data centrally, you need to define a product table (ITEMLIST) and indexes.
(a) Defining the product table (ITEMLIST)
Define the product table (ITEMLIST) to be stored in the DB area TBLDAT_002. Also, define the product table as a multi-chunk table to perform background import.
In this operation example, the following number of chunks is required:
-
The chunk containing the current month's product data that is to be analyzed (one chunk)
-
The chunks containing product data sent from branch offices (11 chunks)
Because there are 10 branch offices, 10 chunks are needed to import the data of those branch offices. In addition, one chunk is needed to merge those 10 chunks. Therefore, 11 chunks are needed in total.
-
The chunks containing product data that are not subject to analysis (six chunks)
Assuming a safety factor of 1.2, the maximum number of chunks that can be created is as follows:
maximum-number-of-chunks-to-be-created = ↑(1 + 11 + 6) × 1.2 (safety factor)↑ = 22
- Note
-
If the number of chunks increases, this negatively impacts retrieval performance. Therefore, we recommend that you design and operate your database system to minimize the number of chunks that are used.
Based on this estimate, specify 22 as the maximum number of chunks. Specify this value for CHUNK of the chunk specification in the CREATE TABLE statement. The following shows an SQL statement example.
- ■ SQL statement example
-
CREATE TABLE "ITEMLIST" ("SHOP_ID" INT,"DELIVERY_TIME" TIMESTAMP, "ITEM_ID" INT, "ITEM_NAME" VARCHAR(100)) IN "TBLDAT002" CHUNK=22
For details about the CREATE TABLE statement, see CREATE TABLE (define a table) in Definition SQL in the manual HADB SQL Reference.
(b) Defining indexes for the product table (ITEMLIST)
Define a B-tree index and a range index for the product table. Define these indexes so that they will be stored in DB areas IDXDAT_002 and RIXDAT_002, respectively.
Define a B-tree index for the product ID (ITEM_ID column) and a range index for the delivery time (DELIVERY_TIME column). The following shows an SQL statement example.
- ■ SQL statement example
-
CREATE INDEX "ITEM_ID_IDX" ON "ITEMLIST"("ITEM_ID") IN "IDXDAT_002" EMPTY CREATE INDEX "DELIVERY_TIME_RIX" ON "ITEMLIST"("DELIVERY_TIME") IN "RIXDAT_002" EMPTY INDEXTYPE RANGE
For details about the CREATE INDEX statement, see CREATE INDEX (define an index) in Definition SQL in the manual HADB SQL Reference.
(c) Table used in this operation example
The following figure shows a portion of the table used in this operation example.
(2) Adding the next month's product data to chunks in wait status (background import)
The next month's product data is sent from each branch office by the 25th of each month. At the main office, the next month's product data is added to the product table (ITEMLIST) as it arrives from the branch offices.
Background import is used to add the next month's product data. During this process, the next month's product data needs to be kept excluded from analysis until the next month begins. Therefore, the data is added as chunks in wait status.
The group of data that is stored in the multi-chunk table in a single background import operation is managed together with its indexes as a group (chunk). For details about a chunk, see 2.14.2 Managing data in data-import units (chunks).
- Note
-
By managing data in units of chunks, you can acquire backups and delete data on a chunk-by-chunk basis. When you perform background import or merge chunks, you can add a comment to a chunk. Adding a comment to a chunk makes it easier to identify a certain chunk you want to manage.
To perform background import to create a chunk in wait status, use the adbimport command with the -b and --status wait options specified. The following shows an example of executing the adbimport command.
- ■ Command execution example
-
adbimport -u ADBUSER01 -p '#HelloHADB_01' -k "'" -s , -g 10 -w /home/adbmanager/tmp -z /home/adbmanager/imp_file/imp_opt_file02.txt -b --status wait -m 'October 2014' ITEMLIST /home/adbmanager/imp_file/imp_data_path02.txt
- Explanation
-
Use background import to import the contents of the input data file, set up in the input data path file (/home/adbmanager/imp_file/imp_data_path02.txt), into the product table (ITEMLIST).
Whenever the next month's product data arrives from the ten branch offices, background import is performed. As a result, ten chunks in wait status are created.
For details about the adbimport command, see adbimport (Import Data) in the manual HADB Command Reference. Also, see 11.4.4 Temporarily excluding data to be imported to a multi-chunk table from retrieval (creating a chunk in wait status).
(3) Retrieving data
Even while the next month's product data is being added to the database, the current month's product data is retrieved and analyzed.
By using background import to add data, you can import data even if another user is retrieving data from the same database (data can be imported during data retrieval).
The following figure shows the relationship between background import and data retrieval.
Although the current month's product data can be analyzed, the next month's product data cannot be analyzed until the new month begins because it is excluded from analysis (chunks in wait status cannot be retrieved).
(4) Merging chunks in wait status (to reduce the number of chunks)
Performing background import whenever the next month's product data arrives from the branch offices creates multiple chunks in wait status. When these chunks in wait status are changed to chunks in normal status to be included in the analysis, the increase in the number of chunks might degrade the retrieval performance.
Therefore, by merging the multiple chunks in wait status into a single chunk, you can prevent retrieval performance degradation that tends to result from an increase in the number of chunks. In this example, all product data that has been sent from the branch offices is first added to the database and is then merged into a single chunk.
(a) Identifying the chunks in wait status to be merged
To identify the chunk IDs of the chunks in wait status to be merged, retrieve the system table STATUS_CHUNKS as explained below.
The following show an SQL statement example.
- ■ SQL statement example
-
SELECT "CHUNK_ID" FROM "MASTER"."STATUS_CHUNKS" TC WHERE "TABLE_SCHEMA"='ADBUSER01' AND "TABLE_NAME"='ITEMLIST' AND "CHUNK_COMMENT"='October 2014'
- ■ Retrieval result
-
CHUNK_ID -------------------- 2 (omitted) 11
- Explanation
-
In this example, the chunk IDs (2 to 11) of the chunks to be merged are identified by using the chunk comment ('October 2014'), added during background import, as the key during retrieval of the STATUS_CHUNKS table.
Check the chunks to be merged and their chunk IDs based on the explanation in 11.4.8 Checking the chunk status and the number of chunks created.
(b) Merging chunks in wait status
To merge chunks in wait status, use the adbmergechunk command. The following shows an execution example of the adbmergechunk command.
- ■ Command execution example
-
adbmergechunk -u ADBUSER01 -p '#HelloHADB_01' -g 2 -w /home/adbmanager/tmp -z /home/adbmanager/merge_file/merge_opt_file02.txt -m 'October 2014_merge' -c 2-11 ITEMLIST
- Explanation
-
Using the values specified in the merge chunk option file (/home/adbmanager/merge_file/merge_opt_file02.txt) as the input information, perform chunk merging on the product table (ITEMLIST) as specified in the -c option of the adbmergechunk command.
In this example, the next month's product data (the data for October 2014) is grouped together by merging chunks with chunk IDs 2 through 11 into a single chunk in wait status.
The following figure shows an example of merging chunks in wait status.
Chunks with chunk IDs 2 through 11 are automatically deleted when the adbmergechunk command terminates normally.
- Note
-
Merging chunks in wait status can prevent degradation in retrieval performance when these chunks are changed to chunks in normal status. There is also a limit to the number of chunks that can be managed by a table or DB area. If the number of chunks is insufficient, you can reduce the number by merging multiple chunks into a single chunk.
For details about the adbmergechunk command, see adbmergechunk (Merge Chunks) in the manual HADB Command Reference. Also, see 11.4.9 Merging chunks (to reduce the number of chunks).
(5) Changing the chunk status (switching the product data to be analyzed)
When the new month begins, the product data that was excluded from analysis is made subject to analysis. Conversely, the product data that had been subject to analysis is now excluded from analysis.
When the new month begins, the chunk status is changed as follows:
-
Chunks in normal status containing the product data that was subject to analysis are changed to chunks in wait status to exclude them from analysis.
-
Chunks in wait status containing the product data that was excluded from analysis are changed to chunks in normal status to make them subject to analysis.
In this example, the product data for September 2014 is now excluded from analysis, and the product data for October 2014 is made subject to analysis.
(a) Identifying the chunk ID of the chunk whose status is to be changed
To identify the chunk ID of the chunk whose status is to be changed, retrieve the system table STATUS_CHUNKS as explained in the following.
First, identify the chunk ID of the product data for September 2014.
- ■ SQL statement example (changing a chunk in normal status to a chunk in wait status)
-
SELECT "CHUNK_ID" FROM "MASTER"."STATUS_CHUNKS" TC WHERE "TABLE_SCHEMA"='ADBUSER01' AND "TABLE_NAME"='ITEMLIST' AND "CHUNK_COMMENT"='September 2014_merge'
- ■ Retrieval result
-
CHUNK_ID -------------------- 1
- Explanation
-
In this example, the chunk ID of the chunk containing the product data for September 2014 is identified using the chunk comment ('September 2014_merge') that was added during chunk merging, as the key during retrieval of the STATUS_CHUNKS table.
Next, identify the chunk ID of the product data for October 2014.
- ■ SQL statement example (changing a chunk in wait status to a chunk in normal status)
-
SELECT "CHUNK_ID" FROM "MASTER"."STATUS_CHUNKS" TC WHERE "TABLE_SCHEMA"='ADBUSER01' AND "TABLE_NAME"='ITEMLIST' AND "CHUNK_COMMENT"='October 2014_merge'
- ■ Retrieval result
-
CHUNK_ID -------------------- 12
- Explanation
-
In this example, the chunk ID of the chunk containing the product data for October 2014 is identified using the chunk comment ('October 2014_merge') that was added during chunk merging, as the key during retrieval of the STATUS_CHUNKS table.
(b) Changing the chunk status
To change the chunk status, use the adbchgchunkstatus command. Chunk statuses are changed as described in this subsection.
-
Changing a chunk in normal status to a chunk in wait status
Below, the chunk containing the product data for September 2014 (chunk ID: 1) is the one whose status is changed.
-
Changing a chunk in wait status to a chunk in normal status
Below, the chunk containing the product data for October 2014 (chunk ID: 12) is the one whose status is changed.
The following is an execution example of the adbchgchunkstatus command.
- ■ Command execution example
-
adbchgchunkstatus -u ADBUSER01 -p '#HelloHADB_01' -w 1 -n 12 ITEMLIST
- Explanation
-
The chunk status in the product table (ITEMLIST) is changed as specified in the -w and -n options.
In this example, the chunk with chunk ID 1 is changed to wait status, and the chunk with chunk ID 12 is changed to normal status.
The following figure shows an example of changing chunk statuses.
- Explanation
-
The chunk containing the product data for September 2014 (chunk ID: 1) is excluded from analysis, and the chunk containing the product data for October 2014 (chunk ID: 12) becomes subject to analysis.
For details about the adbchgchunkstatus command, see adbchgchunkstatus (Change Chunk Status) in the manual HADB Command Reference. Also, see 11.4.12 Changing the chunk status.
(6) Backing up data
To prevent the amount of data from becoming unmanageable, data older than 6 months is backed up and then deleted. The database is backed up every month.
(a) Identifying the chunks with the earliest creation date and time
Chunks that are older than 6 months (chunks with the earliest creation date and time) are identified in the product table (ITEMLIST). To identify these chunks in the product table, retrieve the system table STATUS_CHUNKS as described below.
The following shows an SQL statement example.
- ■ SQL statement example
-
SELECT "CHUNK_ID" FROM "MASTER"."STATUS_CHUNKS" TC WHERE "TABLE_SCHEMA"='ADBUSER01' AND "TABLE_NAME"='ITEMLIST' AND "CREATE_TIME"=(SELECT MIN("CREATE_TIME") FROM "MASTER"."STATUS_CHUNKS" WHERE TC."TABLE_SCHEMA"='ADBUSER01' AND TC."TABLE_NAME"='ITEMLIST')
- ■ Retrieval result
-
CHUNK_ID -------------------- 1
- Explanation
-
When the STATUS_CHUNKS table is retrieved, the result shows that the chunk with chunk ID 1 is the chunk with the earliest creation date and time (older than 6 months).
- Note
-
When chunks are merged, the earliest creation date and time among the merge-source chunks is stored in the STATUS_CHUNKS table.
For details about how to retrieve data from the STATUS_CHUNKS table, see C.9 Searching system tables.
(b) Making a backup
Now that the chunk ID of the chunk with the earliest creation date and time in the product table (ITEMLIST) has been identified as 1, back up the data in that chunk in a CSV file.
To make a backup, use the adbexport command. The following shows an execution example of the adbexport command.
- ■ Command execution example
-
adbexport -u ADBUSER01 -p '#HelloHADB_01' -z /home/adbmanager/exp_file/exp_opt_file02.txt -n ITEMLIST -c 1 /home/adbmanager/exp_file/exp_data_path02.txt
- ■ Content of the output data path file (/home/adbmanager/exp_file/exp_data_path02.txt)
-
/home/adbmanager/exp_backup/backup0101.csv /home/adbmanager/exp_backup/backup0102.csv /home/adbmanager/exp_backup/backup0103.csv
- Explanation
-
The data contained in the chunk with chunk ID 1 is backed up in CSV files (/home/adbmanager/exp_backup/backup0101.csv, /home/adbmanager/exp_backup/backup0102.csv, and /home/adbmanager/exp_backup/backup0103.csv).
In this example, since chunks are merged every month, one month's worth of data is backed up.
- Note
-
Although you can back up the files into a single file, distributing the files into multiple files can improve performance by distributing the output-related overhead.
For details about the adbexport command, see adbexport (Export Data) in the manual HADB Command Reference. Also, see 11.4.5 Exporting data in units of chunks.
(7) Deleting old data (chunks)
To prevent the amount of data from becoming unmanageable, delete data that is older than 6 months.
Delete the chunk with the earliest creation date and time from the product table (ITEMLIST). To delete a chunk, execute the PURGE CHUNK statement with a chunk ID specified in the search condition.
- ■ SQL statement example
-
PURGE CHUNK "ITEMLIST" WHERE CHUNKID=1
- Explanation
-
The data contained in the chunk with chunk ID 1 is deleted.
In this example, since chunks are merged every month, one month's worth of data is deleted.
For details about the PURGE CHUNK statement, see PURGE CHUNK (delete all rows in a chunk) in Data Manipulation SQL in the manual HADB SQL Reference. Also, see 11.4.6 Deleting data in units of chunks.
(8) Repeatedly adding new data, merging, changing status, and deleting old data
By repeatedly performing background import to create chunks in wait status, by merging chunks in wait status, changing chunk statuses, and by deleting chunks using the PURGE CHUNK statement, you can continue to analyze only the product data of a specific month without increasing the overall volume of data managed by HADB.