11.4.25 Operation taking archive of chunks into consideration
This subsection describes, by using the following example, how to suppress increase of database size by archiving old data that is less frequently retrieved. The prerequisites for this operation are as follows:
-
The sensor data that is generated from nationwide communication control devices (information about communication traffic volume and fault sign detection) is collected in the main office. Then, the sensor data is retrieved and analyzed to determine whether communication control devices need to be replaced or increased.
-
The sensor data collected by the main office is managed centrally using the communication control device log table.
-
The sensor data is imported to the database hourly.
-
A group of sensor data (for a day), each of which is imported hourly, is merged into a single chunk (chunk for a day) when the day is changed. Also, a group of sensor data (for a month) is merged into a single chunk (chunk for a month) when the month is changed. Eventually, chunks are managed in units of months.
-
Sensor data is retrieved even while sensor data is being imported into the database. The sensor data that can be retrieved is the data for the last 24 hours (daily unit), for the last seven days (weekly unit), and for the last one month (monthly unit). Based on the retrieval result, whether to replace or increase communication control devices is determined.
-
As a result of retrieval of sensor data, if the communication traffic volume exceeds the threshold value, data for the last five years is analyzed by using the applicable day and the previous and next days as keys. Communication traffic volume might exceed the threshold value approximately once a month.
-
Because the sensor data imported to the database over a year ago is seldom retrieved, it is archived in unit chunks. To reduce the database size, sensor data is archived when the month is changed.
-
The database retains five years' worth of data. The data that was imported more than 5 years ago is deleted in unit chunks when the month is changed.
The following figure shows the flow of operations in this system.
- Organization of this subsection
(1) Creating an archive directory
In this operation example, to reduce the data size, data older than one year is archived on a chunk basis. To archive chunks, an archive directory is required for storing archive files. Before starting operations, create an archive directory.
Create an archive directory by using the following procedure.
Procedure:
-
Log on to the OS as a superuser.
Log on to the server machine's OS as a superuser.
-
Configure a file system for the archive directory.
Execute an OS command to configure a file system for the archive directory.
The following example configures /dev/vg_hadb/hadb_archivedir, which has been created as an LV for the archive directory, in the file system ext4.
▪ Command execution example
mkfs -t ext4 /dev/vg_hadb/hadb_archivedir
-
Create the mount point for the archive directory.
Execute an OS command to create the mount point for the archive directory. The following shows an execution example.
▪ Command execution example
mkdir -p /HADB/archive
-
Activate the VG that contains the file system for the archive directory.
Execute an OS command to activate the VG that contains the file system for the archive directory. The following shows an execution example.
▪ Command execution example
vgchange -a y /dev/vg_hadb
-
Mount the file system for the archive directory.
Execute an OS command to mount the file system for the archive directory. The following shows an execution example.
▪ Command execution example
mount /dev/vg_hadb/hadb_archivedir /HADB/archive -t ext4 -o defaults,noatime,_netdev4
-
Change the owner of the archive directory.
Execute an OS command to change the owner of the archive directory. Specify the HADB administrator and HADB administrators group that use the archive directory.
The following shows an execution example in which the user name of the HADB administrator is adbmanager and the HADB administrators group is adbgroup.
▪ Command execution example
chown adbmanager.adbgroup /HADB/archive
(2) Defining an archivable multi-chunk table
After creating an archive directory, define an archivable multi-chunk table and indexes.
In this operation example, the communication control device log table (INSTRUMENT_LOG) and indexes are defined to manage sensor data centrally.
(a) Defining the communication control device log table (INSTRUMENT_LOG)
Define the communication control device log table (INSTRUMENT_LOG) to be stored in the DB area TBLDAT_003. Also, define the communication control device log table as an archivable multi-chunk table to perform background import and archive of chunks.
In this operation example, the following number of chunks are required:
-
Chunks that manage the sensor data hourly added by background import (24 chunks)
The number of chunks for a day is 24 because data is added hourly by background import.
-
Chunks that manage the sensor data for individual days (32 chunks)
The total number of chunks is 32 because one chunk is needed as the merge-target chunk, in addition to the chunks for a month (for 31 days).
-
Chunks that manage the sensor data for individual months (61 chunks)
The total number of chunks is 61 because one chunk is needed as the merge-target chunk, in addition to the chunks for five years (60 months).
Assuming a safety factor of 1.2, the maximum number of chunks that can be created is as follows:
- Note
-
If the number of chunks increases, this negatively impacts retrieval performance. Therefore, we recommend that you design and operate your database system to minimize the number of chunks that will be used.
Based on this estimate, specify 141 as the maximum number of chunks. Specify this value for CHUNK of the chunk specification in the CREATE TABLE statement.
For the collection date and time column (C_TIME) of the communication control device log table (INSTRUMENT_LOG), you can specify the range for data analysis. Also, data for each chunk does not overlap for this column. Therefore, use this column as the archive range column. Store the range index automatically defined for the archive range column, in the DB area IDXDAT_003 in which other indexes defined for the communication control device log table (INSTRUMENT_LOG) are also stored.
Specify the following for the chunk-archive specification in the CREATE TABLE statement.
-
Archive range column name (C_TIME)
-
Name of the DB area that stores the range index automatically defined (IDXDAT_003)
-
Created archive directory (/HADB/archive)
The following shows an example of the CREATE TABLE statement.
- ▪ Example of the CREATE TABLE statement
-
CREATE TABLE "INSTRUMENT_LOG" ("SID" INTEGER, "C_TIME" TIMESTAMP, "TRAFFIC" INTEGER, "THROUGH" INTEGER, "RETRY" INTEGER, ...) IN "TBLDAT_003" CHUNK= 141 ARCHIVABLE RANGECOLUMN="C_TIME" IN "IDXDAT_003" ARCHIVEDIR='/HADB/archive'
For details about the CREATE TABLE statement, see CREATE TABLE (define a table) in Definition SQL in the manual HADB SQL Reference.
(b) Defining indexes for the communication control device log table (INSTRUMENT_LOG)
Define B-tree indexes for the communication control device log table. Store the B-tree indexes in the DB area IDXDAT_003.
Here, define a B-tree index for the device ID column (SID). The following shows an example of the CREATE INDEX statement.
- ▪ Example of the CREATE INDEX statement
-
CREATE INDEX "SID_IDX" ON "INSTRUMENT_LOG"("SID") IN "IDXDAT_003" EMPTY
For details about the CREATE INDEX statement, see CREATE INDEX (define an index) in Definition SQL Statements in the manual HADB SQL Reference.
(c) Table used in this operation example
The following figure shows a portion of the table used in this operation example.
(3) Adding new sensor data (background import)
Sensor data is generated in real time from communication control devices. The generated sensor data is added hourly to the communication control device log table (INSTRUMENT_LOG).
Sensor data is added by using background import. The group of data that is stored in the archivable multi-chunk table in a single background import operation is managed together with its indexes as a group (chunk). For details about a chunk, see 2.14.2 Managing data in data-import units (chunks).
- Note
-
By managing data in units of chunks, you can acquire backups and delete data on a chunk-by-chunk basis. When you perform background import or merging chunks, you can add a comment to a chunk. Adding a comment to a chunk makes it easier to identify a certain chunk you want to operate on.
Execute the adbimport command with the -b option specified to perform background import. The following shows an example of executing the adbimport command.
- ▪ Command execution example
-
adbimport -u ADBUSER01 -p '#HelloHADB_01' -k "'" -s , -g 10 -w /home/adbmanager/tmp -z /home/adbmanager/imp_file/imp_opt_file03.txt -b -m '2015/03/31' INSTRUMENT_LOG /home/adbmanager/imp_file/imp_data_path03.txt
- Explanation:
-
Use background import to import the contents of the input data file, set up in the input data path file (/home/adbmanager/imp_file/imp_data_path03.txt), into the communication control device log table (INSTRUMENT_LOG).
Specify the comment to be added to a chunk for the -m option, to add the sensor data generated on March 31, 2015, by using background import.
Perform background import hourly. As a result, 24 chunks are created for one day (for 24 hours) .
For details about the adbimport command, see adbimport (Import Data) in the manual HADB Command Reference. Also, see 11.4.2 Storing data in a multi-chunk table (background import).
(4) Retrieving data
You can retrieve data in an archivable multi-chunk table even while background import is performed.
By using background import to add sensor data, you can import data even if another HADB user is retrieving data from the same database (data can be imported during data retrieval).
The following figure shows the relationship between background import and data retrieval.
(5) Merging chunks (to reduce the number of chunks)
If the number of chunks increases, retrieval performance might decline. Therefore, you need to regularly merge chunks. In this operation example, sensor data is hourly imported by background import, so 24 chunks are created per day.
In this operation example, multiple chunks are merged into a single chunk at the following timing. Eventually, chunks are managed in units of months.
-
Chunks of 24 per day (24-hour period) are merged into a single chunk.
-
Chunks of 31 per month (31-day period) are merged into a single chunk.
The following shows an example of merging data for a day (for March 31, 2015) into a single chunk.
(a) Identifying the chunks to be merged
To identify the chunk IDs of the chunks to be merged, retrieve the system table STATUS_CHUNKS as follows. The following shows an SQL statement example.
- ▪ SQL statement example
-
SELECT "CHUNK_ID" FROM "MASTER"."STATUS_CHUNKS" TC WHERE "TABLE_SCHEMA"='ADBUSER01' AND "TABLE_NAME"='INSTRUMENT_LOG' AND "CHUNK_COMMENT"='2015/03/31'
- ▪ Retrieval result
-
CHUNK_ID -------------------- 92 (Omission) 115
- Explanation:
-
In this example, the STATUS_CHUNKS table is retrieved by using the comment added to the chunk during background import ('2015/03/31') as the key. Then, the chunk ID of the chunk to be merged (92 to 115) is identified.
Check the chunks to be merged and their chunk IDs based on the explanation in 11.4.8 Checking the chunk status and the number of chunks created.
(b) Merging chunks
Execute the adbmergechunk command to merge chunks. The following shows an example of executing the adbmergechunk command.
- ▪ Command execution example
-
adbmergechunk -u ADBUSER01 -p '#HelloHADB_01' -g 2 -w /home/adbmanager/tmp -z /home/adbmanager/merge_file/merge_opt_file03.txt -m '2015/03/31' -c 92-115 INSTRUMENT_LOG
- Explanation:
-
For the communication control device log table (INSTRUMENT_LOG), chunks specified for the -c option are merged by using the value specified for the merge chunk option file (/home/adbmanager/merge_file/merge_opt_file03.txt) as the input information.
In this example, the chunks whose chunk IDs are from 92 to 115 are merged into a single chunk in wait status as the sensor data for a day (for March 31, 2015).
The following figure shows an example of merging chunks.
The chunks whose chunk IDs are from 92 to 115 are automatically deleted when the adbmergechunk command terminates normally.
- Note
-
By merging chunks, you can prevent retrieval performance from declining. There is a limit on the number of chunks that can be managed by a table or a DB area. If the number of chunks being used reaches the upper limit of the number of chunks that can be managed by a table or a DB area, merge the multiple chunks into a single chunk. By merging chunks, you can reduce the number of chunks being used.
For details about the adbmergechunk command, see adbmergechunk (Merge Chunks) in the manual HADB Command Reference. Also, see 11.4.9 Merging chunks (to reduce the number of chunks).
(6) Archiving chunks (compressing data in unit chunks)
In this operation example, data imported to the database more than a year ago is seldom retrieved, so it is archived (compressed) in unit chunks. By archiving chunks, you can reduce the database size.
- Important
-
Retrieving data in an archived chunk requires longer time than retrieving data in an unarchived chunk due to the following reasons:
-
Expanding data takes time.
-
Data retrieval by using indexes is not available.
As is performed in this operation example, archive only the data that is seldom retrieved but needs to be retained for a long time. For details, see 2.15.2 When the chunk archiving function is to be used.
-
(a) Executing archive of chunks
Execute the adbarchivechunk command to archive chunks. In this example, the chunks that manage the sensor data for March in 2014 are archived. The following shows an example of executing the adbarchivechunk command.
- ▪ Command execution example
-
adbarchivechunk -u ADBUSER01 -p '#HelloHADB_01' -g 2 -z /home/adbmanager/arc_file/archive_chunk03.def -r 2014/03/01-2014/03/31 INSTRUMENT_LOG
- Explanation:
-
Specify for the -r option of the adbarchivechunk command the range of dates for the chunks to be archived (2014/03/01-2014/03/31). The chunks for which the value range in the archive range column is included in the date range specified for the -r option are archived. In this operation example, the collection date and time column (C_TIME) of the communication control device log table (INSTRUMENT_LOG) is specified as the archive range column. Therefore, based on the value in the collection date and time column (C_TIME), the chunks for March, 2014 are archived.
(b) Checking the execution result of archiving
Execute the adbdbstatus command to check the status of archived chunks. By executing the adbdbstatus command to output the archived chunk summary information, you can check the status of archived chunks. The following shows an example of executing the adbdbstatus command.
- ▪ Command execution example
-
adbdbstatus -c archivechunk -n ADBUSER01.INSTRUMENT_LOG -r 2014/03/01-2014/03/31
- Explanation:
-
In this example, the summary information of archived chunks (for March, 2014) is output. From the following output items, you can check the number of rows stored in an archived chunk, the data size of an archived chunk, and the compression ratio of an archived chunk.
-
Rows (Number of rows stored in an archived chunk)
-
Archive_file_size (Total size of all archive files corresponding to an archived chunk)
-
Unarchive_table_size (Size of segments that were used to store a table in the chunk before being archived)
-
Unarchive_index_size (Size of segments that were used to store an index in the chunk before being archived)
-
Compression_ratio (Compression ratio of an archived chunk)
-
(7) Deleting old data (chunks)
To suppress increase of data size, delete data (archived chunks) that was added more than five years ago.
From the communication control device log table (INSTRUMENT_LOG), delete old chunks whose data was added more than five years ago. Check the chunk IDs of the chunks to be deleted, and then execute the PURGE CHUNK statement to delete the chunks.
Here, the chunks for April, 2010 will be deleted.
(a) Checking the chunk ID
Execute the adbdbstatus command to check the chunk IDs of the chunks to be deleted. By executing the adbdbstatus command to output the archived chunk summary information, you can check the chunk IDs of the chunks to be deleted. The following shows an example of executing the adbdbstatus command.
- ▪ Command execution example
-
adbdbstatus -c archivechunk -n ADBUSER01.INSTRUMENT_LOG -r 2010/04/01-2010/04/30
- Explanation:
-
In this example, the summary information of archived chunks (for April, 2010) is output. From the following output items, you can check the chunk IDs of the chunks to be deleted and the free space that is secured in the archive directory after the chunks are deleted.
-
Chunk_ID (Chunk ID)
-
Archive_file_size (Total size of all archive files corresponding to an archived chunk)
-
(b) Deleting chunks
Delete chunks based on the chunk IDs that you checked by using the adbdbstatus command. To delete chunks, specify the chunk IDs for the search condition of the PURGE CHUNK statement. The following shows an example of the PURGE CHUNK statement.
- ▪ SQL statement example
-
PURGE CHUNK "INSTRUMENT_LOG" WHERE CHUNKID=4
- Explanation:
-
This example deletes the data contained in the chunk whose chunk ID is 4.
In this example, because chunks are merged in units of month, data for one month is deleted.
For details about the PURGE CHUNK statement, see PURGE CHUNK (delete all rows in a chunk) in Data Manipulation SQL in the manual HADB Command Reference. Also, see 11.4.6 Deleting data in units of chunks.
(8) Repeating addition, merge, and archive of new data and deletion of old data
By repeating the following operations, you can suppress the size of data to be retained for a long time, and continue analyzing sensor data:
-
Executing background import
-
Merging chunks
-
Archiving chunks
-
Deleting chunks by using the PURGE CHUNK statement