Hitachi

In-Memory Data Grid Hitachi Elastic Application Data Store


4.4.1 Estimating the size and number of cache data files

This subsection explains how to estimate the following values:

To estimate the size and number of cache data files:

  1. Determine the write block size for the medium.

  2. Estimate the size of one record of data that is stored in a cache data file.

  3. Estimate the size of one cache data file and the actual size of the data that can be stored.

  4. Based on step 2, estimate the maximum size of valid data that can be managed in the system.

  5. Determine a compaction interval.

  6. Based on steps 2 and 5, estimate the capacity (for update data) required for storing data that is updated during compaction.

  7. From steps 4 and 6, estimate the capacity required for the cache data file.

  8. From step 7, estimate the size of the cache data file per range that is specified in the eads.cache.disk.filesize cache property parameter and the number of cache data files per range that is specified in the eads.cache.disk.filenum cache property parameter.

  9. Before starting system operation, make sure that there will be no problem including the time required for compaction.

The following subsections explain each step.

Organization of this subsection

(1) Determining the write block size

The size of the data that is written to a cache data file at one time is called the write block size.

Specify the write block size according to the medium used to store cache data files. Specify this value in the eads.cache.disk.blocksize cache property parameter.

(2) Estimating the size of one record

Data is stored in EADS servers in records that include the key, the value, and control information.

The following shows the formula for estimating the size of one record:

Size of one record (kilobytes) =

(size of key stored in EADS servers (bytes)

+ size of value stored in EADS servers (bytes) + 36)# [Figure] 1,024

#

Round up the value in the parentheses to a multiple of the write block size (in kilobytes) determined in 4.4.1(1) Determining the write block size (eads.cache.disk.blocksize parameter value in the cache properties).

Size of key stored in EADS servers:

The following shows the formula for estimating the size of a key stored in EADS servers:

Size of key stored in EADS servers (bytes) =

size of key in characters + 4

Size of value stored in EADS servers:

The formula for estimating the size of a value stored in EADS servers depends on the language used for creating application programs. The formulas are shown in the following.

  • When using C or when byte arrays are used in Java

    Size of value stored in EADS servers (bytes) =

    size of value in bytes specified in the API function + 2

  • When using non-byte arrays in Java

    Size of value stored in EADS servers (bytes) =

    size of value in bytes that has been serialized by the java.io.ObjectOutputStream class + 2

Reference note

The formulas shown above must be used to estimate the sizes of keys and values stored in EADS servers, because EADS servers store keys and values in a unique format for purposes of processing efficiency.

(3) Estimating the size of one cache data file and the amount of data that can be stored in it

The size of one cache data file is specified in the eads.cache.disk.filesize cache property parameter. Because a cache data file contains twice as much management information as the block size, the actual size of the data storage area is estimated as follows:

Data storage area per cache data file (kilobytes) =

(size of one cache data file (megabytes) [Figure] 1,024)

- (write block size (kilobytes) [Figure] 2)

Size of one cache data file:

eads.cache.disk.filesize parameter value in the cache properties

Write block size:

Value determined in 4.4.1(1) Determining the write block size (eads.cache.disk.blocksize parameter value in the cache properties)

You can determine the amount of data that can be stored in a cache data file from the size of the data storage area estimated by the above formula and the record size estimated in 4.4.1(2) Estimating the size of one record.

(4) Estimating the maximum size of valid data

Estimate the maximum size of valid data that is managed in the system.

Valid data means the data that can be acquired by get and excludes any data that has become invalid due to update or deletion processing.

The following shows the formula for estimating the maximum size of valid data:

Maximum size of valid data (kilobytes) =

number of data items stored per range in a disk cache [Figure] size of one record (kilobytes)

Size of one record:

Value estimated in 4.4.1(2) Estimating the size of one record

Reference note

For details about data that becomes invalid during update and deletion processing, see 10.9 Reducing the data usage of cache data files (performing compaction on cache data files).

(5) Determining a compaction interval

If you will be using disk caches and two-way caches, you will need to perform compaction on cache data files.

Determine the compaction interval (hours) according to the amount of data that is stored in EADS servers.

For details about compaction processing, see 10.9 Reducing the data usage of cache data files (performing compaction on cache data files).

(6) Estimating the capacity required for update data

Because data is always appended to cache data files, the amount of data stored in cache data files increases each time data is updated. Therefore, a cache data file must be large enough to accommodate the amount of data that is stored during one round of compaction processing (in hours) at the compaction interval (in hours).

Estimate the amount of data (update data) that is updated during one round of compaction processing. The following shows the formula for estimating the capacity for update data:

Capacity for update data (kilobytes) =

Number of update operations per hour in EADS servers [Figure] size of one record (kilobytes)

[Figure] (compaction interval (in hours) + time required for one round of compaction processing (in hours))

Number of update operations per hour in EADS servers:

Number of times put, create, update, and replace are executed

Size of one record:

Value estimated in 4.4.1(2) Estimating the size of one record

Compaction interval:

Value determined in 4.4.1(5) Determining a compaction interval

Time required for one round of compaction processing:

This value depends on the system. Before you start system operations, check the time required for a round of compaction processing, and specify that value.

(7) Estimating the capacity required for cache data files

The following shows the formula for estimating the capacity required for cache data files:

Capacity required for cache data files (megabytes) =

(maximum size of valid data (kilobytes) [Figure] 2

+ capacity for update data (kilobytes)) [Figure] 1,024

Maximum size of valid data:

Value determined in 4.4.1(4) Estimating the maximum size of valid data

Capacity for update data:

Value determined in 4.4.1(6) Estimating the capacity required for update data

(8) Designing the size and number of cache data files

Based on the capacity required for cache data files that was determined in 4.4.1(7) Estimating the capacity required for cache data files, design the size of cache data files per range and the number of cache data files in such a manner that the following condition is satisfied:

Capacity required for cache data files (megabytes)[Figure]

size of cache data files per range (megabytes) [Figure] number of cache data files#

#

If the number of cache data files is expected to increase in the future, also consider the capacity of those cache data files.

Specify the capacity of cache data files per range in the eads.cache.disk.filesize cache property parameter.

The following shows the formula for estimating the number of cache data files per range:

Number of cache data files per range =

number of cache data files + 2#

#

If you will not be using the total data restriction function, specify 1.

Specify the obtained value in the eads.cache.disk.filenum parameter in the cache properties.

(9) Checking the compaction schedule

Before you start system operations, make sure that there is no problem with the time required for compaction and the capacity that will be allocated.

If the condition shown below is not satisfied, an error might occur due to insufficient capacity because compaction processing cannot keep up with data update operations:

Capacity for update data < capacity allocated by one round of compaction processing

Important note

If the reason for not satisfying the above condition is poor hardware performance, you must tune the number of times compaction is to be performed and the compaction interval and also consider such measures as updating hardware or adding hosts.