Hitachi

In-Memory Data Grid Hitachi Elastic Application Data Store


12.2.2 If the cluster is unavailable (NOT_AVAILABLE) or is partially available (PARTIALLY_AVAILABLE)

This subsection explains the restoration procedure when the cluster is unavailable (NOT_AVAILABLE) or is partially available (PARTIALLY_AVAILABLE).

Organization of this subsection

(1) When using only memory caches

The following figure shows the general restoration procedure when memory caches are used and the cluster is unavailable (NOT_AVAILABLE) or is partially available (PARTIALLY_AVAILABLE).

Figure 12‒2: Restoration procedure when the cluster is unavailable (NOT_AVAILABLE) or is partially available (PARTIALLY_AVAILABLE) (using only memory caches)

[Figure]

Each of the system operation administrator's tasks is explained in more detail below.

(a) Verify which EADS servers are isolated or stopped

Verify which EADS servers are isolated or stopped.

For details about this procedure, see 12.2.1(1) Verify which EADS servers are isolated or stopped.

(b) Export data from memory to files

Execute the eztool export -s command to export data from each individual memory. You must execute this command for each EADS server.

eztool export -s
Important note

Note that there might not be consensus on the request received immediately before the range became unavailable. This means that the exported data might not be consistent.

(c) Forcibly terminate the active and isolated EADS servers

Execute the eztool forcestop command to forcibly terminate the active and isolated EADS servers.

If all EADS servers have already stopped, there is no need to execute this command.

Execute the eztool forcestop command on the active and isolated EADS servers.

eztool forcestop

(d) Check error messages

Check the error messages output to the message logs of the EADS servers that you identified in (a) above.

(e) Acquire error information

Acquire error information on all EADS servers.

For details about this procedure, see 12.2.1(4) Acquire error information.

(f) Start all EADS servers in the cluster (import data from files)

After resolving the errors, start all EADS servers in the cluster and re-import to memory the data that was exported to files in subsection (b).

For details about this procedure, see 10.3 Starting the EADS servers (and creating caches by importing data from files).

(2) When using disk caches

The following figure shows the general restoration procedure when disk caches are used and the cluster is unavailable (NOT_AVAILABLE) or is partially available (PARTIALLY_AVAILABLE).

Figure 12‒3: Restoration procedure when the cluster is unavailable (NOT_AVAILABLE) or is partially available (PARTIALLY_AVAILABLE) (using disk caches)

[Figure]

Each of the system operation administrator's tasks is explained in more detail below.

(a) Verify which EADS servers are isolated or stopped

Verify which EADS servers are isolated or stopped.

For details about this procedure, see 12.2.1(1) Verify which EADS servers are isolated or stopped.

(b) Perform compaction on cache data files

Execute the compaction command to perform compaction on the cache data files. You must execute this command for each EADS server.

eztool compaction

(c) Forcibly terminate the active and isolated EADS servers

Execute the eztool forcestop command to forcibly terminate the active and isolated EADS servers.

If all EADS servers have already stopped, there is no need to execute this command.

Execute the eztool forcestop command on the active and isolated EADS servers.

eztool forcestop

(d) Check error messages

Check the error messages output to the message logs of the EADS servers that you identified in (a) above.

(e) Acquire error information

Acquire error information for all EADS servers.

For details about this procedure, see 12.2.1(4) Acquire error information.

(f) Start all EADS servers in the cluster (resume disk caches)

After resolving the errors, start all EADS servers in the cluster and resume disk caches.

For details about this procedure, see 10.3.2 How to start the EADS servers (resuming caches on disk).

Reference note

If caches cannot be resumed because of corrupted cache files, determine the number of EADS servers whose cache files have become corrupted.

  • When the number of EADS servers whose cache files have become corrupted is fewer than the number of redundant copies of data plus the original

    Execute the deleteecf -l command on the EADS servers whose files have become corrupted to delete all cache files from the caches that contain the corrupted files. Then re-execute the eztool resume command.

  • When the number of EADS servers whose cache files are corrupted is equal to or greater than the number of redundant copies of data plus the original

    The EADS servers cannot be restored.