Hitachi

In-Memory Data Grid Hitachi Elastic Application Data Store


12.2.1 If one or more EADS servers are isolated

The following figure shows the general procedure for restoring one or more EADS servers that have been isolated due to failures.

Figure 12‒1: General procedure for restoring one or more EADS servers that have been isolated due to failures

[Figure]

Important note

In the following cases, the EADS servers cannot be restored using the procedures explained here:

  • The cluster is unavailable (NOT_AVAILABLE) or is partially available (PARTIALLY_AVAILABLE).

    If the cluster status is AVAILABLE but at least half of the EADS servers in the cluster are isolated, the same measures are needed as when a cluster is unavailable (NOT_AVAILABLE).

  • Online performance has degraded beyond what is allowed.

  • An EADS server to be restored is not defined in the cluster properties.

  • The cluster properties in effect when an EADS server was shut down do not match the cluster properties in effect during restoration.

For details about the restoration procedure when the cluster is unavailable (NOT_AVAILABLE) or is partially available (PARTIALLY_AVAILABLE), see 12.2.2 If the cluster is unavailable (NOT_AVAILABLE) or is partially available (PARTIALLY_AVAILABLE).

Each of the system operation administrator's tasks is explained in more detail below.

Organization of this subsection

(1) Verify which EADS servers are isolated or stopped

Execute the eztool status command to verify which EADS servers are isolated or stopped.

Command execution example

[Figure]

In this example, the isolated EADS server is indicated as isolated in the State column. You must terminate this EADS server.

In this example, there is no stopped EADS server. If there is any stopped EADS server, it is indicated as ----------- in the State column.

(2) Terminate the isolated EADS servers

If an EADS server is isolated, use the eztool isolate --stop command to terminate it. If no EADS servers are isolated, skip this step.

Important note

Execute the eztool isolate --stop command on the isolated EADS server.

(3) Check error messages

Check the error message output to the message log of the EADS server that you terminated in (2) above.

(4) Acquire error information

You need information to investigate the cause of the error. Obtain the following information on all EADS servers:

You can use the eztool snapshot command to collect logs and property files in a single batch operation.

For details about how to acquire error information, see 12.3 Acquiring error information.

Determining the time of EADS server isolation

See the KDEA04783-I or KDEA04799-E message that has been output to the message log of an EADS server that was isolated. The time of this message is the time the EADS server was isolated.

Example message (KDEA04783-I)

[Figure]

In this example, the EADS server whose EADS server ID is 1 was isolated on 2015-04-03 at 11:59:25.

Example message (KDEA04799-E)

[Figure]

In this example, the EADS server whose EADS server ID is 3 was isolated on 2015-04-21 at 11:55:46.

(5) Restore the stopped EADS servers

After handling the errors, restore the stopped EADS servers by using one of the following commands:

During restoration processing, an active EADS server sends data to the EADS servers being restored in order to recover data consistency.

Therefore, note the following:

Tip

If you are using disk caches or two-way caches, restore the EADS server by using one of the following methods:

Restoration method

Restoration procedure

Processing

Criteria

Using cache files for restoration

Use the ezstart -r or ezserver -r command to restore the EADS servers that have stopped without deleting the EADS servers' cache files.

Imports data from the cache files and corrects the data by comparing with the data for an active EADS server.

If the frequency of data update and deletion processing on the cache is low and the cache files contain a large amount of valid data, the time required for restoration processing might be reduced by using cache files, as compared with when cache files are not used.

Not using cache files for restoration

Delete the cache data files for the corresponding cache by executing the deleteecf -l command with the EADS servers that have stopped specified.

Then execute the ezstart -r or ezserver -r command to restore the EADS servers.

If this restoration processing fails, perform this procedure again.

Acquires all data from an active EADS server.

If the frequency of data update and deletion processing on the cache is high and the cache files contain a large amount of invalid data, the time required for restoration processing might be reduced by not using cache files, as compared with when cache files are used.

If a space shortage occurs in cache data files during data restoration processing, compaction is performed internally. If the space shortage is resolved by this compaction processing, the restoration processing resumes.

If the space shortage cannot be resolved, increase the value of the eads.cache.disk.filenum parameter in the cache properties according to the procedure described in 11.4.1 How to change the properties.

If restoration processing fails for any of the reasons listed below, delete the cache files from the cache that contains the corrupted files, execute the ezstart -r or ezserver -r command, and then restore the EADS servers that have stopped:

  • Cache files have become corrupted.

  • A Java heap overflow occurred.

  • Internal compaction processing failed.

(6) Verify that the restarted EADS servers are participating in the cluster

Execute the eztool status command to verify that the restarted EADS servers have been restored in the cluster.

Command execution example

[Figure]

If an EADS server is participating in the cluster, online is displayed in the Cluster column.

If there is any other EADS server that is isolated or stopped, repeat the procedure starting from 12.2.1(1) Verify which EADS servers are isolated or stopped.