12.2.1 If one or more EADS servers are isolated
The following figure shows the general procedure for restoring one or more EADS servers that have been isolated due to failures.
- Important note
-
In the following cases, the EADS servers cannot be restored using the procedures explained here:
-
The cluster is unavailable (NOT_AVAILABLE) or is partially available (PARTIALLY_AVAILABLE).
If the cluster status is AVAILABLE but at least half of the EADS servers in the cluster are isolated, the same measures are needed as when a cluster is unavailable (NOT_AVAILABLE).
-
Online performance has degraded beyond what is allowed.
-
An EADS server to be restored is not defined in the cluster properties.
-
The cluster properties in effect when an EADS server was shut down do not match the cluster properties in effect during restoration.
For details about the restoration procedure when the cluster is unavailable (NOT_AVAILABLE) or is partially available (PARTIALLY_AVAILABLE), see 12.2.2 If the cluster is unavailable (NOT_AVAILABLE) or is partially available (PARTIALLY_AVAILABLE).
-
Each of the system operation administrator's tasks is explained in more detail below.
- Organization of this subsection
(1) Verify which EADS servers are isolated or stopped
Execute the eztool status command to verify which EADS servers are isolated or stopped.
Command execution example
In this example, the isolated EADS server is indicated as isolated in the State column. You must terminate this EADS server.
In this example, there is no stopped EADS server. If there is any stopped EADS server, it is indicated as ----------- in the State column.
(2) Terminate the isolated EADS servers
If an EADS server is isolated, use the eztool isolate --stop command to terminate it. If no EADS servers are isolated, skip this step.
- Important note
-
Execute the eztool isolate --stop command on the isolated EADS server.
(3) Check error messages
Check the error message output to the message log of the EADS server that you terminated in (2) above.
(4) Acquire error information
You need information to investigate the cause of the error. Obtain the following information on all EADS servers:
-
All files under the directory specified in the eads.logger.dir parameter in the server properties
-
All property files under management-directory/conf
-
Thread dumps
You can use the eztool snapshot command to collect logs and property files in a single batch operation.
For details about how to acquire error information, see 12.3 Acquiring error information.
- Determining the time of EADS server isolation
-
See the KDEA04783-I or KDEA04799-E message that has been output to the message log of an EADS server that was isolated. The time of this message is the time the EADS server was isolated.
- Example message (KDEA04783-I)
-
In this example, the EADS server whose EADS server ID is 1 was isolated on 2015-04-03 at 11:59:25.
- Example message (KDEA04799-E)
-
In this example, the EADS server whose EADS server ID is 3 was isolated on 2015-04-21 at 11:55:46.
(5) Restore the stopped EADS servers
After handling the errors, restore the stopped EADS servers by using one of the following commands:
-
ezstart -r command
-
ezserver -r command
During restoration processing, an active EADS server sends data to the EADS servers being restored in order to recover data consistency.
Therefore, note the following:
-
To restore an EADS server, it takes at least the time required for obtaining data.
-
The EADS server that sends data is affected correspondingly by the amount of CPU resources and network bandwidth that are allocated for sending data.
-
If the EADS server cannot keep up with the processing because both data operations and restoration processing must be performed, the EADS server might place data operations on hold to prevent a memory shortage.
- Tip
-
If you are using disk caches or two-way caches, restore the EADS server by using one of the following methods:
Restoration method
Restoration procedure
Processing
Criteria
Using cache files for restoration
Use the ezstart -r or ezserver -r command to restore the EADS servers that have stopped without deleting the EADS servers' cache files.
Imports data from the cache files and corrects the data by comparing with the data for an active EADS server.
If the frequency of data update and deletion processing on the cache is low and the cache files contain a large amount of valid data, the time required for restoration processing might be reduced by using cache files, as compared with when cache files are not used.
Not using cache files for restoration
Delete the cache data files for the corresponding cache by executing the deleteecf -l command with the EADS servers that have stopped specified.
Then execute the ezstart -r or ezserver -r command to restore the EADS servers.
If this restoration processing fails, perform this procedure again.
Acquires all data from an active EADS server.
If the frequency of data update and deletion processing on the cache is high and the cache files contain a large amount of invalid data, the time required for restoration processing might be reduced by not using cache files, as compared with when cache files are used.
If a space shortage occurs in cache data files during data restoration processing, compaction is performed internally. If the space shortage is resolved by this compaction processing, the restoration processing resumes.
If the space shortage cannot be resolved, increase the value of the eads.cache.disk.filenum parameter in the cache properties according to the procedure described in 11.4.1 How to change the properties.
If restoration processing fails for any of the reasons listed below, delete the cache files from the cache that contains the corrupted files, execute the ezstart -r or ezserver -r command, and then restore the EADS servers that have stopped:
-
Cache files have become corrupted.
-
A Java heap overflow occurred.
-
Internal compaction processing failed.
-
(6) Verify that the restarted EADS servers are participating in the cluster
Execute the eztool status command to verify that the restarted EADS servers have been restored in the cluster.
Command execution example
If an EADS server is participating in the cluster, online is displayed in the Cluster column.
If there is any other EADS server that is isolated or stopped, repeat the procedure starting from 12.2.1(1) Verify which EADS servers are isolated or stopped.