7.7.20 Analyzing failure in entire system
This section describes failure investigation procedures that can be used in cases such as when there is no response to a request to invoke a service component or execute a command from the service requester.
- Organization of this subsection
(1) Process hang up
When there is no response, the service component invocation requests from service requester or to the command requests, the process in HCSC server (J2EE server) may be hung up. Following points describe the flow of analysis when process hangs up:
(a) For service component invocation requests from service requester
The analysis procedure when process hangs up for service component invocation request from service requester is as follows:
-
Analyze till what point, the service component invocation requesting process is complete and where exactly has the process hung up. Tracking execution log of the service component invocation process is different for each protocol.
For details on troubleshooting during the execution of requests in each protocol, see "7.7.1 Troubleshooting when executing Web service (SOAP communication)" , "7.7.2 Troubleshooting when executing SessionBean", "7.7.3 Troubleshooting when executing MDB (WS-R)" and "7.7.4 Troubleshooting when executing MDB (DB queue)".
-
With reference to the results after analyzing till what point, the service component invocation request is processed, analyze the CPU usage for the corresponding processes in service requester machine, HCSC server machine, service component machine and database machine.
When CPU usage is near 100%, there is a possibility that the process has fallen into an infinite loop or recurring invocation.
When CPU usage is near 0%, there may be a deadlock or cases wherein there is no response from back process.
-
Check the "J2EE server, redirector, server management command log" of Service Platform, for any invalid error output.
-
Collect performance analysis trace and analyze the same. You can identify the location of hang up by identifying the location where the process takes longer time.
-
Acquire a thread dump multiple times, observing in chronological order, perform a comparative analysis of the stack traces of same thread as tid in the corresponding thread dumps.
(b) For command requests
When the process for command requests hangs up, analyze with the following steps:
-
Investigate the CPU usage of respective processes on machines on which operating environment, HCSC server and database are running for command requests.
When the CPU usage is about 100%, the process might go in an infinite loop or restart invocation.
When the CPU usage status is about 0%, there are cases such as no response from back process or deadlock.
-
Confirm the message log of Service Platform or "logs of J2EE server, redirector, and server management commands" or "logs of Administration Agent, Management Agent or Management Server" of Cosminexus and investigate that no invalid errors are output.
-
Acquire the thread dump for multiple times and observe it as per time-series and compare the stack trace of similar threads of tid in respective thread dump.
(2) Process slow down
If there service component invocation request from service requester slows down or if the response time is slow, use performance analysis trace to identify the location where the execution time is long.
You can identify the exact location where slow down occurs by identifying the place where process takes more time, by collecting performance analysis trace and analyzing the same.
(3) Timeout
Following points describe the flow when a timeout occurs, in service component invocation request from service requester and in response to command requests.
(a) In case of a service component invocation request from service requester
The investigation procedure for timeout in case of a service component invocation request from service requester is as follows:
-
Investigate the error messages output when timeout occurs.
-
Check the error information output by J2EE server of Service Platform for any other error messages that are output in addition to the timeout error messages.
-
Check till what point is the service invocation request process is complete and, where exactly has the process hung up. The trace of execution logs of service component invocation process differs for each protocol.
For troubleshooting during request execution in each protocol, see "7.7.1 Troubleshooting when executing Web service (SOAP communication)", "7.7.2 Troubleshooting when executing SessionBean", "7.7.3 Troubleshooting when executing MDB (WS-R)" and "7.7.4 Troubleshooting when executing MDB (DB queue)".
-
Investigate the request trace or performance analysis trace.
Reference the results of investigations for process completion of service component invocation request and specify the places that take longer time for executing processes and thereby find out the reason for timeout.
(b) For command requests
This point describes the procedure when timeout occurs in command request.
-
Investigate the error messages output when timeout occurs.
-
Check for any other error messages that have been output.
Find error messages in the following service platform logs:
-
Logs for the J2EE server, redirectors, and server management commands
-
Logs for Administration Agent, Management Agent, and Management Server
-