Hitachi

uCosminexus Application Server Maintenance and Migration Guide


6.5.2 Troubleshooting when a response is delayed

This subsection describes troubleshooting for a delayed response.

Organization of this subsection

(1) Flow of actions when a response is delayed

The following figure shows the flow of troubleshooting when a response is delayed.

Figure 6‒6: Flow of actions when a response is delayed

[Figure]

The details of the processing shown in the figure are described in subsection (2).

(2) Flow of actions when a response is delayed

The following points describe the operations according to the contents of the delayed response flow:

  1. Check the CPU usage

    Check the CPU usage of the applicable process.

    The following is an example display of CPU usage with the task manager.

    Figure 6‒7: CPU usage

    [Figure]

    Tip
    If 1core is close to 100%

    This includes cases that run into an infinite loop and recursive invocation. A CPU bottleneck is a possible cause. Go to step 2 and proceed with the check.

    If 1core is close to 0%

    A possible cause is a non-responding or deadlocked back-end process, based on the reason that the back-end process does not return a response. Go to step 2 and proceed with the check.

  2. Acquire the PRF trace

    Execute the mngsvrutil command to output the PRF trace.

    Example of execution

    mngsvrutil -m 123.45.67.89 -u admin2 collect allPrfTraces

  3. Open the PRF trace

    Open the PRF trace.

    Output destination

    C:\Program Files\Hitachi\Cosminexus\manager\log\prf

    File name

    The file is output with the following file names for the trace information to be collected.

    Note that the date and time at which the PRF trace was collected is displayed in date-and-time.

    Performance tracer types

    File name

    All the performance tracers running on the hosts in the management domain

    management-domain-name-date-and-time.zip

    All the performance tracers running on a specific host

    host-name-date-and-time.zip

    Specific performance tracer

    logical-server-name-date-and-time.zip

  4. Check the PRF trace

    Check the Time column in PRF trace and find the processing that requires a long period of time.

    The PRF trace is a trace information that outputs events across processes and effective data for performance analysis or error analysis.

    Figure 6‒8: Example of output of PRF trace

    [Figure]

    In the example, there is a gap of 11 minutes after the SQL statement is issued. Furthermore, the execution of the SQL statement has not ended. Therefore, a problem might have occurred in the database while the SQL statement was being executed.

    Note that the PRF trace is easy to check if you use spreadsheet software.

  5. Output the thread dump

    Execute the mngsvrutil command to output the thread dump.

    Example of execution

    mngsvrutil -m 123.45.67.89 -u admin2 dump server

  6. Check the thread dump

    For an infinite loop

    The following figure shows an example of the thread dump output and the check points in the case of an infinite loop.

    Figure 6‒9: Example of thread dump output (infinite loop)

    [Figure]

    Output the thread dump multiple times, observe the time series, and perform a comparative check of the stack trace of the threads with the same tid in each thread dump.

    Point 1

    If the thread attribute is runnable, this thread is executable. This thread is participating in the increased CPU usage (if the attribute is waiting for monitor entry, the thread is not executable and so does not increase the CPU usage).

    Point 2

    All the thread attributes with the same tid are runnable in multiple thread dump files.

    The threads might be running for a long period of time.

    Point 3

    If a specific line in the same method is being executed repeatedly, an infinite loop might be suspected.

    Tip

    If an infinite loop is suspected in the checks until now, request the developer to perform the check.

    If an infinite loop is not suspected, go to step 7.

    For a deadlock

    The following figure shows an example of thread dump output and the check points in the case of a deadlock.

    Figure 6‒10: Example of thread dump output (deadlock)

    [Figure]

    The above figure shows an example of thread dump when a deadlock occurs.

    The thread attributes are output after nid:... in the example of output.

    Find the thread with the attribute waiting for monitor entry.

    Check the contents of "-waiting to lock..." and "-locked...". There is a deadlock if the threads are waiting to acquire a lock for the areas that are mutually locked.

    Point 1

    If the thread attribute is runnable, this thread is executable, and so this thread is irrelevant to a deadlock.

    Point 2

    If the thread attribute is waiting for monitor entry, it indicates that this thread is waiting to acquire a lock.

    This thread might have caused the deadlock.

    Point 3

    If a thread has acquired a lock, and if the thread is waiting for a lock at Point 2, there is a high possibility that the thread is causing the deadlock.

    Compare the addresses of the locked objects to detect the deadlock for a thread applicable to Point 2 and Point 3.

    In the example, Thread-3 has acquired the <02A328C8> lock and is waiting to acquire <02A328C0>.

    On the other hand, Thread-1 has acquired the <02A328C0> lock and is waiting to acquire <02A328C8>. This shows that Thread-3 and Thread-1 are in a deadlock.

    Tip

    If a deadlock is suspected in the checks until now, request the developer to perform the check.

    If a deadlock is not suspected, go to step 7.

  7. Improve the business application. Remove redundant processing

    Based on the results of checks on the PRF trace and the thread dump, check and take action if you suspect delays in the business application.

    Tip

    If the problem is resolved, the troubleshooting process ends at this point.

    If the problem is not resolved and if the CPU usage is high, go to step 8.

    If the problem is not resolved and if the CPU usage is low, request the helpdesk to check, based on the purchase agreement.

  8. Reduce the parameters with concurrently executing threads and control the number of concurrently executing processing

    The pending requests might accumulate, but you must wait for some time for the processing.

  9. Upgrade the machine CPU

    Note the additional middleware license costs when you upgrade the CPU.

  10. Add more machines and distribute the load of the transactions

    Note the additional hardware and software license costs when you add machines.

    Tip

    If the problem is resolved, the troubleshooting process is complete.

    If the problem is not resolved, request the helpdesk to check, based on the purchase agreement.