Hitachi

JP1 Version 13 JP1/Integrated Management 3 - Manager Overview and System Design Guide


1.4.3 Error investigation

JP1/IM - Manager simplifies the investigation of problems occurring in the system by integrating the diagnostic processing into a unified flow of operations based on the Central Console or Central Scope.

Organization of this subsection

(1) Investigating issues based on Intelligent Integrated Management Base

When Intelligent Integrated Management Base is used as the base point, the sequence from the problem investigation to the countermeasure is as follows.

(a) Checking the impact on services and operations

When a system failure occurs, the status of the node state changes on the [Dashboard] screen. By checking the relevant system and checking the status of related services and business, you can immediately determine whether the impact is occurring.

Figure 1‒24: Checking the impact on services and operations on the [Dashboard] window

[Figure]

If an error has occurred due to an impact on the operation, it is checked whether there is an effect on the related operation.

Figure 1‒25: Checking the impact on operations in integrated operation viewer

[Figure]

(b) To check the status of problems

You can check the status of the failed system in integrated operation viewer event list or in the [Dashboard] tabbed page. In the event list, the event guide information, such as the remedy and response procedure registered in advance, can be used as the event details to enable a smoother and faster initial response to a problem.

Figure 1‒26: Checking Status in integrated operation viewer

[Figure]

(c) Investigating the cause and corrective action

After checking the details of the event and the performance status of the system, investigate the cause of each event. Investigate the performance status of the system associated with the failed system and use it to infer the cause.

Figure 1‒27: Inferring the cause from the relevant system situation

[Figure]

If the cause of the problem has been investigated and corrective actions have been established, the operation rules corresponding to the system status can be registered in advance, and the corrective actions can be proposed and the suggested actions can be executed by simply selecting them. This allows reliable response action to be taken without any dependency on the operator.

In addition, you can launch the relevant management application directly from the displayed JP1 event. In a more intuitive manner, you can move from the monitoring screen to the survey screen to start the investigation. In addition, you can run Windows and Linux commands directly from integrated operation viewer on agent (JP1/IM-Agent). You can execute commands without having to connect to agent with a telnet or other means, making it easy to perform simple checks.

Figure 1‒28: Various operations from integrated operation viewer

[Figure]

(2) Error investigation with the Central Console

The following describes the diagnostic and troubleshooting processing when using the Central Console.

(a) Event details

First of all, check the details of the detected error event. If you register action methods and procedures in advance, the initial response will be smoother and faster.

Figure 1‒29: Troubleshooting advice (event guide information) provided in the Guide area

[Figure]

(b) Event search

For some problems, you might want to investigate not only the error-notification event but also related events leading up to the event in question, to see what was happening generally at the time the error occurred. In such cases, you can perform an event search.

(c) Event investigation

After verifying the general circumstances by checking the event details and conducting an event search, investigate each event.

From a displayed JP1 event, you can launch the appropriate management application and move by intuitive operation from the monitoring window to the investigation window to begin your diagnosis. You can also execute Windows and UNIX commands on an agent host directly from the Central Console. This makes it easy to perform simple checks or tests because you can execute commands without having to connect to the agent host by telnet or other means.

Figure 1‒30: Operations performed from JP1/IM - Manager

[Figure]

(3) Error investigation with the Central Scope

When investigating an error using the Central Scope, first identify the error source, and then link to the Central Console to investigate further.

The following describes the diagnostic and troubleshooting processing when using the Central Scope.

(a) Identifying the source and extent of an error

When an error occurs in the system, the icons representing the affected nodes change to error status in the Monitoring Tree window and Visual Monitoring window. From the upper level of the tree, check the monitoring nodes indicating error status, and identify which resources are likely to be affected by the error.

Figure 1‒31: Checking the affected resources

[Figure]

Guide information is a useful means of checking where a problem occurred. The guide function allows you to register operating know-how including troubleshooting procedures for specific problems, and examples of past situations in which certain errors have occurred. Although responding appropriately to whatever problems occur in a diverse range of resources is never easy, the guidance offered by the guide function goes some way toward reducing the system administrator's workload.

Important

Guide information must be registered before it can be viewed. For details about the guide function, see 5.8 Guide function in this manual and 6.6 Editing guide information in the JP1/Integrated Management 3 - Manager Configuration Guide.

Figure 1‒32: Troubleshooting advice provided by the guide function

[Figure]

(b) Identifying events that caused the error

After you have identified the node that is in error status, you can discover what event caused the problem.

Select the node that is in error status, and then click the Search Status-Change Events command. The Event Console window opens with the Search Events page displayed. This page lists the JP1 events that caused the node to change to error status.

Figure 1‒33: Identifying events that caused the error

[Figure]

(c) Investigating the error

After you have identified the node in which an error occurred, you can discover what event caused the problem. To locate the event, use the Central Console. By linking to the Central Console, you can investigate the nature of the error that triggered the JP1 event.