Hitachi

JP1 Version 12 JP1/Integrated Management 2 - Manager Overview and System Design Guide


12.1.4 Considerations for issuing correlation events

The JP1 events managed by JP1/IM - Manager can burgeon to huge volumes according to the size of the system. The idea behind JP1 events is that they manage each and every event occurring in the system; they therefore cover a wide range of event types.

By using the various filters provided by JP1/IM - Manager, you can restrict the types of JP1 events displayed in the event console. However, when an error occurs, the system might issue a large number of JP1 events reporting the problem and filling up the event console. It would take the system administrator a great deal of time and trouble to analyze and investigate these JP1 events, to identify the cause and remedy every problem.

In JP1/IM - Manager, you can associate a number of predictable JP1 events in advance, or optionally change the JP1 event attribute values, and thereby issue a new event (correlation event). A correlation event can be issued when a conditions is satisfied, or when a conditions fails to be satisfied. By utilizing correlation event generation, you can lessen your workload and reduce the time you spend troubleshooting problems.

Note that the processing by which correlation events are issued differs depending on whether you are using the integrated monitoring database, specifically in terms of the range of events that the correlation processing inherits. For details, see 4.3 Issue of correlation events.

Some points you need to consider when using correlation event generation are discussed below under the following headings:

Organization of this subsection

(1) Correlation event generation definition

A correlation event generation definition consists of correlation source events (event conditions), a timeout period, event correlation type, and the correlation event to be issued.

Give proper consideration to the following points when setting a correlation event generation definition:

Six examples are presented below to illustrate the points above. Refer to these examples when you consider how to set a correlation event generation definition:

Each of these six examples states the condition that needs to be satisfied, the reason, and the contents that you need to enter in the correlation event generation definition file.

For details about the correlation event generation definition file, see Correlation event generation definition file in Chapter 2. Definition Files in the manual JP1/Integrated Management 2 - Manager Command, Definition File and API Reference.

(a) Adding an attribute to the JP1 event attribute values

This example shows how to add an attribute value to the fixed attributes of a JP1 event, issued by another JP1 product or other program, to issue a correlation event.

Condition to be satisfied:

Report JP1 event (00004107), which indicates abnormal termination of a JP1/AJS job, as an event of Emergency level.

Set the correlation event for this example as follows:
  • Event ID: A01

  • Event level: Emergency

  • Message: Same message as the correlation source event (00004107).

Reason:

The event levels in this system are defined as in the following table, with Error level currently set for JP1 event (00004107) indicating abnormal termination of a JP1/AJS job.

Table 12‒1: Event level definitions in the system

Event level

System requirements

Emergency

Problem requiring immediate response

Error

Problem requiring response within one working day

Contents of the correlation event generation definition file:

The following figure shows the contents of the correlation event generation definition file.

Figure 12‒3: Contents of the correlation event generation definition file

[Figure]

Note: In this example, a line number is inserted at the beginning of each line to indicate the individual lines you need to write in the definition file.

To use the correlation event generation definition shown above, copy the following coding:

[Emergency_event]

CON=CID:1,B.ID==4107

SUCCESS_EVENT=B.ID:A01,E.SEVERITY:Emergency,B.MESSAGE:$EV1_B.MESSAGE

(b) Changing a JP1 event message to a more manageable message

This example shows how to change the message of a JP1 event, to issue a correlation event containing the new message.

Condition to be satisfied:

Change the message of a JP1 event to a message appropriate to the system requirements, keeping part of the original message in the new message.

Set the correlation event for this example as follows:
  • Event ID: A02

  • Event level: Same level as the correlation source event (00004107)

  • Message: Partly the same as the correlation source event, as shown in the table below.

    Table 12‒2: Message contents

    Event type

    Message contents

    Correlation source event

    KAVS0265-E Job ended abnormally. (name: job-name: execution-ID, status: status, code: code, host: host-name, JOBID: job-number)

    Correlation event

    Job:job-name ended abnormally with RC=code:Contact job supervisor (ext:xxxx)

    Legend:

    (underline): Parts whose value is inherited from the correlation source event.

Reason:

When the information needing to be managed comes at the end of a long JP1 event message, you have to scroll to see everything, which increases your workload.

Contents of the correlation event generation definition file:

The following figure shows the contents of the correlation event generation definition file.

Figure 12‒4: Contents of the correlation event generation definition file

[Figure]

Note: In this example, a line number is inserted at the beginning of each line to indicate the individual lines you need to write in the definition file. Line 3 spans two lines here, but write it as one line in the definition file.

To use the correlation event generation definition shown above, copy the following coding:

[Job_error]

CON=CID:1,B.ID==4107,B.MESSAGE*="KAVS0265-E.*\\((.*):.*\\).*code: (.*), host.*"

SUCCESS_EVENT=B.ID:A02,E.SEVERITY:$EV1_E.SEVERITY,B.MESSAGE:"Job:$EV1_ENV1 ended abnormally with RC=$EV1_ENV2:Contact job supervisor (ext:xxxx)"

(c) Executing an automated action when hosts A, B, and C have all started

This example shows how to associate multiple JP1 events to issue a correlation event. The procedure for defining an automated action is not covered here. For details about defining automated actions, see 6.3 Defining an automated action.

Condition to be satisfied:

Execute an automated action (for system maintenance purposes) when hosts A, B, and C have all started normally.

Assume that the following JP1 event is issued when host A, B, or C starts normally.
  • Event ID: 100

  • Event level: Information

  • Message: host started.

    The variable value (host) in the message is replaced with the host name (A, B, or C).

  • Extended attribute (E.HOST): Replaced with the name of the host that has started (A, B, or C).

Set the correlation event for this example as follows:
  • Event ID: A03

  • Event level: Information

  • Message: All hosts started normally. Host names: A B C

A timeout period of 10 minutes is set for a JP1 event indicating normal startup to be issued from each of the three hosts.

Reason:

The definitions would be complex if you tried to set an automated action for the three JP1 events reporting host startup. Setting an automated action for one correlation event is easier.

Contents of the correlation event generation definition file:

The following figure shows the contents of the correlation event generation definition file.

Figure 12‒5: Contents of the correlation event generation definition file

[Figure]

Note: In this example, a line number is inserted at the beginning of each line to indicate the individual lines you need to write in the definition file. Line 6 spans two lines here, but write it as one line in the definition file.

To use the correlation event generation definition shown above, copy the following coding:

[Start_notification]

CON=CID:10,B.ID==100,B.MESSAGE==A started.

CON=CID:20,B.ID==100,B.MESSAGE==B started.

CON=CID:30,B.ID==100,B.MESSAGE==C started.

TIMEOUT=600

SUCCESS_EVENT=B.ID:A03,E.SEVERITY:Information,B.MESSAGE:"All hosts started normally. Host names:$EV10_E.HOST $EV20_E.HOST $EV30_E.HOST"

(d) Issuing correlation events for a JP1 event issued from specific hosts

This example shows how to issue correlation events when a JP1 event is issued from specific hosts in the system configuration shown below.

Figure 12‒6: Issuing correlation events targeting specific hosts

[Figure]

Condition to be satisfied:

Apply the following requirement (same as in example (a) above) to host1, host2, and host3 only:

Report JP1 event (00004107), which indicates abnormal termination of a JP1/AJS job, as an event of Emergency level.

Set the correlation event for this example as follows:
  • Event ID: A01

  • Event level: Emergency

  • Message: Same message as the correlation source event (00004107).

Reason:

Several hosts executing JP1/AJS jobs are being monitored, but you want to change the event level of a JP1 event issued only from specific hosts that are executing mission-critical jobs.

Contents of the correlation event generation definition file:

The following figure shows the contents of the correlation event generation definition file.

Figure 12‒7: Contents of the correlation event generation definition file

[Figure]

Note: In this example, a line number is inserted at the beginning of each line to indicate the individual lines you need to write in the definition file.

To use the correlation event generation definition shown above, copy the following coding:

[Emergency_event]

TARGET=B.SOURCESERVER==host1;host2;host3

CON=CID:1,B.ID==4107

SUCCESS_EVENT=B.ID:A01,E.SEVERITY:Emergency,B.MESSAGE:$EV1_B.MESSAGE

(e) Managing JP1 events indicating an authentication error by source server

This example shows how to issue a correlation event for each server from which a JP1 event (00003A71) indicating an authentication error was issued multiple times, as shown in the figure below.

Figure 12‒8: Issuing correlation events by grouping JP1 events by source server

[Figure]

00003A71 is the ID of a JP1 event issued by the Windows event log trapping function of JP1/Base. The procedure for setting this function is not covered here. For details, see the description of converting the Windows event log in the chapter on setting the event converters in the JP1/Base User's Guide.

Condition to be satisfied:

Issue a correlation event whenever a JP1 event (00003A71) indicating an authentication error is issued five times from the same server.

Reason:

User authentication is used to restrict connection to specific servers, and a correlation event is issued by associating JP1 events that indicate an authentication error. Authentication is required for a number of hosts, and you want to manage this correlation event for each individual host.

Contents of the correlation event generation definition file:

The following figure shows the contents of the correlation event generation definition file.

Figure 12‒9: Contents of the correlation event generation definition file

[Figure]

Note: In this example, a line number is inserted at the beginning of each line to indicate the individual lines you need to write in the definition file.

To use the correlation event generation definition shown above, copy the following coding:

[Access_error]

CON=CID:1, B.ID==3A71, B.MESSAGE>=User authentication failed.

TYPE=threshold:5

SAME_ATTRIBUTE=B.SOURCESERVER

SUCCESS_EVENT=B.ID:A00,E.SEVERITY:Error,B.MESSAGE:Authentication errors occurred on $EV1_B.SOURCESERVER.

(f) Monitoring for a situation where an event does not occur within a specified time period

This example shows how to issue a correlation event when a particular event has not occurred within a specified time period, as shown by the figure below.

Figure 12‒10: Monitoring when an event has not occurred within a specified time period

[Figure]

Condition to be satisfied:

Suppose that a warning event A is issued, indicating that a server has stopped, followed some time later by an information event B indicating that the server has started. If both A and B are not detected within a specified timeout period, a warning event C is to be issued.

Reason:

You want to monitor for a situation where a particular event has not occurred within a specified period of time, so that you can investigate the cause of the problem.

Contents of the correlation event generation definition file:

The following figure shows the contents of the correlation event generation definition file in this example.

Figure 12‒11: Contents of the correlation event generation definition file

[Figure]

Note: The line number inserted at the beginning of each line indicates the individual lines you need to write in the definition file.

To use the correlation event generation definition shown above, copy the following coding:

[correlation1]

TIMEOUT=180

CON=CID:1,B.ID==A

CON=CID:2,B.ID==B

SAME_ATTRIBUTE=B.SOURCESERVER

FAIL_EVENT=B.ID:C,E.SEVERITY:Warning,B.MESSAGE:Server $EV1_B.SOURCESERVER has not recovered.

TYPE=sequence

(2) Operating environment required for correlation event generation

The following describes the operating environment required for issuing correlation events.

Memory and disk space requirements for correlation event issue

To issue correlation events, the following process of JP1/IM - Manager must be active:

  • When not using the integrated monitoring database:

    Event generation service (evgen)

  • When using the integrated monitoring database:

    Event base service (evflow)

Estimate in advance the extra memory requirements for starting the relevant process.

Correlation event generation history files are added periodically and make demands on disk space. Allocate sufficient resources for the estimated disk space requirements. You can change the number and size of the correlation event generation history files by adjusting a parameter in the correlation event generation environment definition file. For details, see Correlation event generation environment definition file in Chapter 2. Definition Files in the manual JP1/Integrated Management 2 - Manager Command, Definition File and API Reference.

For details on estimating memory and disk space requirements, see the Release Notes for JP1/IM - Manager.

Designing the JP1/IM and JP1/Base filters

Bear in mind the following two points when setting the JP1/IM and JP1/Base filters:

  • Filtering of correlation source events

    The JP1 events that you want to use as correlation source events must be distributed to the event generation service. To this end, set the JP1/Base forwarding filter and the JP1/IM event acquisition filter so that the source events will pass through.

    The JP1/IM severe events filter, event receiver filters, and view filter can be optionally set. Set these filters depending on whether you need to monitor correlation source events.

  • Filtering of correlation events

    Correlation events must be monitored from JP1/IM - View. As a general rule, set the event acquisition filter and other JP1/IM filters so that correlation events will pass through.

    Filtering can be used when you want to issue correlation events for a purpose other than monitoring, such as to trigger an automated action or to effect a status change in a monitoring node. In this case also, make sure that you set the event acquisition filter so as to allow the correlation events to pass through.

For considerations related to setting the JP1/IM filters, see 12.1.3 Considerations for filtering JP1 events. For considerations on setting the JP1/Base forwarding filter, see the description of JP1 event forwarding in the chapter on setting the event service in the JP1/Base User's Guide.

(3) Notes on correlation event generation

Note the following points regarding correlation event generation: