14.1.4 Considerations for issuing correlation events
The JP1 events managed by JP1/IM - Manager can burgeon to huge volumes according to the size of the system. The idea behind JP1 events is that they manage each and every event occurring in the system; they therefore cover a wide range of event types.
By using the various filters provided by JP1/IM - Manager, you can restrict the types of JP1 events displayed in the event console. However, when an error occurs, the system might issue a large number of JP1 events reporting the problem and filling up the event console. It would take the system administrator a great deal of time and trouble to analyze and investigate these JP1 events, to identify the cause and remedy every problem.
In JP1/IM - Manager, you can associate a number of predictable JP1 events in advance, or optionally change the JP1 event attribute values, and thereby issue a new event (correlation event). A correlation event can be issued when a conditions is satisfied, or when a conditions fails to be satisfied. By utilizing correlation event generation, you can lessen your workload and reduce the time you spend troubleshooting problems.
Note that the processing by which correlation events are issued differs depending on whether you are using the integrated monitoring database, specifically in terms of the range of events that the correlation processing inherits. For details, see 4.3 Issue of correlation events.
Some points you need to consider when using correlation event generation are discussed below under the following headings:
-
Correlation event generation definition
-
Operating environment required for correlation event generation
-
Notes on correlation event generation
- Organization of this subsection
(1) Correlation event generation definition
A correlation event generation definition consists of correlation source events (event conditions), a timeout period, event correlation type, and the correlation event to be issued.
Give proper consideration to the following points when setting a correlation event generation definition:
-
Filtering condition for the correlation target range
Are the JP1 events that match the event conditions issued from specific hosts only?
-
Correlation source events (event conditions)
-
Which JP1 events will be correlation source events?
-
Will you need one correlation source event or more than one?
-
-
Timeout period
-
Event correlation type (sequence, combination, or threshold)
-
Duplicate attribute value condition
Will you need to manage correlation events by grouping hosts or users?
-
Maximum correlation number
-
Correlation event to be issued
Six examples are presented below to illustrate the points above. Refer to these examples when you consider how to set a correlation event generation definition:
-
Adding an attribute to the JP1 event attribute values
-
Changing a JP1 event message to a more manageable message
-
Executing an automated action when hosts A, B, and C have all started
-
Issuing correlation events for a JP1 event issued from specific hosts
-
Managing JP1 events indicating an authentication error by source server
-
Monitoring for a situation where an event does not occur within a specified time period
Each of these six examples states the condition that needs to be satisfied, the reason, and the contents that you need to enter in the correlation event generation definition file.
For details about the correlation event generation definition file, see Correlation event generation definition file in Chapter 2. Definition Files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
(a) Adding an attribute to the JP1 event attribute values
This example shows how to add an attribute value to the fixed attributes of a JP1 event, issued by another JP1 product or other program, to issue a correlation event.
- Condition to be satisfied:
-
Report JP1 event (00004107), which indicates abnormal termination of a JP1/AJS job, as an event of Emergency level.
- Set the correlation event for this example as follows:
-
-
Event ID: A01
-
Event level: Emergency
-
Message: Same message as the correlation source event (00004107).
-
- Reason:
-
The event levels in this system are defined as in the following table, with Error level currently set for JP1 event (00004107) indicating abnormal termination of a JP1/AJS job.
Table 14‒1: Event level definitions in the system Event level
System requirements
Emergency
Problem requiring immediate response
Error
Problem requiring response within one working day
- Contents of the correlation event generation definition file:
-
The following figure shows the contents of the correlation event generation definition file.
Figure 14‒3: Contents of the correlation event generation definition file Note: In this example, a line number is inserted at the beginning of each line to indicate the individual lines you need to write in the definition file.
To use the correlation event generation definition shown above, copy the following coding:
[Emergency_event]
CON=CID:1,B.ID==4107
SUCCESS_EVENT=B.ID:A01,E.SEVERITY:Emergency,B.MESSAGE:$EV1_B.MESSAGE
(b) Changing a JP1 event message to a more manageable message
This example shows how to change the message of a JP1 event, to issue a correlation event containing the new message.
- Condition to be satisfied:
-
Change the message of a JP1 event to a message appropriate to the system requirements, keeping part of the original message in the new message.
- Set the correlation event for this example as follows:
-
-
Event ID: A02
-
Event level: Same level as the correlation source event (00004107)
-
Message: Partly the same as the correlation source event, as shown in the table below.
Table 14‒2: Message contents Event type
Message contents
Correlation source event
KAVS0265-E Job ended abnormally. (name: job-name: execution-ID, status: status, code: code, host: host-name, JOBID: job-number)
Correlation event
Job:job-name ended abnormally with RC=code:Contact job supervisor (ext:xxxx)
- Legend:
-
(underline): Parts whose value is inherited from the correlation source event.
-
- Reason:
-
When the information needing to be managed comes at the end of a long JP1 event message, you have to scroll to see everything, which increases your workload.
- Contents of the correlation event generation definition file:
-
The following figure shows the contents of the correlation event generation definition file.
Figure 14‒4: Contents of the correlation event generation definition file Note: In this example, a line number is inserted at the beginning of each line to indicate the individual lines you need to write in the definition file. Line 3 spans two lines here, but write it as one line in the definition file.
To use the correlation event generation definition shown above, copy the following coding:
[Job_error]
CON=CID:1,B.ID==4107,B.MESSAGE*="KAVS0265-E.*\\((.*):.*\\).*code: (.*), host.*"
SUCCESS_EVENT=B.ID:A02,E.SEVERITY:$EV1_E.SEVERITY,B.MESSAGE:"Job:$EV1_ENV1 ended abnormally with RC=$EV1_ENV2:Contact job supervisor (ext:xxxx)"
(c) Executing an automated action when hosts A, B, and C have all started
This example shows how to associate multiple JP1 events to issue a correlation event. The procedure for defining an automated action is not covered here. For details about defining automated actions, see 6.3 Defining an automated action.
- Condition to be satisfied:
-
Execute an automated action (for system maintenance purposes) when hosts A, B, and C have all started normally.
- Assume that the following JP1 event is issued when host A, B, or C starts normally.
-
-
Event ID: 100
-
Event level: Information
-
Message: host started.
The variable value (host) in the message is replaced with the host name (A, B, or C).
-
Extended attribute (E.HOST): Replaced with the name of the host that has started (A, B, or C).
-
- Set the correlation event for this example as follows:
-
-
Event ID: A03
-
Event level: Information
-
Message: All hosts started normally. Host names: A B C
A timeout period of 10 minutes is set for a JP1 event indicating normal startup to be issued from each of the three hosts.
-
- Reason:
-
The definitions would be complex if you tried to set an automated action for the three JP1 events reporting host startup. Setting an automated action for one correlation event is easier.
- Contents of the correlation event generation definition file:
-
The following figure shows the contents of the correlation event generation definition file.
Figure 14‒5: Contents of the correlation event generation definition file Note: In this example, a line number is inserted at the beginning of each line to indicate the individual lines you need to write in the definition file. Line 6 spans two lines here, but write it as one line in the definition file.
To use the correlation event generation definition shown above, copy the following coding:
[Start_notification]
CON=CID:10,B.ID==100,B.MESSAGE==A started.
CON=CID:20,B.ID==100,B.MESSAGE==B started.
CON=CID:30,B.ID==100,B.MESSAGE==C started.
TIMEOUT=600
SUCCESS_EVENT=B.ID:A03,E.SEVERITY:Information,B.MESSAGE:"All hosts started normally. Host names:$EV10_E.HOST $EV20_E.HOST $EV30_E.HOST"
(d) Issuing correlation events for a JP1 event issued from specific hosts
This example shows how to issue correlation events when a JP1 event is issued from specific hosts in the system configuration shown below.
|
- Condition to be satisfied:
-
Apply the following requirement (same as in example (a) above) to host1, host2, and host3 only:
Report JP1 event (00004107), which indicates abnormal termination of a JP1/AJS job, as an event of Emergency level.
- Set the correlation event for this example as follows:
-
-
Event ID: A01
-
Event level: Emergency
-
Message: Same message as the correlation source event (00004107).
-
- Reason:
-
Several hosts executing JP1/AJS jobs are being monitored, but you want to change the event level of a JP1 event issued only from specific hosts that are executing mission-critical jobs.
- Contents of the correlation event generation definition file:
-
The following figure shows the contents of the correlation event generation definition file.
Figure 14‒7: Contents of the correlation event generation definition file Note: In this example, a line number is inserted at the beginning of each line to indicate the individual lines you need to write in the definition file.
To use the correlation event generation definition shown above, copy the following coding:
[Emergency_event]
TARGET=B.SOURCESERVER==host1;host2;host3
CON=CID:1,B.ID==4107
SUCCESS_EVENT=B.ID:A01,E.SEVERITY:Emergency,B.MESSAGE:$EV1_B.MESSAGE
(e) Managing JP1 events indicating an authentication error by source server
This example shows how to issue a correlation event for each server from which a JP1 event (00003A71) indicating an authentication error was issued multiple times, as shown in the figure below.
|
00003A71 is the ID of a JP1 event issued by the Windows event log trapping function of JP1/Base. The procedure for setting this function is not covered here. For details, see the description of converting the Windows event log in the chapter on setting the event converters in the JP1/Base User's Guide.
- Condition to be satisfied:
-
Issue a correlation event whenever a JP1 event (00003A71) indicating an authentication error is issued five times from the same server.
- Reason:
-
User authentication is used to restrict connection to specific servers, and a correlation event is issued by associating JP1 events that indicate an authentication error. Authentication is required for a number of hosts, and you want to manage this correlation event for each individual host.
- Contents of the correlation event generation definition file:
-
The following figure shows the contents of the correlation event generation definition file.
Figure 14‒9: Contents of the correlation event generation definition file Note: In this example, a line number is inserted at the beginning of each line to indicate the individual lines you need to write in the definition file.
To use the correlation event generation definition shown above, copy the following coding:
[Access_error]
CON=CID:1, B.ID==3A71, B.MESSAGE>=User authentication failed.
TYPE=threshold:5
SAME_ATTRIBUTE=B.SOURCESERVER
SUCCESS_EVENT=B.ID:A00,E.SEVERITY:Error,B.MESSAGE:Authentication errors occurred on $EV1_B.SOURCESERVER.
(f) Monitoring for a situation where an event does not occur within a specified time period
This example shows how to issue a correlation event when a particular event has not occurred within a specified time period, as shown by the figure below.
|
- Condition to be satisfied:
-
Suppose that a warning event A is issued, indicating that a server has stopped, followed some time later by an information event B indicating that the server has started. If both A and B are not detected within a specified timeout period, a warning event C is to be issued.
- Reason:
-
You want to monitor for a situation where a particular event has not occurred within a specified period of time, so that you can investigate the cause of the problem.
- Contents of the correlation event generation definition file:
-
The following figure shows the contents of the correlation event generation definition file in this example.
Figure 14‒11: Contents of the correlation event generation definition file Note: The line number inserted at the beginning of each line indicates the individual lines you need to write in the definition file.
To use the correlation event generation definition shown above, copy the following coding:
[correlation1]
TIMEOUT=180
CON=CID:1,B.ID==A
CON=CID:2,B.ID==B
SAME_ATTRIBUTE=B.SOURCESERVER
FAIL_EVENT=B.ID:C,E.SEVERITY:Warning,B.MESSAGE:Server $EV1_B.SOURCESERVER has not recovered.
TYPE=sequence
(2) Operating environment required for correlation event generation
The following describes the operating environment required for issuing correlation events.
- Memory and disk space requirements for correlation event issue
-
To issue correlation events, the following process of JP1/IM - Manager must be active:
-
When not using the integrated monitoring database:
Event generation service (evgen)
-
When using the integrated monitoring database:
Event base service (evflow)
Estimate in advance the extra memory requirements for starting the relevant process.
Correlation event generation history files are added periodically and make demands on disk space. Allocate sufficient resources for the estimated disk space requirements. You can change the number and size of the correlation event generation history files by adjusting a parameter in the correlation event generation environment definition file. For details, see Correlation event generation environment definition file in Chapter 2. Definition Files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
For details on estimating memory and disk space requirements, see the Release Notes for JP1/IM - Manager.
-
- Designing the JP1/IM and JP1/Base filters
-
Bear in mind the following two points when setting the JP1/IM and JP1/Base filters:
-
Filtering of correlation source events
The JP1 events that you want to use as correlation source events must be distributed to the event generation service. To this end, set the JP1/Base forwarding filter and the JP1/IM event acquisition filter so that the source events will pass through.
The JP1/IM severe events filter, event receiver filters, and view filter can be optionally set. Set these filters depending on whether you need to monitor correlation source events.
-
Filtering of correlation events
Correlation events must be monitored from JP1/IM - View. As a general rule, set the event acquisition filter and other JP1/IM filters so that correlation events will pass through.
Filtering can be used when you want to issue correlation events for a purpose other than monitoring, such as to trigger an automated action or to effect a status change in a monitoring node. In this case also, make sure that you set the event acquisition filter so as to allow the correlation events to pass through.
For considerations related to setting the JP1/IM filters, see 14.1.3 Considerations for filtering JP1 events. For considerations on setting the JP1/Base forwarding filter, see the description of JP1 event forwarding in the chapter on setting the event service in the JP1/Base User's Guide.
-
(3) Notes on correlation event generation
Note the following points regarding correlation event generation:
-
You cannot make an issued correlation event subject to any further correlation processing.
If you register a correlation event as a source event in a correlation event generation definition, the setting will be ignored.
-
After editing a correlation event generation definition file, always check its contents by executing the jcoegscheck command. This will eliminate invalid or redundant conditions as definition errors.
For details about the jcoegscheck command, see jcoegscheck in Chapter 1. Commands in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
The event generation service can still operate when a generation definition contains invalid settings, but any invalid parts in the edited file will be ignored.
-
If you specify the same attribute as in an event condition in a filtering condition for the correlation target range, or in a duplicate attribute value condition, you might end up with invalid conditions that can never be satisfied. The jcoegscheck command does not catch such problems.
When specifying a filtering condition for the correlation target range or a duplicate attribute value condition, take care that it does not contradict the event conditions.
Two examples of invalid conditions are discussed below. The first is an example of specifying a filtering condition for the correlation target range.
Figure 14‒12: Invalid conditions: Example 1 (filtering condition for the correlation target range) This example is explained below, following the line numbers.
Line 2 declares a filtering condition for the correlation target range, and specifies as correlation targets all JP1 events whose source server name contains host. As a result, a JP1 event whose source server name is HOST_A, specified in the event condition at line 4, will not be correlated.
Because JP1 events that satisfy the event condition at line 4 are not processed, the correlation event generation condition fails and no correlation event is issued.
The next example is a duplicate attribute value condition.
Figure 14‒13: Invalid conditions: Example 2 (duplicate attribute value condition) This example is explained below, following the line numbers.
The event condition at line 2 correlates messages that begin with ERROR=, and the event condition at line 3 correlates messages that begin with ACCESS ERROR. The duplicate attribute value condition at line 4 groups JP1 events that have identical messages.
Because the event conditions at line 2 and line 3 target JP1 events that have different messages, the same message requirement of the duplicate attribute value condition cannot be satisfied, and correlation events will never be issued.
However, JP1 events that match the event condition in line 2 or line 3 will be processed, starting a correlation processing which can never succeed. Suppose that JP1 events are issued in the following order:
-
JP1 event (event ID: 00000999; message: ERROR=100) is issued.
This event satisfies the event condition at line 2, so a correlation processing begins. ERROR=100 is registered as a potential duplicate attribute value, and the number of sets of JP1 events being correlated is incremented by one.
-
JP1 event (event ID: 00000998; message: ACCESS ERROR) is issued.
This event satisfies the event condition at line 3, but its message is not the same as ERROR=100, so a new correlation processing begins. ACCESS ERROR is registered as a potential duplicate attribute value, and the number of sets of JP1 events being correlated is incremented by one.
-
-
When the correlation event generation function issues correlation and correlation failure events, the total number of events in the whole system increases. The increase of events might cause overall system load to increase.
Consider the increase of events due to the correlation event function and the total number of events in the whole system, and verify whether the increase of events causes a problem in the overall system load.