6.3.5 Setting the function for measurement value output at alarm recovery
Because alarms that monitor multi-instance records only enter normal status when the value of every instance is in the normal range, they do not identify the specific instance that caused the alarm to be generated. For this reason, fixed values are displayed for measured values and message text when conditions return to normal.
If you enable the function for measurement value output at alarm recovery, the instance that most recently caused the alarm to enter abnormal or warning status is assumed to be responsible for its return to normal status, and the measured values and message text are set accordingly.
The following table describes how this function affects the alarm generated when an alarm that monitors a multi-instance record returns to normal status.
Item |
Function for measurement value output at alarm recovery |
|
---|---|---|
Enabled |
Disabled |
|
Number of alarms generated |
The number of alarms generated simultaneously when the last abnormal or warning alarm was generated |
1 |
Measured value (the variable %CVS in the alarm definition |
The current value of the instance that caused the last abnormal or warning alarm to be generated# |
<OK> |
The contents of the message text (the variable %MTS in the alarm definition) |
The value set in the alarm definition |
-- |
For details on how to set the function for measurement value output at alarm recovery, see the chapter describing installation and setup in the JP1/Performance Management Planning and Configuration Guide.
- Tip
-
Even with the function for measurement value output at alarm recovery enabled, the only part of the alarm message text that changes is the measured value. Therefore, if you write alarm message text that refers to the alarm being in abnormal or warning status, the message receiver might assume that the instance to which it refers is in abnormal or warning status, when in fact it indicates a return to normal status. For this reason, we recommend that you do not include references to the alarm status in alarm message text in a system with the function for measurement value output at alarm recovery enabled.
Also, some monitoring template alarms for PFM - Agent and PFM - RM might contain alarm message text that refers to the alarm status. When using such a monitoring template alarm in a system with the function for measurement value output at alarm recovery enabled, we recommend that you copy the alarm table and edit the alarm message text as needed.
- Organization of this subsection
(1) Contents of alarm message text
This subsection describes the contents of alarm message text, using the Abnormal Status(A) alarm of the health check agent as an example. The Abnormal Status(A) alarm monitors the health check status of an agent. In the example below, the alarm monitors PFM - Agent for Platform (Windows) on the server host01.
The following table describes how the function for measurement value output at alarm recovery affects the alarm message text.
Event |
Alarm message text |
|
---|---|---|
With feature enabled |
With feature disabled |
|
A value exceeds the threshold and an abnormal or warning alarm is reported |
Status of TA1host01 changed to Incomplete |
Status of TA1host01 changed to Incomplete |
The value enters a normal range and a normal alarm is reported |
Status of TA1host01 changed to Running |
-- |
(2) Examples of alarm settings and generated alarms
For each of the following alarm types, this subsection describes examples of issued alarms and their message text:
-
Standard alarms
An alarm for which both Notify when the state changed#1 and State changes for the alarm#1 are set and Evaluate all data#1 is not set.
-
Alarms that monitor whether a value exists
An alarm for which Check whether the value exists#2 is set.
-
Alarms that evaluate all data
An alarm for which Evaluate all data#1 is set.
- #1
-
For details, see the chapter describing the New Alarm > Basic Information window or the Edit > Basic Information window in the manual JP1/Performance Management Reference.
- #2
-
For details, see 6.4.3 Setting a value whose existence is to be monitored.
(a) Standard alarms
The following describes an example of a standard alarm. In this example, the alarm monitors PFM - Agent for Platform (Windows). The alarm definition is as follows:
-
Message text: Disk Busy % (%CVS1) = %CVS2
-
Check whether the value exists: Not selected.
-
Activate alarm: Selected.
-
Notify when the state changed: Selected.
-
State changes for the alarm: Selected.
-
Evaluate all data: Not selected.
-
Evaluate regularly: Selected.
-
Alarm when damping conditions are satisfied: Not selected.
-
Record: Logical Disk Overview (PI_LOGD)
-
Abnormal: ID <> "_Total" AND % Disk Time >= "90.000"
-
Warning: ID <> "_Total" AND % Disk Time >= "50.000"
■ When an instance enters abnormal or warning status after an alarm of that status has been generated
The following table describes an example in which an abnormal or warning alarm is generated for an instance, after which another instance enters the same status as the alarm.
Timing of alarm evaluation |
Status of instance |
Generated alarm |
|
---|---|---|---|
Instance 1 (C drive) |
Instance 2 (D drive) |
||
1st |
Measurement value: 10 Status: Normal |
Measurement value: 20 Status: Normal |
--#1 |
2nd |
Measurement value: 90 Status: Abnormal |
Measurement value: 30 Status: Normal |
Alarm: (abnormal)
|
3rd |
Measurement value: 60 Status: Warning |
Measurement value: 90 Status: Abnormal |
--#2 |
4th |
Measurement value: 20 Status: Normal |
Measurement value: 30 Status: Normal |
Alarm: (normal)
|
In this example, different instances enter the abnormal range the second and third time the alarm is evaluated. However, because the status of the record has not changed between the second and third evaluation, an abnormal alarm is not issued as a result of the third evaluation.
The variable %CVS in a normal alarm stores the measurement value of the instance that caused the last abnormal or warning alarm to be issued. For this reason, the normal alarm generated at the fourth evaluation uses the measurement value of instance 1, which caused the abnormal alarm to be generated at the second evaluation.
■ When an alarm of another status is generated after an abnormal or warning alarm
The following table describes an example in which an abnormal or warning alarm is generated, after which another alarm of a different status is generated based on a measurement value of another instance.
Timing of alarm evaluation |
Instance status |
Generated alarm |
|
---|---|---|---|
Instance 1 (C drive) |
Instance 2 (D drive) |
||
1st |
Measurement value: 10 Status: Normal |
Measurement value: 20 Status: Normal |
--# |
2nd |
Measurement value: 90 Status: Abnormal |
Measurement value: 30 Status: Normal |
Alarm: (abnormal)
|
3rd |
Measurement value: 40 Status: Normal |
Measurement value: 60 Status: Warning |
Alarm: (warning)
|
4th |
Measurement value: 20 Status: Normal |
Measurement value: 30 Status: Normal |
Alarm: (normal)
|
In this example, instance 1 enters the normal range the third time the alarm is evaluated. However, because instance 2 has entered the warning range, a warning alarm is generated.
The variable %CVS in a normal alarm stores the measurement value of the instance that caused the last abnormal or warning alarm to be issued. For this reason, the normal alarm generated at the fourth evaluation uses the measurement value of instance 2, which caused the warning alarm to be generated at the third evaluation.
(b) Alarms that monitor whether a value exists
The following describes an example of an alarm that monitors whether a value exists. In this example, the alarm monitors PFM - Agent for Platform (Windows). The alarm definition is as follows:
-
Message text: %CVS
-
Check whether the value exists: Selected.
-
Activate alarm: Selected.
-
Notify when the state changed: Selected.
-
State changes for the alarm: Selected.
-
Evaluate all data: Not selected.
-
Evaluate regularly: Selected.
-
Alarm when damping conditions are satisfied: Not selected.
-
Record: Process Detail Interval (PD_PDI)
-
Field: Program
-
Value: process2
The following table describes an example in which this alarm is generated.
Timing of alarm evaluation |
Instance status |
Generated alarms |
|
---|---|---|---|
Instance 1 |
Instance 2 |
||
1st |
Measurement value: process1 |
Measurement value: process2 |
--#1 |
2nd |
Measurement value: process1 |
Measurement value: process3 |
Alarm: (abnormal)
|
3rd |
Measurement value: process3 |
Measurement value: process4 |
--#2 |
4th |
Measurement value: process2 |
Measurement value: process3 |
Alarm: (normal)
|
In this example, the value process2 whose existence is being monitored is not found when the alarm is evaluated for the second time, and an abnormal alarm is generated. However, because the value does not exist when the alarm is generated, (N/A) appears for the %CVS variable in the message text.
In the fourth evaluation, because the value process2 is present again, a normal alarm is generated, and the value process2 appears for the %CVS variable.
(c) Alarms that evaluate all data
The following describes an example of an alarm that evaluates all data. In this example, the alarm monitors PFM - Agent for Platform (Windows). The alarm definition is as follows:
-
Message text: Disk Busy % (%CVS1) = %CVS2
-
Check whether the value exists: Not selected.
-
Activate alarm: Selected.
-
Notify when the state changed: Selected.
-
State changes for the alarm: Selected.
-
Evaluate all data: Selected.
-
Evaluate regularly: Selected.
-
Alarm when damping conditions are satisfied: Not selected.
-
Record: Logical Disk Overview (PI_LOGD)
-
Abnormal: ID <> "_Total" AND % Disk Time >= "90.000"
-
Warning: ID <> "_Total" AND % Disk Time >= "50.000"
The following table describes an example in which this alarm is generated.
Timing of alarm evaluation |
Instance status |
Generated alarms |
||
---|---|---|---|---|
Instance 1 (C drive) |
Instance 2 (D drive) |
Instance 1 (C drive) |
Instance 2 (D drive) |
|
1st |
Measurement value: 10 Status: Normal |
Measurement value: 20 Status: Normal |
--#1 |
--#1 |
2nd |
Measurement value: 90 Status: Abnormal |
Measurement value: 90 Status: Abnormal |
Alarm: (abnormal)
|
Alarm: (abnormal)
|
3rd |
Measurement value: 20 Status: Normal |
Measurement value: 90 Status: Abnormal |
--#2 |
--#2 |
4th |
Measurement value: 10 Status: Normal |
Measurement value: 20 Status: Normal |
Alarm: (normal)
|
Alarm: (normal)
|
When an alarm evaluates all data, an alarm is generated for each instance whose data meets the alarm conditions when the alarm status changes. Therefore, the second time the alarm is evaluated, abnormal alarms are generated for instance 1 and instance 2 which have both entered the abnormal range.
When the measurement values of all instances return to the normal range, a number of normal alarms equivalent to the number of abnormal and warning alarms are issued. In this example, because two abnormal alarms were generated at the second evaluation, two normal alarms are generated at the fourth evaluation. The %CVS variable in the message text of the normal alarms is replaced with the measurement values of instance 1 and instance 2 that caused the abnormal alarms to be generated in the second evaluation.