3.1.3 Using trend monitoring for detection in advance of threshold overages
Trend monitoring monitors each monitoring item. For details about the monitoring items, see 3.1.1 ITSLM's monitoring methods and types of monitored targets. Note that trend monitoring is not applicable to error rate.
This subsection explains trend monitoring.
- Organization of this subsection
(1) About trend monitoring
Trend monitoring calculates trends in the performance trends of monitored services and detects in advance possible overages of a service performance threshold.
A trend is an approximated straight line obtained from current service performance. An approximated straight line is calculated on the basis of the past N hours of service performance. If this approximated straight line exceeds the threshold within N hours from the present time, this event is detected as a warning sign of a potential service performance error. The value of N is specified in the Settings window.
For details about how to specify numeric values in the Settings window, see 3.2.7 Setting up the monitoring items for service performance.
The following shows an example in which an overage of a threshold is detected ahead of time by trend monitoring.
This example monitors average response time. The trend is calculated from the past N hours of service performance. A warning sign is detected if the service performance is predicted to exceed the threshold within the next N hours.
To obtain a trend for predicting an overage of a threshold within N hours, N hours' worth of service performance is required. This reduces the error associated with a long period of trend monitoring. To predict an overage of a threshold during the next hour, one hour's worth of service performance is required.
The approximated straight line is updated every 60 seconds and each time this occurs a check is performed to see if an overage of the threshold might occur. If an overage of the threshold is predicted, a warning is displayed in the window.
The following shows an example of a warning displayed in the window.
The information displayed in the window includes a warning icon, the detection date and time, the time at which service performance is predicted to exceed the threshold, the name of the service group subject to the warning, and the service name. If the trend keeps exceeding the threshold, a warning is displayed only the first time the trend is detected. You can view the service performance leading up to and following the point of the warning as a graph.
The following shows an example of a graph that is displayed.
In the graph, a warning icon indicates the time the threshold would be exceeded and a colored belt indicates the time period during which the event resulting in the overage of the threshold is assumed to occur.
To run trend monitoring, you must specify the following items in the Settings window:
-
Threshold
-
Reference time for calculating trends
- Threshold
-
Specifies the reference threshold that is to be used to determine the status of the monitored service.
- Reference time for calculating trends
-
Specifies N hours as the reference time for calculating trends. N hours are used as follows:
-
A trend is calculated on the basis of the past N hours of service performance.
-
A warning sign is detected if an overage of a threshold is predicted to occur within N hours from the present time.
-
- When linking with Performance Management
-
If you link ITSLM with Performance Management, you can also run trend monitoring for system performance. In trend monitoring for system performance, there are two types of monitoring items:
-
Monitoring item to be reported when it exceeds the threshold
-
Monitoring item to be reported when it drops below the threshold
You can determine which type applies to a monitoring item by checking the Monitor settings area in the Settings window. If the icon in the Threshold column is , the monitoring item is reported when it exceeds the threshold. If the icon in the Threshold column is , the monitoring item is reported when it drops below the threshold.
-
(2) Detection criteria
In trend monitoring, a warning is detected if the calculated trend is flat or uptrending and satisfies one of the following conditions:
-
The trend is currently already above the threshold
The time that is displayed in Details for the reported warning event is the current time.
-
The trend indicates that the threshold will be reached or exceeded within N hours
The time that is displayed in Details for the reported warning event is the time the overage of the threshold is predicted to occur.
The value of N is specified in the Settings window.
For details about how to specify numeric values in the Settings window, see 3.2.7 Setting up the monitoring items for service performance.
In the case of a downward trend, no warning is detected even if the current trend exceeds the threshold because such a trend might be indicative of recovery.
To maintain accuracy, a trend is calculated only when a condition is satisfied. The criteria for calculating a trend when service performance is being monitored differs from when system performance is being monitored. The following explains both cases.
- When service performance is monitored
-
A trend is calculated when the following condition is satisfied:
-
Total amount of time over the past N hours during which service performance was collected (seconds) N 3,600 30 100 (seconds)
For example, if the value of N is 5, 5 3,600 30 100 = 5,400 (seconds), which is 90 minutes. If at least 90 minutes' worth of service performance has been collected, a trend is calculated and trend monitoring is run.
-
- When system performance is monitored
-
A trend is calculated when the following criteria are both satisfied:
-
Total amount of time over the past N hours during which service performance was collected (seconds) N 3,600 30 100 (seconds)
-
At least two performance data items have been collected during the past N hours.
For example, if the value of N is 5, 5 3,600 30 100 = 5,400 (seconds), which is 90 minutes. If at least 90 minutes' worth of service performance has been collected and at least two performance data items have been collected, a trend is calculated and trend monitoring is run.
-
(3) Criteria for determining that performance has returned to normal
This subsection explains the criteria for determining that performance has returned to normal.
- Monitoring items for upper-limit threshold value
-
If any of the following conditions is true, the trend monitoring status returns to normal.
-
The trend will no longer exceed the threshold N hours from now.
-
The trend is downtrending.
-
The trend is currently below the threshold and will no longer exceed the threshold in the next N hours.
-
- Monitoring items for lower-limit threshold value
-
If any of the following conditions is true, the trend monitoring status returns to normal.
-
The trend will no longer exceed the threshold N hours from now.
-
The trend is uptrending.
-
The trend is currently above the threshold and will no longer be below the threshold in the next N hours.
-
The value of N is specified in the Settings window.
For details about how to specify numeric values in the Settings window, see 3.2.7 Setting up the monitoring items for service performance.
Note that a monitored service is placed in warning status when a warning is reported for it in ITSLM's Home or Real-time Monitor window, and such a monitored service will remain in warning status until it recovers to normal status. A trend monitoring notification for the same monitoring item for the same monitored service is suppressed. Therefore, when an overage of a threshold is detected by trend monitoring, the warning status remains displayed for at least 60 seconds after the notification.
(4) Supplementary information
-
If an overage of a threshold is currently already detected by threshold value monitoring, it will not be detected by trend monitoring.
-
The following service performance is not used for calculation of a trend:
-
If monitoring was stopped once and restarted within N hours, the service performance existing before monitoring was restarted
-
Service performance existing when average response time and throughput were both 0
The value of N is specified in the Monitor settings area of the Settings window.
For details about how to specify numeric values in the Settings window, see 3.2.7 Setting up the monitoring items for service performance.
-
-
If service performance data that is not used for trend monitoring continues to exist after a warning was detected, such as when a condition where throughput and average response time are both 0 continues after an overage of a threshold was detected by trend monitoring for average response time, it might take time for the monitored service resulting in the warning to return to its normal status because there is no additional data to change the trend.
-
If the value of N specified in the Monitor settings window is less than the collection interval for that monitoring item, the required number of performance data items (minimum of two) cannot be acquired within the trend monitoring time even if trend monitoring is set to be run. In such a case, trend monitoring is not run because an approximated straight line cannot be created.