4.6.2 Example of execution for predictive error detection in the performance of processes in monitored services and the corrective action support methodology
This subsection explains by way of example how to use ITSLM to execute predictive error detection in the performance of processes in monitored services and the corrective actions to take, based on given conditions.
- Organization of this subsection
(1) Prerequisites
The conditions for this execution example are as follows:
-
Registration of monitored services and Web transactions and the setup required for predictive error detection have been completed and monitoring has already started.
-
The following figure shows the relationship among personnel involved in this task.
Figure 4‒8: Relationship among personnel involved in predictive error detection in the performance of processes in monitored services and the corrective action support methodology (execution example) -
Person who monitors all services
Instructs the monitor to perform monitoring. If notified of a warning sign of a service performance error, this person investigates the cause. Upon determining that further investigation is needed, this person asks the maintenance service provider for the monitored service to investigate.
-
Monitor
Uses the Home window to monitor the status of the monitored services of all service groups and the status of the processes of each monitored service.
-
Maintenance service provider for the monitored service
If requested by the person who monitors all services, this person investigates the monitored service and takes correction action, as necessary.
-
(2) Predictive error detection in the performance of a process in a monitored service
- Tasks in ITSLM
-
While the person who monitors all services was monitoring the status of the monitored services and the status of the processes of the monitored services in the Home window, a warning sign of a service performance error was displayed for a Web transaction corresponding to a process.
The following figure shows a display example of the Home window when a warning is displayed for a Web transaction of a monitored service.
Figure 4‒9: Display example of the Home window that contains a warning for a Web transaction of a monitored service Details of the warning displayed in this figure are as follows:
-
When detected: 2014-08-01 15:49:50
-
Type: OUTLIER
-
Details: UPPER LIMIT
-
Service group: Group02
-
Service: Service02
-
Monitored target: All Web Access
-
Monitor item: Avg. response
This warning indicates that the average response time of All Web Access of Service02 belonging to Group02 that was obtained at 15:49:50 on August 1, 2014, constituted an out-of-range value (a value exceeding the upper limit) and differed significantly from the usual value for the monitored service.
-
- Results of the task
-
The monitor reported the warning to the person who monitors all services.
Because the warning might lead to an error if left unattended, the person who monitors all services decided to take corrective action immediately.
(3) Corrective action taken after a warning sign was detected in the service performance for a process of a monitored service
- Tasks in ITSLM
-
After being notified of the warning displayed in the Home window, the person who monitors all services decided to use the Troubleshoot window to investigate the timing of the event detected as a warning, and then take corrective action.
The following figure shows a display example of the Troubleshoot window in which a warning is displayed for a Web transaction of a monitored service.
Figure 4‒10: Display example of the Troubleshoot window in which a warning is displayed for a Web transaction of a monitored service This performance chart of average response time indicates that the event causing the warning occurred between 15:48:18 and 15:51:18.
The access log for the time period during which the warning appeared include the Web transactions of the monitored service. This information can be used to investigate any problems in Web system processing.
Figure 4‒11: Display example of the access log in which a warning for a monitored service is displayed - Results of tasks
-
Because the details of the warning and the timing of the event causing the warning became clear from the data provided in the Troubleshoot window, the person who monitors all services notified the maintenance service provider for the monitored service and requested a root cause investigation and corrective action.
(4) Verifying the service performance after taking corrective action
- Tasks in ITSLM
-
After corrective action was taken by the maintenance service provider for the monitored service based on the results of a root cause investigation, the person who monitors all services decided to use the Real-time Monitor window to verify that the service performance of the Web transaction had returned to normal.
The following figure shows a display example of the Real-time Monitor window showing that the service performance of the Web transaction has returned to normal after corrective action was taken.
Figure 4‒12: Display example of the Real-time Monitor window showing that service performance of the Web transaction has returned to normal As shown in this figure, when service performance of the Web transaction has returned to normal, the (normal) icon is displayed in the Service performance information area.
- Results of tasks
-
The person who monitors all services has verified that service performance of the Web transaction has returned to normal. This concludes the handling of the warning sign of a service performance error for a process of a monitored service.