1.1.2 Monitoring service status

If a service that has many users or that is critical to some users' business is interrupted, those users are greatly affected. ITSLM can achieve monitoring based on threshold values that are used as evaluation metrics (SLOs). ITSLM can also predict an abnormality in service performance by monitoring for unusual service status.

Monitoring based on threshold values

You can evaluate service status based on specific metrics of the SLOs. You can also detect a service that might exceed an SLO in the future by analyzing trends in the service's status in real time.
Monitoring for an unusual service status

You can detect at an early stage a warning sign of a possible abnormality that feels unusual to service users, before it develops into a real service performance error. By handling an abnormality at the stage of the early warning sign, you can provide stable services and increase service users' sense of satisfaction.

The following figure shows how ITSLM performs monitoring.

Figure 1‒4: Mechanism of monitoring by ITSLM

ITSLM collects, aggregates, and analyzes in real time the HTTP packets that constitute the requests and responses sent between the service users and the service providing server. ITSLM monitors the current service status in this manner.

In services provided by business systems, a single process consists of one or more sets of requests and responses. For example, in a mail service, each process, such as a login process or display of a list of emails, consists of multiple requests and responses. To monitor the status of each service process, ITSLM identifies the requests and responses that make up the process to be monitored among all requests and responses of the monitored service and monitors those requests and responses as a set.

When each service process is monitored, a set of requests and responses is identified based on the queries and cookie information contained in the URIs of the requests and responses.

Whether to monitor services by process is evaluated when the following types of processes occur:

Newly added processes
Important processes in terms of system requirements
Processes that are expected to generate a high workload
Other processes that require special attention

Example of predictive error detection in the performance of a monitored service and the corrective action support methodology

This example detects an unusual service status that is a warning sign of an abnormality in the performance of a monitored service and takes an appropriate corrective action before an error materializes.

The following figure shows the general procedure for detecting a warning sign of an abnormality in the performance of a monitored service and taking corrective action.

Figure 1‒5: General procedure for detecting a warning sign of an abnormality in the performance of a monitored service and taking corrective action

First, use of ITSLM to monitor a service's status detects an increase in response time, which is a warning sign of an abnormality in service performance. Next, from ITSLM's past monitoring records, the timing of an event that might be the cause of the warning sign of an abnormality in service performance is checked. You can use the results of this check to respond to (handle) the detected event.

When ITSLM verifies that the service level has recovered after the cause was identified and you took corrective action, your handling of the abnormality in service performance at the stage of the early warning sign is complete.

ITSLM performs predictive error detection in the performance of a monitored service. It can also help you take corrective action. Because ITSLM enables you to take corrective action before a problem actually occurs in the service, you can improve the service users' sense of satisfaction.

For this example, an example of setting up the monitored items is explained in 3.3.1 Example of setup for predictive error detection in the performance of monitored services and the corrective action support methodology, and an example of execution of monitoring is explained in 4.6.1 Example of execution for predictive error detection in the performance of monitored services and the corrective action support methodology.

Example of predictive error detection in the performance of processes in a monitored service and the corrective action support methodology

This subsection explains an example of monitoring a new process added to a monitored service.

New functions have been added to a monitored service after upgrading. Because newly added processes are prone to errors, this example registers the new process into ITSLM and monitors it individually in addition to monitoring the entire service.

The following figure shows the general procedure for detecting a warning sign of an abnormality in the performance of a registered process of a monitored service and taking corrective action.

Figure 1‒6: General procedure for detecting a warning sign of an abnormality in the performance of a process in a monitored service registered into ITSLM and taking corrective action

This example monitors the status of newly registered processes. First, ITSLM detects an increase in response time in a registered process, a warning sign of an abnormality in service performance for the process. Next, from ITSLM's past monitoring records, the timing of an event that might be the cause of the warning sign of an abnormality in service performance for the process is checked. You can use the results of this check to respond to (handle) the detected event.

When ITSLM verifies that the service level has recovered after the process resulting in the warning sign of the abnormality and the timing of the event were identified and you took an appropriate corrective action, your handling of the abnormality in service performance of the process at the stage of the early warning sign is complete.

ITSLM performs predictive error detection in the performance of each process of a monitored service. It can also assist you in taking an appropriate corrective action.

For this example, an example of setting up the monitored items is explained in 3.3.2 Example of setup for predictive error detection in the performance of processes in monitored services and the corrective action support methodology, and an example of execution of monitoring is explained in 4.6.2 Example of execution for predictive error detection in the performance of processes in monitored services and the corrective action support methodology.

To Page Top