3.3.3 Example of setup for predictive error detection in the performance of systems running monitored services and the corrective action support methodology (working with Performance Management)

This subsection explains an example of predictive error detection in the performance of systems running monitored services, as discussed in 1.2 Linking with Performance Management to monitor service status (working with Performance Management).

This subsection explains by way of example how to perform evaluation and setup based on given conditions to support predictive error detection in the system performance of hosts and middleware that provide monitored services and the corrective actions to take.

Organization of this subsection

(1) Prerequisites
(2) Collecting key field information for monitoring items
(3) Setting up monitoring items

(1) Prerequisites

The following are the conditions for this setup example:

There is a service level agreement (SLA) regarding the service quality (service level) between the service's outsourcing company (service provider) and an outsourced contractor (data center). The data center is required to maintain the service level based on the SLA. SLOs defined on the basis of the SLA are specified in the same manner as in 3.3.1 Example of setup for predictive error detection in the performance of monitored services and the corrective action support methodology.
The service group and monitored services have been registered in the same manner as in 3.3.1 Example of setup for predictive error detection in the performance of monitored services and the corrective action support methodology. Monitoring of the monitored services has stopped.

The following figure shows the relationship among the personnel involved in this task.

Figure 3‒40: Relationship among personnel involved in predictive error detection in the performance of systems running monitored services and the corrective action support methodology (setup example)

Person who monitors all services

Adds the monitoring items for system performance for the services for which SLOs are defined.

To monitor the monitoring items for system performance in ITSLM, this person verifies the settings in Performance Management with the system administrator.
System administrator

The system administrator defines the monitoring items for system performance in Performance Management. This person provides the information needed for monitoring system performance in ITSLM to the person who monitors all services.

To Page Top

(2) Collecting key field information for monitoring items

This subsection explains an example of multi-instance monitoring items. For single-instance monitoring items, there is no need to define key field information.

Tasks required for setting up monitoring items in ITSLM: The person who monitors all services asks the system administrator to provide the information needed to monitor system performance in ITSLM. The system administrator checks the key field information (multi-instance records) collected by Performance Management and provides the information to the person who monitors all services. For an example of multi-instance records collected by Performance Management, see 3.1.1(7) Monitoring items for system performance.
Results of the tasks: Because the key field information has been verified, the person who monitors all services decides to set up monitoring items for the system providing each monitored service.

To Page Top

(3) Setting up monitoring items

Tasks in ITSLM

The two types of monitoring item setup tasks are configuration information setup and monitoring setup. These types are explained below.

Configuration information setup

The person who monitors all services decides to log in to ITSLM - Manager, display the Settings window, and then set up the configuration information.

To monitor system performance, you first set up configuration information for the monitored service. Setting up configuration information involves associating the business group with the monitored service and then setting up the monitored target. Monitoring items (such as CPU, HDD, and HEAP) are also set up for the monitored target.

The following shows an example of the setup.

Figure 3‒41: Setup example of configuration information (business group setup)

In this figure, the business group to be associated with service Service01 of service group Group01 is selected.

Business group BGroup2 is associated with host Host03. Because Agent02 and Agent03 are running on host Host03, data collected by Agent02 and Agent03 will be monitored by ITSLM.

After selecting the business group, click the To Monitor item settings button to set up monitoring items for the monitored target.

The following shows an example of the setup.

Figure 3‒42: Setup example of configuration information (monitoring item setup)

Monitoring items can be set up for monitored target Agent03. Specify in monitoring item setup whether system information measured by Performance Management is to be associated with the monitored service for which the business group has been set.

In this figure, monitoring item CPU is set up for Agent03. For the value of Key field 1, C specified in Performance Management is specified.

Monitoring setup

Once the configuration information has been set up, the person who monitors all services decides to specify the details of monitoring.

Based on the SLOs, monitoring items for the system that provides the monitored service are set up.

The following shows an example of the setup.

Figure 3‒43: Setup example of monitoring items for the system that provides the monitored service based on SLOs

This example sets up a monitoring item for Agent01 that was associated with service Service05 of service group Group03. The following shows the monitoring item settings.

SLO monitor settings

Table 3‒15: Example settings under SLO monitor settings
Monitoring item	Monitoring	Threshold	Occurrence frequency (Times exceeded/measured)	Trend monitoring
CPU	Select	`30`%	`1`/`2`	`5`

Under SLO monitor settings, the SLO definition items are specified as thresholds, and then trend monitoring is set up to promptly detect any error in the performance of the system running the monitored service.

A warning is set to be issued if the probability of exceeding the threshold is 1/2 or higher during the measurement period.

Any potential system performance error must be detected at least five hours in advance because other personnel must be contacted to take corrective action in the event of a system performance error. For this reason, trend monitoring is set to 5 hours.

Error Predict. settings

Table 3‒16: Example settings under Error Predict. settings
Monitoring item	Monitoring	Days in baseline calculation	Days till start	Sensitivity	Occurrence frequency (Times exceeded/measured)
CPU	Select	`20` days	`5` days	High	`1`/`5`

Under Error Predict. settings, 20 days' worth of service performance is to be used to calculate the baseline for performing monitoring based on typical system performance. Days till start is set to 5 because it was requested that monitoring be started five days later.

A warning is set to be issued if the probability of exceeding the threshold is 1/5 or higher during the measurement period.

Out-of-range value detection is to be performed for all monitoring items. The sensitivity is set to high so that any service performance the veers from the baseline will be detected quickly.

Results of the tasks

Once setup has been completed for service Service05 of service group Group03, the person who monitors all services proceeds to set up monitoring items for the remaining monitored services in the same manner.

After setup has been completed for all monitored services, the person who monitors all services decides to perform monitoring. For an example of execution of monitoring, see 4.6.3 Example of execution for predictive error detection in the performance of systems running monitored services and the corrective action support methodology (working with Performance Management).

To Page Top