4.6.2 Setting up the monitoring items for service performance
Set the monitor items for each monitored service. The monitoring staff performs this operation.
- Organization of this subsection
(1) Before you start
Verify that you have the service group administrator permissions.
Verify that the monitored service has been registered.
For details about how to register monitored services, see 4.6.1 Registering monitored services.
-
Verify that monitoring of the monitored service for which monitoring items are to be set up has stopped.
For details about how to stop monitoring, see (3) Procedure for stopping monitoring in 5.3.1 Starting monitoring.
(2) Procedure
The following shows the Settings window used in this task:
To set up monitoring items for service performance:
Click the Settings button.
In the Setting menu area, select Monitor settings.
The Monitor settings area is displayed.
From the Services area, select a monitored target of a monitored service.
When you select a monitored target of a monitored service, the service group name, monitored service name, and monitored target are displayed in the Monitor settings area. The current values are displayed under SLO monitor settings and Error Predict. settings. Immediately after a monitored service has been registered, the default values are set.
If you will be running threshold value monitoring or trend monitoring, select the Item name check boxes under SLO monitor settings for the items that you want to monitor, and then enter values in Threshold.
An error message is displayed if an Item name check box is selected but no value is specified for that item or an invalid value is entered in the text box.
If you will be running trend monitoring, select the Trend monitor check boxes for the items that you want to monitor under SLO monitor settings, and then enter the reference time for trend calculation.
The Trend monitor check boxes are enabled only when Item name check boxes are selected. In the Trend monitor text box, enter the time to be subject to trend monitoring.
An error message is displayed if a Trend monitor check box is selected but no value is specified for that item or an invalid value is entered in the text box. Note that there is no check box for Error rate, because trend monitoring is not applicable to error rate.
Under Error Predict. settings, enter appropriate values in Days in baseline calculation and Days till start.
An error message is displayed if an invalid value or nothing is entered in a text box. If you will not be performing out-of-range value detection, leave the default values in Days in baseline calculation and Days till start.
If you will be performing out-of-range value detection, select the Item name check boxes for the items that you want to monitor under Error Predict. settings and then select their Sensitivity settings.
Select an item that you want to monitor, and then select High, Middle, or Low as its sensitivity. As the sensitivity becomes higher, it becomes easier to detect the item. As the sensitivity becomes lower, it becomes harder to detect the item. Initially, set the sensitivity to Middle, and then you can adjust it later as needed after checking the number of items detected.
If you perform out-of-range value detection with multiple monitoring items combined, select Throughput from the Correlated items pull-down menu on the Avg. response row under Error Predict. settings.
Click the Apply button.
If the monitoring items have been set up successfully, a dialog box to that effect is displayed.
When you click the OK button in the dialog box, the settings are applied.
(3) Next task
(4) Setting example
This subsection explains by way of example how to perform evaluation and setup based on given conditions to support predictive error detection in the performance of monitored services and the corrective actions to take.
(a) Defining SLOs from the SLA
- Tasks required for setting up monitoring items in SLM
-
The monitoring staff checks the SLA and evaluates the SLOs for thresholds.
Because the SLA contains requirements, including that achievement of response performance be 95% or higher and availability of service be 99.8% or higher, the monitoring staff defines the SLOs as follows:
Average response time: 3,000 milliseconds
Throughput: 800 count/second
Error rate: 1.0%
The monitoring staff also decides to perform out-of-range value detection in addition to monitoring based on thresholds as SLOs because warning signs of service performance errors must be detected and handled.
- Results of the tasks
-
Because SLOs have been defined, the monitoring staff decides to set up monitoring items for each monitored service.
(b) Setting up monitoring items
- Tasks in SLM
-
The monitoring staff decides to log in to SLM - Manager to display the Settings window and set up monitoring items for the monitored services based on the defined SLOs.
The following shows a setup example of monitoring items for the monitored services based on the SLOs.
Figure 4‒15: Setup example of monitoring items for the monitored services based on the SLOs This example sets up monitoring items for service Service01 of service group Group01. The following shows the settings for the monitoring items.
- SLO monitor settings
-
Table 4‒2: Example settings under SLO monitor settings Check box
Item name
Threshold
Check box
Trend monitoring
Selected
Avg. response
3000
Selected
5
Selected
Throughput
800
Selected
5
Selected
Error rate
1.0
--
--
Legend:
--: Cannot be set
Under SLO monitor settings, the SLO definition items are specified as thresholds, and then trend monitoring is set up for average response time and throughput so as to promptly detect any error in the performance of a monitored service.
A potential service performance error must be detected at least five hours in advance because other personnel must be contacted to take corrective action in the event of a service performance error. For this reason, trend monitoring is set to 5 hours.
- Error Predict. settings
-
Table 4‒3: Example settings under Error Predict. settings Days in baseline calculation
Days till start
Check box
Item name
Sensitivity
Correlated item
20
5
Selected
Avg. response
High
Throughput
Selected
Throughput
High
--
Selected
Error rate
High
--
Legend:
--: Cannot be set
Under Error Predict. settings, 20 days' worth of service performance is to be used to calculate the baseline for performing monitoring based on typical service performance. Days till start is set to 5 because it was requested that monitoring be started five days later.
Out-of-range value detection is to be performed for all monitoring items. The sensitivity is set to high so that any service performance that veers from the baseline will be detected quickly. Out-of-range value detection with multiple monitoring items combined is also to be performed to improve the precision of out-of-range value detection.
- Results of the tasks
-
Once setup has been completed for service Service01 of service group Group01, the monitoring staff proceeds to set up monitoring items for the remaining monitored services in the same manner.