Hitachi

Job Management Partner 1 Version 10 Job Management Partner 1/IT Service Level Management Description, User's Guide, Reference and Operator's Guide


3.1.5 Using availability monitoring for checking the availability of services (working with Performance Management)

Availability monitoring is supported when ITSLM is linked with Performance Management.

This subsection explains availability monitoring.

Organization of this subsection

(1) About availability monitoring

Availability monitoring is a method for checking whether monitored services are running smoothly.

PFM - Agent for Service Response is used for monitoring the availability of monitored services. You can monitor the availability of monitored services even when no users are accessing them.

The following figure shows how availability monitoring works.

Figure 3‒21: How availability monitoring works

[Figure]

You can check the current availability of services in the Home window or the Real-time Monitor window. If a monitored service has stopped, an error is displayed in these windows. The following shows an example in which an error is displayed in a window.

Figure 3‒22: Example in which an error is displayed in a window (availability monitoring)

[Figure]

(2) Availability items that can be output to reports

For the monitored services whose availability is being monitored, you can output availability items to reports. The availability items are metrics used to evaluate availability. Availability monitoring enables you to output the following availability items to reports:

The following table provides details about the availability items that can be output to reports by availability monitoring.

Table 3‒8: Definition of availability items and formulas

No.

Evaluation metric (SLO)

Definition

Formula

1

Service availability

Percentage of the time during the report interval that the service was running

Service availability (%) = A [Figure] (A + B) [Figure] 100

A = Total operational period during the report interval (minutes)

B = Total error period during the report interval (minutes)

2

MTTR (mean time to recovery)

Average time required from the occurrence of an error to recovery from the error during the report interval

Mean time to recovery (minutes) = B/C

B = Total error period during the report interval (minutes)

C = Number of times errors occurred during the report interval

3

MTBF (mean time between failures)

Average time from one error recovery to the occurrence of the next error during the report interval

Mean time between failures (minutes) = A/C

A = Total operational period during the report interval (minutes)

C = Number of times errors occurred during the report interval

Legend:

Report interval: Total length of time subject to reporting that is obtained from the start time and period entered by the user in the Report area of the Report window.

Operational period: Period from the time normal operation of the monitored service was verified to the time a stoppage of the monitored service was detected or monitoring was stopped.

Error period: Period from the time a stoppage of the monitored service was detected to the time normal operation of the monitored service was verified or monitoring was stopped

The following explains for three cases how availability items are calculated by availability monitoring.

(3) Reporting criteria

When a monitored service that is subject to availability monitoring is stopped, an error is reported. If either of the following criteria is satisfied, the monitored service is treated as being stopped:

If monitoring is stopped, the measurement results that have been obtained so far are reset. Therefore, if monitoring stops while the monitored service is stopped and an error occurs in the measurement result obtained after monitoring is restated, the error notification indicates that another monitored service has stopped.

(4) Criteria for determining that performance has returned to normal

If both the following criteria are satisfied, the monitored service is determined to have recovered from the stoppage and returned to normal:

If monitoring is stopped, the measurement results that have been obtained so far are reset. Therefore, if monitoring stops while the monitored service is stopped, recovery is not reported even if the measurement result obtained after monitoring is restarted is normal.

(5) Supplementary information

(6) Related topics