Hitachi

JP1 Version 13 JP1/Integrated Management 3 - Manager Overview and System Design Guide


3.15.1 Performance monitoring function by JP1/IM - Agent

Performance monitoring function consists of Prometheus, Alertmanager, Exporter of add-on program and provides the following two functions:

Performance data and alerts sent to the Integrated manager host can be viewed in integrated operation viewer.

Organization of this subsection

(1) Performance data collection function

Prometheus server is a function that collects performance data from monitored targets. It has two functions:

(a) Scrape function

Prometheus server is a function that acquires the performance data to be monitored via the Exporter.

When the Prometheus server accesses a specific URL of the Exporter, the Exporter retrieves the monitored performance data and returns it to the Prometheus server. This process is called scrape.

A scrape is executed in units of scrape jobs that combine multiple scrapes for the same purpose.

If a discovery configuration file is used for monitoring through UAP monitoring, jobs should be defined. Also, additional settings are required for the scraping definitions of the log metrics feature.

For details on the scraping description of the log metrics feature, see 1.21.2(10) Setting up scraping definitions in the JP1/Integrated Management 3 - Manager Configuration Guide.

Scrapes are defined in units of scrape jobs. JP1/IM - By default, the following scrape job name scrape definition is set according to the type of exporter.

Scrape Job Name

Scrape Definition

jpc_node

Scrape definition for Node exporter

jpc_windows

Scrape definition for Windows exporter

jpc_blackbox_http

Scrape definition for HTTP/HTTPS monitoring in Blackbox exporeter

jpc_blackbox_icmp

Scrape Definition for ICMP Monitoring in Blackbox exporeter

jpc_cloudwatch

Scrape definition for Yet another cloudwatch exporter

jpc_process

Scraping definition for Process exporter

jpc_promitor

Scraping definition for Promitor

jpc_script

Scraping definition for Script exporter

jpc_oracledb

Scraping definition for OracleDB exporter

jpc_node_aix

Scraping definition for Node exporter for AIX

jpc_web_probe

Scraping definition for Web exporter

jpc_vmware

Scraping definition for VMware exporter

jpc_hyperv

Scraping definition for Windows exporter (Hyper-V monitoring)

jpc_sql

Scraping definition for SQL exporter

If you want to scrape user-defined Exporter, you must add a scrape definition for each target exporter.

The metric obtained from Exporter by scraping of Prometheus server is depending on the type of Exporter. For details, see the description of metric definition file in each Exporter in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

In addition, the Prometheus server generates the following metrics when scraping is performed, in addition to the metrics obtained from the exporter.

Metric Name

Description

up

This metric indicates "1" for successful scraping and "0" for failure. It can be used to monitor the operation of the exporter.

Scrape failure may be caused by host stoppage, exporter stop, exporter returning anything other than 200, or communication error.

scrape_duration_seconds

A metric that indicates how long it took to scrape. It is not used in normal operation.

It is used for investigations when the scrape does not finish within the expected time.

scrape_samples_post_metric_relabeling

A metric that indicates the number of samples remaining after the metric is relabeled. It is not used in normal operation.

It is used to check the number of data when building the environment.

scrape_samples_scraped

A metric that indicates the number of samples returned by the exporter scraped. It is not used in normal operation.

It is used to check the number of data when building the environment.

scrape_series_added

A metric that shows the approximate number of newly generated series. It is not used in normal operation.

For details about how to run scrape, see 5.24 API for scrape of Exporter used by JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference. Exporter that you want to scrape must be able to run as described here.

The scrape definition method is shown below.:

  • Scrape definitions are done in units of scrape jobs.

  • The scrape definition is described in the Prometheus configuration file (jpc_prometheus_server.yml).

  • If you are editing a scrape definition, you can download Prometheus configuration file from integrated operation viewer, edit it, and then upload it.

The following are the settings related to scrape definitions supported by JP1/IM - Agent.

Table 3‒35: Settings for scrape definitions supported by JP1/IM - Agent

Setting Item

Description

Scrape Job Name

(required)

Sets the name of the scrape job that Prometheus scrapes. You can specify multiple scrape job names.

The specified scrape job name is set in the metric label as job="scrape job name".

Scrape to

(required)

Set the specific URL of the exporter to be scraped. Only exporters on hosts where JP1/IM - Agent resides can be specified as scrape destinations.

The server to be scraped in the URL is specified by the host name. "localhost" cannot be used.

The total number of scrape destinations specified in all scrape jobs is limited to 100.

Scrape parameters

(optional)

You can set parameters to pass to the Exporter when scraping.

Depending on the type of exporter, the contents that can be set differ.

Scrape interval

(optional)

You can set the scrape interval.

You can set a scrape interval that is common to all scrape jobs and a scrape interval for each scrape job. If both are set, the scrape interval for each scrape job takes precedence.

You can specify the following units: years, weeks, days, hours, minutes, seconds, or milliseconds.

Scrape timeout

(optional)

You can set a timeout period when scraping takes a long time.

You can set a timeout period that is common to all scrape jobs and a timeout period for each scrape job. If both are set, the scrape interval for each scrape job takes precedence.

Relabeling

(optional)

You can delete unnecessary metrics and customize labels.

By using this feature and setting unnecessary metrics that are not supported by default, you can reduce the amount of data sent to JP1/IM - Manager.

The outcome of scrape by Exporter subject to scrape of Prometheus server is returned in Text-based format data format of Prometheus. Here is a Text-based format of Prometheus:

Text-based format basics

Item

Description

Start time

2014 Apr

Supported Versions

Prometheus Version 0.4.0 or Later

Transmission format

HTTP

Character code

UTF-8

Line feed code is \n

Content-Type

Text/plain; version=0.0.4

If there is no version value, it is treated as the latest text format version.

Content-Encoding

gzip

Advantages

  • Human readable

  • Easy to assemble, especially for minimal cases (no need for nesting).

  • Read on a line-by-line basis (except for hints and docstring).

Constraints

  • Redundancy

  • Since the type and docstring are not part of the syntax, there is little validation of the metric contract.

  • Cost of parsing

Supported Metrics

  • Counter

  • Gauge

  • Histogram

  • Summary

  • Untyped

More information about Text-based format

Text-based format of Prometheus is row-oriented.

Separate lines with a newline character. The line feed code is \n. \ r\n is considered invalid.

The last line must be a newline character.

Also, blank lines are ignored.

Row Format

Within a line, tokens can be separated by any number of blanks or tabs. However, when joining with the previous token, it must be separated by at least one space.

In addition, leading and trailing white spaces are ignored.

Comments, help text, and information

Lines that have # as a character other than the first white space are comments.

This line is ignored unless the first token after # is a HELP or TYPE.

These lines are treated as follows:

If the token is a HELP, at least one more token (metric name) is expected. All remaining tokens are considered to be docstring of that metric name.

HELP line can contain any UTF-8 string after metric name. However, you must escape the backslash as \ and the newline character as \n. For any metric name, there can be only one HELP row.

If the token is a TYPE, two or more tokens are expected. The first is metric name. The second, either counter, gauge, histogram, summary, or untyped, defines the type of metric. There can be only one TYPE row for a given metric. Metric name of TYPE line must appear in front of the first sample.

If no TYPE row exists for metric name, the type is set to untyped.

Write a sample (one per line) using the following EBNF:

 metric_name [
    "{" label_name "=" `"` label_value `"` { "," label_name "=" `"` label_value `"` } [ "," ] "}"
 ] value [ timestamp ]
Sample Syntax
  • Metric_name and label_name are subject to the limitations of the normal Prometheus expression language.

  • The label_value is any UTF-8 string. However, backslash (\), double quote ("), and line feed must be escaped as \\, \" and \n, respectively.

  • Value is a floating-point number required by ParseFloat() function of Go language. In addition to the typical numbers, NaN, +Inf, -Inf is also a valid number. Indicates that NaN is not a number. The + Inf is positive infinity. -Inf is negative infinity.

  • The timestamp is a int64 (milliseconds from the epoch, 1970-01-01 00:00:00 UTC, excluding leap seconds), and is optionally represented by ParseInt() function of Go.

Grouping and Sorting

All rows granted with metric must be provided as a single grouping, and the optional HELP and TYPE rows must come first (in any order).

It is also recommended, but not required, to perform repeatable sorting with a repeating description.

Each line must have a unique pair of metric names / labels. If it is not a unique combination, the capture behavior is undefined.

Histograms and Summaries

Because histograms and summary types are difficult to express in text format, the following rules apply:

  • Sample sum x for the summary or histogram appears as another sample called x_sum.

  • Sample counts named x for a summary or histogram appear as another sample called x_count.

  • Each quantile in the summary named x appears as another sample line with the same name x and labeled {quantile="y"}.

  • Each bucket count in the histogram named x appears as another sample line named x_bucket and labeled {le="y"} ( y is the bucket limit).

  • The histogram must have a bucket of {le="+Inf"}. Its value must be the same as the value of x_count.

  • For le or quantile labels, the histogram bucket and summary quantiles must appear in ascending order of the values for the labels.

Sample Text-based format

Here is a sample Prometheus metric exposition that contains comments, HELP and TYPE representations, histograms, summaries, and character escaping.

# HELP http_requests_total The total number of HTTP requests.
# TYPE http_requests_total counter
http_requests_total{method="post",code="200"} 1027 1395066363000
http_requests_total{method="post",code="400"}    3 1395066363000
 
# Escaping in label values:
msdos_file_access_time_seconds{path="C:\\DIR\\FILE.TXT",error="Cannot find file:\n\"FILE.TXT\""} 1.458255915e9
 
# Minimalistic line:
metric_without_timestamp_and_labels 12.47
 
# A weird metric from before the epoch:
something_weird{problem="division by zero"} +Inf -3982045
 
# A histogram, which has a pretty complex representation in the text format:
# HELP http_request_duration_seconds A histogram of the request duration.
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.05"} 24054
http_request_duration_seconds_bucket{le="0.1"} 33444
http_request_duration_seconds_bucket{le="0.2"} 100392
http_request_duration_seconds_bucket{le="0.5"} 129389
http_request_duration_seconds_bucket{le="1"} 133988
http_request_duration_seconds_bucket{le="+Inf"} 144320
http_request_duration_seconds_sum 53423
http_request_duration_seconds_count 144320
 
# Finally a summary, which has a complex representation, too:
# HELP rpc_duration_seconds A summary of the RPC duration in seconds.
# TYPE rpc_duration_seconds summary
rpc_duration_seconds{quantile="0.01"} 3102
rpc_duration_seconds{quantile="0.05"} 3272
rpc_duration_seconds{quantile="0.5"} 4773
rpc_duration_seconds{quantile="0.9"} 9001
rpc_duration_seconds{quantile="0.99"} 76656
rpc_duration_seconds_sum 1.7560473e+07
rpc_duration_seconds_count 2693

(b) Ability to obtain monitored operational information

This function acquires operation information (performance data) from the monitoring target. The process of collecting operational information is performed by a program called "Exporter".

In response to scrape requests sent from the Prometheus server to the Exporter, the Exporter collects operational information from the monitored target and returns the results to Prometheus.

Exporters shipped with JP1/IM - Agent scrape only from Prometheus in JP1/IM - Agent that cohabits. Do not scrape from Prometheus provided by other hosts or users.

This section describes the functions of each exporter included with JP1/IM - Agent.

(c) Windows exporter (Windows performance data collection capability)

Windows exporter is an exporter that can be embedded in the monitored Windows host and obtain the operating information of the Windows host.

Windows exporter is installed on the same host as the Prometheus server, and upon a scrape request from the Prometheus server, it collects operational information from the Windows OS of the host and returns it to the Prometheus server.

It is possible to collect operational information related to memory and disk, which cannot be collected by monitoring from outside the host (external monitoring by URL or CloudWatch), from inside the host.

In addition, with JP1/IM - Manager and JP1/IM - Agent version 13-01 or later, you can monitor the operational status of integrated agent host (Windows) services (programs registered in Windows services) (service monitoring function#).

Note that you cannot use the service monitoring function by running JP1/IM - Agent inside the containers.

#

If you use the service monitoring function in an environment where the version is upgraded from 13-00 to 13-01 or later, you need to configure the settings to perform service monitoring. The following are JP1/IM - Manger and JP1/IM - Agent setup instructions:

Where to find instructions for setting up JP1/IM - Manager

See Editing category name definition file for IM management nodes (imdd_category_name.conf) (optional) in 1.19.3(1)(d) Settings of product plugin (for Windows) in the JP1/Integrated Management 3 - Manager Configuration Guide.

Where to find instructions for setting up JP1/IM - Agent

See the instructions for configuring service monitoring in 1.21.2(3)(f) Configuring service monitoring (for Windows) (optional) and 1.21.2(5)(b) Modify metric to Collect (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.

This feature creates an IM management node for each service that you want to monitor. For details on displaying the tree, see 3.15.6(1)(i) Tree Format. If you configure an alert, a JP1 event is issued when the service is stopped and registered with IM management node corresponding to the stopped service. You can check the operational status of the past service from the service trend display.

■ Main items to be acquired

The main retrieval items of Windows exporter are defined in Windows exporter metric definition file (default) and Windows exporter (service monitoring) metric definition file (default). For details, see Windows exporter metric definition file (metrics_windows_exporter.conf) in Chapter 2. Definition Files and Windows exporter (service monitoring) metric definition file (metrics_windows_exporter_service.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can add retrieved items to the metric definition file. The following are the metrics that can be specified in the PromQL statement described in the definition file. For details of "Collector" in the table, refer to the description of "Collector" at the bottom of the table.

Metric Name

Collector

What to Get

Label

windows_cache_copy_read_hits_total

cache

Number of copy read requests that hit the cache (cumulative)

instance: instance-identification-string

job: job-name

windows_cache_copy_reads_total

cache

Number of reads from the file system cache page (cumulative)

instance: instance-identification-string

job: job-name

windows_cpu_time_total

cpu

Number of seconds of processor time spent per mode (cumulative)

instance: instance-identification-string

job: job-name

core: coreid

mode: mode#

#

Contains one of the following:

  • "dpc"

  • "idle"

  • "interrupt"

  • "privileged"

  • "user"

windows_cs_physical_memory_bytes

cs

Number of bytes of the physical memory capacity

instance: instance-identification-string

job: job-name

windows_logical_disk_idle_seconds_total

logical_disk

Number of seconds that the disk was idle (cumulative)

instance: instance-identification-string

job: job-name

volume: volume-name

windows_logical_disk_free_bytes

logical_disk

Number of bytes of unused disk space

instance: instance-identification-string

job: job-name

volume: volume-name

windows_logical_disk_read_bytes_total

logical_disk

Number of bytes transferred from disk during the read operation (cumulative)

instance: instance-identification-string

job: job-name

volume: volume-name

windows_logical_disk_read_seconds_total

logical_disk

Number of seconds that the disk was busy for read operations (cumulative)

instance: instance-identification-string

job: job-name

volume: volume-name

windows_logical_disk_reads_total

logical_disk

Number of read operations to disk (cumulative)

instance: instance-identification-string

job: job-name

volume: volume-name

windows_logical_disk_requests_queued

logical_disk

Number of requests queued on disk

instance: instance-identification-string

job: job-name

volume: volume-name

windows_logical_disk_size_bytes

logical_disk

Disk space bytes

instance: instance-identification-string

job: job-name

volume: volume-name

windows_logical_disk_write_bytes_total

logical_disk

Number of bytes transferred to disk during the write operation (cumulative)

instance: instance-identification-string

job: job-name

volume: volume-name

windows_logical_disk_write_seconds_total

logical_disk

Number of seconds that the disk was busy for write operations (cumulative)

instance: instance-identification-string

job: job-name

volume: volume-name

windows_logical_disk_writes_total

logical_disk

Number of disk write operations (cumulative)

instance: instance-identification-string

job: job-name

volume: volume-name

windows_memory_available_bytes

memory

Number of bytes of unused space in physical memory

Note:

The total of zero, free, and standby (cached) areas allocated to a process or immediately available to the system.

instance: instance-identification-string

job: job-name

windows_memory_cache_bytes

memory

Number of bytes of physical memory used for file system caching

instance: instance-identification-string

job: job-name

windows_memory_cache_faults_total

memory

Number of page faults in the file system cache (cumulative)

instance: instance-identification-string

job: job-name

windows_memory_page_faults_total

memory

Number of times a page fault occurred (cumulative)

instance: instance-identification-string

job: job-name

windows_memory_pool_nonpaged_allocs_total

memory

Number of times a nonpageable physical memory region was allocated

instance: instance-identification-string

job: job-name

windows_memory_pool_paged_allocs_total

memory

Number of times you allocated a pageable physical memory region

instance: instance-identification-string

job: job-name

windows_memory_swap_page_operations_total

memory

Number of pages read from or written to disk to resolve hard page faults (cumulative)

instance: instance-identification-string

job: job-name

windows_memory_swap_pages_read_total

memory

Number of pages read from disk to resolve hard page faults (cumulative)

instance: instance-identification-string

job: job-name

windows_memory_swap_pages_written_total

memory

Number of pages written to disk to resolve hard page faults (cumulative)

instance: instance-identification-string

job: job-name

windows_memory_system_cache_resident_bytes

memory

Number of active system file cache bytes in physical memory

instance: instance-identification-string

job: job-name

windows_memory_transition_faults_total

memory

The number of page faults resolved by recovering pages that were in use by other processes sharing the page, pages that were on the modified pages list or standby list, or pages that were written to disk (cumulative)

instance: instance-identification-string

job: job-name

windows_net_bytes_received_total

net

Number of bytes received by the interface (cumulative)

Note:

If the NIC name contains characters other than half-width alphanumeric characters, these characters are converted to underscores and set in the NIC label.

instance: instance-identification-string

job: job-name

device: network-device-name

windows_net_bytes_sent_total

net

Number of bytes sent from the interface (cumulative)

Note:

If the NIC name contains characters other than half-width alphanumeric characters, these characters are converted to underscores and set in the NIC label.

instance: instance-identification-string

job: job-name

device: network-device-name

windows_net_bytes_total

net

Number of bytes received and transmitted by the interface (cumulative)

Note:

If the NIC name contains characters other than half-width alphanumeric characters, these characters are converted to underscores and set in the NIC label.

instance: instance-identification-string

job: job-name

device: network-device-name

windows_net_packets_sent_total

net

Number of packets sent by the interface (cumulative)

Note:

If the NIC name contains characters other than half-width alphanumeric characters, these characters are converted to underscores and set in the NIC label.

instance: instance-identification-string

job: job-name

device: network-device-name

windows_net_packets_received_total

net

Number of packets received by the interface (cumulative)

Note:

If the NIC name contains characters other than half-width alphanumeric characters, these characters are converted to underscores and set in the NIC label.

instance: instance-identification-string

job: job-name

device: network-device-name

windows_system_context_switches_total

system

Number of context switches (cumulative)

instance: instance-identification-string

job: job-name

device: network-device-name

windows_system_processor_queue_length

system

Number of threads in the processor queue

instance: instance-identification-string

job: job-name

device: network-device-name

windows_system_system_calls_total

system

Number of times the process called the OS service routine (cumulative)

instance: instance-identification-string

job: job-name

windows_process_start_time

process

Time of process start

instance: instance-identifier-string

job: job-name

process: process-name#

process_id: process-ID

creating_process_id: creator-process-ID

windows_process_cpu_time_total

process

Returns elapsed time that all of the threads of this process used the processor to execute instructions by mode (privileged, user). An instruction is the basic unit of execution in a computer, a thread is the object that executes instructions, and a process is the object created when a program is run. Code executed to handle some hardware interrupts and trap conditions is included in this count.

instance: instance-identifier-string

job: job-name

process: process-name#

process_id: process-ID

creating_process_id: creator-process-ID

mode: mode (privileged or user)

windows_process_io_bytes_total

process

Bytes issued to I/O operations in different modes (read, write, other). This property counts all I/O activity generated by the process to include file, network, and device I/Os. Read and write mode includes data operations; other mode includes those that do not involve data, such as control operations.

instance: instance-identifier-string

job: job-name

process: process-name#

process_id: process-ID

creating_process_id: creator-process-ID

mode: mode (read, write, or other)

windows_process_io_operations_total

process

I/O operations issued in different modes (read, write, other). This property counts all I/O activity generated by the process to include file, network, and device I/Os. Read and write mode includes data operations; other mode includes those that do not involve data, such as control operations.

instance: instance-identifier-string

job: job-name

process: process-name#

process_id: process-ID

creating_process_id: creator-process-ID

mode: mode (read, write, or other)

windows_process_page_faults_total

process

Page faults by the threads executing in this process. A page fault occurs when a thread refers to a virtual memory page that is not in its working set in main memory. This can cause the page not to be fetched from disk if it is on the standby list and hence already in main memory, or if it is in use by another process with which the page is shared.

instance: instance-identifier-string

job: job-name

process: process-name#

process_id: process-ID

creating_process_id: creator-process-ID

windows_process_page_file_bytes

process

Current number of bytes this process has used in the paging file(s). Paging files are used to store pages of memory used by the process that are not contained in other files. Paging files are shared by all processes, and lack of space in paging files can prevent other processes from allocating memory.

instance: instance-identifier-string

job: job-name

process: process-name#

process_id: process-ID

creating_process_id: creator-process-ID

windows_process_pool_bytes

process

Pool Bytes is the last observed number of bytes in the paged or nonpaged pool. The nonpaged pool is an area of system memory (physical memory used by the operating system) for objects that cannot be written to disk, but must remain in physical memory as long as they are allocated. The paged pool is an area of system memory (physical memory used by the operating system) for objects that can be written to disk when they are not being used. Nonpaged pool bytes is calculated differently than paged pool bytes, so it might not equal the total of paged pool bytes.

instance: instance-identifier-string

job: job-name

process: process-name#

process_id: process-ID

creating_process_id: creator-process-ID

pool: paged (pool paged) or nonpaged (pool non paged)

windows_process_priority_base

process

Current base priority of this process. Threads within a process can raise and lower their own base priority relative to the process base priority of the process.

instance: instance-identifier-string

job: job-name

process: process-name#

process_id: process-ID

creating_process_id: creator-process-ID

windows_process_private_bytes

process

Current number of bytes this process has allocated that cannot be shared with other processes.

instance: instance-identifier-string

job: job-name

process: process-name#

process_id: process-ID

creating_process_id: creator-process-ID

windows_process_virtual_bytes

process

Current size, in bytes, of the virtual address space that the process is using. Use of virtual address space does not necessarily imply corresponding use of either disk or main memory pages. Virtual space is finite and, by using too much, the process can limit its ability to load libraries.

instance: instance-identifier-string

job: job-name

process: process-name#

process_id: process-ID

creating_process_id: creator-process-ID

windows_service_state

service

The state of the service (State)

instance: instance-identifier-string

job: job-name

name: service-name#1

state: service-status#2

#1

Uppercase letters are converted to lowercase.

#2

Contains one of the following:

  • continue pending (pending continuation)

  • pause pending (suspended)

  • paused (paused)

  • running (running)

  • start pending (pending startup)

  • stop pending (suspended)

  • stopped (stopped)

  • unknown (unknown)

#

The process-name is set, but ".exe" is omitted.

■ Collector

Windows exporter has a built-in collection process called a "collector" for each monitored resource such as CPU and memory.

If you want to add the metrics listed in the table above as acquisition fields, you must enable the collector corresponding to the metric you want to use. You can also disable collectors of metrics that you do not want to collect to suppress unnecessary collection.

Enable/disable for each collector can be specified with the "--collectors.enabled" option on the Windows exporter command line or in the item "collectors.enabled" in the Windows exporter configuration file (jpc_windows_exporter.yml).

For details about Windows exporter command-line options, see the description of windows_exporter command options in Service definition file (jpc_program-name.service.xml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

For details about Windows exporter configuration file entry "collectors.enabled", see the description of item collectors in Windows exporter configuration file (jpc_windows_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Specifying Monitored Services

When using the service monitoring function of Windows exporter, the service to be monitored is specified in the "services-where" field of Windows exporter configuration file (jpc_windows_exporter.yml).

For details about Windows exporter configuration file entry "services-where", see the entry "services-where" in Windows exporter configuration file (jpc_windows_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

The value of name label of the metric output by service collectors of Windows exporter is set to the service name. If half-width uppercase characters are included in the service name of the monitoring target, they are converted to half-width lowercase characters and set. When full-pitch uppercase characters are included, they are converted to full-pitch lowercase characters and set.

- About Monitoring JP1/IM - Agent Services

For the service name of JP1/IM - Agent service, see 10.1 Service of JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Administration Guide. For details about the service name in a logical host environment, see 7.3.6 Newly installing JP1/IM - Agent with integrated agent host (for Windows) in the JP1/Integrated Management 3 - Manager Configuration Guide.

Note that you cannot use the service monitoring function to monitor Prometheus server and Windows exporter services.

(d) Node exporter (Linux performance data collection capability)

Node exporter is an exporter that can be embedded in a monitored Linux host to obtain operating information of a Linux host.

The Node exporter is installed on the same host as the Prometheus server, and upon a scrape request from the Prometheus server, it collects operational information from the Linux OS of the host and returns it to the Prometheus server.

It is possible to collect operational information related to memory and disk, which cannot be collected by monitoring from outside the host (external monitoring by URL or CloudWatch), from inside the host.

In addition, with JP1/IM - Manager and JP1/IM - Agent version 13-01 or later, you can monitor the operational status of integrated agent host (Linux) service (program registered in Systemd) (service monitoring function#).

Note that you cannot use the service monitoring function by running JP1/IM - Agent inside the containers.

#

If you use the service monitoring function in an environment where the version is upgraded from 13-00 to 13-01 or later, you need to configure the settings to perform service monitoring.

The following are JP1/IM - Manger and JP1/IM - Agent setup instructions:

Where to find instructions for setting up JP1/IM - Manager

See Editing category name definition file for IM management nodes (imdd_category_name.conf) (optional) in 1.19.3(1)(d) Settings of product plugin (for Windows) in the JP1/Integrated Management 3 - Manager Configuration Guide.

Where to find instructions for setting up JP1/IM - Agent

See the instructions for configuring service monitoring in 2.19.2(3)(f) Configuring service monitor settings (for Linux) (optional) and 2.19.2(5)(b) Change metric to collect (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.

This feature creates an IM management node for each service that you want to monitor. For details on displaying the tree, see 3.15.6(1)(i) Tree Format. If you configure an alert, a JP1 event is issued when the service is stopped and registered with IM management node corresponding to the stopped service. You can check the operational status of the past service from the service trend display.

■ Main items to be acquired

The main retrieval items of Node exporter are defined in Node exporter metric definition file (default) and Node exporter (service monitoring) metric definition file (default). For details, see Node exporter metric definition file (metrics_node_exporter.conf) and Node exporter (service monitoring) metric definition file (metrics_windows_exporter_service.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can add retrieved items to the metric definition file. The following are the metrics that can be specified in the PromQL statement described in the definition file. For details of "Collector" in the table, refer to the description of "Collector" at the bottom of the table.

Metric Name

Collector

What to Get

Label

node_boot_time_seconds

stat

Last boot time

Note:

Shown in UNIX time, including microseconds.

instance: instance-identification-string

job: job-name

node_context_switches_total

stat

Number of times a context switch has been made (cumulative)

instance: instance-identification-string

job: job-name

node_cpu_seconds_total

cpu

CPU seconds spent in each mode (cumulative)

instance: instance-identification-string

job: job-name

cpu: cpuid

mode: mode#

#

Contains one of the following:

  • user

  • nice

  • system

  • idle

  • iowait

  • irq

  • soft

  • steal

node_disk_io_now

diskstats

Number of disk I/Os currently in progress

instance: instance-identification-string

job: job-name

device: device-name

node_disk_io_time_seconds_total

diskstats

Seconds spent on disk I/O (cumulative)

instance: instance-identification-string

job: job-name

device: device-name

node_disk_read_bytes_total

diskstats

Number of bytes successfully read from disk (cumulative)

instance: instance-identification-string

job: job-name

device: device-name

node_disk_read_time_seconds_total

diskstats

Seconds took to read from disk (cumulative value)

instance: instance-identification-string

job: job-name

device: device-name

node_disk_reads_completed_total

diskstats

Number of successfully completed reads from disk (cumulative)

instance: instance-identification-string

job: job-name

device: device-name

node_disk_write_time_seconds_total

diskstats

Seconds took to write to disk (cumulative value)

instance: instance-identification-string

job: job-name

device: device-name

node_disk_writes_completed_total

diskstats

Number of successfully completed disk writes (cumulative)

instance: instance-identification-string

job: job-name

device: device-name

node_disk_written_bytes_total

diskstats

Number of bytes successfully written to disk (cumulative)

instance: instance-identification-string

job: job-name

device: device-name

node_filesystem_avail_bytes

filesystem

Number of file system bytes available to non-root users

instance: instance-identification-string

job: job-name

fstype: file-system-type

mountpoint: mount-point

node_filesystem_files

filesystem

Number of file nodes in the file system

instance: instance-identification-string

job: job-name

fstype: file-system-type

mountpoint: mount-point

node_filesystem_files_free

filesystem

Number of free file nodes in the file system

instance: instance-identification-string

job: job-name

fstype: file-system-type

mountpoint: mount-point

node_filesystem_free_bytes

filesystem

Number of bytes of free file system space

instance: instance-identification-string

job: job-name

fstype: file-system-type

mountpoint: mount-point

node_filesystem_size_bytes

filesystem

Number of bytes in file system capacity

instance: instance-identification-string

job: job-name

fstype: file-system-type

mountpoint: mount-point

node_intr_total

stat

Number of interrupts handled (cumulative)

instance: instance-identification-string

job: job-name

node_load1

loadavg

One-minute average of the number of jobs in the run queue

instance: instance-identification-string

job: job-name

node_load15

loadavg

15-minute average of the number of jobs in the run queue

instance: instance-identification-string

job: job-name

node_load5

loadavg

5-minute average of the number of jobs in the run queue

instance: instance-identification-string

job: job-name

node_memory_Active_file_bytes

meminfo

Bytes of recently used file cache memory

Note:

The value obtained by converting the Active(file) of /proc/meminfo to bytes.

instance: instance-identification-string

job: job-name

node_memory_Buffers_bytes

meminfo

Number of bytes in the file buffer

Note:

The value of Buffers converted to bytes in /proc/meminfo.

instance: instance-identification-string

job: job-name

node_memory_Cached_bytes

meminfo

Number of bytes in file read cache memory

Note:

This is the value of Cached converted to bytes in /proc/meminfo.

instance: instance-identification-string

job: job-name

node_memory_Inactive_file_bytes

meminfo

Number of bytes of file cache memory that have not been used recently

Note:

The value of the Inactive(file) of /proc/meminfo converted to bytes.

instance: instance-identification-string

job: job-name

node_memory_MemAvailable_bytes

meminfo

The number of bytes of memory available to start a new application without swapping

Note:

The value of MemAvailable in /proc/meminfo converted to bytes.

instance: instance-identification-string

job: job-name

node_memory_MemFree_bytes

meminfo

Number of bytes of free memory

Note:

The value of MemFree in /proc/meminfo converted to bytes.

instance: instance-identification-string

job: job-name

node_memory_MemTotal_bytes

meminfo

Total amount of bytes of memory

Note:

The value of MemTotal converted to bytes in /proc/meminfo.

instance: instance-identification-string

job: job-name

node_memory_SReclaimable_bytes

meminfo

Number of bytes in the Slab cache that can be reclaimed

Note:

SReclaimable in /proc/meminfo converted to bytes.

instance: instance-identification-string

job: job-name

node_memory_SwapFree_bytes

meminfo

Number of bytes of free swap memory space

Note:

The value of SwapFree in /proc/meminfo converted to bytes.

instance: instance-identification-string

job: job-name

node_memory_SwapTotal_bytes

meminfo

Bytes of total swap memory

Note:

This is the value of SwapTotal converted to bytes in /proc/meminfo.

instance: instance-identification-string

job: job-name

node_netstat_Icmp6_InMsgs

netstat

Number of ICMPv6 messages received (cumulative)

instance: instance-identification-string

job: job-name

node_netstat_Icmp_InMsgs

netstat

Number of ICMPv4 messages received (cumulative)

instance: instance-identification-string

job: job-name

node_netstat_Icmp6_OutMsgs

netstat

Number of ICMPv6 messages sent (cumulative)

instance: instance-identification-string

job: job-name

node_netstat_Icmp_OutMsgs

netstat

Number of ICMPv4 messages sent (cumulative)

instance: instance-identification-string

job: job-name

node_netstat_Tcp_InSegs

netstat

Number of TCP packets received (cumulative)

instance: instance-identification-string

job: job-name

node_netstat_Tcp_OutSegs

netstat

Number of TCP packets sent (cumulative)

instance: instance-identification-string

job: job-name

node_netstat_Udp_InDatagrams

netstat

Number of UDP packets received (cumulative)

instance: instance-identification-string

job: job-name

node_netstat_Udp_OutDatagrams

netstat

Number of UDP packets sent (cumulative)

instance: instance-identification-string

job: job-name

node_network_flags

netclass

A numeric value indicating the state of the interface

Note:

/sys/class/net/[iface]/flags is a decimal value.

instance: instance-identification-string

job: job-name

device: network-device-name

node_network_iface_link

netclass

Interface serial number

Note:

The value of /sys/class/net/[iface]/iflink.

instance: instance-identification-string

job: job-name

device: network-device-name

node_network_mtu_bytes

netclass

Interface MTU value

Note:

The value of /sys/class/net/[iface]/mtu.

instance: instance-identification-string

job: job-name

device: network-device-name

node_network_receive_bytes_total

netdev

Number of bytes received by the network device (cumulative value)

instance: instance-identification-string

job: job-name

device: network-device-name

node_network_receive_errs_total

netdev

Number of network device receive errors (cumulative)

instance: instance-identification-string

job: job-name

device: network-device-name

node_network_receive_packets_total

netdev

Number of packets received by network devices (cumulative)

instance: instance-identification-string

job: job-name

device: network-device-name

node_network_transmit_bytes_total

netdev

Number of bytes sent by the network device (cumulative value)

instance: instance-identification-string

job: job-name

device: network-device-name

node_network_transmit_colls_total

netdev

Number of transmit collisions for network devices (cumulative)

instance: instance-identification-string

job: job-name

device: network-device-name

node_network_transmit_errs_total

netdev

Number of transmission errors for network devices (cumulative)

instance: instance-identification-string

job: job-name

device: network-device-name

node_network_transmit_packets_total

netdev

Number of packets sent by network devices (cumulative)

instance: instance-identification-string

job: job-name

device: network-device-name

node_time_seconds

time

Seconds of system time since the epoch (1970)

instance: instance-identification-string

job: job-name

node_uname_info

uname

System information obtained by the uname system call

instance: instance-identification-string

job: job-name

domainname: NIS-and-YP-domain-names

machine: hardware-identifiers

nodename: machine-name-in-some-network-defined-at-implementation-time

release: operating-system-release-number (e.g. "2.6.28")

sysname: the-name-of-the-OS (e.g. "Linux")

version: operating-system-version

node_vmstat_pswpin

vmstat

Number of page swap-ins (cumulative)

Note:

The value of the pswpin in /proc/vmstat.

instance: instance-identification-string

job: job-name

node_vmstat_pswpout

vmstat

Number of page swap-outs (cumulative)

Note:

The value of pswpout in /proc/vmstat.

instance: instance-identification-string

job: job-name

node_systemd_unit_state

systemd

The state of the systemd unit.

instance: instance-identification-string

job: job-name

name: unit-file-name

state: service-status#1

type: how-to-launch-a-process#2

#1

Contains one of the following:

  • activating (during startup)

  • active (running)

  • deactivating (stopped)

  • failed (failed to execute)

  • inactive (stopped)

#2

Contains the Type value of the unit file.

■ Collector

The Node exporter has a built-in collection process called a "collector" for each monitored resource such as CPU and memory.

If you want to add the metrics listed in the table above as acquisition fields, you must enable the collector corresponding to the metric you want to use. You can also disable collectors of metrics that you do not want to collect to suppress unnecessary collection.

Per-collector enable/disable can be specified in the Node exporter command line options. Specify the collector to enable with the "--collector.collector-name" option and the collector to disable with the "--no-collector.collector-name" option.

For details about Node exporter command-line options, see the description of node_exporter command options in Unit definition file (jpc_program-name.service) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Specifying monitored services

When using the service monitoring function of Node exporter, the service to be monitored is specified in the "--collector.systemd.unit-include" field of Node exporter unit definition file (jpc_node_exporter.service). Collects performance data for the service specified in this file that meets one of the following conditions:

  • Automatic start of monitored services is enabled (running systemctl enable)

  • Automatic startup of monitored services is disabled, but the status is active

Performance data for services with auto-start disabled is not collected while the service is stopped. Therefore, if you want to monitor a service that has auto-start disabled and is stopped, start the service that you want to monitor and collect performance data prior to creating IM management node tree.

For unit definition file, see the description in item "--collector.systemd.unit-include" in "node_exporter command options" in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

- About monitoring JP1/IM - Agent services

For unit definition file name of JP1/IM - Agent services, see 10.1 Service of JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Administration Guide. For unit definition file name in a logical host environment, see 8.3.6 Newly installing JP1/IM - Agent with integrated agent host (for UNIX) in the JP1/Integrated Management 3 - Manager Configuration Guide.

Note that you cannot use the service monitoring function to monitor Prometheus server and Node exporter services.

(e) Process exporter (Linux process data collection capability)

Process exporter, built into a monitored Linux host, collects operating information of processes running on that host.

Installed in the same host as Prometheus server, Process exporter collects operating information of the processes from the Linux OS on the host when triggered by scraping requests from Prometheus server, and returns it to the server.

Process exporter allows you to collect process-related operating information, which cannot be obtained through monitoring from outside the host (such as synthetic monitoring with URLs or CloudWatch), from within the host.

■ Key metric items

The key Process exporter metric items are defined in the Process exporter metric definition file (initial status). For details, see Process exporter metric definition file (metrics_process_exporter.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can add more metric items to the metric definition file. The following table shows the metrics you can specify with PromQL statements used within the definition file.

Metric name

Data to be obtained

Label

namedprocess_namegroup_num_procs

Number of processes in this group.

instance: instance-identifier-string

job: job-name

groupname: group-name#

namedprocess_namegroup_cpu_seconds_total

CPU usage based on /proc/[pid]/stat fields utime(14) and stime(15) i.e. user and system time.

instance: instance-identifier-string

job: job-name

groupname: group-name#

mode: user or system

namedprocess_namegroup_read_bytes_total

Bytes read based on /proc/[pid]/io field read_bytes. As /proc/[pid]/io are set by the kernel as read only to the process' user, to get these values you should run process-exporter either as that user or as root. Otherwise, we can't read these values and you'll get a constant 0 in the metric.

instance: instance-identifier-string

job: job-name

groupname: group-name#

namedprocess_namegroup_write_bytes_total

Bytes written based on /proc/[pid]/io field write_bytes.

instance: instance-identifier-string

job: job-name

groupname: group-name#

namedprocess_namegroup_major_page_faults_total

Number of major page faults based on /proc/[pid]/stat field majflt(12).

instance: instance-identifier-string

job: job-name

groupname: group-name#

namedprocess_namegroup_minor_page_faults_total

Number of minor page faults based on /proc/[pid]/stat field minflt(10).

instance: instance-identifier-string

job: job-name

groupname: group-name#

namedprocess_namegroup_context_switches_total

Number of context switches based on /proc/[pid]/status fields voluntary_ctxt_switches and nonvoluntary_ctxt_switches. The extra label ctxswitchtype can have two values: voluntary and nonvoluntary.

instance: instance-identifier-string

job: job-name

groupname: group-name#

ctxswitchtype: voluntary or nonvoluntary

namedprocess_namegroup_memory_bytes

Number of bytes of memory used. The extra label memtype can have three values:

  • resident: Field rss(24) from /proc/[pid]/stat. This is just the pages which count toward text, data, or stack space. This does not include pages which have not been demand-loaded in, or which are swapped out.

  • virtual: Field vsize(23) from /proc/[pid]/stat, virtual memory size.

  • swapped: Field VmSwap from /proc/[pid]/status, translated from KB to bytes.

If gathering smaps file is enabled, two additional values for memtype are added:

  • proportionalResident: Sum of Pss fields from /proc/[pid]/smaps

proportionalSwapped: Sum of SwapPss fields from /proc/[pid]/smaps

instance: instance-identifier-string

job: job-name

groupname: group-name#

memtype: resident, virtual, swapped, proportionalResident, or proportionalSwapped

namedprocess_namegroup_open_filedesc

Number of file descriptors, based on counting how many entries are in the directory /proc/[pid]/fd.

instance: instance-identifier-string

job: job-name

groupname: group-name#

namedprocess_namegroup_worst_fd_ratio

Worst ratio of open filedescs to filedesc limit, amongst all the procs in the group. The limit is the fd soft limit based on /proc/[pid]/limits.

instance: instance-identifier-string

job: job-name

groupname: group-name#

namedprocess_namegroup_oldest_start_time_seconds

Epoch time (seconds since 1970/1/1) at which the oldest process in the group started. This is derived from field starttime(22) from /proc/[pid]/stat, added to boot time to make it relative to epoch.

instance: instance-identifier-string

job: job-name

groupname: group-name#

namedprocess_namegroup_num_threads

Sum of number of threads of all process in the group. Based on field num_threads(20) from /proc/[pid]/stat.

instance: instance-identifier-string

job: job-name

groupname: group-name#

namedprocess_namegroup_states

Number of threads in the group in each of various states, based on the field state(3) from /proc/[pid]/stat.

The extra label state can have these values: Running, Sleeping, Waiting, Zombie, Other.

instance: instance-identifier-string

job: job-name

groupname: group-name#

state: Running, Sleeping, Waiting, Zombie, or Other

namedprocess_namegroup_thread_count

Number of threads in this thread subgroup.

instance: instance-identifier-string

job: job-name

groupname: group-name#

threadname: thread-name

namedprocess_namegroup_thread_cpu_seconds_total

Same as cpu_user_seconds_total and cpu_system_seconds_total, but broken down per-thread subgroup.

instance: instance-identifier-string

job: job-name

groupname: group-name#

threadname: thread-name

mode: user or system

namedprocess_namegroup_thread_io_bytes_total

Same as read_bytes_total and write_bytes_total, but broken down per-thread subgroup. Unlike read_bytes_total/write_bytes_total, the label iomode is used to distinguish between read and write bytes.

instance: instance-identifier-string

job: job-name

groupname: group-name#

threadname: thread-name

iomode: read or write

namedprocess_namegroup_thread_major_page_faults_total

Same as major_page_faults_total, but broken down per-thread subgroup.

instance: instance-identifier-string

job: job-name

groupname: group-name#

namedprocess_namegroup_thread_minor_page_faults_total

Same as minor_page_faults_total, but broken down per-thread subgroup.

instance: instance-identifier-string

job: job-name

groupname: group-name#

namedprocess_namegroup_thread_context_switches_total

Same as context_switches_total, but broken down per-thread subgroup.

instance: instance-identifier-string

job: job-name

groupname: group-name#

#

The group-name contains a name that uniquely identifies the collected performance value. In addition, the value is stored according to the contents set by the user in the item "name" of the Process exporter configuration file (jpc_process_exporter.yml).

Important
  • Processes whose name contains multi-byte characters cannot be monitored.

  • Process exporter still continues to output information of processes that it collected once, even after the processes stop running. Therefore, if Process exporter is configured to collect information based on PIDs, new time-series data is added every time a process is restarted and its PID is changed, resulting in large amounts of unnecessary data.

    Furthermore, it is not recommended to use PIDs in open source software (OSS), and thus version 13-00 of our software is configured not to collect PID information by default (groupname). If the user wants to manage processes on the same command line separately, we recommend operational means, such as a change in the order of arguments or the use of PIDs (however, periodic restarts are needed to prevent collected information from accumulating continuously).

    Note that information collected by Windows exporter is different from what Process exporter collects, because Windows exporter collects the PID information.

  • When Process exporter monitors a monitored process, by default it monitors the child processes of the monitored process and acquires the operational data including the child processes.

    To avoid including child processes, unit definition file of Process exporter must be edited.

    For details, see 2.19.2(6)(d) Setting that excludes child processes from monitoring in the JP1/Integrated Management 3 - Manager Configuration Guide.

(f) Node exporter for AIX (AIX performance data collection capability)

A Node exporter for AIX is an Exporter that is embedded in a monitored AIX host to obtain the health of the host.

Node exporter for AIX is installed on a host other than Prometheus server and is returned to Prometheus server after scrape is requested from Prometheus server to collect operational data from AIX OS of the same host.

You can collect activity on memory and disks from inside the host that cannot be collected by monitoring from outside the host (external shape monitoring by URL or CloudWatch).

■ Prerequisites

It is a prerequisite that the ports used by Node exporter for AIX are protected by firewalls, networking configurations, and so on, so that they are not accessed by anything other than Prometheus server of JP1/IM - Agent.

For the ports used by Node exporter for AIX, see the explanation of node_exporter_aix command options in 10.4.2(1) Enabling registering services in the JP1/Integrated Management 3 - Manager Administration Guide.

■ Conditions to be monitored

See the Release Notes for the supporting OS of the host on which you are installing Node exporter for AIX.

WPAR is not supported.

Multiple boots of Node exporter for AIX on the same host are not supported, even if they are booted on both physical and logical hosts.

The logical host configuration of the monitored AIX hosts is supported only if the following conditions are met:

  • The hostname of the monitored AIX hostname can be uniquely resolved from Prometheus.

Note: If more than one IP address is assigned to AIX monitored host, Node exporter for AIX can be accessed by all IP addresses.

For the upper limit of Node exporter for AIX that can be monitored by one Prometheus server, refer to the limit value list in JP1/IM - Agent of Appendix D.1 Limits when using the Intelligent Integrated Management Base.

■ Main items to be acquired

The main retrieval items for Node exporter for AIX that JP1/IM - Agent ships with are defined in metric definition-file (default) of Node exporter for AIX. For details, see Node exporter for AIX metric definition file (metrics_node_exporter_aix.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can add retrieval items to metric definition file. The following table lists metric that can be specified for PromQL expression in the definition file:

Metric Name

Command-line options for retrieva

Contents to be acquired

Label

Data Source

node_context_switches

-C

Total number of context switches.

(cumulative value)

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

Get by perfstat_cpu_total func

pswitch of perfstat_cpu_t structure

node_cpu

-C

Seconds the cpus spent in each mode.

(cumulative value)

instance: instance-identity-string

job: job-name

cpu: cpuid

mode: mode (idle, sys, user, or wait)

Get by perfstat_cpu func

Perfstat_cpu_t structure

aix_diskpath_wblks

-D

Blocks written via the path

cpupool_id=physical-processor-shared-pooling-ID

diskpath=disk-path-name

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

Get by perfstat_diskpath func

wblks of perfstat_diskpath_t structure

aix_diskpath_rblks

-D

Blocks read via the path

cpupool_id=physical-processor-shared-pooling-ID

diskpath=disk-path-name

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

Get by perfstat_diskpath func

rblks of perfstat_diskpath_t structure

aix_disk_rserv

-d

Read or receive service time

cpupool_id=physical-processor-shared-pooling-ID

disk=disk-name

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

vgname=volume-group-name

Get by perfstat_disk func

rserv of perfstat_disk_t structure

aix_disk_rblks

-d

Number of blocks read from disk

cpupool_id=physical-processor-shared-pooling-ID

disk=disk-name

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

vgname=volume-group-name

Get by perfstat_disk func

rblks of perfstat_disk_t structures

aix_disk_wserv

-d

Write or send service time

cpupool_id=physical-processor-shared-pooling-ID

disk=disk-name

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

vgname=volume-group-name

Get by perfstat_disk func

wserv of perfstat_disk_t structure

aix_disk_wblks

-d

Number of blocks written to disk

cpupool_id=physical-processor-shared-pooling-ID

disk=disk-name

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

vgname=volume-group-name

Get by perfstat_disk func

wblks of perfstat_disk_t structure

aix_disk_time

-d

Amount of time disk is active

cpupool_id=physical-processor-shared-pooling-ID

disk=disk-name

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

vgname=volume-group-name

Get by perfstat_disk func

time of perfstat_disk_t structure

aix_disk_xrate

-d

Number of transfers from disk

cpupool_id=physical-processor-shared-pooling-ID

disk=disk-name

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

vgname=volume-group-name

Get by perfstat_disk func

xrate of perfstat_disk_t structure

aix_disk_xfers

-d

Number of transfers to/from disk

cpupool_id=physical-processor-shared-pooling-ID

disk=disk-name

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

vgname=volume-group-name

Get by perfstat_disk func

xfers of perfstat_disk_t structure

node_filesystem_avail_bytes

-f

Filesystem space available to non-root users in bytes.

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

device=device-name

fstype=file-system-type

mountpoint=mount-point

Get by stat_filesystems func

avail_bytes of filesystem structure

node_filesystem_files

-f

Filesystem total file nodes.

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

device=device-name

fstype=file-system-type

mountpoint=mount-point

Get by stat_filesystems func

files of filesystem structure

node_filesystem_files_free

-f

Filesystem total free file nodes.

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

device=device-name

fstype=file-system-type

mountpoint=mount-point

Get by stat_filesystems func

files_free of filesystem structure

node_filesystem_free_bytes

-f

Filesystem free space in bytes.

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

device=device-name

fstype=file-system-type

mountpoint=mount-point

Get by stat_filesystems func

free_bytes of filesystem structure

node_filesystem_size_bytes

-f

Filesystem size in bytes.

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

device=device-name

fstype=file-system-type

mountpoint=mount-point

Get by stat_filesystems func

size_bytes of filesystem structure

node_intr

-C

Total number of interrupts serviced.

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

Get by perfstat_cpu_total func

decrintrs of perfstat_cpu_total_t structure

mpcsintrs of perfstat_cpu_total_t structure

devintrs of perfstat_cpu_total_t structure

softintrs of perfstat_cpu_total_t structure

node_load1

-C

1m load average.

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

Get by perfstat_cpu_total func

loadavg[0] of perfstat_cpu_total_t structure

node_load5

-C

5m load average.

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

Get by perfstat_cpu_total func

loadavg[1] of perfstat_cpu_total_t structure

node_load15

-C

15m load average.

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

Get by perfstat_cpu_total func

loadavg[2] of perfstat_cpu_total_t structure

aix_memory_real_avail

-m

Number of pages (in 4KB pages) of memory available without paging out working segments

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

Get by perfstat_memory_total func

real_avail of perfstat_memory_total_t structure

aix_memory_real_free

-m

Free real memory (in 4 KB pages).

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

Get by perfstat_memory_total func

real_free of perfstat_memory_total_t structures

aix_memory_real_inuse

-m

Real memory which is in use (in 4KB pages)

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

Get by perfstat_memory_total func

real_inuse of perfstat_memory_total_t structures

aix_memory_real_total

-m

Total real memory (in 4 KB pages).

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

Get by perfstat_memory_total func

perfstat_memory_total_t structure real_total

aix_netinterface_mtu

-i

Network frame size

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

netinterface=net-interface-name

Get by perfstat_netinterface func

mtu of perfstat_netinterface_t structure

aix_netinterface_ibytes

-i

Number of bytes received on interface

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

netinterface=net-interface-name

Get by perfstat_netinterface func

ibytes of perfstat_netinterface_t structure

aix_netinterface_ierrors

-i

Number of input errors on interface

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

netinterface=net-interface-name

Get by perfstat_netinterface func

ierrors of perfstat_netinterface_t structure

aix_netinterface_ipackets

-i

Number of packets received on interface

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

netinterface=net-interface-name

Get by perfstat_netinterface func

ipackets of perfstat_netinterface_t structure

aix_netinterface_obytes

-i

Number of bytes sent on interface

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

netinterface=net-interface-name

Get by perfstat_netinterface func

obytes of perfstat_netinterface_t structure

aix_netinterface_collisions

-i

Number of collisions on csma interface

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

netinterface=net-interface-name

Get by perfstat_netinterface func

collisions of perfstat_netinterface_t structure

aix_netinterface_oerrors

-i

Number of output errors on interface

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

netinterface=net-interface-name

Get by perfstat_netinterface func

oerrors of perfstat_netinterface_t structure

aix_netinterface_opackets

-i

Number of packets sent on interface

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

netinterface=net-interface-name

Get by perfstat_netinterface func

opackets of perfstat_netinterface_t structure

aix_memory_pgspins

-m

Number of page ins from paging space

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

Get by perfstat_memory_total func

pgspins of perfstat_memory_total_t structure

aix_memory_pgspouts

-m

Number of pages paged out from paging space

cpupool_id=physical-processor-shared-pooling-ID

group_id=group-ID

instance: instance-identity-string

job: job-name

lpar=partition-name

machine_serial=machine-ID

Get by perfstat_memory_total func

pgspouts of perfstat_memory_total_t structure

Node exporter for AIX is collected for each monitored resource, such as CPU, memories. You can enable or disable collection for each resource that you want to monitor by using Node exporter for AIX command-line options.

For Node exporter for AIX command-line options, see the description of node_exporter_aix command options in 10.4.2(1) Enabling registering services in the JP1/Integrated Management 3 - Manager Administration Guide.

Use Script exporter to collect information about processes. For details on how to configure the settings, see 1.23.2(4)(e) Monitoring processes on monitoring hosts (AIX) (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.

Use JP1/Base log file trap feature to monitor the log files of the monitored AIX hosts.

■ Notes on logging Node exporter for AIX

Node exporter for AIX log file is output to OS system log. Therefore, the destination depends on OS system log settings. For details on changing the output destination of the system log for Node exporter for AIX logging OS, see 1.23.2(4)(f) Changing the log destination of Node exporter for AIX (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.

■ Precautions When Using SMT or Micro-Partitioning

In an SMT(Simultaneous multithreading) or Micro-Partitioning deployment, calculating CPU Utilization (cpu_used_rate) metric for Node exporter for AIX does not include physical CPU quotas, but calculating CPU utilization as displayed by sar command includes physical CPU quotas.

Therefore, CPU Utilization (cpu_used_rate) of Node exporter for AIX might show a lower metric than sar command output.

(g) Yet another cloudwatch exporter (Azure Monitor performance data collection capability)

Yet another cloudwatch exporter is an exporter included in the monitoring agent that uses Amazon CloudWatch to collect uptime information for AWS services in the cloud.

Yet another cloudwatch exporter is installed on the same host as the Prometheus server, and collects CloudWatch metrics obtained via the SDK provided by AWS (AWS SDK)# upon scrape requests from the Prometheus server, and sends them to the Prometheus server. I will return it.

#

SDK provided by Amazon Web Services (AWS). Yet another cloudwatch exporter uses the Go language version of the AWS SDK for Go (V1). CloudWatch monitoring requires that Amazon CloudWatch supports the AWS SDK for Go (V1).

You can monitor services that cannot include Node exporter or Windows exporter.

Restrictions

To monitor with Yet another cloudwatch exporter (Amazon CloudWatch performance data collection capability), you must be able to connect to AWS Sercurity Token Service(STS) global endpoint. You cannot use the regional endpoint with Yet another cloudwatch exporter that shipped with JP1/IM - Agent.

■ Main items to be acquired

The main retrieval items of Yet another cloudwatch exporter are defined in Yet another cloudwatch exporter metric definition file (default). For details, see Yet another cloudwatch exporter metric definition file (metrics_ya_cloudwatch_exporter.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ CloudWatch metrics you can collect

You can collect metric of namespace name of AWS that is supported for monitoring by Yet another cloudwatch exporter of JP1/IM - Agent that is listed in 3.15.6(1)(k) Creating an IM Management Node for Yet another cloudwatch exporter.

Specify the metrics to collect by describing the AWS service name and CloudWatch metric name in the Yet another Cloudwatch Exporter configuration file (jpc_ya_cloudwatch_exporter.yml).

The following is an example of the description of the Yet another cloudwatch exporter configuration file when collecting CPUUtilization and DiskReadBytes for CloudWatch metrics for AWS/EC2 services.

discovery:
  exportedTagsOnMetrics:
    ec2:
      - jp1_pc_nodelabel
  jobs:
  - type: ec2
    regions:
      - ap-northeast-1
    period: 60
    length: 300
    delay: 60
    nilToZero: true
    searchTags:
      - key: jp1_pc_nodelabel
        value: .*
    metrics:
      - name: CPUUtilization
        statistics:
        - Maximum
      - name: DiskReadBytes
        statistics:
        - Maximum

For details about what Yet another cloudwatch exporter configuration file describes, see Yet another cloudwatch exporter configuration file (jpc_ya_cloudwatch_exporter.yml in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can also add new metrics to the Yet another cloudwatch exporter metrics definition file using the metrics you set in the Yet another cloudwatch exporter configuration file.

The metrics and labels specified in the PromQL statement described in the definition file conform to the following naming conventions:

- Naming conventions for Exporter metrics

Yet another cloudwatch exporter treats the metric name of CloudWatch as the metric name of the exporter as the automatic conversion of the metric name in CloudWatch by the following rules. Also, the metric specified on the PromQL statement is described using the indicator name of the exporter.

"aws_"#1+name-space#2+"_"+CloudWatch-metric#2+"_"+statistic-type#2

#1

Appended if the namespace does not begin with "aws_".

#2

Indicates the name you set in the Yet another cloudwatch exporter configuration file (jpc_ya_cloudwatch_exporter.yml). It is converted by the following rules:

  • It is converted from camel case notation to snake case notation.

    CamelCase is a notation that capitalizes word breaks, such as "CamelCase" or "camelCase."

    Snakecase is a notation that separates words with "_", such as "snake_case".

  • The following symbols are converted to "_".

    whitespace,comma,tab, /, \, half-width period, -, :, =, full-width left double quote, @, <, >

  • "%" is converted to "_percent".

- Exporter label naming conventions

Yet another cloudwatch exporter treats the CloudWatch dimension tag name as the Exporter's label name, which is automatically converted by the following rules. Also, labels specified on the PromQL statement are described using the label name of the Exporter.

  • For dimensions

    "dimension"+"_"+dimensions_name#

  • For tags

    "tag"+"_"+tag_name#

  • For custom tags

    "custom_tag_"+"_"+custom tag_name#

#

Indicates the name you set in the Yet another cloudwatch exporter configuration file (jpc_ya_cloudwatch_exporter.yml).

■ About policies for IAM users in your AWS account

To connect to AWS CloudWatch, you must create a policy with the following permissions and assign it to an IAM user.

"tag:GetResources",
"cloudwatch:GetMetricData",
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics"

For details on how to set JSON format information, refer to "1.21.2(7)(b) Modify Setup to connect to CloudWatch (for Windows) (optional)" in the manual JP1/Integrated Management 3-Manager Configuration Guide. (The references for Linux are the same.)

■ Environment-variable HTTPS_PROXY

Environment-variable that you specify when you connect to CloudWatch from a Yet another cloudwatch exporter through a proxy. The URL that can be set in the environment-variable HTTPS_PROXY is http only. Note that the only Authentication method supported is Basic authentication.

You can set the environment-variable HTTPS_PROXY to connect to AWS CloudWatch through proxies. The following shows an example configuration.

HTTPS_PROXY=http://username:password@proxy.example.com:5678

■ How to handle monitoring targets JP1/IM - Agent does not support

If you have a product or metric that cannot be monitored by JP1/IM - Agent, you must retrieve it, for example, using user-defined Exporter.

(h) Promitor (Azure Monitor performance data collection capability)

Promitor, included in the integrated agent, collects operating information of Azure services on the cloud environment through Azure Monitor and Azure Resource Graph.

Promitor consists of Promitor Scraper and Promitor Resource Discovery. Promitor Scraper collects metrics on resources from Azure Monitor according to schedule settings and returns them.

Metrics can be collected from target resources in two ways: one method is to specify the target resources separately in a configuration file and the other is to detect the resources automatically. If you choose to detect them automatically, Promitor Resource Discovery detects resources in a tenant through Azure Resource Graph, and based on the results, Promitor Scraper collects metric information.

In addition, both Promitor Scraper and Promitor Resource Discovery require two configuration files for each of them. One configuration file is to define runtime settings, such as authentication information, and the other is to define metric information to be collected.

■ Key metric items

The key Promitor metric items are defined in the Promitor metric definition file (initial status). For details, see the description under Promitor metric definition file (metrics_promitor.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Metrics you can collect

Promitor can collect metrics for the following services to monitor:

You specify metrics you want to collect in the Promitor Scraper configuration file (metrics-declaration.yaml).

If you want to change the metrics specified in the Promitor Scraper settings file, see Change monitoring metrics (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide 1.21.2(8) Set up of Promitor (d) Configuring scraping targets (required).

You can also add new metrics to the Promitor metric definition file, based on the metrics specified in the Promitor Scraper configuration file. Metrics defined in Promitor Scraper configuration file can be specified to the PromQL statement written in the definition file.

Table 3‒36: Services supported as monitoring targets by Promitor

Promitor resourceType name

Azure Monitor namespace

Automatic discovery support

VirtualMachine

Microsoft.Compute/virtualMachines

Y

FunctionApp

Microsoft.Web/sites

Y

ContainerInstance

Microsoft.ContainerInstance/containerGroups

--

KubernetesService

Microsoft.ContainerService/managedClusters

Y

FileStorage

Microsoft.Storage/storageAccounts/fileServices

--

BlobStorage

Microsoft.Storage/storageAccounts/blobServices

--

ServiceBusNamespace

Microsoft.ServiceBus/namespaces

Y

CosmosDb

Microsoft.DocumentDB/databaseAccounts

Y

SqlDatabase

Microsoft.Sql/servers/databases

Y

SqlServer

Microsoft.Sql/servers/databases

Microsoft.Sql/servers/elasticPools

--

SqlManagedInstance

Microsoft.Sql/managedInstances

Y

SqlElasticPool

Microsoft.Sql/servers/elasticPools

Y

LogicApp

Microsoft.Logic/workflows

Y

Legend:

Y: Automatic discovery is supported.

--: Automatic discovery is not supported.

■ Checking how Azure SDKs used by Promitor are supported

Promitor employs Azure SDK for .NET. An end of Azure SDK support is announced 12 months in advance. For details on the lifecycle of Azure SDK, see Lifecycle FAQ at the following website:

https://learn.microsoft.com/ja-jp/lifecycle/faq/azure#azure-sdk-----------

For the lifecycles of versions of Azure SDK libraries, you can find them in the following website:

https://azure.github.io/azure-sdk/releases/latest/all/dotnet.html

■ Credentials required for account information

Promitor can connect to Azure through the service principal method or the managed ID method. For details on the credentials assigned to the service principal and managed ID, see (a) Configuring the settings for establishing a connection to Azure (required) in the JP1/Integrated Management 3 - Manager Configuration Guide 1.21.2(8) Set up of Promitor.

(i) Blackbox exporter (Synthetic metric collector)

Blackbox exporter is an exporter that sends simulated requests to monitored Internet services on the network and obtains operation information obtained from the responses. The supported communication protocols are HTTP, HTTPS, and ICMP.

When the Blackbox exporter receives a scrape request from the Prometheus server, it throws a service request such as HTTP to the monitored target and obtains the response time and response. In addition, the execution results are summarized in the form of metrics and returned to the Prometheus server.

■ Main items to be acquired

The main retrieval items of Blackbox exporter are defined in Blackbox exporter metric definition file (default). For details, see Blackbox exporter metric definition file (metrics_blackbox_exporter.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can add retrieved items to the metric definition file. The following are the metrics that can be specified in the PromQL statement described in the definition file.

Metric Name

Prober

What to get

Label

probe_http_duration_seconds

http

The number of seconds taken per phase of the HTTP request

Note:

All redirects add up.

instance: instance-identification-string

job: job-name

phase: phase#

#

Contains one of the following:

  • "resolve"

  • "connect"

  • "tls"

  • "processing"

  • "transfer"

probe_http_content_length

http

HTTP content response length

instance: instance-identification-string

job: job-name

probe_http_uncompressed_body_length

http

Uncompressed response body length

instance: instance-identification-string

job: job-name

probe_http_redirects

http

Number of redirects

instance: instance-identification-string

job: job-name

probe_http_ssl

http

Whether SSL was used for the final redirect

  • 0: TLS/SSL was not used

  • 1: TLS/SSL was used

instance: instance-identification-string

job: job-name

probe_http_status_code

http

HTTP response status code value

Note:

If you are redirecting, the final status code is the value of the metric.

If no redirection is performed, the first status code received is the value of the metric.

instance: instance-identification-string

job: job-name

probe_ssl_earliest_cert_expiry

http

Earliest expiring SSL certificate UNIX time

instance: instance-identification-string

job: job-name

probe_ssl_last_chain_expiry_timestamp_seconds

http

Expiration timestamp of the last certificate in the SSL chain

Note:

If you want to monitor this metric, you must specify false for the insecure_skip_verify parameter in the tls_config settings of the Blackbox exporter configuration file (jpc_blackbox_exporter.yml), place the certificate, and specify the path of the certificate file in the appropriate parameter.

instance: instance-identification-string

job: job-name

probe_ssl_last_chain_info

http

SSL leaf certificate information

Note:

This is the SHA256 hash value of the server certificate to be monitored. The hash value is set to the label "fingerprint_sha256".

instance: instance-identification-string

job: job-name

fingerprint_sha256: SHA256-fingerprint-on-certificate

probe_tls_version_info

http

TLS version used

Note:

The TLS version, such as "TLS 1.2", is set to the label "version".

instance: instance-identification-string

job: job-name

version:TLS-version

probe_http_version

http

HTTP version of the probe response

instance: instance-identification-string

job: job-name

probe_failed_due_to_regex

http

Whether the probe failed due to a regular expression check on the response body or response headers

  • 0: Success

  • 1: Failed

instance: instance-identification-string

job: job-name

probe_http_last_modified_timestamp_seconds

http

UNIX time showing Last-Modified HTTP response headers

instance: instance-identification-string

job: job-name

probe_icmp_duration_seconds

icmp

Seconds taken per phase of an ICMP request

instance: instance-identification-string

job: job-name

phase: phase#

#

Contains one of the following:

  • resolve

    Name Resolution Time

  • setup

    Time from resolve completion to ICMP packet transmission

  • rtt

    Time to get a response after setup

probe_icmp_reply_hop_limit

icmp

Hop limit (TTL for IPv4) value

Note:

Hop limit (TTL for IPv4) value

instance instance-identification-string

job: job-name

probe_success

--

Whether the probe was successful

  • 0: Failed

  • 1: Success

instance instance-identification-string

job: job-name

probe_duration_seconds

--

The number of seconds it took for the probe to complete

instance instance-identification-string

job: job-name

■ IP communication with monitored objects

Only IPv4 communication is supported.

■ Encrypted communication with monitored objects

HTTP monitoring enables encrypted communication using TLS. In this case, the Blackbox exporter acts as a TLS client to the monitored object (TLS server).

When using encrypted communication using TLS, specify it in item "tls_config" in the Blackbox exporter configuration file (jpc_blackbox_exporter.yml). In addition, the following certificate and key files must be prepared.

File

Format

CA certificate file

A file encoding an X509 public key certificate in pkcs7 format in PEM format

Client certificate file

Client certificate key file

A file in which the private key in pkcs1 or pkcs8 format is encoded in PEM format#

#

You cannot use password-protected files.

The available TLS versions and cipher suites are supported below.

Item

Scope of support

TLS Version

1.2 to 1.3

Cipher suites

  • "TLS_AES_128_GCM_SHA256" (TLS 1.3 only)

  • "TLS_AES_256_GCM_SHA384" (TLS 1.3 only)

  • "TLS_CHACHA20_POLY1305_SHA256" (TLS 1.3 only)

  • "TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA" (up to TLS 1.2)

  • "TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA" (up to TLS 1.2)

  • "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA" (up to TLS 1.2)

  • "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA" (up to TLS 1.2)

  • "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256" (TLS 1.2 only)

  • "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384" (TLS 1.2 only)

  • "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256" (TLS 1.2 only)

  • "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384" (TLS 1.2 only)

  • "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256" (TLS 1.2 only)

  • "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256" (TLS 1.2 only)

■ Timeout for collecting health information

In a network environment where response is slow (under normal conditions), operating information can be collected by adjusting the timeout period.

On the Prometheus server, you can specify the scrape request timeout period in the entry "scrape_timeout" of the Prometheus configuration file (jpc_prometheus_server.yml). For details, see the description of item scrape_timeout in Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

In addition, the timeout period when connecting from the Blackbox exporter to the monitoring target is 0.5 seconds before the value specified in "scrape_timeout" above.

■ Certificate expiration

When collecting operation information by HTTPS monitoring, the exporter receives a certificate list (server certificate and certificate list certifying server certificate) from the monitoring target.

The Blackbox exporter allows you to collect the expiration time (UNIX time) of the closest expiring certificate as a probe_ssl_earliest_cert_expiry metric.

You can also use the features in 3.15.1(3) Performance data monitoring notification function to monitor certificates that are close to their deadline, because you can calculate the number of seconds remaining before the deadline with the value calculated in probe_ssl_earliest_cert_expiry Metric Value-PromQL's time() function.

■ User-Agent value in HTTP request header when monitoring HTTP

The default value of User-Agent included in HTTP request header during HTTP monitoring is as shown below:

  • For version 13-00 or earlier

    "Go-http-client/1.1"

  • For version 13-00-01 or later

    "Blackbox Exporter/0.24.0"

You can change the value of User-Agent in the setting of item "headers" in the Blackbox exporter configuration file (jpc_blackbox_exporter.yml).

The following is an example of changing the value of User-Agent to "My-Http-Client".

modules:
  http:
    prober: http
    http:
      headers:
        User-Agent: "My-Http-Client"

For details, see the description of item headers in Blackbox exporter configuration file (jpc_blackbox_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ About HTTP 1.1 Name-Based Virtual Host Support

The Blackbox exporter supports HTTP 1.1 name-based virtual hosts and TLS Server Name Indication (SNI). You can monitor virtual hosts that disguise one HTTP/HTTPS server as multiple HTTP/HTTPS servers.

■ About TLS Server Authentication and Client Authentication

In Blackbox exporter's HTTPS monitoring, server authentication is performed using the CA certificate described in item "ca_file" of the Blackbox exporter configuration file (jpc_blackbox_exporter.yml) and the server certificate sent by the server when HTTPS communication with the server starts (TLS handshake).

If the sent certificate is incorrect (server name is incorrect, expired, self-certificate is used, etc.), HTTPS communication cannot be started and monitoring fails.

In addition, when a request is made to send a certificate from the monitored server at the start of HTTPS communication (TLS handshake), the client certificate described in item "cert_file" of the Blackbox exporter configuration file (jpc_blackbox_exporter.yml) is sent to the monitored server.

If the server validates the sent certificate, recognizes it as invalid, and returns an error to the Blackbox exporter via the TLS protocol (or if communication cannot be continued due to a loss of communication, etc.), the monitoring fails.

For details on the verification contents related to the client certificate and the operation in the event of an error on the monitored server, check the specifications of the monitored server (or relay device such as a load balancer).

To detect fraudulent certificates during server authentication, if you specify "true" in item "insecure_skip_verify" in the Blackbox exporter configuration file (jpc_blackbox_exporter.yml), HTTPS communication can be started without errors. However, in that case, the verification operation related to client authentication at the server will be invalidated.

For details, see the description of item insecure_skip_verify in Blackbox exporter configuration file (jpc_blackbox_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

Server authentication cannot be performed using certificates that the host name is not listed in the Subject Alternative Name field.

■ About cookie information

The Blackbox exporter does not use cookie information sent from the monitored target in the next HTTP communication request.

■ About external resources referenced from content included in the response body of HTTP communication

In Blackbox exporter, external resources (subframes, images, etc.) referenced from the content included in the response body of HTTP communication are not included in the monitoring range.

■ About Monitoring of Content Included in HTTP Communication Response Body

Since the Blackbox exporter does not parse the content, the execution result and execution time based on the syntax (HTML, javascript, etc.) in the content included in the response body of HTTP communication are not reflected in the monitoring result.

■ Precautions when the monitoring destination of HTTP monitoring redirects with Basic authentication

If the Blackbox exporter's HTTP monitoring destination redirects with Basic authentication, the Blackbox exporter sends the same Basic authentication username and password to the redirect source and destination. Therefore, when performing Basic authentication on both the redirect source and the redirect destination, the same user name and password must be set on the redirect source and the redirect destination.

(j) Script exporter (UAP monitoring capability)

Script exporter runs scripts on a host and gets results.

The Script exporter is installed on the same host as the JP1/IM - Agent, and upon a scrape request from the Prometheus server, it executes a script on that host to retrieve the results and returns them to the Prometheus server.

Developing a script that gets UAP information and converts it to a metric and adding the script to Script exporter enables you to monitor applications that are not supported by Exporter as you want.

■ Key metric items

The key Script exporter metric items are defined in the Script exporter metric definition file (initial status). For details, see Script exporter metric definition file (metrics_script_exporter.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can add more metric items to the metric definition file. The following table shows the metrics you can specify with PromQL statements used within the definition file.

Metric name

Data to be obtained

Label

script_success

Script exit status (0 = error, 1 = success)

instance: instance-identifier-string

job: job-name

script: script-name

script_duration_seconds

Script execution time, in seconds.

instance: instance-identifier-string

job: job-name

script: script-name

script_exit_code

The exit code of the script.

instance: instance-identifier-string

job: job-name

script: script-name

(k) OracleDB exporter (Oracle Database monitoring function)

OracleDB exporter is an Exporter for Prometheus that retrieves performance data from Oracle Database.

- About the number of sessions

If you monitor Oracle Database from OracleDB exporter, it connects to each scrape and disconnects when the data-collection is complete. The number of sessions when connecting is 1.

■ Conditions to be monitored

The following are the Oracle Database configurations and database character sets that JP1/IM - Agent supports for monitoring:

  • Configuring Oracle Database

    • For non-clusters

      Non CDB and CDB configurations

    • For Oracle RAC

      CDB configuration

Because OracleDB exporter connects to one service in a single process, it launches more than one OracleDB exporter if there is more than one target.

Note
  • Oracle RAC One Node and Oracle Database Cloud Service are not supported.

  • HA clustering configuration on Oracle Database is not supported.

  • Oracle Database database-character set

    • AL32UTF8(Unicode UTF-8)

    • JA16SJIS (Japanese-language SJIS)

    • ZHS16GBK (Simplified Chinese GBK)

■ Acquisition items

The metrics that can be retrieved with the OracleDB exporter shipped with the JP1/IM - Agent are the metrics and cache_hit_ratio defined by the OracleDB exporter default.

OracleDB exporter retrieval items are defined in metric definition-file (default) of OracleDB exporter. For details, see OracleDB exporter metric definition file (metrics_oracledb_exporter.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

The following tables list metric that can be specified for PromQL expression in the definition file. The value of each metric is obtained by executing the SQL statement shown in the table to Oracle Database. For details about metric, contact Oracle based on SQL statement of the data source.

Metric name

Contents to be acquired

Label

Data source (SQL statement)

oracledb_sessions_value

Count of sessions

status: status

type: session-type

SELECT status, type, COUNT(*) as value FROM v$session GROUP BY status, type

oracledb_resource_current_utilization

Resource usage#1

resource_name: resource-name

SELECT resource_name,current_utilization,CASE WHEN TRIM(limit_value) LIKE 'UNLIMITED' THEN '-1' ELSE TRIM(limit_value) END as limit_value FROM v$resource_limit

oracledb_resource_limit_value

Resource usage limit#1 (UNLIMITED: -1)

resource_name: resource-name

oracledb_asm_diskgroup_total

Bytes of total size of ASM disk group

name: disk-group-name

SELECT name,total_mb*1024*1024 as total,free_mb*1024*1024 as free FROM v$asm_diskgroup_stat where exists (select 1 from v$datafile where name like '+%')

oracledb_asm_diskgroup_free

Bytes of free space available on ASM disk group

name: disk-group-name

oracledb_activity_execute_count

Total number of calls (user calls and recursive calls) executing SQL statements (cumulative value)

none

SELECT name, value FROM v$sysstat WHERE name IN ('parse count (total)', 'execute count', 'user commits', 'user rollbacks', 'db block gets from cache', 'consistent gets from cache', 'physical reads cache')

oracledb_activity_parse_count_total

Total number of parse calls (hard, soft and describe) (cumulative value)

none

oracledb_activity_user_commits

Total number of user commit (cumulative value)

none

oracledb_activity_user_rollbacks

The number of times a user manually issued a ROLLBACK statement, or the total number of times an error occurred during a user's transaction (cumulative value)

none

oracledb_activity_physical_reads_cache

Total number of data blocks read from disk to the buffer cache (cumulative value)

none

oracledb_activity_consistent_gets_from_cache

Number of times block read consistency was requested from the buffer cache (cumulative value)

none

oracledb_activity_db_block_gets_from_cache

Number of times CURRENT blocking was requested from the buffer cache (cumulative value)

none

oracledb_process_count

Count of Oracle Database active-processes

none

SELECT COUNT(*) as count FROM v$process

oracledb_wait_time_administrative

Hours spent waiting for Administrative wait class (in 1/100 seconds)#2

none

SELECT

n.wait_class as WAIT_CLASS,

round(m.time_waited/m.INTSIZE_CSEC,3) as VALUE

FROM

v$waitclassmetric m, v$system_wait_class n

WHERE

m.wait_class_id=n.wait_class_id AND n.wait_class != 'Idle'

oracledb_wait_time_application

Hours spent waiting for Application wait class (in 1/100 seconds)#2

none

oracledb_wait_time_commit

Hours spent waiting for Commit wait class (in 1/100 seconds)#2

none

oracledb_wait_time_concurrency

Hours spent waiting for Concurrency wait class (in 1/100 seconds)#2

none

oracledb_wait_time_configuration

Hours spent waiting for Configuration wait class (in 1/100 seconds)#2

none

oracledb_wait_time_network

Hours spent waiting for Network wait class (in 1/100 seconds)#2

none

oracledb_wait_time_other

Hours spent waiting for Other wait class (in 1/100 seconds)#2

none

oracledb_wait_time_scheduler

Hours spent waiting for Scheduler wait class (in 1/100 seconds)#2

none

oracledb_wait_time_system_io

Hours spent waiting for System I/O wait class (in 1/100 seconds)#2

none

oracledb_wait_time_user_io

Hours spent waiting for User I/O wait class (in 1/100 seconds)#2

none

oracledb_tablespace_bytes

Total bytes consumed by tablespaces

tablespace: name-of-the-tablespace

type: tablespace-contents

SELECT

dt.tablespace_name as tablespace,

dt.contents as type,

dt.block_size * dtum.used_space as bytes,

dt.block_size * dtum.tablespace_size as max_bytes,

dt.block_size * (dtum.tablespace_size - dtum.used_space) as free,

dtum.used_percent

FROM dba_tablespace_usage_metrics dtum, dba_tablespaces dt

WHERE dtum.tablespace_name = dt.tablespace_name

ORDER by tablespace

oracledb_tablespace_max_bytes

Maximum number of bytes in a tablespace

tablespace: name-of-the-tablespace

type: tablespace-contents

oracledb_tablespace_free

Number of free bytes in the tablespace

tablespace: name-of-the-tablespace

type: tablespace-contents

oracledb_tablespace_used_percent

Tablespace utilization

If auto extension is ON, it is calculated with auto extension taken into account.

tablespace: name-of-the-tablespace

type: tablespace-contents

oracledb_exporter_last_scrape_duration_seconds

The number of seconds taken the last scrape

none

-

oracledb_exporter_last_scrape_error

Whether the last scrape resulted in an error

0: Error

1: Success

none

-

oracledb_exporter_scrapes_total

Total number of times Oracle Database was scraped for metrics

none

-

oracledb_up

Whether the Oracle Database Server is up

0: Not running

1: Running

none

-

#1

In a PDB, the table in the source v$resource_limit is empty and cannot be retrieved.

#2

In a PDB, the table in the source v$waitclassmetric is empty and cannot be retrieved.

Important
  • Prior to using OracleDB exporter, make sure that SQL statements that serve as the data source can be executed, for example, with SQL*Plus command. This ensures that the required information can be displayed. Use OracleDB exporter to connect to Oracle Database when checking.

  • OracleDB exporter provided by JP1/IM - Agent does not support the ability to collect any metric (custom metrics).

■ Requirements for monitoring Oracle Database

If you want to monitor Oracle Database on OracleDB exporter, Oracle Database must have the following settings:

You do not need to install Oracle Client, etc. on JP1/IM - Agent host-side.

  • Oracle listener

    • Configure Oracle listener and servicename so that they can connect to the target.

    • Oracle listener is configured to accept unencrypted connect requests.

  • Oracle Database

    Set Oracle Database database-character set to one of the following:

    • AL32UTF8 (Unicode UTF-8)

    • JA16SJIS (Japanese-language SJIS)

    • ZHS16GBK (Simplified Chinese GBK)

  • Users used to access Oracle Database

    • The user used to connect to Oracle Database must have the following permissions:

      - Login permissions

      - SELECT permissions to the following tables

      dba_tablespace_usage_metrics

      dba_tablespaces

      v$system_wait_class

      v$asm_diskgroup_stat

      v$datafile

      v$sysstat

      v$process

      v$waitclassmetric

      v$session

      v$resource_limit

    • User used to connect to Oracle Database

      For details about the character types and maximum lengths that can be specified for user names, see Environment variables.

    • Password of the user used to connect to Oracle Database

      The following character types can be used for passwords:

      - Uppercase letters, lowercase letters, numbers, @, +, ', !, $, :, ., (, ), ~, -, _

      - The password can be from 1 to 30 bytes in length.

■ Obfuscation of Oracle Database passwords

OracleDB exporter shipped with JP1/IM - Agent manages the passwords in secret obfuscation capabilities for accessing Oracle Database from OracleDB exporter. For details, see 3.15.10 Secret obfuscation function.

■ Notes on Oracle Database log files

Monitoring Oracle Database with OracleDB exporter can generate a large number of logfiles. Therefore, Oracle Database administrator should consider deleting logfiles periodically.

Directory where log files are generated

(including subdirectories)

Increasing log file extensions

$ORACLE_BASE/diag/rdbms

.trc, .trm

Below is a sample command line for deleting ".trc" or ".trm" files with older renewal dates. If necessary, consider running such commands periodically to delete unnecessary logs.

OS

Command line example for deleting logs

Windows

forfiles /P "%ORACLE_BASE%\diag\rdbms" /M *.trm /S /C "cmd /C del /Q @path" /D -14

forfiles /P "%ORACLE_BASE%\diag\rdbms" /M *.trc /S /C "cmd /C del /Q @path" /D -14

Linux

find $ORACLE_BASE/diag/rdbms -name '*.tr[cm]' -mtime +14 -delete

Set the $ORACLE_BASE and %ORACLE_BASE% environment variables as needed.

■ Environment variables

The following environment variables are required when using OracleDB exporter.

- Environment-variable "DATA_SOURCE_NAME" (required)

Specify the destination of OracleDB exporter in the following format: There is no default value.

  • For Windows

oracle://user-name@host-name:port/service-name?connection timeout=10[&amp;instance name=instance-name]
  • For Linux

oracle://user-name@host-name:port/service-name?connection timeout=10[&instance name=instance-name]
user-name
  • Specifies the username to connect to Oracle listener. Up to 30 characters can be specified.

  • You can use uppercase letters, numbers, underscores, dollar signs, pound signs, periods, and at signs. Note that lowercase letters are not allowed.

  • For Linux, replace the pound sign with "%%23" when you include your username in unit definition file. For example, if you are a shared CDB user, specify "C##USER" as "C%%23%%23USER".

  • For Windows, replace the pound sign with %23 when you include the username in service definition file. For example, if you are a shared CDB user, specify "C##USER" as "C%23%23USER".

host-name
  • Specifies the host name of Oracle Database host to monitor. Up to 253 characters can be specified.

  • You can use uppercase letters, lowercase letters, numbers, hyphens, and periods.

port
  • Specifies the port number for connecting to Oracle listener.

service-name
  • Specifies the service name of Oracle listener. Up to 64 characters can be specified.

  • You can use uppercase letters, lowercase letters, numbers, underscores, hyphens, and periods.

Option

You can specify the following options. If you specify more than one, connect them with &amp; in Windows and & in Linux.

  • connection timeout=number

    Specifies the connection timeout in seconds. This option must be specified.

    Be sure to specify 10. If you specify a value other than 10 or do not specify this option, scrape of Prometheus server times out and up metric may be 0 even if OracleDB exporter is running.

  • instance name=instance-name

    Specifies instance to connect to. Specifying this option is optional.

(Example of specification)

oracle://orauser@orahost:1521/orasrv?connection timeout=10
  • For Windows

oracle://orauser@orahost:1521/orasrv?connection timeout=10&amp;instance name=orcl1
  • For Linux

oracle://orauser@orahost:1521/orasrv?connection timeout=10
&instance name=orcl1
- Environment variable DATA_SOURCE_NAME (required)

Specify the full path of jp1ima directory under JP1/IM - Agent installation directory.

For a logical host, specify the full path of jp1ima directory under JP1/IM - Agent shared directory.

(Example of specification)

  • For Windows

C:\Program files\Hitachi\jp1ima
  • For Linux

/opt/jp1ima

■ Notes

  • If you try to stop the monitored Oracle Database instance and containers prior to stopping OracleDB exporter, NORMAL shutdown of Oracle may not terminate. Stop OracleDB exporter in advance or stop Oracle Database by IMMEDIATE shutdown

  • Shut down OracleDB exporter before making configuration changes or maintaining Oracle Database instance and containers.

(l) Fluentd (Log metrics)

This capability can generate and measure log metrics from log files created by monitoring targets. For details on the function, see 3.15.2 Log metrics by JP1/IM - Agent.

■ Key metric items

You define what figures you need from the log files created by your monitoring targets in the log metrics definition file (fluentd_any-name_logmetrics.conf). These definitions allow you to get quantified data (log metrics) as metric items.

For details on the log metrics definition file, see Log metrics definition file (fluentd_any-name_logmetrics.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Sample files

The following provides descriptions of sample files for when you use the log metrics feature. If you copy the sample files, be careful of the linefeed codes. For details, see the description of each file of 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference. These sample files are based on the assumptions in Assumptions of the sample files. Copy each file and change the settings according to your monitoring targets.

- Assumptions of the sample files

The sample files described here assume that HostA, a monitored host (integrated agent host), exists and JP1/IM - Agent is installed in it, and that WebAppA, an application running on HostA, creates the following log file.

- ControllerLog.log

As shown in target log message 1, a log message is created, saying that an HTTP endpoint in WebAppA is used, at the start of processing of the request for that endpoint. The log message also indicates the number of records handled upon request processing.

Target log message 1:

...
2022-10-19 10:00:00 [INFO] c.b.springbootlogging.LoggingController : endpoint "/register" started. Target record: 5.
...

In the sample files, a regular expression to match target log message 1 is used, and the number of the log messages that match the expression is counted. The number is then displayed in the Trends tab of the JP1/IM integrated operation viewer as log metric 1, Requests to the register Endpoint.

The definition for log metric 1 uses counter as its log metric type.

In addition, the regular expression used in the above also extracts the number indicated as Target record from target log message 1, and then the extracted numbers are summed up. The total is then displayed in the Trends tab of the JP1/IM integrated operation viewer as log metric 2, Number of Registered Records.

The definition for log metric 2 uses counter as its log metric type.

Fluentd workers (multi-process workers feature) for the number of log files to be monitored are required. For details on the worker settings related to the log metrics feature, see the log metrics definition file (fluentd_any-name_logmetrics.conf). Here, it is assumed that 11 fluentd workers are running, and ControllerLog.log is monitored by a worker whose worker ID is 10.

These sample files also assume the tree structure consisting of the following IM management nodes:

All Systems
 + Host A
    + Application Server
       + WebAppA
- Target files in this example

The target files used in this example are as follows:

  • Integrated manager host

    - User-specific metric definition file

  • Integrated agent host

    - Prometheus configuration file

    - User-specific discovery configuration file

    - Log metrics definition file

    - Fluentd log monitoring target definition file

- Sample user-specific metric definition file

- File name: metrics_logmatrics1.conf

- Written code

[
  {
    "name":"logmetrics_request_endpoint_register",
    "default":true,
    "promql":"logmetrics_request_endpoint_register and $jp1im_TrendData_labels",
    "resource_en":{
      "category":"HTTP",
      "label":"request_num_of_endpoint_register",
      "description":"The request number of endpoint register",
      "unit":"request"
    },
    "resource_ja":{
      "category":"HTTP",
      "label":"registerへのリクエスト数",
      "description":"The request number of endpoint register",
      "unit":"リクエスト"
    }
  },
  {
    "name":"logmetrics_num_of_registeredrecord",
    "default":true,
    "promql":"logmetrics_num_of_registeredrecord and $jp1im_TrendData_labels",
    "resource_en":{
      "category":"DB",
      "label":"logmetrics_num_of_registeredrecord",
      "description":"The number of registered record",
      "unit":"record"
    },
    "resource_ja":{
      "category":"DB",
      "label":"登録されたレコード数",
      "description":"The number of registered record",
      "unit":"レコード"
    }
  }
]
Note

The storage directory, written code, and file name follow the format of the user-specific metric definition file (metrics_any-Prometheus-trend-name.conf).

- Sample Prometheus configuration file

- File name: jpc_prometheus_server.yml

- Written code

global:
  ...
(omitted)
  ...
scrape_configs:
  - job_name: 'LogMetrics'
    
    file_sd_configs:
      - files:
        - 'user/user_file_sd_config_logmetrics.yml'
    
    relabel_configs:
      - target_label: jp1_pc_nodelabel
        replacement: Log trapper(Fluentd)
    
    metric_relabel_configs:
      - target_label: jp1_pc_nodelabel
        replacement: ControllerLog
      - source_labels: ['__name__']
        regex: 'logmetrics_request_endpoint_register|logmetrics_num_of_registeredrecord'
        action: 'keep'
      - regex: (jp1_pc_multiple_node|jp1_pc_agent_create_flag)
        action: labeldrop
 
  ...
(omitted)
  ...
Note

The storage directory and written code follow the format of the Prometheus configuration file (jpc_prometheus_server.yml). You do not have to create a new file. Instead, you add the scrape_configs section for the log metrics feature to the Prometheus configuration file (jpc_prometheus_server.yml) created during installation.

- Sample user-specific discovery configuration file

- File name: user_file_sd_config_logmetrics.yml

- Written code

- targets:
  - HostA:24830
  labels:
    jp1_pc_exporter: logmetrics
    jp1_pc_category: WebAppA
    jp1_pc_trendname: logmetrics1
    jp1_pc_multiple_node: "{__name__=~'logmetrics_.*'}"
    jp1_pc_agent_create_flag: false
Note

The storage directory and written code follow the format of the user-specific discovery configuration file (file_sd_config_any-name.yml).

ControllerLog.log is monitored by the worker whose Fluentd worker ID is 10. Thus, when 24820 is set for port in the Sample log metrics definition file, the port number of the worker monitoring ControllerLog.log is 24820 + 10 = 24830.

- Sample log metrics definition file

- File name: fluentd_WebAppA_logmetrics.conf

- Written code

## Input
<worker 10>
  <source>
    @type prometheus
    bind '0.0.0.0'
    port 24820
    metrics_path /metrics
  </source>
</worker>
## Extract target log message 1
<worker 10>
  <source>
    @type tail
    @id logmetrics_counter
    path /usr/lib/WebAppA/ControllerLog/ControllerLog.log
    tag WebAppA.ControllerLog
    pos_file ../data/fluentd/tail/ControllerLog.pos
    read_from_head true
    <parse>
      @type regexp
      expression /^(?<logtime>[^\[]*) \[(?<loglebel>[^\]]*)\] (?<class>[^\[]*) : endpoint "\/register" started. Target record: (?<record_num>\d[^\[]*).$/
      time_key logtime
      time_format %Y-%m-%d %H:%M:%S
      types record_num:integer
    </parse>
  </source>
 
## Output
## Define log metrics 1 and 2
  <match WebAppA.ControllerLog>
    @type prometheus
    <metric>
      name logmetrics_request_endpoint_register
      type counter
      desc The request number of endpoint register
    </metric>
    <metric>
      name logmetrics_num_of_registeredrecord
      type counter
      desc The number of registered record
      key record_num
      <labels>
      loggroup ${tag_parts[0]}
      log ${tag_parts[1]}
      </labels>
    </metric>
  </match>
</worker>
Note

The storage directory and written code follow the format of the log metrics definition file (fluentd_any-name_logmetrics.conf).

- Sample Fluentd log monitoring target definition file

- File name: jpc_fluentd_common_list.conf

- Written code

## [Target Settings]
  ...
(omitted)
  ...
@include user/fluentd_WebAppA_logmetrics.conf
Note

The storage directory and written code follow the format of the Fluentd log monitoring target definition file (jpc_fluentd_common_list.conf) in JP1/IM - Agent definition files. You do not have to create a new file. Instead, you add the include section for the log metrics feature to the Fluentd log monitoring target definition file (jpc_fluentd_common_list.conf) created during installation.

(m) Web scenario monitoring function

In JP1/IM - Manager and JP1/IM - Agent versions 13-10 and later, Web scenario monitoring function is available. #

Web scenario monitoring function is one of Synthetic metric collector. Monitors how long a user action plays in Web browser. The monitoring scope is HTTP(S) communication of the initial screen and the series of operations from login to logoff. HTTP(S) Monitors the operation of Web contents that issue a large number of communications combining HTML,json,xml, etc. based on communications. It is possible to monitor from the viewpoint of user operation, which cannot be done by Synthetic metric collector (single HTTP (S) monitoring) by Blackbox exporter.

#

If JP1/IM - Manager is upgraded from a version earlier than 13-10 to a version later than 13-10, and you use Web scenario monitoring function, you must configure the settings to use Web scenario monitoring function. For instructions on setting up JP1/IM - Manager, see Setting up the environment variables and Setting up Web exporter in 1.21.2(13)(a) Setting up JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Configuration Guide.

See Configuring authentication in 1.21.2(13)(a) Setting up JP1/IM - Agent

■ Prerequisites

When you use Web scenario monitoring function, you have the following prerequisites:

  • Prerequisite browser

    The following browsers must be installed before you can create and monitor Web scenarios.

    • Google Chrome

    • Microsoft Edge

    In addition, you must be able to access the targets from the above browsers.

    The above browsers are used to create Web scenarios and to monitor Web scenarios using Web scenarios.

  • Agent host

    We recommend that you create Web scenarios on the same host and monitor Web scenarios on the same host.

    If you want to migrate Web scenario file to a different host, you must perform the steps in 1.5.1(9)(c) Migrating Web Scenario Files to another host in the JP1/Integrated Management 3 - Manager Administration Guide.

    In addition, Web scenario monitoring function can only be used by agent host on Windows host that have JP1/IM - Agent for Windows installed.

  • Web exporter

    The listen port used by Web exporter must be protected, for example, by a firewall or networking configuration, so that it is not accessed by anything other than JP1/IM - Agent's Prometheus server. For the port used by Web exporter, see the explanation of web_exporter command options in Service definition file (jpc_program-name_service.xml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Network configuration

We recommend that you install JP1/IM - Agent on a networked host that is close to the user who is using the monitored Web contents to monitor according to user's Web scenarios. If the network path from JP1/IM - Agent to the monitoring target differs greatly from that of the user using Web contents, it is difficult to detect monitoring errors due to failure of the relay device.

■ Function List

Web scenario monitoring function monitors the response time of the user's experience by automatically playing back the user's actions on the browser screen and measuring the playing time.

Web scenario monitoring function consists of Web operation information collection function, which collects performance information for Web operation responses based on Web operation scenario, and Web scenario creation support function, which helps create a Web operation scenario (Web scenario).

Table 3‒37: Function List

Function

Description

Web scenario monitoring function

Monitoring of Web system is realized from the performance data of the collected Web response.

Web scenario creation support function

Supports you create scenarios for Web manipulation.

Web scenario creation function

Launch your browser and create Web scenarios.

Web operation information collection function

Based on Web action scenario, collect the performance-information of the response of Web action.

Use a Web exporter that provides Web operation information collection function.

Web scenario execution function

Perform Web actions as they were created in Web scenario.

Trace viewer function

Displays trace information to be used for investigation when an error is detected during Web scenario execution.

■ Web Scenario Creation Support Function

Web Scenario creation support function launches Web Scenario creation function, which launches a browser and records what user interact with in the browser as a Web scenario.

■Scenarios that can be created

You can monitor Web contents using Web scenarios that record the following actions:

  • Operations for displaying the top page

    This is just the operation to display the top page. No other operations are required.

  • To log on from the Login screen

    Enter the username and password and click the logon button.

  • Operation of the logoff button on the logoff screen

■ Web Scenario Creation Function (playwright codegen)

Web Scenario creation function provides the ability to assist in the creation of Web manipulation scenarios (Web scenarios). Web Scenario creation function uses Playwright of OSS.

■Prerequisites

When you run playwright command manually, the current folder must be Playwright working folder. For Playwright working folders, see Appendix A.4(3) Integrated agent host (Windows).

It must be run in built-in Administrator.

■Starting Codegen

Web Scenario creation function uses Codegen of Playwright.

The user who runs Codegen must be the same user as the user who runs Web exporter.

Use playwright codegen command to perform Web scenario creation function.

Playwright codegen command is a command that opens a Web site and generates a code-based page in response to your actions. Allows users to run on the terminal.

Recording starts when Web Scenario creation function is activated.

npx playwright codegen --target playwright-test --channel channels --lang locale URL -o ./tests/Web-scenario-filename

Codegen opens two windows: a browser window for interacting with the monitored Web site and a Playwright Inspector window for recording Web scenario code.

When a user runs a Codegen and performs an action in a browser, Playwright generates the code according to the action.

For details about the parameters that can be specified in playwright codegen command, see the following tables.

  • npx playwright codegen command option

    Item

    Description

    Changeability

    What You Setup in Your JP1/IM - Agent

    JP1/IM - Agent Defaults Value

    -o filename

    or

    --output filename

    Save the generated script to a file

    REQ

    Specifies the path to the filename of the destination Web scenario file relative to the command-execution directory.

    If not specified, the script will be discarded when Codegen terminates. The generated script must be copied to a text file, for example, by the user.

    File names have the following rules:

    • The filename must be in the format "String.spec.ts".

    • The file name can contain only single-byte alphanumeric characters and underscores (_).

    • The maximum number of bytes that can be specified for a parameter is 256 bytes.

    • You cannot specify folders and files on a network drive. If specified, operation cannot be guaranteed in the event of a network failure or delay (in the event of a Windows).

    • The following pathnames cannot be specified:

      - File name with a leading "-" (hyphen)

      - Folder or file name containing environment-dependent characters

    If you specify a file that does not exist, a new file is created.

    If you specify a file name that already exists, the file is overwritten.

    For details about the storage location of the output Web scenario file, see Appendix A.4(3) Integrated agent host (Windows).

    ./tests/Web-scenario-filename

    --target language

    Select the language for generating the script.

    --

    None

    None

    --channel channels

    Specifies the distribution channel for Chromium.

    REQ

    Specify one of the following as the browser for executing Codegen.

    • "chrome"

      Specify if you want to use Google Chrome.

    • "msedge"

      Specify if you want to use Microsoft Edge.

    None

    --lang language

    Specify the language and locale.

    <Example of specification>

    "ja-JP"

    REQ

    One of the following, depending on the language code at the time of the test run:

    • "en-US"

    • "ja-JP"

    • "zh-CN"

    • "th"

    If not specified, a Web scenario is generated in a language code that differs from the language code at the time of the test. Originally, a successful Web scenario may fail.

    None

    --proxy-server proxies

    Specify the proxy server.

    <Example of specification>

    "http://myproxy:3128"

    "socks5://myproxy:8080"

    Y

    Specifies the proxy used for the request.

    Specify the entire domain with up to 253 alphanumeric characters.

    None

    --proxy-bypass Bypass proiesy

    Specifies a comma-separated domain of proxies to bypass.

    <Example of specification>

    ".com,chromium.org,.domain.com"

    Y

    Specifies the domain for proxy bypass, up to 253 alphanumeric characters.

    None

    URL

    Specify URL to be monitored.

    Y

    Specify the entire domain with up to 253 alphanumeric characters.

    Specify URL in the following format:

    Protocol://Hostname:port-number

    None

Legend:

REQ: Required setting, Y: Changeable, --: Not applicable

■Recording screen operations

When recording operations such as mouse clicking, text entry, and HTML operations, perform the operations you want to record in the browser window while recording is already started. As you work, a Web scenario code is generated on Playwright Inspector window.

The following tables show the browser operations and operations that can be recorded and measured as Web scenarios.

Table 3‒38: Operation and operation of browsers that can be recorded and measured as Web scenario

Classification

Operation

Record

Remarks

Sample Codes Recorded by codegen

Mouse operation

--

--

--

The mouse operation itself is not recorded, but it is recorded as button operation, etc. caused by mouse operation.

--

Click

Y

Y

--

await page.getByRole('button', { name: 'Login' }).click();

Double click

Y

Y

--

await page.getByRole('button', { name: 'Clear' }).dblclick();

Sub button click

Y

Y

--

await page.locator('body').click({

button: 'right'

});

Keyboard operation (key entry operation)

--

--

--

--

--

Entering Characters

Y

Y

The value being input is reflected in real time. The entered value is recorded as a HTML action, etc.

await page.locator('input[name="username"]').fill('username');

Shortcut key input

--

--

This is the same as browser operation.

This is the same as browser operation.

Accelerate key input

--

--

Other key input

--

--

Browser operations

--

--

--

--

--

Move next item

[Tab]

Y

Y

Only recorded if an element in HTML is selected.

await page.locator('body').press('Tab');

Move previous item

[Shift]+[Tab]

Y

Y

await page.locator('body').press('Shift+Tab');

Go to next page

[Alt]+[→]

Y

Y

Keyboard actions are disabled. Page transitions are recorded.

await page.goto('URL');

Go to previous page

[Alt]+[←]

[BackSpace]

Y

Y

Context menu display

[Right-click]

[Shift]+[F10]

Y

Y

--

await page.locator('body').press('Shift+F10');

Scroll up

[↑]

Y

Y

--

await page.locator('body').press('ArrowUp');

Scroll down

[↓]

Y

Y

--

await page.locator('body').press('ArrowDown');

Page Up Scroll

[PgUp]

Y

Y

--

await page.locator('body').press('PageUp');

Page Down Scroll

[PgDn]

Y

Y

--

await page.locator('body').press('PageDown');

Go to top of page

[Home]

Y

Y

--

await page.locator('body').press('Home');

Go to End of Page

[End]

Y

Y

--

await page.locator('body').press('End');

Stop operation

[Esc]

Y

Y

--

await page.locator('body').press('Escape');

Link-click [Enter]

[Click]

Y

Y

This is the same as HTML linking operation. Page transitions are recorded.

This is the same as HTML linking operation.

Multiple selection operation

[Ctrl] +[click]

Y

Y

--

await page.getByRole('listbox').selectOption(['apple', 'banana', 'orange']);

Cut

[Ctrl]+[X]

Y

Y

--

await page.locator('body').press('Control+x');

Copy

[Ctrl]+[C]

Y

Y

--

await page.locator('body').press('Control+c');

Paste

[Ctrl]+[V]

Y

Y

--

await page.locator('body').press('Control+v');

Select All

[Ctrl]+[A]

Y

Y

--

await page.locator('body').press('Control+a');

Dialog operation

--

--

--

The operation itself may not be recorded, but page transitions are recorded.

await page.goto('URL');

Text input

Y

#1

--

--

Key operation

Y

#1

--

--

Other input items

Y

#1

--

--

HTML operation

--

--

--

Records operations related to input operations and page transitions.

--

Link operation

Y

Y

--

--

INPUT TEXT handling (text-entry)

Y

Y

--

await page.getByLabel('Name (4 to 8 characters):').fill('test');

INPUT PASSWORD (password-entry)

Y

Y

--

await page.getByLabel('password (8 characters or more):').fill('pwdtest1');

INPUT CHECKBOX

Y

Y

--

await page.getByRole('checkbox').check();

INPUT RADIO

Y

Y

--

await page.getByLabel(''apple').check();

INPUT SUBMIT

Y

Y

--

await page.getByRole('button', { name: 'Send' }).click();

INPUT RESET

Y

Y

--

await page.getByRole('button', { name: 'Reset Form' }).click();

INPUT BUTTON

Y

Y

--

await page.getByRole('button', { name: 'test' }).click();

Script operation

--

--

--

Scripts without page transitions, HTML operations, or button operations are not recorded, but page transitions, HTML operations, and button operations that are caused by script operations are recorded. #2

--

Page transition operation

Y

Y

Actions implemented inside the script may not be recorded, but page transitions are recorded.

--

Legend

Operation field Y: Can be operated --: Not applicable

Recording field Y: Recording object --: Not applicable

Other than the above --: Not applicable

#1

The data is recorded based on the values entered in the dialog or the operation results by pressing the button. However, some dialogs may not be recorded correctly. Be sure to run it after creating Web scenario to see if it runs correctly.

Dialogs that do not run correctly in Web scenario, such as stopping while dialogs are open, cannot be handled.

#2

Depending on the page-transitions, HTML and button-operation timings caused by scripting, recording may not be possible.

Be sure to run it after creating Web scenario to see if it runs correctly.

Operations or behaviors not described in the above tables cannot be recorded and measured as Web scenarios.

Note that dialogue authentication (other than user ID and passwords) and ActiveX in-control are not supported. Also, ftp is not supported.

■Record of assertion

Assertion is an operation that checks whether the elements displayed on Web website match the expected content. When you run Codegen to create a Web scenario, clicking on an element displayed in the browser window and adding an assert to Web scenario determines whether the element displayed on the browser window when Web scenario is run matches the element displayed on the browser window when Codegen is run.

The following types of assertions are available:

  • assert visibility

    Assert that the element exists.

  • assert text

    Asserts that the element contains certain text.

  • assert value

    Asserts that an element has a specific value.

If you want to add an assertion to a Web scenario, click one of the buttons on assert visibility,assert text,assert value and select the element to be asserted in the browser window. An assertion is generated for the selected element in Playwright Inspector window.

■Pausing Recording

If you want to pause recording, press Record. Clicking Record button again resumes recording.

■Saving the generated Web scenario code

When you exit Web scenario creation function, the generated Web scenario is saved in Web scenario file specified in the command-line options at the beginning of Web scenario creation function.

■Exiting Codegen

To exit Web scenario creation function, press the Ctrl+C keys in the terminal where playwright codegen command was executed to exit, or close the browser window that was opened when Web scenario creation function was started.

■Codegen Window Structure

When you use playwright codegen command to execute Web scenario creation function, the following window is displayed.

  • Browser window

    Web site where you want to run the scenarios is displayed. Records clicks and typing actions by navigating Web pages.

    The following tables show the buttons and operations that are used to record Web page operations. For details, see Playwright documentation.

    Item number

    Button

    How to operate

    1

    --

    Drag this button to move the tab.

    2

    Record

    Click this button to stop or resume recording.

    3

    assert visibility

    Click this button, and then select the element that you want to assert that the element is visible.

    Click this button again to return to normal operation recording.

    4

    assert text

    Click this button, then select the element for which you want to assert that the element contains specific text.

    Click this button again to return to normal operation recording.

    5

    assert value

    Click this button, then select the element for which you want to assert that the element has a specific value.

    Click this button again to return to normal operation recording.

  • Playwright Inspector window

    Allows you to record Web scenarios.

    The following buttons and procedures are used to record Web scenarios: For details, see Playwright documentation.

    Item number

    Button

    How to operate

    1

    Record

    Same as Browser window.

    2

    assert visibility

    Same as Browser window.

    3

    assert text

    4

    assert value

■Notes

  1. Web scenario created by Web scenario creation function cannot be used to determine the status code of HTTP. Therefore, if "404 Not Found" or "500 Internal server error" is returned, it may be determined that Web entry was successful.

  2. When using Web scenario creation function to verify that Web page transitions, you will not be able to detect successful or unsuccessful page transitions when using the following Web scenario:

    <Example of operation to check>

    On integrated operation viewer login page (URL:'http://hostname:20703/login'), enter your registered username and password to verify that you can successfully log in.

    <Some coding that Codegen writes to Web scenarios>

    test('test', async ({ page }) => {
      await page.goto('http://hostname:20703/login');
      await page.locator('input[name="username"]').fill('username');
      await page.locator('input[name="password"]').fill('password');
      await page.getByRole('button', { name: 'Login'}).click();
    });

    When the above Web scenario is played back, the operation to click Login button is played, and playback is terminated regardless of whether or not the screen is displayed after logging in. Therefore, page transitions cannot be detected as successful or unsuccessful.

    The following are the actions to be taken to verify a successful page transition:

    Use assertion of Codegen to add an action that asserts that the page displays its own elements after the page transition.

    For details on how to record assertions, see ■Record of assertion.

    Here is an example of a Web scenario with the above example modified:

    <Some coding that Codegen writes to Web scenarios>
    test('test', async ({ page }) => {
      await page.goto('http://hostname:20703/login');
      await page.locator('input[name="username"]').fill('username');
      await page.locator('input[name="password"]').fill('password');
      await page.getByRole('button', { name: 'Login' }).click();
      await expect(page.getByRole('button', { name: Logout' })).toBeVisible();
    });

    In the example of the modified Web scenario above, we added an action to assert that the page that transitions after the login process shows Logout button that should be displayed on that page. If the assertion of Logout button fails, it can detect that the page transitions after logging in failed.

  3. When URL transitions are recorded in Codegen, if a transfer (redirection#) is made to another new URL when accessed in the specified URL, URL transition to the redirection source is not recorded, only URL transition to the redirection destination may be recorded.

    If Codegen records actions that are redirected to a different URL by a server of the specified URL, the monitoring scope cannot include redirection from the redirection source to the destination.

    #

    Refers to the redirection performed by HTTP protocol using HTTP status code (in the 300 range) and Location header field.

    The following example shows where URL transitions to the redirection source are not recorded, but only URL transitions to the redirection destination are recorded.

    • When servers redirect from URL to new URL due to, for example, the transfer of a monitored site

    • When a forward slash (/) is missing at the end of URL specified in Codegen and the servers automatically add the forward slash and redirect it to the correct URL.

    Note that redirects that do not involve the following HTTP protocols do not fall under this precaution, and URL transitions of the redirection source and the redirection destination are recorded.

    • HTML redirection using the <meta> element of HTML

    • JavaScript redirection executed due to the URL string of the window.location property set by a client script such as JavaScript

■ Web operation information collection function (Web exporter)

Web operation information collection function (Web exporter) executes the scenario for Web scenario file created beforehand using Prometheus server's scrape request as the trigger, and returns the execution result as scrape result. Detailed motion at the time of scenario execution is output as a trace and can be viewed by the user using the trace viewer function.

■Acquisition items

The metrics that can be retrieved with Web exporter (Web operation information collection function are probe_webscena_success (Displays whether the probe was successful#1) and probe_webscena_duration_seconds (The seconds taken by the web scenario probe#2).

#1

Signifies the success or failure of the entire collection, including preparation for collection (such as process startup).

#2

If the collection fails, metric may not be retrieved.

Web exporter retrieval items are defined in metric definition file (metrics_web_exporter.conf) of Web exporter. For details, see Web exporter Metric Definition File (metrics_web_exporter.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■Monitoring when a monitoring target is temporarily stopped

To suppress error detection during a power failure or maintenance, you must stop collecting activity information for the target.

The collection of operational information can be stopped by deleting the applicable monitoring target in targets of Web exporter discovery configuration file (jpc_file_sd_config_web.yml). For details, see Web exporter discovery configuration file (jpc_file_sd_config_web.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Web scenario execution function

Use playwright to perform Web scenario execution functions.

Playwright exporter configuration file specifies the parameters for Web scenario execution function.

For Playwright configuration file, see Playwright configuration file (jpc_playwright.config.ts) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■Trace

Web scenario execution function outputs the trace during Web scenario execution to a trace file in the following Web exporter: Web scenario file displays the results of the actions performed and HTTP communication traces.

  • For physical hosts

    Agent-path\logs\web_exporter\trace\Web-scenario-filename-test-project-name-number-of-retries_generation-number\trace.zip

  • For logical hosts

    shared-folder\jp1ima\logs\web_exporter\trace\Web-scenario-filename-test-project-name-number-of-retries_generation-number\trace.zip

Web-scenario-filename

If Web scenario filename ends with ".spec.ts", the text without ".spec.ts" is stored.

project-name

The character string specified in name parameter of Playwright configuration file is set.

Spaces, control characters, and the following characters are converted to a hyphen (-).

! " # $ % & ' ( ) * + , . / : ; < = > ? @ [ \ ] ^ _ { | } ~

number-of-retries

Used if retries parameter of Playwright configuration file is 1 or more. retry1, retry2, retry3, ... is set according to the number of retries when Web scenario execution failed.

The "- number-of-retries" part is granted only when retrying. Therefore, it is not granted to the first-run tracing of Web scenarios per scrape.

generation-number

The 4-digit number is set.

For the number of generations of traces to be saved, see tracenum in Web exporter configuration file (jpc_web_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

The file size of the trace is a few MB (it varies greatly depending on the content of the monitored content). For Web scenarios where you log in to and log off from Intelligent Integrated Management Base, it is approximately 2MB per scenario. If 2000 generations and 0 retries are retained every 6 minutes (defaults), approximately 4GB of disc space is required.

■Trace Viewer

Web exporter trace file can be referenced in the trace viewer.

The trace viewer is used to investigate the details when an error is detected.

For details about the trace viewer, see 3.15.1(1)(n) Trace Viewer Function (playwright show-trace).

■ Monitoring with Other Monitoring Function

Web scenario monitor function allows you to monitor Web contents from the user's point of view, but does not allow detailed monitoring of HTTP communication (name resolution times or certificate expiration time) or monitoring inside the monitoring target. Therefore, if an error occurs, you cannot investigate the cause of the error using only metric information acquired by Web scenario monitoring function.

For example, you need to monitor HTTP communication using Blackbox exporter outline monitoring and monitor the inside of the monitored side (HTTP servers and DB servers) using log trapper of Fluentd.

■ Handling of Public Key Infrastructure (PKI: Public Key Infrastructure) Certificates Used in TLS Communication

If the monitoring target is a HTTPS server, register the certificate below in OS (for Windows, register it in the certificate store).

  • CA certificate of authentication authority that issued the server certificate

  • Client certificate (if HTTPS server requires a client certificate during TLS handshake) and private key

For details about how to register with OS, see the documentation for your OS.

■ Understanding Web Scenarios for HTTP authentication with Passwords

If the monitored Web contents require HTTP authentication with a username and password (such as Basic authentication), enter the username and password in URL fields of Web scenario creation function as follows:

http://username:password@domain-name:port/Web-content-path

■ Handling passwords

HTTP authentication and Web contents the passwords that you use in your own authentication (if you are prompting for a username and password on the form) are stored in Web exporter configuration file, Web scenario file, and the trace file. When providing the information for failure investigation to the requester, the user should perform masking such as replacing the password part with a different character string to prevent leakage.

■ Configuring HTTP Proxies

To set up a HTTP proxy server to communicate from JP1/IM - Agent host to the monitoring target, set "proxy" in Playwright configuration file (jpc_playwright.config.ts) item.

For details about Playwright configuration file, see Playwright configuration file (jpc_playwright.config.ts) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

For details on configuration file editing procedure, see To edit the configuration files (for Windows) in 1.19.3(1)(a) Common way to setup in the JP1/Integrated Management 3 - Manager Configuration Guide.

■ About Reviewing Web scenarios

Web scenario monitoring function does not provide the ability to independently test Web scenario. Actually monitor Web scenarios. Make sure that the monitoring is successful. Refer to metric of the probe_webscena_success to determine whether it is normal.

■ Timeout Settings and User Tasks When Timeout Occurs

Web scenario creation support function suspends the collection of too-long activity information (collection of Web scenario execution times) due to timeouts.

The following parameters relate to timeouts:

Setting point

Parameter name

Prometheus configuration file (jpc_prometheus_server.yml)#1

scrape_timeout (scrape required timeout period)

web_exporter command options#2

--timeout-offset (The number of seconds to be subtracted from the Prometheus scrape_timeout value (Offset subtracted from timeout time)).

It is fixed at 0.5 second. The user cannot be changed.

#1

For details about Prometheus configuration file parameters, see Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

#2

For details about the options of the web_exporter command, see Service definition file (jpc_program-name_service.xml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

The timeout setting is always applied, and the collection is interrupted when the timeout period is exceeded. As a result, the collection does not continue indefinitely without interruption.

The timeout period is --timeout-offset(0.5 seconds) subtracted from the scrape_timeout time.

This timeout time must include the execution time of the processing required to collect the activity information. Collection of operational information includes manipulation of Web contents (browser operations) according to Web scenarios.

In practice, it is recommended that you set a timeout of 30 seconds more than it would take to run the actual Web scenario. This is because there is a startup process for browsers and other processes.

If processing is aborted due to a timeout, one or more of the following messages is output to the log. At this time, the probe_webscena_success metric may be 0 (failed) or the metric may not be sent. Check the following log file to see if the processing was aborted by a timeout.

Log file

Message

web_exporter log

KNBC20144-E An error occurred while an internal command was executing. (maintenance information = exit status 1)

KNBC20147-E An error occurred while an internal command was executing. (message = Test timeout of milliseconds exceeded., ...)

Prometheus server log

msg="Scrape failed" err="Get URL: context deadline exceeded"

Even if the timeout occurred, the child process that started is terminated, so the user does not need to terminate it.

■ Notes

  • The following monitoring cannot be performed using Web scenario monitoring function:

    • Monitoring Web contents that do not support JP1/IM - Agent supported browsers

    • Monitoring Web contents that behave differently than when creating Web scenarios

    • Monitoring HTTP status codes

    • Monitoring Web sites using external authentication providers for authentication

  • If a timeout occurs during the collection of operational information, the browser process may remain unfinished. In this case, the user must stop the applicable process. For details, see ■Timeout Settings and User Tasks When Timeout Occurs.

(n) Trace Viewer Function (playwright show-trace)

The Trace viewer function provides a visual overview of the actions recorded in the trace during a Web scenario.

■Prerequisites

When the user runs playwright command manually, the current folder must be Playwright working folder. For Playwright working folders, see Appendix A.4(3) Integrated agent host (Windows).

Run as a user with Administrator's permissions (run from the Administrator Console if Windows's UAC function is enabled).

You can use playwright show-trace commands to perform trace viewer functions.

playwright show-trace command displays the trace viewer. Allows users to run on the terminal.

■Run Web Scenarios to log tracing

To log traces when running Web scenarios, you must specify a on in Playwright configuration file (jpc_playwright.config.ts) trace optional mode to ensure that traces are recorded at all times for every test run.

For the format and options of Playwright configuration file, see Playwright configuration file (jpc_playwright.config.ts) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■Open the trace

You can run the following command to display the trace for the path specified in the command options in the trace viewer.

Run the command as a user with Administrator's permissions (if the Windows UAC function is enabled, run the command from the administrator console).

npx playwright show-trace trace-file-path

For details about the parameters that you specify for playwright show-trace command, see the following tables:

  • npx playwright show-trace command option

    Item

    Description

    Changeability

    What You Setup in Your JP1/IM - Agent

    JP1/IM - Agent Defaults Value

    Path to a trace file

    Specifies the trace file to be displayed in the trace viewer.

    Y

    Specifies the path to the output trace file.

    If it is not specified, drag-and-drop the trace file on the displayed HTML to display the trace.

    None

Legend:

Y: Changeable

In the trace viewer, you can see the following information:

  • Action

    Action tab, you can see which locator was used for the action and how long it took each action to execute.

    If you want to verify the transformation of DOM snapshot, hover over the respective action in Web scenario.

    If you are investigating or debugging, move the time axis forward or backward and click the action you want to review.

    Use the Before and After tabs to see the differences before and after the actions.

  • Screenshots

    Records screenshots as traces and displays thumbnail images in chronological order at the top of the trace viewer. You can mouse over a thumbnail image to display an enlarged image of each action and state.

    You can double-click an action to view the time that the action was executed. When you select multiple actions using the sliders on the timeline, they appear in the Action tab, and you can filter and view the log for only the selected actions.

  • Snapshot

    By default, tracing is performed with the snapshot option turned off.

    If you want to use this function, you must specify true for the snapshots parameter of the Playwright configuration file. For Playwright configuration file, see Playwright configuration file (jpc_playwright.config.ts) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

    You can switch the tabs in the center of the screen to see the types of snapshots listed in the following table.

    Type

    Description

    Action

    Snapshot of the moment of input that was executed

    Use this type of snapshot to see exactly where Playwright clicked.

    Before

    Snapshot at the time the action was invoked

    After

    Snapshot after action

  • Source

    When you hover over an action in a Web scenario, the code line for that action is highlighted in the Source tabbed page.

  • Call

    Call tabbed page shows the execution time and used locators.

  • Log

    Use to view a log of actions, such as scrolling, waiting for elements to appear, enabled and stable, clicking, and filling in a view.

  • Error

    If Web scenario execution fails, an error message is displayed on the Error tabbed page. The timeline also displays a red line to indicate where the error occurred.

    To check the source code line, select Source tabbed page.

  • Console

    Browse the console logs for browser and Web scenario runs.

  • Network

    Network tabbed page that shows the networking requests that were made during Web scenario.

    Name, Method, Status, Content Type, Duration alternatively, select Size to change the order.

    Click Request to view information about the request, such as the request header, response header, request body, and response body.

    If you want to use this function, you must specify true for the snapshots parameter of the Playwright configuration file. For Playwright configuration file, see Playwright configuration file (jpc_playwright.config.ts) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

  • Metadata

    Metadata tab next to the Action tab provides detailed information about Web scenario execution, such as browser, viewport size, and runtime.

    start time shows the time when Web was started. The time displayed in the trace is the date and time of JP1/IM - Agent host displayed in "YYYY/MM/DD hh:mm:ss" format. If the time zone of the monitored host differs from the time zone of JP1/IM - Agent host, the date and time of JP1/IM - Agent host also apply.

■Close the trace

To exit the trace viewer, press the Ctrl+C keys to exit or close the trace viewer window at the terminal where playwright show-trace command was executed.

(o) VMware exporter (VMware performance data collection capability)

VMware exporter is an Exporter for Prometheus that retrieves performance data from VMware ESXi.

■ Prerequisites

It is a prerequisite that the ports used by VMware exporter are protected by firewalls, networking configurations, and so on, so that they are not accessed by anything other than Prometheus server of JP1/IM - Agent.

For the port used by VMware exporter, see vmware_exporter command options in Service definition file (jpc_program-name_service.xml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Conditions to be monitored

  • VMware vCenterServer are not monitored.

  • VMware exporter target is VMware ESXi. For details about the supported VMware ESXi versions, see the Release Notes.

  • The name of the datastore# managed by VMware ESXi must be the same as the host name. If the datastore name and host name are different, separate nodes are created for the datastore and the hypervisor, and the available metrics are separated.

    When nodes are divided into datastores and hypervisors, the metrics that can be retrieved for each node are as follows.

    • Data store

      vmware_host_size, vmware_host_used, vmware_host_free, vmware_datastore_used_percent

    • Hypervisor

      Metrics for hosts, except: vmware_host_size, vmware_host_used, vmware_host_free, vmware_datastore_used_percent

    For details about each metric and its description, see VMware exporter metric definition file for host (metrics_vmware_exporter_host.conf) and VMware exporter metric definition file for VM (metrics_vmware_exporter_vm.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

    #: If there is more than one data store, use "host-name_any-string".

  • Do not use duplicate VM names that are managed by VMware ESXi. If VM names are duplicated, the same node will be displayed with more than one monitor result. Therefore, be sure to set VM name to a unique name.

■ Acquisition items

VMware exporter shipped with JP1/IM - Agent has metric that is defined by VMware exporter defaults.

VMware exporter retrieval items are defined in metric definition file for host and metric definition file for VM of VMware exporter. For details, see VMware exporter metric definition file for host (metrics_vmware_exporter_host.conf) and VMware exporter metric definition File for VM (metrics_vmware_exporter_vm.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

Metric are obtained using OSS's pyVmomi and VMware's officially provided vRealize Operations of metric. Metric of vRealize Operations used by metric are listed in the following tables.

Metric Name

Category

Description

Label

Data source

vmware_datastore_capacity_size

DATASTORES

VMware Datastore capacity in bytes (Unit:B)

dc_name : data-center-name

ds_name : datastore-name

instance : data-retrieval-address

job : job-name

Get by pyVmomi

vmware_datastore_capacity_size of datastore structure

vmware_datastore_freespace_size

DATASTORES

VMware Datastore freespace in bytes (Unit:B)

dc_name : data-center-name

ds_name : datastore-name

instance : data-retrieval-address

job : job-name

Get by pyVmomi

vmware_datastore_freespace_size of datastore structure

vmware_host_num_cpu

HOSTS

VMware Number of processors in the Host

dc_name : data-center-name

host_name : host-name

instance : data-retrieval-address

job : job-name

Get by pyVmomi

vmware_host_num_cpu of

vmware_datastore_freespace_size of hostst structure

vmware_host_memory_usage

HOSTS

VMware Host Memory usage in Mbytes (Unit:MB)

dc_name : data-center-name

host_name : host-name

instance : data-retrieval-address

job : job-name

Get by pyVmomi

vmware_host_memory_usage of hostst structure

vmware_host_memory_max

HOSTS

VMware Host Memory Max availability in Mbytes (Unit:MB)

dc_name : data-center-name

host_name : host-name

instance : data-retrieval-address

job : job-name

Get by pyVmomi

vmware_host_memory_max of hostst structure

vmware_host_mem_vmmemctl_average

HOSTS

The total amount of memory currently used for virtual machine memory control. (Unit:KB)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

mem.vmmemctl.average of performance counters

vmware_vm_mem_swapped_average

VMS

The amount of unreserved memory in kilobytes. (Unit:KB)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

mem.swapped.average of performance counters

vmware_host_net_bytesRX_average

HOSTS

Average amount of data received per second. (Unit:KBps)

dc_name : data-center-name

host_name : host-name

Get by pyVmomi

vmware_host_net_bytesRX_average of performance counters

vmware_host_net_bytesTX_average

HOSTS

Average amount of data transferred per second. (Unit:KBps)

dc_name : data-center-name

host_name : host-name

Get by pyVmomi

vmware_host_net_bytesTX_average of performance counters

vmware_vm_mem_active_average

VMS

The amount of memory that is being used effectively. (Unit:KB)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

mem.active.average of performance counters

vmware_vm_guest_disk_capacity

VMGUESTS

Disk capacity metric per partition (Unit:B)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

vmware_vm_guest_disk_capacity of vmguests structure

vmware_vm_guest_disk_free

VMGUESTS

Disk metric per partition (Unit:B)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

vmware_vm_guest_disk_free of vmguests structure

vmware_vm_mem_vmmemctl_average

VMS

The total amount of memory currently used for virtual machine memory control. (Unit:KB)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

mem.vmmemctl.average of performance counters

vmware_vm_mem_consumed_average

VMS

The amount of host memory consumed by the virtual machine for guest memory. (Unit:KB)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

mem.consumed.average of performance counters

vmware_vm_net_transmitted_average

VMS

The average amount of data transferred per second. (Unit:KBps)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

net.transmitted.average of performance counters

vmware_vm_net_received_average

VMS

The average amount of data received per second. (Unit:KBps)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

net.received.average of performance counters

vmware_vm_power_state

VMS

VMWare VM Power state (On / Off)VMWare VM Power state (On / Off)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

vmware_vm_power_state of vms structure

vmware_host_cpu_used_summation

HOSTS

Used CPU (Unit:msec)

dc_name : data-center-name

host_name : host-name

instance : data-retrieval-address

job : job-name

Get by pyVmomi

cpu.used.summation of performance counters

vmware_vm_cpu_ready_summation

VMS

Time spent in VMware host ready state. (Unit:msec)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

cpu.ready.summation of performance counters

vmware_vm_num_cpu

VMS

VMWare Number of processors in the virtual machine

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

vmware_vm_num_cpu of vms structure

vmware_vm_memory_max

VMS

VMWare VM Memory Max availability in Mbytes (Unit:MB)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

vmware_vm_memory_max of vms structure

vmware_vm_max_cpu_usage

VMS

VMWare VM Cpu Max availability in hz (Unit:hz)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

vmware_vm_max_cpu_usage of vms structure

vmware_vm_template

VMS

VMWare VM Template (true / false)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

vmware_vm_template of vms structure

vmware_host_cpu_usage_average

--

Average CPU usage

dc_name : data-center-name

host_name : host-name

instance : data-retrieval-address

job : job-name

Get by pyVmomi

cpu.usage.average of performance counters

vmware_host_disk_write_average

--

The amount of data written to disk during the performance interval. (Unit:KBps)

dc_name : data-center-name

host_name : host-name

instance : data-retrieval-address

job : job-name

Get by pyVmomi

disk.write.average of performance counters

vmware_host_disk_read_average

--

The amount of data read during the performance interval. (Unit:KBps)

dc_name : data-center-name

host_name : host-name

instance : data-retrieval-address

job : job-name

Get by pyVmomi

disk.read.average of performance counters

vmware_vm_cpu_usage_average

--

Average CPU usage

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

cpu.usage.average of performance counters

vmware_vm_disk_write_average

--

The amount of data written to disk during the performance interval. (Unit:KBps)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

disk.write.average of performance counters

vmware_vm_disk_read_average

--

The amount of data read during the performance interval. (Unit:KBps)

dc_name : data-center-name

ds_name : datastore-name

host_name : host-name

instance : data-retrieval-address

job : job-name

vm_name : virtual-machine-name

Get by pyVmomi

disk.read.average of performance counters

■ Obfuscation of VMware exporter passwords

VMware exporter shipped with JP1/IM - Agent manages the passwords for accessing VMware ESXi from VMware exporter in secret obfuscation capabilities. For details, see 3.15.10 Secret obfuscation function.

(p) Windows exporter (Hyper-V monitoring function)

Hyper-V monitoring function monitors Hyper-V activity using Widows exporter's hyperv collectors.

■ Prerequisites

The port used by the Hyper-V monitoring function must be protected by a firewall or network configuration so that it cannot be accessed by anyone other than the Prometheus server of JP1/IM - Agent.

For details about the ports used by Hyper-V monitoring function, see the explanation of windows_exporter command options (Hyper-V monitoring) in Service definition file (jpc_program-name_service.xml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Conditions to be monitored

  • For details about the versions of Hyper-V that Hyper-V monitoring function supports as targets, see the Release Notes.

  • The following rules apply to VM naming:

    • VM name must be the same as the host name of the guest OS.

    • Do not set a VM containing "-".

    • The name of the disc managed by Hyper-V must be the same as VM name.

      If you use a different name for the disc name and VM name, the following metric cannot be displayed on VM:

      - hyperv_vm_device_written

      - hyperv_vm_device_read

      For details about individual metric, see "Acquisition items" below.

      If there are multiple disks, use the name of "host-name_any-string".

  • If you use live migration, for example, to move a VM from a monitored host, you will not be able to monitor that VM. You can monitor the destination VM by making it a monitoring target.

  • VM that have never been started are not collected, and no VM are created. Therefore, the tree must be updated when VM is started for the first time.

  • You can monitor only VM of hosts with which JP1/IM - Agent resides. It does not monitor VM in nested constructs.

■ Acquisition items

Hyper-V monitoring function obtains metric of Windows exporter (Hyper-V monitoring) defaults-defined Hyper-V.

Windows exporter (Hyper-V monitoring) retrieval items are defined in metric definition file for host and metric definition file for VM of Windows exporter (Hyper-V monitoring). For details, see Windows exporter (Hyper-V monitoring) metric definition file (metrics_windows_exporter_hyperv_host.conf) and Windows exporter (Hyper-V monitoring) metric definition file for VM (metrics_windows_exporter_hyperv_vm.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

The following table lists metric that can be specified for PromQL expression in the definition file: For details about the "Collector" in the table, see the description of the "Collectors" in the table below.

Metric Name

Collector

Contents to be acquired

Type

Label

windows_hyperv_vm_cpu_total_run_time

hyperv

The time spent by the virtual processor in guest and hypervisor code

gauge

instance: instance-identification-string

job: job-name

core: coreid

vm: virtual-machine-name

windows_hyperv_vm_device_bytes_written

hyperv

The total number of bytes that have been written per second on this virtual device

counter

instance: instance-identification-string

job: job-name

vm_device: virtual-disk-file-path

windows_hyperv_vm_device_bytes_read

hyperv

The total number of bytes that have been read per second on this virtual device

counter

instance: instance-identification-string

job: job-name

vm_device: virtual-disk-file-path

windows_hyperv_host_cpu_total_run_time

hyperv

The time spent by the virtual processor in guest and hypervisor code

gauge

instance: instance-identification-string

job: job-name

core: coreid

windows_hyperv_vswitch_bytes_received_total

hyperv

The total number of bytes received per second by the virtual switch

counter

instance: instance-identification-string

job: job-name

vswitch: virtual-switch-name

windows_hyperv_vswitch_bytes_sent_total

hyperv

The total number of bytes sent per second by the virtual switch

counter

instance: instance-identification-string

job: job-name

vswitch: virtual-switch-name

windows_cs_logical_processors

cs

Number of installed logical processors

gauge

instance: instance-identification-string

job: job-name

windows_hyperv_vm_cpu_hypervisor_run_time

hyperv

The time spent by the virtual processor in hypervisor code

gauge

instance: instance-identification-string

job: job-name

core: coreid

vm: virtual-machine-name

■ Collector

Windows exporter (Hyper-V monitoring) has a built-in collection process called a "collector" for each monitored resource such as CPU and memory.

You must enable the collectors for metric listed in the tables above that correspond to metric you want to collect. You can also disable collectors for metric that you do not want to collect to suppress unwanted collections.

Enable/disable for each collector can be specified with the "--collectors.enabled" option on the Windows exporter (Hyper-V monitoring) command line or in the item "collectors.enabled" in the Windows exporter (Hyper-V monitoring) configuration file (jpc_windows_exporter_hyperv.yml).

For details about Windows exporter (Hyper-V monitoring) command-line options, see the description of windows_exporter command options (Hyper-V monitoring) in Service definition file (jpc_program-name.service.xml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

For details about Windows exporter (Hyper-V monitoring) configuration file entry "collectors.enabled", see the description of item collectors in Windows exporter (Hyper-V monitoring) configuration file (jpc_windows_exporter_hyperv.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Notes

Because Hyper-V monitoring is used to monitor Hyper-V in JP1/IM - Agent's own host, when you use HA host clusters or live migration, you must deploy JP1/IM - Agent on the monitored targets according to the configuration of Hyper-V that you want to monitor.

When Hyper-V configuration is changed, the tree must be updated after the first startup of VM to be monitored.

(q) SQL exporter (Microsoft SQL Server monitoring function)

SQL exporter is an Exporter for Prometheus that retrieves performance data from Microsoft SQL Server.

- About the number of sessions

When monitoring Microsoft SQL Server from SQL exporter, the connection is made according to the number of connections defined in SQL exporter configuration file (jpc_sql_exporter.yml), and if the session retention time is within the time defined in this file, the data is acquired in the same session.

For details about SQL exporter configuration file, see SQL exporter configuration file (jpc_sql_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Supported targets and configurations

The target is instance of Microsoft SQL Server. Monitoring can be performed in units of instance and the maximum number of monitored devices is 10.

For details about supported Microsoft SQL Server versions and editions, see the Release Notes for JP1/IM - Agent.

The following shows Microsoft SQL Server configurations that are supported for monitoring.

  • Monitoring a single host (including remote monitoring)

  • Monitoring multiple hosts (including remote monitoring)

In a mirrored configuration, you can monitor both the principal database and the secondary database by setting them to be monitored. However, because each instance is different, the Watch Tree is collected as a separate node.

If you are configuring with SQL Server AlwaysOn Availability Group function, you can monitor both the primary and secondary databases by setting them to be monitored. However, because each instance is different, the Watch Tree is collected as a separate node.

■ Acquisition items

The metrics that can be retrieved with the SQL exporter shipped with the JP1/IM - Agent are the metrics defined by SQL exporter defaults and metrics listed below.

  • mssql_database_detail_process_count

  • mssql_global_server_summary_perc_busy

  • mssql_global_server_summary_packet_errors

  • mssql_server_detail_blocked_processes

  • mssql_server_overview_cache_hit

  • mssql_transaction_log_overview_log_space_used

SQL exporter retrieval items are defined in metric definition-file of SQL exporter. For details, see SQL exporter metric definition file (metrics_sql_exporter.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

The following tables list metric that can be specified for PromQL expression in the definition file. The value of each metric is obtained by executing the SQL statement shown in the table to Microsoft SQL Server. For details about metric, contact Microsoft based on SQL statement of the data source.

Metric name

Contents to be acquired

Label

Data source (SQL statement)

mssql_local_time_seconds

Local time in seconds since epoch (UNIX time).

none

SELECT DATEDIFF(second, '19700101', GETUTCDATE()) AS unix_time

mssql_connections

Number of active connections.

none

SELECT DB_NAME(sp.dbid) AS db, COUNT(sp.spid) AS count

FROM sys.sysprocesses sp

GROUP BY DB_NAME(sp.dbid)

mssql_deadlocks

Number of lock requests that resulted in a deadlock.

none

SELECT cntr_value

FROM sys.dm_os_performance_counters WITH (NOLOCK)

WHERE counter_name = 'Number of Deadlocks/sec' AND instance_name = '_Total'

mssql_user_errors

Number of user errors.

none

SELECT cntr_value

FROM sys.dm_os_performance_counters WITH (NOLOCK)

WHERE counter_name = 'Errors/sec' AND instance_name = 'User Errors'

mssql_kill_connection_errors

Number of severe errors that caused SQL Server to kill the connection.

none

SELECT cntr_value

FROM sys.dm_os_performance_counters WITH (NOLOCK)

WHERE counter_name = 'Errors/sec' AND instance_name = 'Kill Connection Errors'

mssql_page_life_expectancy_seconds

The minimum number of seconds a page will stay in the buffer pool on this node without references.

none

SELECT top(1) cntr_value

FROM sys.dm_os_performance_counters WITH (NOLOCK)

WHERE counter_name = 'Page life expectancy'

mssql_batch_requests

Number of command batches received.

none

SELECT cntr_value

FROM sys.dm_os_performance_counters WITH (NOLOCK)

WHERE counter_name = 'Batch Requests/sec'

mssql_log_growths

Number of times the transaction log has been expanded, per database.

none

SELECT rtrim(instance_name) AS db, cntr_value

FROM sys.dm_os_performance_counters WITH (NOLOCK)

WHERE counter_name = 'Log Growths' AND instance_name <> '_Total'

mssql_buffer_cache_hit_ratio

Ratio of requests that hit the buffer cache

none

SELECT cntr_value

FROM sys.dm_os_performance_counters

WHERE [counter_name] = 'Buffer cache hit ratio'

mssql_checkpoint_pages_sec

Checkpoint Pages Per Second

none

SELECT cntr_value

FROM sys.dm_os_performance_counters

WHERE [counter_name] = 'Checkpoint pages/sec'

mssql_io_stall_seconds

Stall time in seconds per database and I/O operation.

none

SELECT

cast(DB_Name(a.database_id) as varchar) AS [db],

sum(io_stall_read_ms) / 1000.0 AS [read],

sum(io_stall_write_ms) / 1000.0 AS [write],

sum(io_stall) / 1000.0 AS io_stall

FROM

sys.dm_io_virtual_file_stats(null, null) a INNER JOIN sys.master_files b ON a.database_id = b.database_id AND a.file_id = b.file_id GROUP BY a.database_id

mssql_io_stall_total_seconds

Total stall time in seconds per database.

none

SELECT

cast(DB_Name(a.database_id) as varchar) AS [db],

sum(io_stall_read_ms) / 1000.0 AS [read],

sum(io_stall_write_ms) / 1000.0 AS [write],

sum(io_stall) / 1000.0 AS io_stall

FROM

sys.dm_io_virtual_file_stats(null, null) a INNER JOIN sys.master_files b ON a.database_id = b.database_id AND a.file_id = b.file_id GROUP BY a.database_id

mssql_resident_memory_bytes

SQL Server resident memory size (AKA working set).

none

SELECT

physical_memory_in_use_kb * 1024 AS resident_memory_bytes,

virtual_address_space_committed_kb * 1024 AS virtual_memory_bytes,

memory_utilization_percentage,

page_fault_count

FROM sys.dm_os_process_memory

mssql_virtual_memory_bytes

Microsoft SQL Server committed virtual memory size.

none

SELECT

physical_memory_in_use_kb * 1024 AS resident_memory_bytes,

virtual_address_space_committed_kb * 1024 AS virtual_memory_bytes,

memory_utilization_percentage,

page_fault_count

FROM sys.dm_os_process_memory

mssql_memory_utilization_percentage

The percentage of committed memory that is in the working set.

none

SELECT

physical_memory_in_use_kb * 1024 AS resident_memory_bytes,

virtual_address_space_committed_kb * 1024 AS virtual_memory_bytes,

memory_utilization_percentage,

page_fault_count

FROM sys.dm_os_process_memory

mssql_page_fault_count

The number of page faults that were incurred by the Microsoft SQL Server process.

none

SELECT

physical_memory_in_use_kb * 1024 AS resident_memory_bytes,

virtual_address_space_committed_kb * 1024 AS virtual_memory_bytes,

memory_utilization_percentage,

page_fault_count

FROM sys.dm_os_process_memory

mssql_os_memory

OS physical memory, used and available.

none

SELECT

(total_physical_memory_kb - available_physical_memory_kb) * 1024 AS used, available_physical_memory_kb * 1024 AS available

FROM sys.dm_os_sys_memory

mssql_os_page_file

OS page file, used and available.

none

SELECT

(total_page_file_kb - available_page_file_kb) * 1024 AS used, available_page_file_kb * 1024 AS available

FROM sys.dm_os_sys_memory

mssql_database_detail_process_count

Total number of processes

none

SELECT

DB_NAME(ISNULL(des.database_id,0)) AS db, COUNT(des.session_id) AS count

FROM master.sys.dm_exec_sessions des

WHERE ISNULL(des.database_id,0) <> 0

GROUP BY DB_NAME(ISNULL(des.database_id,0))

mssql_global_server_summary_perc_busy

Percentage of CPU Busy Time

Note: This field cannot acquire the correct value.

none

SELECT 100.0 * @@cpu_busy / (@@cpu_busy+ @@idle+ @@io_busy) AS cpu_busy_percent

mssql_global_server_summary_packet_errors

The number of packet errors

none

SELECT @@packet_errors AS count

mssql_server_detail_blocked_processes

The number of processes waiting due to processes running on Microsoft SQL Server being locked

none

SELECT DB_NAME(ISNULL(S.database_id,0)) AS db, SUM(ISNULL(R.blocking_session_id,0)) AS count

FROM master.sys.dm_exec_sessions S LEFT OUTER JOIN master.sys.dm_exec_requests R ON S.session_id = R.session_id

GROUP BY DB_NAME(ISNULL(S.database_id,0))

mssql_server_overview_cache_hit

The percentage of times data pages were found in the data cache

none

SELECT 100.0 * (

SELECT

cntr_value

FROM master.sys.dm_os_performance_counters

WHERE RTRIM(object_name) LIKE '%:Buffer Manager'

AND RTRIM(LOWER(counter_name)) = 'buffer cache hit ratio'

) / (

SELECT

cntr_value

FROM master.sys.dm_os_performance_counters

WHERE RTRIM(object_name) LIKE '%:Buffer Manager'

AND RTRIM(LOWER(counter_name)) = 'buffer cache hit ratio base'

) AS cache_hity_percent

■ Requirements for monitoring Microsoft SQL Server

If you monitor Microsoft SQL Server on SQL exporter, you must configure the following settings:

  • Microsoft SQL Server

    Set Microsoft SQL Server database-character set to the following:

    • AL32UTF8 (Unicode UTF-8)

    • JA16SJIS (Japanese-language SJIS)

    • ZHS16GBK (Simplified Chinese GBK)

    The supported authentication methods are user ID and password-based SQL Server authentication registered in Microsoft SQL Server. Windows authentication is not supported.

  • Users used to access Microsoft SQL Server

    Grant the permissions below to the users you want to use to connect to Microsoft SQL Server.

    • Login permissions

      CONNECT SQL

    • SELECT permissions to the following tables

      Table name

      Permissions

      sys.sysprocesses

      VIEW SERVER STATE

      sys.dm_os_performance_counters

      VIEW SERVER PERFORMANCE STATE

      sys.dm_io_virtual_file_stats

      VIEW SERVER PERFORMANCE STATE

      sys.master_files

      CREATE DATABASE

      ALTER ANY DATABASE or VIEW ANY DEFINITION

      sys.dm_os_process_memory

      VIEW SERVER PERFORMANCE STATE

      sys.dm_os_sys_memory

      VIEW SERVER PERFORMANCE STATE

■ Obfuscation of Microsoft SQL Server passwords

SQL exporter shipped with JP1/IM - Agent manages the passwords in secret obfuscation capabilities for accessing Microsoft SQL Server from SQL exporter. For details, see 3.15.10 Secret obfuscation function.

■ Notes

  • If Microsoft SQL Server is not installed or configured, or if Microsoft SQL Server is not running, no performance information is collected.

  • If Microsoft SQL Server to be monitored is rebuilding the index while performance information is being collected, Microsoft SQL Server may receive lock-release wait to ensure data integrity. In such cases, the lock-release wait is cleared for processes that Microsoft SQL Server determines to have little impact, but the performance information collection request is rolled back, and performance information collection may fail.

  • If Microsoft SQL Server creates the table during a transaction and does not commit the operation, the data-collection fails because the system table is shared locked. In this case, data collection may not be possible until the operation is confirmed.

  • A shared lock is placed on the database when performance information is collected. If you attempt to create a new database of Microsoft SQL Server at this time, the creation may fail.

(r) Script exporter (job monitoring function)

The Job monitoring function works in conjunction with JP1/AJS3-Manager 13-50 and later to monitor JP1/AJS3 job information as metric to detect and anomaly detection the performance issues of the execution time of the root JobNet and to visualize the transition of the root jobnet execution time with integrated operation viewer.

Trend data using JP1/AJS3 root jobnet execution time as a metric can be stored in the trend data management DB of JP1/IM - Manager and displayed and monitored on the Trends and Dashboards tabs of the integrated operations viewer.

Metric of JP1/AJS3 job information displayed in integrated operation viewer is defined in metric definition file (metrics_ajs_rootjobnet.conf) of JP1/AJS3. For details, see Setup for linking JP1/IM3 in the JP1/Automatic Job Management System 3 Linkage Guide.

A Script exporter is a Exporter that runs a script that resides on a host and retrieves the results.

The Script exporter is installed on the same host as the JP1/IM - Agent, and upon a scrape request from the Prometheus server, it executes a script on that host to retrieve the results and returns them to the Prometheus server.

When JP1/AJS3 linkage is configured and job supervision functions are used, the Prometheus server can collect performance data of JP1/AJS3 job information via the script exporter after executing and completing the root jobnet of JP1/AJS3.

Note that the maximum number of JP1/AJS3 root jobnet that can collect performance data with a single unified agent is 5,000. If you collect performance data for more than 5000 root jobnet in a single integrated agent, you can collect up to 10,000 alert rules by evaluating them every two minutes or more. For details about how often alert rules are evaluated, see the evaluation_interval entry of Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can also configure integrated agent host and JP1/AJS3-Manager hosts as separate hosts. You must install JP1/Base on integrated agent host.

■ Creating a IM management node for use with the Job monitoring function

The IM management node of the JP1/AJS3 root jobnet, which can monitor job information, is created using the adapter command included with the JP1/AJS3 product plug-in. You can create a IM management node as follows:

  1. Setup JP1/AJS3 linkage.

    For details about how to set up monitoring of JP1/AJS3 root JobNets, see Setup for linking JP1/IM3 in the JP1/Automatic Job Management System 3 Linkage Guide.

  2. Generate tree information from the integrated operations viewer or run the jddcreatetree command.

  3. Accept tree information from the integrated operations viewer or run the jddupdatetree command.

■ Tree of IM management node created by the job monitoring function

IM management node tree created by the job monitoring function is shown below.

All Systems
 + JP1/AJS3-manager-host
 |  + Job
 |  |  + JP1/AJS3 - Manager
 |  |     + scheduler-service
 |  |        + job-group#1
 |  |           + root-jobnet#2
 |  + Management Applications
 |     + JP1/AJS3 - Manager
 |     + JP1/AJS3 - Manager Scheduler Service
 |        + scheduler-service
 + JP1/AJS3-agent-host
    + Management Applications
       + JP1/AJS3 -Agent
#1

A job group can have multiple hierarchies.

#2

JP1/IM - Manager 13-50 and later, a new SID for the configuration information of the IM management node corresponding to the root jobnet is created. However, the tree structure remains the same as in the JP1/AJS3 linkage used in JP1/IM - Manager version 13-11 and earlier. A node in the root jobnet has two configuration SIDs associated with one tree SID (one with "_JP1PC-IMB_" at the beginning and one without).

The types and formats of configuration SID corresponding to IM management node created by the job monitoring function are shown below.

Type of configuration SID

SID format

Job category

Root jobnet SID

_JP1PC-IMB_JP1/IM-manager-host-name/_JP1PC-M_Prometheus-host-name/_JP1PC-AHOST_Exporter-host-name/JP1AJS-M_JP1/AJS3-manager-host-name/_HOST_JP1/AJS3-manager-host-name/_JP1SCHE_scheduler-service-name/_JP1JOBG_job-group-name/_JP1ROOTJOBNET_root-job-net-name#

#

If a job group is defined with multiple hierarchies, "_JP1JOBG_job-group-name" is repeated depending on the definition.

Because the job monitoring function uses Script exporter, the following IM management node tree is also created.

All Systems
 + JP1/IM-Agent-host
    + Script
    |  + ajseventmetrics#1
    + Management Applications
       + JP1/IM - agent control base
       + Metric forwarder(Prometheus server)
       + Alert forwarder(Alertmanager)
       + JP1/AJS3 metric collector(Script exporter)#2
#1

Indicates agent SID of Script exporter for job monitoring.

#2

Indicates agent service SID of Script exporter for job monitoring.

If you use Script exporter and also configure UAP monitoring capability in addition to the job monitoring function, Script exporter's IM management node is created as agent serviced SID for Script metric collector(Script exporter), as shown in the following IM management node tree. If you want to monitor the life and death of integrated agent processes, set the associated alert definition for each IM management node of Script metric collector (Script exporter) and JP1/AJS3 metric collector (Script exporter). In that case, when the script exporter stops, a JP1 event associated with each IM management node is issued. For details about integrated agent process alive monitoring, see 1.21.2 (18) Setup of integrated agent process alive monitoring (for Windows) (optional) and 2.19.2 (17) Setup of integrated agent process alive monitoring (for Linux) (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.

All Systems
 + JP1/IM-Agent-host
    + Script
    |  + ajseventmetrics#1
    + Platform#2
    |  + uap_run#3
    + Management Applications
       + JP1/IM - agent control base
       + Metric forwarder(Prometheus server)
       + Alert forwarder(Alertmanager)
       + JP1/AJS3 metric collector(Script exporter)#4
       + Script metric collector(Script exporter)#5
#1

Indicates agent SID of Script exporter for job monitoring.

#2

Indicates agent SID of Script exporter for user-specified UAP monitoring.

#3

Indicates agent SID of Script exporter for UAP monitoring.

#4

Indicates agent service SID of Script exporter for job monitoring.

#5

Indicates agent service SID of Script exporter for UAP monitoring.

■ Viewing performance data for JP1/AJS3 Job Information

When JP1/AJS3 linkage is set up and JP1/AJS3 root jobnet has been executed, you can check metric of the job information related to the selected root jobnet from the Dashboard tab or Trend tab when you select IM management node of the root jobnet with IM management node of JP1/AJS3 reflected in the tree in integrated operation viewer. You can also customize the Dashboard tab or create a new dashboard to check the trend data of metric in the job information in the various panels.

When customizing the Dashboard tab or creating a new dashboard, we recommend that you specify no more than 20 root jobnets for target node in the various panels. If you specify more than 20, it will take time to display the dashboard panels. In addition, the panel display of the dashboard takes time depending on the following conditions in addition to the number of target nodes.

  • Fixed value of the range vector selector specified in PromQL (promql in the metric definition file) of the target metric#1

  • Number of samples of performance data associated with the target node of target metric

  • Number of performance data label sets associated with the target node of target metric

  • Display range setting duration in the panel#2

  • Number of plots in the panel#2

#1

For details about JP1/AJS3 metric definition file (metrics_ajs_rootjobnet.conf), see Setup for linking JP1/IM3 in the JP1/Automatic Job Management System 3 Linkage Guide. For details on specifying the range vector selector, see Consolidation display of trend data with dynamic range vectors in 3.15.6(4)(c) About Performance Data to Retrieve.

#2

For details on the various panel settings, see Target node of each panel in 2.4.3 Add panel window in the JP1/Integrated Management 3 - Manager GUI Reference.

The following table lists the dashboards that are automatically generated and displayed on the Dashboard tab and the information displayed on the Trend tab when IM management node that is created by JP1/AJS3 linkage is selected in integrated operation viewer. Depending on the support of the job monitoring function, the displayed content differs between JP1/IM - Manager 13-11 or earlier and 13-50 or later.

Selected node

Panels from the second row of dashboards that are automatically generated and displayed on the Dashboard tab#1

Trend tab

JP1/IM - Manager Version

JP1/IM - Manager Version

13-11 or earlier

13-50 or later

13-11 or earlier

13-50 or later

Host

JP1/AJS3-manager-host

None#2

None#2

None

None

Job category

Job

None#2

None#2

None

None

JP1/AJS3 - Manager

None

None

None

None

scheduler-service

None

None

None

None

job-group

None

None

None

None

root-jobnet

None

Displays metric trend panel#3 associated with the root jobnet node

None

Displays metric associated with the root jobnet node

Management Applications category

Management Applications

None

None

None

None

JP1/AJS3 - Manager

None

None

None

None

JP1/AJS3 - Manager Scheduler Services

None

None

None

None

scheduler-service

None

None

Displays metric associated with the scheduler service

Displays metric associated with the scheduler service

Host

JP1/AJS3-agent

None#2

None#2

None

None

Management Applications category

Management Applications

None

None

None

None

JP1/AJS3 - Agent

None

None

None

None

#1

No matter which node you select, the first row of the dashboard shows the Node Status, Alert Information, Numeric and Trend panels.

#2

If integrated agent, user-defined Prometheus, and user-defined Fluentd hosts are the same host as JP1/AJS3 manager host, the Trend panel for metric associated with those nodes is displayed. If there is more than one terminal node under the selected node, and the same metric is related, it is displayed in one panel. Note that the panel of metrics related to the root jobnet is not displayed.

#3

If Display range setting of the dashboard is the default to 1 hour, metric trend panel associated with the node in the root jobnet displays seven days of data for each day.

In integrated operation viewer, for a dashboard that is automatically generated and displayed on the Dashboard tab when a node of Exporter or Fluentd and a node of the root JobNet of JP1/AJS3 are selected, the following is the difference in the panel display for metric associated with the terminal node under the selected node.

Selected node

Terminal node under the node of Exporter or Flluentd

Terminal node under the node of JP1/AJS3 root jobnet

Panel view#1

Panel setting#1

Panel view#2

Panel setting#2

Terminal node

View 1-hour trend data per minute #3

  • Display range setting per-panel settings: None (same as dashboard display range)

  • Setting the number of plots: 60

7-day trend data display per day#4

  • Display range setting per-panel settings: setting "Time difference from dashboard display range (start date and time or end date and time)" in "Specification method"#3, #4

  • Plot Count setting: 7

Top nodes of the terminal node (except system node)

No panel display

Not applicable

#1

Panel display for metrics other than job information in JP1/AJS3 is eligible.

#2

Panel display of JP1/AJS3 job information metrics is eligible.

#3

Assume that the dashboard display range is set to 1 hour.

#4

The dashboard display range settings are configured as follows:

  • "Dashboard display range (start time or end time)" setting: Start time

  • "Past or future range of the time difference" setting: Past range

  • "Time difference" setting: 143h (5 days 23 hours)

(s) Whether Prometheus and Exporter are supported for the same host configuration and another host configuration

The following tables show whether Prometheus and Exporter can be supported for the same host configuration and another host configuration.

Table 3‒39: Whether or not Prometheus and Exporter host configuration are supported

Exporter type

Configuring Prometheus and Exporter hosts

Same host

Another host

Exporter provided by JP1/IM - Agent

Node exporter for AIX

N

Y

Exporter other than the above

Y

N

User-defined Exporter

Y

Y

Legend

Y: Supported

N: Not supported

The following configurations are not supported:

  • Configuring scrape from more than one Prometheus to the same Exporter

  • Exporter# on the remote agent (the host on Exporter and the host being monitored are separate hosts)

#

Exporter of the remote agent is Exporter whose discovery configuration file contains the description "jp1_pc_remote_monitor_instance".

Also, if Prometheus and Exporter are configured on different hosts, it is assumed that the ports used by Exporter are protected by firewalls, network configurations, etc. so that they are not accessed by anyone other than JP1/IM - Agent's Prometheus server (e.g. by building integrated agent host and Exporter hosts in the same network so that they are not accessed externally).

(2) Centralized management of performance data

This function allows Prometheus server to store performance data collected from monitoring targets in the intelligent integrated management database of JP1/IM - Manager. It has the following features:

(a) Remote light function

This is a function in which the Prometheus server sends performance data collected from monitoring targets to an external database suitable for long-term storage. JP1/IM - Agent uses this function to send performance data to JP1/IM - Manager.

The following shows how to define a remote light.

  • Remote write definitions are described in the Prometheus server configuration file (jpc_prometheus_server.yml).

  • Download Prometheus server configuration file from integrated operation viewer, edit it in a text editor, modify Remote Write definition, and then upload it.

The following settings are supported by JP1/IM - Agent for defining Remote Write. For details about the settings, see Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

Table 3‒40: Settings for Remote Light Definition Supported by JP1/IM - Agent

Setting items

Description

Remote Light Destination

(required)

Set the endpoint URL for JP1/IM agent control base.

Remote light timeout period

(optional)

You can set the timeout period if the remote light takes a long time.

Change it if you are satisfied with the default value.

Relabeling

(optional)

You can remove unwanted metric and customize labeling.

(3) Performance data monitoring notification function

This function allows Prometheus server to monitor performance data collected from monitoring targets at a threshold value and notify JP1/IM - Manager. It has three functions:

If you add a service to be monitored in an environment where an alert definition for monitoring a service is set, the added service is also monitored. If you exclude a monitored service for which an alert has been fired from the monitoring target, you will receive an alert indicating that the alert that was fired has been recovered.

For an example of defining an alert, see Alert definition example for metrics in Node exporter metric definition file and Alert definition example for metrics in Windows exporter metric definition file in Alert configuration file (jpc_alerting_rules.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference. For Linux, the alerts are defined differently depending on whether or not the monitored auto-start is enabled (running systemctl enable). If you want to monitor a service for which automatic startup is disabled, you must create and configure an alert definition for each target.

- When using the job monitoring function

If you want to monitor performance data for job information, the alert rule evaluation interval must be at least one minute. For details about how often alert rules are evaluated, see the evaluation_interval entry of Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

(a) Alert evaluation function

This function monitors performance data collected from monitoring targets at a threshold value.

Define alert rules to evaluate alerts, monitor performance data at thresholds, and notify alerts.

Alerts can be evaluated by comparing the time series data directly with the thresholds, or by comparing the thresholds with the results of formulas using PromQL#.

#

For details about PromQL, see 2.7.4(7) About PromQL.

For each time series of data or for each data generated by the calculation result of the PromQL expression, the alert status according to the evaluation is managed, and the action related to the notification is executed according to the alert state.

There are three alert states: pending, firing, and resolved. When the condition meets the alert rule first, it will be in the "pending" state. After that, when the condition continues to meet the alert rule (not resolved) during the time of "for" clause defined in the alert rule definition, it will be in the "firing" state.

When the condition does not meet(resolved), or if the time series is gone, it will be in the "resolved" state.

The relationship between alert status and notification behavior is as below.

Alert status

Description

Notification behavior

pending

The threshold is exceeded. The state the threshold is exceeded, but the time of "for" clause defined in the alert rule definition has not passed yet.

Do not notify alerts.

firing

The firing state. The state the threshold is exceeded, and the time of "for" clause defined in the alert rule definition has passed. Alternatively, the state the threshold is exceeded, and the "for" clause of the alert is not specified.

Notifies you of alerts.

resolved

The resolved state. The state the alert rule is no longer met.

  • When the condition recovers from the "firing" state, a notification of resolved is given.

  • When the condition recovers from the "pending" state, no resolved notification is given.

The following shows how to define an alert rule.

  • Alert rule definitions are described in the alert configuration file (jpc_alerting_rules.yml) (definitions in any YAML format can also be described).

  • Before reflecting the created definition file in the environment to be used, format check and alert rule test with the promtool command.

  • Download alert configuration file from integrated operation viewer, edit it in a text editor, change the definition of the alert rule, and then upload it.

The following settings apply to the alert rule definitions supported by JP1/IM - Agent. For details about the settings, see Alert configuration file (jpc_alerting_rules.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference. There is no default alert rule definition.

Table 3‒41: Settings for alert rule definitions supported by JP1/IM - Agent

Setting Item

Description

Alert Name (required)

Set the alert name.

Conditional expression (required)

Set the alert condition expression (threshold).

It can be configured using PromQL.

Waiting time (required)

Set the amount of time to wait after entering the "pending" state before changing to the "firing" state.

Change it if you are satisfied with the default value.

Label (required)

Set labels to add to alerts and recovery notifications.

In JP1/IM - Agent, a specific label must be set.

Annotation (required)

Set to store additional information such as alert description and URL link.

In JP1/IM - Agent, certain annotations must be set.

Labels and annotations can use the following variables:

Variable#

Description

$labels

A variable that holds the label key-value pairs for the alert instance. The label key can be one of the following labels:

When time series data is specified in the alarm evaluation conditional expression

You can specify the label that the data retains.

  • When time series data is specified in the alarm evaluation conditional expression

    You can specify the label that the data retains.

  • When PromQL expression is specified as the condition expression for alarm evaluation

    You can specify a label that is set as the result of a PromQL expression.

    The label that the data retains depends on the metrics.

    With regards to the label, refer the description of the metrics that can be specified in the PromQL statement, in 3.15.1(1) Performance data collection function.

$values

A variable that holds the evaluation value of the alert instance.

When a firing is notified, it is expanded to the value at the time the firing was detected.

When the resolved notification, it is expanded to the value as of the firing just before resolved (note that it is not the value as of resolved).

$externalLabels

This variable holds the label and value set in "external_labels" of item "global" in the Prometheus configuration file (jpc_prometheus_server.yml).

#1

Variables are expanded by enclosing them in "{{" and "}}". The following is an example of how to use variables:

description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"

■ Alert rule definition for converting to JP1 events

In order to convert the alert to be notified into a JP1 event on the JP1/IM - Manager side, the following information must be set in the alert rule definition.

Setting item

Value to set

Uses

name

Configure any unique alert group definition name in integrated agent.

Alert group definition name

alert

Set any unique alert-definition-name in integrated agent.

Alert Definition Name

expr

Set the PromQL statement.

It is recommended to set the PromQL statement described in the metric definition file. This way, when the JP1 event occurs, you can display trend information in the Integrated Operation Viewer.

Firing conditions#

#

If the conditions are met, it is firing, and if the conditions are not met, it is resolved.

labels.jp1_pc_product_name

Set "/HITACHI/JP1/JPCCS" as fixed.

Set to the product name of the JP1 event.

labels.jp1_pc_severity

Set one of the following:

  • Emergency

  • Alert

  • Critical

  • Error

  • Warning

  • Notice

  • Information

  • Debug

Set to JP1 event severity#.

#

This value is set to the severity of the JP1 event of the anomaly. The severity of a successful JP1 event is set to Information.

labels.jp1_pc_eventid

Set any value in the range of 0~1FFF,7FFF8000~7FFFFFFF.

Set to the event ID of the JP1 event.

labels.jp1_pc_metricname

Set the metric name.

For Yet another cloudwatch exporter, be sure to specify it. Associates the JP1 event with the IM management node in the AWS namespace corresponding to the metric name (or the first metric name if multiple metric names are specified separated by commas).

Set to the metric name of the JP1 event.

For yet another cloudwatch exporter, it is also used to correlate JP1 events.

annotations.jp1_pc_firing_description

Specify the value to be set for the message of the JP1 event when the firing condition of the alert is satisfied.

If the length of the value is 1,024 bytes or more, set the string from the beginning to the 1,023rd byte.

If the specification is omitted, the message content of the JP1 event is "The alert is firing. (alert = alert name)".

You can also specify variables to embed job names and evaluation values. If a variable is used, the first 1,024 bytes of the expanded message are valid.

It is set to the message of the JP1 event.

annotations.jp1_pc_resolved_description

Specify the value to be set for the message of the JP1 event when the firing condition of the alert is not satisfied.

If the length of the value is 1,024 bytes or more, set the string from the beginning to the 1,023rd byte.

If the specification is omitted, the content of the message in the JP1 event is "The alert is resolved. (alert = alert name)".

You can also specify variables to embed job names and evaluation values. If a variable is used, the first 1,024 bytes of the expanded message are valid.

It is set to the message of the JP1 event.

For an example of setting an alert definition, see Definition example in alert configuration file (jpc_alerting_rules.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

For details about the properties of the corresponding JP1 event, see 3.2.3 Lists of JP1 events output by JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ How to operate in combination with trending-related functions

Combine the definitions of the PromQL statement described in the metric definition file and the PromQL statement evaluated by the alert evaluation function, and in the alert definition annotations.jp1_pc_firing_description and annotations.jp1_pc_resolved_description of the alert definition in the alert configuration file, By describing the metric name of the corresponding trend data, when the JP1 event of the alert is issued, you can check the past change and current value of the performance value evaluated by the alert on the Trends tab of the integrated operation viewer.

For details about PromQL expression defined in trend displayed related capabilities, see 3.15.6(4) Return of trend data.

For example, if you want the Node exporter to monitor CPU usage and notify you when the CPU usage exceeds 80%, create an alert configuration file (alert definition) and a metric definition file as shown in the following example.

  • Example of description of alert configuration file (alert definition)

    groups:
      - name: node_exporter
        rules:
        - alert: cpu_used_rate(Node exporter)
          expr: 80 < (avg by (instance,job,jp1_pc_nodelabel,jp1_pc_exporter) (rate(node_cpu_seconds_total{mode="system"}[2m])) + avg by (instance,job,jp1_pc_nodelabel,jp1_pc_exporter) (rate(node_cpu_seconds_total{mode="user"}[2m]))) * 100
          for: 3m
          labels:
            jp1_pc_product_name: "/HITACHI/JP1/JPCCS2"
            jp1_pc_component: "/HITACHI/JP1/JPCCS/CONFINFO"
            jp1_pc_severity: "Error"
            jp1_pc_eventid: "0301"
            jp1_pc_metricname: "node_cpu_seconds_total"
          annotations:
            jp1_pc_firing_description: "CPU usage has exceeded the threshold (80%). value={{ $value }}%"
            jp1_pc_resolved_description: "CPU usage has fallen below the threshold (80%)."
  • Example of description of metric definition file

    [
      {
        "name":"cpu_used_rate",
        "default":true,
        "promql":"(avg by (instance,job,jp1_pc_nodelabel,jp1_pc_exporter) (rate(node_cpu_seconds_total{mode=\"system\"}[2m]) and $jp1im_TrendData_labels) + avg by (instance,job,jp1_pc_nodelabel,jp1_pc_exporter) (rate(node_cpu_seconds_total{mode=\"user\"}[2m]) and $jp1im_TrendData_labels)) * 100",
        "resource_en":{
          "category":"platform_unix",
          "label":"CPU used rate",
          "description":"CPU usage.It also indicates the average value per processor. [Units: %]",
          "unit":"%"
        },
        "resource_ja":{
          "category":"platform_unix",
          "label":"CPU使用率",
          "description":"CPU使用率(%)。プロセッサごとの割合の平均値でもある。",
          "unit":"%"
        }
      }
    }

    When the conditions of the PromQL statement specified in expr of the alert definition are satisfied and the JP1 event of the alert is issued, the message "CPU usage has exceeded the threshold (80%). value = performance-value%" is set in the message of the JP1 event. Users can view this message to view "CPU Usage" trend information and see past changes and current values of CPU usage.

■ Behavior when the service is stopped

If the Alertmanager service is stopped, the JP1 event for the alert is not issued. In addition, if the Prometheus server and Alertmanager services are running and the exporter whose alert is firing is stopped due to a failure, the alert becomes resolved and a normal JP1 event is issued.

When alert is firing and the Prometheus server service is stopped while the Alertmanager is running, a normal JP1 event that gives a notification of resolved of the alert is issued.

For details, see About behavior when the Prometheus server is restarted or stoppedwhile the Alertmanager is running.

■ About behavior when the service is restarted

Even if the alert is firing or resolved and the Prometheus server, Alertmanager, or Exporter service is restarted, when the current alert status is the same as the alert state before the restart, the JP1 event is not issued.

When the alert is firing and the Prometheus server service is restarted while the Alertmanager is running, there are cases in which a normal JP1 event that gives a notification of resolved of the alert is issued.

For details, see About behavior when the Prometheus server is restarted or stopped while the Alertmanager is running.

■ About Considering Performance Data Spikes

Performance data can be momentarily jumpy (large values, small values, or minus values). These sudden changes in performance data are commonly referred to as "spikes." In many cases, even if a spike occurs and becomes an abnormal value momentarily, it immediately returns to normal and does not need to be treated as an abnormal. Also, when the performance data is reset, such as when the OS is restarted, a spike may occur instantaneously.

When monitoring such performance data metrics, it is necessary to consider suppressing sudden anomaly detection by specifying "for" (grace period before treating alerts as anomalies) in the alert rule definition.

■ About behavior when the Prometheus server is restarted or stopped while the Alertmanager is running

When the alert is firing and the Prometheus server service is restarted or stopped while the Alertmanager is running, there are cases in which a normal JP1 event that gives a notification of resolved of the alert be issued.

When following conditions are met, a normal JP1 event is issued.

  • The sum total of the duration of the "for" clause# defined in alert definition of firing alert and the duration that Prometheus server service is not runnig due to being stopped or reloading becomes greater than the value of "evaluation_interval" defined in Prometheus configuration file.

  • #: When the "for" clause of the alert is not specified, define 0.

■ About behavior when the service is reloaded

Even if the alert is firing or resolved and the API that reloads the Prometheus server, Alertmanager, or Exporter service is executed, the JP1 event is not issued.

(b) Alert forwarder

This function notifies you when the alert status becomes "firing" or "resolved" after the Prometheus server evaluates the alert.

When the state of alert changes during JP1/IM - Manager (Intelligent Integrated Management Base) is stopped, there are cases in which a notification of firing and resolved is not performed.

The Prometheus server sends alerts one by one, and the sent alerts are notified to JP1/IM - Manager (Intelligent Integrated Management Base) via Alertmanager. You will also be notified one by one when you retry.

Alerts sent to JP1/IM - Manager are basically sent in the order in which they occurred, but the order may be changed when multiple alert rules meet the conditions at the same time or when a transmission error occurs and they are resent. However, since the alert information includes the time of occurrence, it is possible to understand in which order it occurred.

In addition, if the abnormal condition continues for 7 days, an alert will be re-notified.

The following shows how to define the notification destination of the alert.

  • Alert destinations are described in both the Prometheus configuration file (jpc_prometheus_server.yml) and the Alertmanager configuration file (jpc_alertmanager.yml).

    For Prometheus configuration file, specify a Alertmanager that coexists as a destination for Prometheus server notifications. For Alertmanager configuration file, specify JP1/IM agent control base as the notification destination for Alertmanager.

  • Download the individual configuration file from integrated operation viewer, edit them in a text editor, change the alert notification destination definitions, and then upload them.

The following settings are related to definition of Prometheus server notification destinations supported by JP1/IM - Agent. For details about the settings, see Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

Table 3‒42: Settings for defining notification destinations for Prometheus server supported by JP1/IM - Agent

Setting items

Description

Notification destination (required)

Configure the notification destination Alertmanager.

If a host name or internet address is specified for --web.listen-address in the Alertmanager command line option, modify localhost to the host name or internet address specified in --web.listen-address.

  • For physical host environments

    Specifies the Alert manager that you want to live with.

  • For clustered environment

    Specifies the Alertmanager that runs on the logical host.

Label setting (optional)

You can add labels. Configure as needed.

The following are Alertmanager notification destinations that JP1/IM - Agent supports: For details about the settings, see Alertmanager configuration file (jpc_alertmanager.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

Table 3‒43: Settings for defining Alertmanager notification destinations supported by JP1/IM - Agent

Setting items

Description

Webhook settings (required)

Set the endpoint URL for JP1/IM agent control base.

(c) Notification suppression function

This function suppresses the notifications described in 3.15.1(3)(b) Alert forwarder. It includes:

  • Silence function

    Use this if you do not want to be temporarily notified of certain alerts.

■ Silence function

This feature temporarily suppresses certain notifications. You can set not to notify alerts that occur during temporary maintenance. Unlike when the common exclusion condition of JP1/IM - Manager is used, the notification suppression function does not notify JP1/IM - Manager itself.

While silence is enabled, you will not be notified when the alert status changes. When silence is disabled, if the state has changed compared to the state of the alert before silence was enabled, notification is given.

Here are two examples of when to notify:

Figure 3‒36: Cases where the state is different before and after disabling silence

[Figure]

The above figure shows an example in which the alert status is "abnormal" when silence is enabled, and while silence is enabled, the alert status changes to "normal", and then silence is disabled.

When the alert changes to "normal", you will not be notified because silence is enabled. When silence is disabled, the alert status has changed from "abnormal" to "normal" before silence is enabled, so "normal" notification is given.

Figure 3‒37: Cases where the state is the same before and after enabling silence

[Figure]

The above figure shows an example in which the alert status changed to "normal" once, changed to "abnormal" again, and then disabled silence while silence was enabled.

When silence is disabled, notification is not performed because the alert status is the same "abnormal" as before silence was enabled.

If an alert fails to be sent and retries and silence is enabled to suppress the alert, the alert will not be retried.

- How to Configure silence

Silence settings (enable or disable) and retrieve the current silence settings are performed via REST API (GUI is not supported).

In addition, when configuring silence settings, integrated agent host must be able to communicate with Alertmanager port-number from the machine that you are operating.

For details about silence settings and REST API used to obtain current silence settings, see 5.22.3 Get silence list of Alertmanager, 5.22.4 Silence creation of Alertmanager, and 5.22.5 Silence Revocation of Alertmanager in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

(4) Communication function

(a) Communication protocols and authentication methods

The following shows the communication protocols and authentication methods used by integrated agent.

Connection source

Connect to

Protocol

Authentication method

Prometheus server

JP1/IM agent control base

HTTP

No authentication

Alertmanager

Prometheus server

Alertmanager

HTTP

No authentication

Exporter

Blackbox exporter

monitored

HTTP/HTTPS

Basic Authentication

Basic Authentication

No authentication

HTTPS

Server Authentication

With client authentication

No client authentication

ICMP#1

No authentication

Yet another cloudwatch exporter

Amazon CloudWatch

HTTPS

AWS IAM Authentication

Promitor Scraper

Azure Monitor

HTTPS

No client authentication

Promitor Resource Discovery

Azure Resource Graph

HTTPS

No client authentication

Promitor Scraper

Promitor Resource Discovery

HTTP

No authentication

Prometheus

Fluentd

HTTP

No authentication

OracleDB exporter

Oracle listener

Oracle listener-specific (no encryption)

Authentication by username/password

Web scenario execution function

Browser that invokes Web scenario-execution feature

Chrome devtools protocol (CDP)

No authentication

Web Scenario Execute Function/Browser from which Web Scenario Execute Function starts

Monitored server

  • Non cryptographic communication #2

  • Cryptographic communication #2

  • No authentication

  • HTTP authentication (Basic authentication only)

  • Authentication Using TLS Servers Certificate

  • Authentication #3 with TLS Client Certificate

  • Authentication for entering a username and password on the form

VMware exporter

VMware ESXi

No SSL/TSL connected

Authentication by Username and Password

Connected #4 with SSL/TLS

  • Authentication by Username and Password

  • Authentication with CA Certificates

SQL exporter

Microsoft SQL Server

No TSL connected

Authentication by username and password

Connected #5 with TLS

Authentication by username and password

#1

ICMPv6 is not available.

#2

The specific protocol depends on the target.

#3

See Configuring authentication in 1.21.2(13)(a) Setting up JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Configuration Guide.

#4

Only TLS1.1 and TLS1.2 can be connected.

#5

You must provide the option to enable TLS communication with Microsoft SQL Server in the connection information of the monitoring target set by SQL exporter configuration file (jpc_sql_exporter.yml). For details, see SQL exporter configuration file (jpc_sql_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

(b) Network configuration

Integrated agent can be used in a network configuration with only a IPv4 environment or in a network configuration with a mix of IPv4 and IPv6 environments. Only IPv4 communication is supported in a network configuration with a mix of IPv4 and IPv6 environments.

You can use integrated agent in the following configurations without a proxy server:

Connection source

Connect to

Connection type

Prometheus server

JP1/IM agent control base

No proxy server

Alertmanager

Prometheus server

Alertmanager

Exporter

Blackbox exporter

Monitoring targets (ICMP monitoring)

Monitoring targets (HTTP monitoring)

  • No proxy server

  • Through a proxy server without authentication

  • Through a proxy server with authentication

Yet another cloudwatch exporter

Amazon CloudWatch

  • No proxy server

  • Through a proxy server without authentication

  • Through a proxy server with authenticationNo proxy server

Promitor Scraper

Azure Monitor

  • No proxy server

  • Through a proxy server without authentication

  • Through a proxy server with authenticationNo proxy server

Promitor Resource Discovery

Azure Resource Graph

OracleDB exporter

Oracle listener

No proxy server

Web Scenario Execute Function/Browser from which Web Scenario Execute Function starts

Monitored server

  • No proxy server

  • Through a proxy server without authentication

  • Through a proxy server with authentication (HTTP authentication (Basic authentication only))

VMware exporter

VMware ESXi

No proxy server

SQL exporter

Microsoft SQL Server

No proxy server

Integrated agent transmits the following:

Connection source

Connect to

Transmitted data

Authentication method

Prometheus server

JP1/IM agent control base

Performance data in Protobuf format

Alertmanager

Alert information in JSON format#1

Prometheus server

Exporter

None

Exporter

Prometheus server

Prometheus textual performance data#2

Blackbox exporter

monitored

Response for each protocol

Yet another cloudwatch exporter

Amazon CloudWatch

CloudWatch data

Promitor Scraper

Azure Monitor

Azure Monitor data (metrics information)

  • Service principal

  • Managed ID

Promitor Resource Discovery

Azure Resource Graph

Azure Resource Graph data (resources exploration results)

OracleDB exporter

Oracle listener

Proprietary Oracle listener data

Web scenario execution function

Browser that invokes Web scenario-execution feature

Browser operation data

Web Scenario Execute Function/Browser from which Web Scenario Execute Function starts

Monitored server

Data that depends on the target

Monitored server

Web Scenario Execute Function/Browser from which Web Scenario Execute Function starts

Data that depends on the target

VMware exporter

VMware ESXi

VMware ESXi information

VMware ESXi

VMware exporter

SQL exporter

Microsoft SQL Server

None

Microsoft SQL Server

SQL exporter

Result of executing SQL statement

#1

For details, see the description of the message body for the request in 5.6.5 JP1 Event converter in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

#2

For details, see the description of Prometheus text formatting in 5.24 API for scrape of Exporter used by JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.