3.15.1 Performance monitoring function by JP1/IM - Agent

Performance monitoring function consists of Prometheus, Alertmanager, Exporter of add-on program and provides the following two functions:

Function to retrieve performance data through Exporter and send performance data to the Integrated Manager host
This function monitors the thresholds of the acquired performance data. If a condition is met, it alerts JP1/IM - Manager.

Performance data and alerts sent to the Integrated manager host can be viewed in integrated operation viewer.

Organization of this subsection

(1) Performance data collection function
(2) Centralized management of performance data
(3) Performance data monitoring notification function
(4) Communication function

(1) Performance data collection function

Prometheus server is a function that collects performance data from monitored targets. It has two functions:

Scrape function (Prometheus server)
Ability to acquire monitored operation information (Exporter)

(a) Scrape function

Prometheus server is a function that acquires the performance data to be monitored via the Exporter.

When the Prometheus server accesses a specific URL of the Exporter, the Exporter retrieves the monitored performance data and returns it to the Prometheus server. This process is called scrape.

A scrape is executed in units of scrape jobs that combine multiple scrapes for the same purpose.

If a discovery configuration file is used for monitoring through UAP monitoring, jobs should be defined. Also, additional settings are required for the scraping definitions of the log metrics feature.

For details on the scraping description of the log metrics feature, see 1.21.2(10) Setting up scraping definitions in the JP1/Integrated Management 3 - Manager Configuration Guide.

Scrapes are defined in units of scrape jobs. JP1/IM - By default, the following scrape job name scrape definition is set according to the type of exporter.

Scrape Job Name	Scrape Definition
prometheus	Scrape definition for Prometheus server
jpc_node	Scrape definition for Node exporter
jpc_windows	Scrape definition for Windows exporter
jpc_blackbox_http	Scrape definition for HTTP/HTTPS monitoring in Blackbox exporeter
jpc_blackbox_icmp	Scrape Definition for ICMP Monitoring in Blackbox exporeter
jpc_cloudewatch	Scrape definition for Yet another cloudwatch exporter
jpc_process	Scraping definition for Process exporter
jpc_promitor	Scraping definition for Promitor
jpc_script	Scraping definition for Script exporter

If you want to scrape your own exporter, you must add a scrape definition for each target exporter.

The metric obtained from Exporter by scraping of Prometheus server is depending on the type of Exporter. For details, see the description of metric definition file in each Exporter in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

In addition, the Prometheus server generates the following metrics when scraping is performed, in addition to the metrics obtained from the exporter.

Metric Name	Description
up	This metric indicates "1" for successful scraping and "0" for failure. It can be used to monitor the operation of the exporter. Scrape failure may be caused by host stoppage, exporter stop, exporter returning anything other than 200, or communication error.
scrape_duration_seconds	A metric that indicates how long it took to scrape. It is not used in normal operation. It is used for investigations when the scrape does not finish within the expected time.
scrape_samples_post_metric_relabeling	A metric that indicates the number of samples remaining after the metric is relabeled. It is not used in normal operation. It is used to check the number of data when building the environment.
scrape_samples_scraped	A metric that indicates the number of samples returned by the exporter scraped. It is not used in normal operation. It is used to check the number of data when building the environment.
scrape_series_added	A metric that shows the approximate number of newly generated series. It is not used in normal operation.

For details about how to run scrape, see 5.23 API for scrape of Exporter used by JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference. Exporter that you want to scrape must be able to run as described here.

The scrape definition method is shown below.:

Scrape definitions are done in units of scrape jobs.
The scrape definition is described in the Prometheus configuration file (jpc_prometheus_server.yml).
If you are editing a scrape definition, you can download Prometheus configuration file from integrated operation viewer, edit it, and then upload it.

The following are the settings related to scrape definitions supported by JP1/IM - Agent.

Table 3‒15: Settings for scrape definitions supported by JP1/IM - Agent
Setting Item	Description
Scrape Job Name (required)	Sets the name of the scrape job that Prometheus scrapes. You can specify multiple scrape job names. The specified scrape job name is set in the metric label as job="scrape job name".
Scrape to (required)	Set the specific URL of the exporter to be scraped. Only exporters on hosts where JP1/IM - Agent resides can be specified as scrape destinations. The server to be scraped in the URL is specified by the host name. "localhost" cannot be used. The total number of scrape destinations specified in all scrape jobs is limited to 100.
Scrape parameters (Optional)	You can set parameters to pass to the Exporter when scraping. Depending on the type of exporter, the contents that can be set differ.
Scrape interval (Optional)	You can set the scrape interval. You can set a scrape interval that is common to all scrape jobs and a scrape interval for each scrape job. If both are set, the scrape interval for each scrape job takes precedence. You can specify the following units: years, weeks, days, hours, minutes, seconds, or milliseconds.
Scrape timeout (Optional)	You can set a timeout period when scraping takes a long time. You can set a timeout period that is common to all scrape jobs and a timeout period for each scrape job. If both are set, the scrape interval for each scrape job takes precedence.
Relabeling (Optional)	You can delete unnecessary metrics and customize labels. By using this feature and setting unnecessary metrics that are not supported by default, you can reduce the amount of data sent to JP1/IM - Manager.

The outcome of scrape by Exporter subject to scrape of Prometheus server is returned in Text-based format data format of Prometheus. Here is a Text-based format of Prometheus:

Text-based format basics

Item	Description
Start time	2014 Apr
Supported Versions	Prometheus Version 0.4.0 or Later
Transmission format	HTTP
Character code	UTF-8 Line feed code is \n
Content-Type	Text/plain; version=0.0.4 If there is no version value, it is treated as the latest text format version.
Content-Encoding	gzip
Advantages	Human readable Easy to assemble, especially for minimal cases (no need for nesting). Read on a line-by-line basis (except for hints and docstring).
Constraints	Redundancy Since the type and docstring are not part of the syntax, there is little validation of the metric contract. Cost of parsing
Supported Metrics	Counter Gauge Histogram Summary Untyped

More information about Text-based format

Text-based format of Prometheus is row-oriented.

Separate lines with a newline character. The line feed code is \n. \ r\n is considered invalid.

The last line must be a newline character.

Also, blank lines are ignored.

Row Format

Within a line, tokens can be separated by any number of blanks or tabs. However, when joining with the previous token, it must be separated by at least one space.

In addition, leading and trailing white spaces are ignored.

Comments, help text, and information

Lines that have # as a character other than the first white space are comments.

This line is ignored unless the first token after # is a HELP or TYPE.

These lines are treated as follows:

If the token is a HELP, at least one more token (metric name) is expected. All remaining tokens are considered to be docstring of that metric name.

HELP line can contain any UTF-8 string after metric name. However, you must escape the backslash as \ and the newline character as \n. For any metric name, there can be only one HELP row.

If the token is a TYPE, two or more tokens are expected. The first is metric name. The second, either counter, gauge, histogram, summary, or untyped, defines the type of metric. There can be only one TYPE row for a given metric. Metric name of TYPE line must appear in front of the first sample.

If no TYPE row exists for metric name, the type is set to untyped.

Write a sample (one per line) using the following EBNF:

 metric_name [
    "{" label_name "=" `"` label_value `"` { "," label_name "=" `"` label_value `"` } [ "," ] "}"
 ] value [ timestamp ]

Sample Syntax

Metric_name and label_name are subject to the limitations of the normal Prometheus expression language.
The label_value is any UTF-8 string. However, backslash (\), double quote ("), and line feed must be escaped as \\, \" and \n, respectively.
Value is a floating-point number required by ParseFloat() function of Go language. In addition to the typical numbers, NaN, +Inf, -Inf is also a valid number. Indicates that NaN is not a number. The + Inf is positive infinity. -Inf is negative infinity.
The timestamp is a int64 (milliseconds from the epoch, 1970-01-01 00:00:00 UTC, excluding leap seconds), and is optionally represented by ParseInt() function of Go.

Grouping and Sorting

All rows granted with metric must be provided as a single grouping, and the optional HELP and TYPE rows must come first (in any order).

It is also recommended, but not required, to perform repeatable sorting with a repeating description.

Each line must have a unique pair of metric names / labels. If it is not a unique combination, the capture behavior is undefined.

Histograms and Summaries

Because histograms and summary types are difficult to express in text format, the following rules apply:

Sample sum x for the summary or histogram appears as another sample called x_sum.
Sample counts named x for a summary or histogram appear as another sample called x_count.
Each quantile in the summary named x appears as another sample line with the same name x and labeled {quantile="y"}.
Each bucket count in the histogram named x appears as another sample line named x_bucket and labeled {le="y"} ( y is the bucket limit).
The histogram must have a bucket of {le="+Inf"}. Its value must be the same as the value of x_count.
For le or quantile labels, the histogram bucket and summary quantiles must appear in ascending order of the values for the labels.

Sample Text-based format

Here is a sample Prometheus metric exposition that contains comments, HELP and TYPE representations, histograms, summaries, and character escaping.

# HELP http_requests_total The total number of HTTP requests.
# TYPE http_requests_total counter
http_requests_total{method="post",code="200"} 1027 1395066363000
http_requests_total{method="post",code="400"}    3 1395066363000
 
# Escaping in label values:
msdos_file_access_time_seconds{path="C:\\DIR\\FILE.TXT",error="Cannot find file:\n\"FILE.TXT\""} 1.458255915e9
 
# Minimalistic line:
metric_without_timestamp_and_labels 12.47
 
# A weird metric from before the epoch:
something_weird{problem="division by zero"} +Inf -3982045
 
# A histogram, which has a pretty complex representation in the text format:
# HELP http_request_duration_seconds A histogram of the request duration.
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.05"} 24054
http_request_duration_seconds_bucket{le="0.1"} 33444
http_request_duration_seconds_bucket{le="0.2"} 100392
http_request_duration_seconds_bucket{le="0.5"} 129389
http_request_duration_seconds_bucket{le="1"} 133988
http_request_duration_seconds_bucket{le="+Inf"} 144320
http_request_duration_seconds_sum 53423
http_request_duration_seconds_count 144320
 
# Finally a summary, which has a complex representation, too:
# HELP rpc_duration_seconds A summary of the RPC duration in seconds.
# TYPE rpc_duration_seconds summary
rpc_duration_seconds{quantile="0.01"} 3102
rpc_duration_seconds{quantile="0.05"} 3272
rpc_duration_seconds{quantile="0.5"} 4773
rpc_duration_seconds{quantile="0.9"} 9001
rpc_duration_seconds{quantile="0.99"} 76656
rpc_duration_seconds_sum 1.7560473e+07
rpc_duration_seconds_count 2693

(b) Ability to obtain monitored operational information

This function acquires operation information (performance data) from the monitoring target. The process of collecting operational information is performed by a program called "Exporter".

In response to scrape requests sent from the Prometheus server to the Exporter, the Exporter collects operational information from the monitored target and returns the results to Prometheus.

Exporters shipped with JP1/IM - Agent scrape only from Prometheus in JP1/IM - Agent that cohabits. Do not scrape from Prometheus provided by other hosts or users.

This section describes the functions of each exporter included with JP1/IM - Agent.

(c) Windows exporter (Windows performance data collection capability)

Windows exporter is an exporter that can be embedded in the monitored Windows host and obtain the operating information of the Windows host.

Windows exporter is installed on the same host as the Prometheus server, and upon a scrape request from the Prometheus server, it collects operational information from the Windows OS of the host and returns it to the Prometheus server.

It is possible to collect operational information related to memory and disk, which cannot be collected by monitoring from outside the host (external monitoring by URL or CloudWatch), from inside the host.

In addition, with JP1/IM - Manager and JP1/IM - Agent version 13-01 or later, you can monitor the operational status of integrated agent host (Windows) services (programs registered in Windows services) (service monitoring function^#).

Note that you cannot use the service monitoring function by running JP1/IM - Agent inside the containers.

#

If you use the service monitoring function in an environment where the version is upgraded from 13-00 to 13-01 or later, you need to configure the settings to perform service monitoring. The following are JP1/IM - Manger and JP1/IM - Agent setup instructions:

Where to find instructions for setting up JP1/IM - Manager: See Editing category name definition file for IM management nodes (imdd_category_name.conf) (Optional) in 1.19.3(1)(d) Settings of product plugin (for Windows) in the JP1/Integrated Management 3 - Manager Configuration Guide.
Where to find instructions for setting up JP1/IM - Agent: See the instructions for configuring service monitoring in 1.21.2(3)(f) Configuring service monitoring (for Windows) (optional) and 1.21.2(5)(b) Modify metric to Collect (Optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.

This feature creates an IM management node for each service that you want to monitor. For details on displaying the tree, see 3.15.6(1)(i) Tree Format. If you configure an alert, a JP1 event is issued when the service is stopped and registered with IM management node corresponding to the stopped service. You can check the operational status of the past service from the service trend display.

■ Main items to be acquired

The main retrieval items of Windows exporter are defined in Windows exporter metric definition file (default) and Windows exporter (Service monitoring) metric definition file (default). For details, see Windows exporter metric definition file (metrics_windows_exporter.conf) in Chapter 2. Definition Files and Windows exporter (Service monitoring) metric definition file (metrics_windows_exporter_service.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can add retrieved items to the metric definition file. The following are the metrics that can be specified in the PromQL statement described in the definition file. For details of "Collector" in the table, refer to the description of "Collector" at the bottom of the table.

Metric Name	Collector	What to Get	Label
windows_cache_copy_read_hits_total	cache	Number of copy read requests that hit the cache (cumulative)	`instance:` Instance identification string `job:` Job name
windows_cache_copy_reads_total	cache	Number of reads from the file system cache page (cumulative)	`instance:` Instance identification string `job:` Job name
windows_cpu_time_total	cpu	Number of seconds of processor time spent per mode (cumulative)	`instance:` Instance identification string `job:` Job name `core:` coreid `mode:` Mode^# `#` Contains one of the following: `"dpc"` `"idle"` `"interrupt"` `"privileged"` `"user"`
windows_cs_physical_memory_bytes	cs	Number of bytes of the physical memory capacity	`instance:` Instance identification string `job:` Job name
windows_logical_disk_idle_seconds_total	logical_disk	Number of seconds that the disk was idle (cumulative)	`instance:` Instance identification string `job:` Job name `volume:` Volume name
windows_logical_disk_free_bytes	logical_disk	Number of bytes of unused disk space	`instance:` Instance identification string `job:` Job name `volume:` Volume name
windows_logical_disk_read_bytes_total	logical_disk	Number of bytes transferred from disk during the read operation (cumulative)	`instance:` Instance identification string `job:` Job name `volume:` Volume name
windows_logical_disk_read_seconds_total	logical_disk	Number of seconds that the disk was busy for read operations (cumulative)	`instance:` Instance identification string `job:` Job name `volume:` Volume name
windows_logical_disk_reads_total	logical_disk	Number of read operations to disk (cumulative)	`instance:` Instance identification string `job:` Job name `volume:` Volume name
windows_logical_disk_requests_queued	logical_disk	Number of requests queued on disk	`instance:` Instance identification string `job:` Job name `volume:` Volume name
windows_logical_disk_size_bytes	logical_disk	Disk space bytes	`instance:` Instance identification string `job:` Job name `volume:` Volume name
windows_logical_disk_write_bytes_total	logical_disk	Number of bytes transferred to disk during the write operation (cumulative)	`instance:` Instance identification string `job:` Job name `volume:` Volume name
windows_logical_disk_write_seconds_total	logical_disk	Number of seconds that the disk was busy for write operations (cumulative)	`instance:` Instance identification string `job:` Job name `volume:` Volume name
windows_logical_disk_writes_total	logical_disk	Number of disk write operations (cumulative)	`instance:` Instance identification string `job:` Job name `volume:` Volume name
windows_memory_available_bytes	memory	Number of bytes of unused space in physical memory Note: The total of zero, free, and standby (cached) areas allocated to a process or immediately available to the system.	`instance:` Instance identification string `job:` Job name
windows_memory_cache_bytes	memory	Number of bytes of physical memory used for file system caching	`instance:` Instance identification string `job:` Job name
windows_memory_cache_faults_total	memory	Number of page faults in the file system cache (cumulative)	`instance:` Instance identification string `job:` Job name
windows_memory_page_faults_total	memory	Number of times a page fault occurred (cumulative)	`instance:` Instance identification string `job:` Job name
windows_memory_pool_nonpaged_allocs_total	memory	Number of times a nonpageable physical memory region was allocated	`instance:` Instance identification string `job:` Job name
windows_memory_pool_paged_allocs_total	memory	Number of times you allocated a pageable physical memory region	`instance:` Instance identification string `job:` Job name
windows_memory_swap_page_operations_total	memory	Number of pages read from or written to disk to resolve hard page faults (cumulative)	`instance:` Instance identification string `job:` Job name
windows_memory_swap_pages_read_total	memory	Number of pages read from disk to resolve hard page faults (cumulative)	`instance:` Instance identification string `job:` Job name
windows_memory_swap_pages_written_total	memory	Number of pages written to disk to resolve hard page faults (cumulative)	`instance:` Instance identification string `job:` Job name
windows_memory_system_cache_resident_bytes	memory	Number of active system file cache bytes in physical memory	`instance:` Instance identification string `job:` Job name
windows_memory_transition_faults_total	memory	The number of page faults resolved by recovering pages that were in use by other processes sharing the page, pages that were on the modified pages list or standby list, or pages that were written to disk (cumulative)	`instance:` Instance identification string `job:` Job name
windows_net_bytes_received_total	net	Number of bytes received by the interface (cumulative) Note: If the NIC name contains characters other than half-width alphanumeric characters, these characters are converted to underscores and set in the NIC label.	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
windows_net_bytes_sent_total	net	Number of bytes sent from the interface (cumulative) Note: If the NIC name contains characters other than half-width alphanumeric characters, these characters are converted to underscores and set in the NIC label.	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
windows_net_bytes_total	net	Number of bytes received and transmitted by the interface (cumulative) Note: If the NIC name contains characters other than half-width alphanumeric characters, these characters are converted to underscores and set in the NIC label.	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
windows_net_packets_sent_total	net	Number of packets sent by the interface (cumulative) Note: If the NIC name contains characters other than half-width alphanumeric characters, these characters are converted to underscores and set in the NIC label.	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
windows_net_packets_received_total	net	Number of packets received by the interface (cumulative) Note: If the NIC name contains characters other than half-width alphanumeric characters, these characters are converted to underscores and set in the NIC label.	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
windows_system_context_switches_total	system	Number of context switches (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
windows_system_processor_queue_length	system	Number of threads in the processor queue	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
windows_system_system_calls_total	system	Number of times the process called the OS service routine (cumulative)	`instance:` Instance identification string `job:` Job name
windows_process_start_time	process	Time of process start	`instance:` `instance-identifier-string` `job:` `job-name` `process:` `process-name` `process_id:` `process-ID` `creating_process_id:` `creator-process-ID`
windows_process_cpu_time_total	process	Returns elapsed time that all of the threads of this process used the processor to execute instructions by mode (privileged, user). An instruction is the basic unit of execution in a computer, a thread is the object that executes instructions, and a process is the object created when a program is run. Code executed to handle some hardware interrupts and trap conditions is included in this count.	`instance:` `instance-identifier-string` `job:` `job-name` `process:` `process-name` `process_id:` `process-ID` `creating_process_id:` `creator-process-ID` `mode:` `mode` (privileged or user)
windows_process_io_bytes_total	process	Bytes issued to I/O operations in different modes (read, write, other). This property counts all I/O activity generated by the process to include file, network, and device I/Os. Read and write mode includes data operations; other mode includes those that do not involve data, such as control operations.	`instance:` `instance-identifier-string` `job:` `job-name` `process:` `process-name` `process_id:` `process-ID` `creating_process_id:` `creator-process-ID` `mode:` `mode` (privileged or user)
windows_process_io_operations_total	process	I/O operations issued in different modes (read, write, other). This property counts all I/O activity generated by the process to include file, network, and device I/Os. Read and write mode includes data operations; other mode includes those that do not involve data, such as control operations.	`instance:` `instance-identifier-string` `job:` `job-name` `process:` `process-name` `process_id:` `process-ID` `creating_process_id:` `creator-process-ID` `mode:` `mode` (read, write, or other)
windows_process_page_faults_total	process	Page faults by the threads executing in this process. A page fault occurs when a thread refers to a virtual memory page that is not in its working set in main memory. This can cause the page not to be fetched from disk if it is on the standby list and hence already in main memory, or if it is in use by another process with which the page is shared.	`instance:` `instance-identifier-string` `job:` `job-name` `process:` `process-name` `process_id:` `process-ID` `creating_process_id:` `creator-process-ID`
windows_process_page_file_bytes	process	Current number of bytes this process has used in the paging file(s). Paging files are used to store pages of memory used by the process that are not contained in other files. Paging files are shared by all processes, and lack of space in paging files can prevent other processes from allocating memory.	`instance:` `instance-identifier-string` `job:` `job-name` `process:` `process-name` `process_id:` `process-ID` `creating_process_id:` `creator-process-ID`
windows_process_pool_bytes	process	Pool Bytes is the last observed number of bytes in the paged or nonpaged pool. The nonpaged pool is an area of system memory (physical memory used by the operating system) for objects that cannot be written to disk, but must remain in physical memory as long as they are allocated. The paged pool is an area of system memory (physical memory used by the operating system) for objects that can be written to disk when they are not being used. Nonpaged pool bytes is calculated differently than paged pool bytes, so it might not equal the total of paged pool bytes.	`instance:` `instance-identifier-string` `job:` `job-name` `process:` `process-name` `process_id:` `process-ID` `creating_process_id:` `creator-process-ID` `pool:` paged (pool paged) or nonpaged (pool non paged)
windows_process_priority_base	process	Current base priority of this process. Threads within a process can raise and lower their own base priority relative to the process base priority of the process.	`instance:` `instance-identifier-string` `job:` `job-name` `process:` `process-name` `process_id:` `process-ID` `creating_process_id:` `creator-process-ID`
windows_process_private_bytes	process	Current number of bytes this process has allocated that cannot be shared with other processes.	`instance:` `instance-identifier-string` `job:` `job-name` `process:` `process-name` `process_id:` `process-ID` `creating_process_id:` `creator-process-ID`
windows_process_virtual_bytes	process	Current size, in bytes, of the virtual address space that the process is using. Use of virtual address space does not necessarily imply corresponding use of either disk or main memory pages. Virtual space is finite and, by using too much, the process can limit its ability to load libraries.	`instance:` `instance-identifier-string` `job:` `job-name` `process:` `process-name` `process_id:` `process-ID` `creating_process_id:` `creator-process-ID`
windows_service_state	service	The state of the service (State)	`instance:` `instance-identifier-string` `job:` `job-name` `name: service-name`^#1 `state:` service-status^#² #1 Uppercase letters are converted to lowercase. #2 Contains one of the following: `continue pending` (pending continuation) `pause pending` (suspended) `paused` (paused) `running` (running) `start pending` (pending startup) `stop pending` (suspended) `stopped` (stopped) `unknown` (unknown)

■ Collector

Windows exporter has a built-in collection process called a "collector" for each monitored resource such as CPU and memory.

If you want to add the metrics listed in the table above as acquisition fields, you must enable the collector corresponding to the metric you want to use. You can also disable collectors of metrics that you do not want to collect to suppress unnecessary collection.

Enable/disable for each collector can be specified with the "--collectors.enabled" option on the Windows exporter command line or in the item "collectors.enabled" in the Windows exporter configuration file (jpc_windows_exporter.yml).

For details about Windows exporter command-line options, see the description of windows_exporter command options in Service definition file (jpc_program-name.service.xml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

For details about Windows exporter configuration file entry "collectors.enabled", see the description of item collectors in Windows exporter configuration file (jpc_windows_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Specifying Monitored Services

When using the service monitoring function of Windows exporter, the service to be monitored is specified in the "services-where" field of Windows exporter configuration file (jpc_windows_exporter.yml).

For information about Windows exporter configuration file entry "services-where", see the entry "services-where" in Windows exporter configuration file (jpc_windows_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

The value of name label of the metric output by service collectors of Windows exporter is set to the service name. If half-width uppercase characters are included in the service name of the monitoring target, they are converted to half-width lowercase characters and set. When full-pitch uppercase characters are included, they are converted to full-pitch lowercase characters and set.

- About Monitoring JP1/IM - Agent Services

For the service name of JP1/IM - Agent service, see 10.1 Service of JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Administration Guide. For information about the service name in a logical host environment, see 7.3.6 Newly installing JP1/IM - Agent with integrated agent host (for Windows) in the JP1/Integrated Management 3 - Manager Configuration Guide.

Note that you cannot use the service monitoring function to monitor Prometheus server and Windows exporter services.

(d) Node exporter (Linux performance data collection capability)

Node exporter is an exporter that can be embedded in a monitored Linux host to obtain operating information of a Linux host.

The Node exporter is installed on the same host as the Prometheus server, and upon a scrape request from the Prometheus server, it collects operational information from the Linux OS of the host and returns it to the Prometheus server.

In addition, with JP1/IM - Manager and JP1/IM - Agent version 13-01 or later, you can monitor the operational status of integrated agent host (Linux) service (program registered in Systemd) (service monitoring function^#).

Note that you cannot use the service monitoring function by running JP1/IM - Agent inside the containers.

#

If you use the service monitoring function in an environment where the version is upgraded from 13-00 to 13-01 or later, you need to configure the settings to perform service monitoring.

The following are JP1/IM - Manger and JP1/IM - Agent setup instructions:

Where to find instructions for setting up JP1/IM - Manager: See Editing category name definition file for IM management nodes (imdd_category_name.conf) (Optional) in 1.19.3(1)(d) Settings of product plugin (for Windows) in the JP1/Integrated Management 3 - Manager Configuration Guide.
Where to find instructions for setting up JP1/IM - Agent: Refer to the instructions for configuring service monitoring in 2.19.2(3)(f) Configuring service monitor settings (for Linux) (Optional) and 2.19.2(5)(b) Change metric to collect (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.

■ Main items to be acquired

The main retrieval items of Node exporter are defined in Node exporter metric definition file (default) and Node exporter (Service monitoring) metric definition file (default). For details, see Node exporter metric definition file (metrics_node_exporter.conf) and Node exporter (service monitoring) metric definition file (metrics_windows_exporter_service.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

Metric Name	Collector	What to Get	Label
node_boot_time_seconds	stat	Last boot time Note: Shown in UNIX time, including microseconds.	`instance:` Instance identification string `job:` Job name
node_context_switches_total	stat	Number of times a context switch has been made (cumulative)	`instance:` Instance identification string `job:` Job name
node_cpu_seconds_total	cpu	CPU seconds spent in each mode (cumulative)	`instance:` Instance identification string `job:` Job name `cpu:` cpuid `mode:` Mode^# `#` Contains one of the following: `user` `nice` `system` `idle` `iowait` `irq` `soft` `steal`
node_disk_io_now	diskstats	Number of disk I/Os currently in progress	`instance:` Instance identification string `job:` Job name `device:` Device name
node_disk_io_time_seconds_total	diskstats	Seconds spent on disk I/O (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Device name
node_disk_read_bytes_total	diskstats	Number of bytes successfully read from disk (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Device name
node_disk_read_time_seconds_total	diskstats	Seconds took to read from disk (cumulative value)	`instance:` Instance identification string `job:` Job name `device:` Device name
node_disk_reads_completed_total	diskstats	Number of successfully completed reads from disk (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Device name
node_disk_write_time_seconds_total	diskstats	Seconds took to write to disk (cumulative value)	`instance:` Instance identification string `job:` Job name `device:` Device name
node_disk_writes_completed_total	diskstats	Number of successfully completed disk writes (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Device name
node_disk_written_bytes_total	diskstats	Number of bytes successfully written to disk (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Device name
node_filesystem_avail_bytes	filesystem	Number of file system bytes available to non-root users	`instance:` Instance identification string `job:` Job name `fstype:` File System Type `mountpoint:` Mount Point
node_filesystem_files	filesystem	Number of file nodes in the file system	`instance:` Instance identification string `job:` Job name `fstype:` File System Type `mountpoint:` Mount Point
node_filesystem_files_free	filesystem	Number of free file nodes in the file system	`instance:` Instance identification string `job:` Job name `fstype:` File System Type `mountpoint:` Mount Point
node_filesystem_free_bytes	filesystem	Number of bytes of free file system space	`instance:` Instance identification string `job:` Job name `fstype:` File System Type `mountpoint:` Mount Point
node_filesystem_size_bytes	filesystem	Number of bytes in file system capacity	`instance:` Instance identification string `job:` Job name `fstype:` File System Type `mountpoint:` Mount Point
node_intr_total	stat	Number of interrupts handled (cumulative)	`instance:` Instance identification string `job:` Job name
node_load1	loadavg	One-minute average of the number of jobs in the run queue	`instance:` Instance identification string `job:` Job name
node_load15	loadavg	15-minute average of the number of jobs in the run queue	`instance:` Instance identification string `job:` Job name
node_load5	loadavg	5-minute average of the number of jobs in the run queue	`instance:` Instance identification string `job:` Job name
node_memory_Active_file_bytes	meminfo	Bytes of recently used file cache memory Note: The value obtained by converting the Active(file) of /proc/meminfo to bytes.	`instance:` Instance identification string `job:` Job name
node_memory_Buffers_bytes	meminfo	Number of bytes in the file buffer Note: The value of Buffers converted to bytes in /proc/meminfo.	`instance:` Instance identification string `job:` Job name
node_memory_Cached_bytes	meminfo	Number of bytes in file read cache memory Note: This is the value of Cached converted to bytes in /proc/meminfo.	`instance:` Instance identification string `job:` Job name
node_memory_Inactive_file_bytes	meminfo	Number of bytes of file cache memory that have not been used recently Note: The value of the Inactive(file) of /proc/meminfo converted to bytes.	`instance:` Instance identification string `job:` Job name
node_memory_MemAvailable_bytes	meminfo	The number of bytes of memory available to start a new application without swapping Note: The value of MemAvailable in /proc/meminfo converted to bytes.	`instance:` Instance identification string `job:` Job name
node_memory_MemFree_bytes	meminfo	Number of bytes of free memory Note: The value of MemFree in /proc/meminfo converted to bytes.	`instance:` Instance identification string `job:` Job name
node_memory_MemTotal_bytes	meminfo	Total amount of bytes of memory Note: The value of MemTotal converted to bytes in /proc/meminfo.	`instance:` Instance identification string `job:` Job name
node_memory_SReclaimable_bytes	meminfo	Number of bytes in the Slab cache that can be reclaimed Note: SReclaimable in /proc/meminfo converted to bytes.	`instance:` Instance identification string `job:` Job name
node_memory_SwapFree_bytes	meminfo	Number of bytes of free swap memory space Note: The value of SwapFree in /proc/meminfo converted to bytes.	`instance:` Instance identification string `job:` Job name
node_memory_SwapTotal_bytes	meminfo	Bytes of total swap memory Note: This is the value of SwapTotal converted to bytes in /proc/meminfo.	`instance:` Instance identification string `job:` Job name
node_netstat_Icmp6_InMsgs	netstat	Number of ICMPv6 messages received (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_netstat_Icmp_InMsgs	netstat	Number of ICMPv4 messages received (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_netstat_Icmp6_OutMsgs	netstat	Number of ICMPv6 messages sent (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_netstat_Icmp_OutMsgs	netstat	Number of ICMPv4 messages sent (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_netstat_Tcp_InSegs	netstat	Number of TCP packets received (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_netstat_Tcp_OutSegs	netstat	Number of TCP packets sent (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_netstat_Udp_InDatagrams	netstat	Number of UDP packets received (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_netstat_Udp_OutDatagrams	netstat	Number of UDP packets sent (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_network_flags	netclass	A numeric value indicating the state of the interface Note: /sys/class/net/[iface]/flags is a decimal value.	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_network_iface_link	netclass	Interface serial number Note: The value of /sys/class/net/[iface]/iflink.	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_network_mtu_bytes	netclass	Interface MTU value Note: The value of /sys/class/net/[iface]/mtu.	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_network_receive_bytes_total	netdev	Number of bytes received by the network device (cumulative value)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_network_receive_errs_total	netdev	Number of network device receive errors (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_network_receive_packets_total	netdev	Number of packets received by network devices (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_network_transmit_bytes_total	netdev	Number of bytes sent by the network device (cumulative value)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_network_transmit_colls_total	netdev	Number of transmit collisions for network devices (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_network_transmit_errs_total	netdev	Number of transmission errors for network devices (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_network_transmit_packets_total	netdev	Number of packets sent by network devices (cumulative)	`instance:` Instance identification string `job:` Job name `device:` Network Device Name
node_time_seconds	time	Seconds of system time since the epoch (1970)	`instance:` Instance identification string `job:` Job name
node_uname_info	uname	System information obtained by the uname system call	`instance:` Instance identification string `job:` Job name `domainname:` NIS and YP domain names `machine:` Hardware Identifiers `nodename:` Machine name in some network defined at implementation time `release:` Operating system release number (e.g. "2.6.28") `sysname:` The name of the OS (e.g. "Linux") `version:` Operating system version
node_vmstat_pswpin	vmstat	Number of page swap-ins (cumulative) Note: The value of the pswpin in /proc/vmstat.	`instance:` Instance identification string `job:` Job name
node_vmstat_pswpout	vmstat	Number of page swap-outs (cumulative) Note: The value of pswpout in /proc/vmstat.	`instance:` Instance identification string `job:` Job name
node_systemd_unit_state	systemd	The state of the systemd unit.	`instance:` instance-identifier-string `job:` job-name `name:` unit-file-name `state:` service-status^#1 `type:` How to launch a process^#2 #1 Contains one of the following: `activating` (during startup) `active` (running) `deactivating` (stopped) `failed` (failed to execute) `inactive` (stopped) #2 Contains the Type value of the unit file.

■ Collector

The Node exporter has a built-in collection process called a "collector" for each monitored resource such as CPU and memory.

Per-collector enable/disable can be specified in the Node exporter command line options. Specify the collector to enable with the "--collector.collector-name" option and the collector to disable with the "--no-collector.collector-name" option.

For details about Node exporter command-line options, see the description of node_exporter command options in Unit definition file (jpc_program-name.service) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Specifying monitored services

When using the service monitoring function of Node exporter, the service to be monitored is specified in the "services-where" field of Node exporter unit definition file (jpc_node_exporter.service). Collects performance data for the service specified in this file that meets one of the following conditions:

Automatic start of monitored services is enabled (running systemctl enable)
Automatic startup of monitored services is disabled, but the status is active

Performance data for services with auto-start disabled is not collected while the service is stopped. Therefore, if you want to monitor a service that has auto-start disabled and is stopped, start the service that you want to monitor and collect performance data prior to creating IM management node tree.

For unit definition file, see the description in item "--collector.systemd.unit-include" in "node_exporter command options" in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

- About monitoring JP1/IM - Agent services

For unit definition file name of JP1/IM - Agent services, see 10.1 Service of JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Administration Guide. For unit definition file name in a logical host environment, see 8.3.6 Newly installing JP1/IM - Agent with integrated agent host (for UNIX) in the JP1/Integrated Management 3 - Manager Configuration Guide.

Note that you cannot use the service monitoring function to monitor Prometheus server and Node exporter services.

(e) Process exporter (Linux process data collection capability)

Process exporter, built into a monitored Linux host, collects operating information of processes running on that host.

Installed in the same host as Prometheus server, Process exporter collects operating information of the processes from the Linux OS on the host when triggered by scraping requests from Prometheus server, and returns it to the server.

Process exporter allows you to collect process-related operating information, which cannot be obtained through monitoring from outside the host (such as synthetic monitoring with URLs or CloudWatch), from within the host.

■ Key metric items

The key Process exporter metric items are defined in the Process exporter metric definition file (initial status). For details, see Process exporter metric definition file (metrics_process_exporter.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can add more metric items to the metric definition file. The following table shows the metrics you can specify with PromQL statements used within the definition file.

Metric name	Data to be obtained	Label
namedprocess_namegroup_num_procs	Number of processes in this group.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name`
namedprocess_namegroup_cpu_seconds_total	CPU usage based on `/proc/[pid]/stat fields utime(14)` and `stime(15)` i.e. user and system time.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name` `mode:` user or system
namedprocess_namegroup_read_bytes_total	Bytes read based on `/proc/[pid]/io` field `read_bytes`. As `/proc/[pid]/io` are set by the kernel as read only to the process' user, to get these values you should run process-exporter either as that user or as root. Otherwise, we can't read these values and you'll get a constant 0 in the metric.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name`
namedprocess_namegroup_write_bytes_total	Bytes written based on `/proc/[pid]/io` field `write_bytes`.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name`
namedprocess_namegroup_major_page_faults_total	Number of major page faults based on `/proc/[pid]/stat` field `majflt(12)`.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name`
namedprocess_namegroup_minor_page_faults_total	Number of minor page faults based on `/proc/[pid]/stat` field `minflt(10)`.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name`
namedprocess_namegroup_context_switches_total	Number of context switches based on `/proc/[pid]/status` fields `voluntary_ctxt_switches` and `nonvoluntary_ctxt_switches`. The extra label ctxswitchtype can have two values: voluntary and nonvoluntary.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name` `ctxswitchtype:` voluntary or nonvoluntary
namedprocess_namegroup_memory_bytes	Number of bytes of memory used. The extra label `memtype` can have three values: resident: Field `rss(24)` from `/proc/[pid]/stat`. This is just the pages which count toward text, data, or stack space. This does not include pages which have not been demand-loaded in, or which are swapped out. virtual: Field `vsize(23)` from `/proc/[pid]/stat`, virtual memory size. swapped: Field `VmSwap` from `/proc/[pid]/status`, translated from KB to bytes. If gathering smaps file is enabled, two additional values for memtype are added: proportionalResident: Sum of `Pss` fields from `/proc/[pid]/smaps` proportionalSwapped: Sum of `SwapPss` fields from `/proc/[pid]/smaps`	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name` `memtype:` resident, virtual, swapped, proportionalResident, or proportionalSwapped
namedprocess_namegroup_open_filedesc	Number of file descriptors, based on counting how many entries are in the directory `/proc/[pid]/fd`.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name`
namedprocess_namegroup_worst_fd_ratio	Worst ratio of open filedescs to filedesc limit, amongst all the procs in the group. The limit is the fd soft limit based on `/proc/[pid]/limits`.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name`
namedprocess_namegroup_oldest_start_time_seconds	Epoch time (seconds since 1970/1/1) at which the oldest process in the group started. This is derived from field `starttime(22)` from `/proc/[pid]/stat`, added to boot time to make it relative to epoch.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name`
namedprocess_namegroup_num_threads	Sum of number of threads of all process in the group. Based on `field num_threads(20)` from `/proc/[pid]/stat`.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name`
namedprocess_namegroup_states	Number of threads in the group in each of various states, based on the field `state(3)` from `/proc/[pid]/stat`. The extra label state can have these values: Running, Sleeping, Waiting, Zombie, Other.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name` `state:` Running, Sleeping, Waiting, Zombie, or Other
namedprocess_namegroup_thread_count	Number of threads in this thread subgroup.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name` `threadname:` `thread-name`
namedprocess_namegroup_thread_cpu_seconds_total	Same as cpu_user_seconds_total and cpu_system_seconds_total, but broken down per-thread subgroup.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name` `threadname:` `thread-name` `mode:` user or system
namedprocess_namegroup_thread_io_bytes_total	Same as read_bytes_total and write_bytes_total, but broken down per-thread subgroup. Unlike read_bytes_total/write_bytes_total, the label iomode is used to distinguish between read and write bytes.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name` `threadname:` `thread-name` `iomode:` read or write
namedprocess_namegroup_thread_major_page_faults_total	Same as major_page_faults_total, but broken down per-thread subgroup.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name`
namedprocess_namegroup_thread_minor_page_faults_total	Same as minor_page_faults_total, but broken down per-thread subgroup.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name`
namedprocess_namegroup_thread_context_switches_total	Same as context_switches_total, but broken down per-thread subgroup.	`instance:` `instance-identifier-string` `job:` `job-name` `groupname:` `group-name`

Important

Processes whose name contains multi-byte characters cannot be monitored.
Process exporter still continues to output information of processes that it collected once, even after the processes stop running. Therefore, if Process exporter is configured to collect information based on PIDs, new time-series data is added every time a process is restarted and its PID is changed, resulting in large amounts of unnecessary data.

Furthermore, it is not recommended to use PIDs in open source software (OSS), and thus version 13-00 of our software is configured not to collect PID information by default (groupname). If the user wants to manage processes on the same command line separately, we recommend operational means, such as a change in the order of arguments or the use of PIDs (however, periodic restarts are needed to prevent collected information from accumulating continuously).

Note that information collected by Windows exporter is different from what Process exporter collects, because Windows exporter collects the PID information.

(f) Node exporter for AIX (AIX performance data collection capability)

A Node exporter for AIX is an Exporter that is embedded in a monitored AIX host to obtain the health of the host.

Node exporter for AIX is installed on a host other than Prometheus server and is returned to Prometheus server after scrape is requested from Prometheus server to collect operational data from AIX OS of the same host.

You can collect activity on memory and disks from inside the host that cannot be collected by monitoring from outside the host (external shape monitoring by URL or CloudWatch).

■ Prerequisites

It is a prerequisite that the ports used by Node exporter for AIX are protected by firewalls, networking configurations, and so on, so that they are not accessed by anything other than Prometheus server of JP1/IM - Agent.

For the ports used by Node exporter for AIX, see the explanation of node_exporter_aix command options in 10.4.2(1) Enabling registering services in the JP1/Integrated Management 3 - Manager Administration Guide.

■ Conditions to be monitored

See the Release Notes for the supporting OS of the host on which you are installing Node exporter for AIX.

WPAR is not supported.

Multiple boots of Node exporter for AIX on the same host are not supported, even if they are booted on both physical and logical hosts.

The logical host configuration of the monitored AIX hosts is supported only if the following conditions are met:

The hostname of the monitored AIX hostname can be uniquely resolved from Prometheus.

Note: If more than one IP address is assigned to AIX monitored host, Node exporter for AIX can be accessed by all IP addresses.

For the upper limit of Node exporter for AIX that can be monitored by one Prometheus server, refer to the limit value list in JP1/IM - Agent of Appendix D.1 Limits when using the Intelligent Integrated Management Base.

■ Main items to be acquired

The main retrieval items for Node exporter for AIX that JP1/IM - Agent ships with are defined in metric definition-file (default) of Node exporter for AIX. For details, see Node exporter for AIX metric definition file (metrics_node_exporter_aix.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can add retrieval items to metric definition file. The following table lists metric that can be specified for PromQL expression in the definition file:

Metric Name	Command-line options for retrieva	Contents to be acquired	Label	Data Source
node_context_switches	`-C`	Total number of context switches. (cumulative value)	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID`	Get by perfstat_cpu_total func pswitch of perfstat_cpu_t structure
node_cpu	`-C`	Seconds the cpus spent in each mode. (cumulative value)	`instance: instance-identity-string` `job:` job-name `cpu:` cpuid `mode:` mode (`idle`, `sys`, `user`, or `wait`)	Get by perfstat_cpu func Perfstat_cpu_t structure
aix_diskpath_wblks	`-D`	Blocks written via the path	`cpupool_id=physical-processor-shared-pooling-ID` `diskpath=disk pathname` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID`	Get by perfstat_diskpath func wblks of perfstat_diskpath_t structure
aix_diskpath_rblks	`-D`	Blocks read via the path	`cpupool_id=physical-processor-shared-pooling-ID` `diskpath=disk-path-name` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID`	Get by perfstat_diskpath func rblks of perfstat_diskpath_t structure
aix_disk_rserv	`-d`	Read or receive service time	`cpupool_id=physical-processor-shared-pooling-ID` `disk=disk-name` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `vgname=`volume-group-name	Get by perfstat_disk func rserv of perfstat_disk_t structure
aix_disk_rblks	`-d`	Number of blocks read from disk	`cpupool_id=physical-processor-shared-pooling-ID` `disk=disk-name` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `vgname=`volume-group-name	Get by perfstat_disk func rblks of perfstat_disk_t structures
aix_disk_wserv	`-d`	Write or send service time	`cpupool_id=physical-processor-shared-pooling-ID` `disk=disk-name` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `vgname=`volume-group-name	Get by perfstat_disk func wserv of perfstat_disk_t structure
aix_disk_wblks	`-d`	Number of blocks written to disk	`cpupool_id=physical-processor-shared-pooling-ID` `disk=disk-name` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `vgname=`volume-group-name	Get by perfstat_disk func wblks of perfstat_disk_t structure
aix_disk_time	`-d`	Amount of time disk is active	`cpupool_id=physical-processor-shared-pooling-ID` `disk=disk-name` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `vgname=`volume-group-name	Get by perfstat_disk func time of perfstat_disk_t structure
aix_disk_xrate	`-d`	Number of transfers from disk	`cpupool_id=physical-processor-shared-pooling-ID` `disk=disk-name` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `vgname=`volume-group-name	Get by perfstat_disk func xrate of perfstat_disk_t structure
aix_disk_xfers	`-d`	Number of transfers to/from disk	`cpupool_id=physical-processor-shared-pooling-ID` `disk=disk-name` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `vgname=`volume-group-name	Get by perfstat_disk func xfers of perfstat_disk_t structure
node_filesystem_avail_bytes	`-f`	Filesystem space available to non-root users in bytes.	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `device=device-name` `fstype=`file-system-type `mountpoint=`mount-point	Get by stat_filesystems func avail_bytes of filesystem structure
node_filesystem_files	`-f`	Filesystem total file nodes.	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `device=device-name` `fstype=`file-system-type `mountpoint=`mount-point	Get by stat_filesystems func files of filesystem structure
node_filesystem_files_free	`-f`	Filesystem total free file nodes.	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `device=device-name` `fstype=`file-system-type `mountpoint=`mount-point	Get by stat_filesystems func files_free of filesystem structure
node_filesystem_free_bytes	`-f`	Filesystem free space in bytes.	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `device=device-name` `fstype=`file-system-type `mountpoint=`mount-point	Get by stat_filesystems func free_bytes of filesystem structure
node_filesystem_size_bytes	`-f`	Filesystem size in bytes.	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `device=device-name` `fstype=`file-system-type `mountpoint=`mount-point	Get by stat_filesystems func size_bytes of filesystem structure
node_intr	`-C`	Total number of interrupts serviced.	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID`	Get by perfstat_cpu_total func decrintrs of perfstat_cpu_total_t structure mpcsintrs of perfstat_cpu_total_t structure devintrs of perfstat_cpu_total_t structure softintrs of perfstat_cpu_total_t structure
node_load1	`-C`	1m load average.	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID`	Get by perfstat_cpu_total func loadavg[0] of perfstat_cpu_total_t structure
node_load5	`-C`	5m load average.	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID`	Get by perfstat_cpu_total func loadavg[1] of perfstat_cpu_total_t structure
node_load15	`-C`	15m load average.	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID`	Get by perfstat_cpu_total func loadavg[2] of perfstat_cpu_total_t structure
aix_memory_real_avail	`-m`	Number of pages (in 4KB pages) of memory available without paging out working segments	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID`	Get by perfstat_memory_total func real_avail of perfstat_memory_total_t structure
aix_memory_real_free	`-m`	Free real memory (in 4 KB pages).	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID`	Get by perfstat_memory_total func real_free of perfstat_memory_total_t structures
aix_memory_real_inuse	`-m`	Real memory which is in use (in 4KB pages)	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID`	Get by perfstat_memory_total func real_inuse of perfstat_memory_total_t structures
aix_memory_real_total	`-m`	Total real memory (in 4 KB pages).	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID`	Get by perfstat_memory_total func perfstat_memory_total_t structure real_total
aix_netinterface_mtu	`-i`	Network frame size	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `netinterface=net interface name`	Get by perfstat_netinterface func mtu of perfstat_netinterface_t structure
aix_netinterface_ibytes	`-i`	Number of bytes received on interface	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `netinterface=net interface name`	Get by perfstat_netinterface func ibytes of perfstat_netinterface_t structure
aix_netinterface_ierrors	`-i`	Number of input errors on interface	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `netinterface=net interface name`	Get by perfstat_netinterface func ierrors of perfstat_netinterface_t structure
aix_netinterface_ipackets	`-i`	Number of packets received on interface	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `netinterface=net interface name`	Get by perfstat_netinterface func ipackets of perfstat_netinterface_t structure
aix_netinterface_obytes	`-i`	Number of bytes sent on interface	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `netinterface=net interface name`	Get by perfstat_netinterface func obytes of perfstat_netinterface_t structure
aix_netinterface_collisions	`-i`	Number of collisions on csma interface	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `netinterface=net interface name`	Get by perfstat_netinterface func collisions of perfstat_netinterface_t structure
aix_netinterface_oerrors	`-i`	Number of output errors on interface	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `netinterface=net interface name`	Get by perfstat_netinterface func oerrors of perfstat_netinterface_t structure
aix_netinterface_opackets	`-i`	Number of packets sent on interface	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance:` `instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID` `netinterface=net interface name`	Get by perfstat_netinterface func opackets of perfstat_netinterface_t structure
aix_memory_pgspins	`-m`	Number of page ins from paging space	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID`	Get by perfstat_memory_total func pgspins of perfstat_memory_total_t structure
aix_memory_pgspouts	`-m`	Number of pages paged out from paging space	`cpupool_id=physical-processor-shared-pooling-ID` `group_id=group-ID` `instance: instance-identity-string` `job:` job-name `lpar=partition-name` `machine_serial=machine-ID`	Get by perfstat_memory_total func pgspouts of perfstat_memory_total_t structure

Node exporter for AIX is collected for each monitored resource, such as CPU, memories. You can enable or disable collection for each resource that you want to monitor by using Node exporter for AIX command-line options.

For Node exporter for AIX command-line options, see the description of node_exporter_aix command options in 10.4.2(1) Enabling registering services in the JP1/Integrated Management 3 - Manager Administration Guide.

Use Script exporter to collect information about processes. For details on how to configure the settings, see 1.23.2(4)(e) Monitoring processes on monitored hosts (AIX) (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.

Use JP1/Base log file trap feature to monitor the log files of the monitored AIX hosts.

■ Notes on logging Node exporter for AIX

Node exporter for AIX log file is output to OS system log. Therefore, the destination depends on OS system log settings. For details on changing the output destination of the system log for Node exporter for AIX logging OS, see 1.23.2(4)(f) Changing Node exporter for AIX log output destination (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.

■ Precautions When Using SMT or Micro-Partitioning

In an SMT(Simultaneous multithreading) or Micro-Partitioning deployment, calculating CPU Utilization (cpu_used_rate) metric for Node exporter for AIX does not include physical CPU quotas, but calculating CPU utilization as displayed by sar command includes physical CPU quotas.

Therefore, CPU Utilization (cpu_used_rate) of Node exporter for AIX might show a lower metric than sar command output.

(g) Yet another cloudwatch exporter (Azure Monitor performance data collection capability)

Yet another cloudwatch exporter is an exporter included in the monitoring agent that uses Amazon CloudWatch to collect uptime information for AWS services in the cloud.

Yet another cloudwatch exporter is installed on the same host as the Prometheus server, and collects CloudWatch metrics obtained via the SDK provided by AWS (AWS SDK)^# upon scrape requests from the Prometheus server, and sends them to the Prometheus server. I will return it.

#: SDK provided by Amazon Web Services (AWS). Yet another cloudwatch exporter uses the Go language version of the AWS SDK for Go (V1). CloudWatch monitoring requires that Amazon CloudWatch supports the AWS SDK for Go (V1).

You can monitor services that cannot include Node exporter or Windows exporter.

■ Main items to be acquired

The main retrieval items of Yet another cloudwatch exporter are defined in Yet another cloudwatch exporter metric definition file (default). For details, see Yet another cloudwatch exporter metric definition file (metrics_ya_cloudwatch_exporter.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ CloudWatch metrics you can collect

You can collect metric of namespace name of AWS that is supported for monitoring by Yet another cloudwatch exporter of JP1/IM - Agent that is listed in 3.15.6(1)(k) Creating an IM Management Node for Yet another cloudwatch exporter.

Specify the metrics to collect by describing the AWS service name and CloudWatch metric name in the Yet another Cloudwatch Exporter configuration file (jpc_ya_cloudwatch_exporter.yml).

The following is an example of the description of the Yet another cloudwatch exporter configuration file when collecting CPUUtilization and DiskReadBytes for CloudWatch metrics for AWS/EC2 services.

discovery:

exportedTagsOnMetrics:

ec2:

- jp1_pc_nodelabel

jobs:

- type: ec2

regions:

- ap-northeast-1

period: 60

length: 300

delay: 60

nilToZero: true

searchTags:

- key: jp1_pc_nodelabel

value: .*

metrics:

- name: CPUUtilization

statistics:

- Maximum

- name: DiskReadBytes

statistics:

- Maximum

For details about what Yet another cloudwatch exporter configuration file describes, see Yet another cloudwatch exporter configuration file (jpc_ya_cloudwatch_exporter.yml in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can also add new metrics to the Yet another cloudwatch exporter metrics definition file using the metrics you set in the Yet another cloudwatch exporter configuration file.

The metrics and labels specified in the PromQL statement described in the definition file conform to the following naming conventions:

- Naming conventions for Exporter metrics

Yet another cloudwatch exporter treats the metric name of CloudWatch as the metric name of the exporter as the automatic conversion of the metric name in CloudWatch by the following rules. Also, the metric specified on the PromQL statement is described using the indicator name of the exporter.

"aws_"^#1+Namespace^#2+"_"+CloudWatch_Metric ^#2+"_"+Statistic_Type^#2

#1

Appended if the namespace does not begin with "aws_".

#2

Indicates the name you set in the Yet another cloudwatch exporter configuration file (jpc_ya_cloudwatch_exporter.yml). It is converted by the following rules:

It is converted from camel case notation to snake case notation.

CamelCase is a notation that capitalizes word breaks, such as "CamelCase" or "camelCase."

Snakecase is a notation that separates words with "_", such as "snake_case".
The following symbols are converted to "_".

whitespace,comma,tab, /, \, half-width period, -, :, =, full-width left double quote, @, <, >
"%" is converted to "_percent".

- Exporter label naming conventions

Yet another cloudwatch exporter treats the CloudWatch dimension tag name as the Exporter's label name, which is automatically converted by the following rules. Also, labels specified on the PromQL statement are described using the label name of the Exporter.

For dimensions

"dimension"+"_"+dimensions_name^#
For tags

"tag"+"_"+tag_name^#
For custom tags

"custom_tag_"+"_"+custom tag_name^#

#: Indicates the name you set in the Yet another cloudwatch exporter configuration file (jpc_ya_cloudwatch_exporter.yml).

■ About policies for IAM users in your AWS account

To connect to AWS CloudWatch, you must create a policy with the following permissions and assign it to an IAM user.

"tag:GetResources",
"cloudwatch:GetMetricData",
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics"

For details on how to set JSON format, see 2.19.2(8)(b) Modify Setup to connect to CloudWatch (for Linux) (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.

■ Environment-variable HTTPS_PROXY

Environment-variable that you specify when you connect to CloudWatch from a Yet another cloudwatch exporter through a proxy. The URL that can be set in the environment-variable HTTPS_PROXY is http only. Note that the only Authentication method supported is Basic authentication.

You can set the environment-variable HTTPS_PROXY to connect to AWS CloudWatch through proxies. The following shows an example configuration.

HTTPS_PROXY=http://username:password@proxy.example.com:5678

■ How to handle monitoring targets JP1/IM - Agent does not support

If you have a product or metric that cannot be monitored by JP1/IM - Agent, you must retrieve it, for example, using user-defined Exporter.

(h) Promitor (Azure Monitor performance data collection capability)

Promitor, included in the integrated agent, collects operating information of Azure services on the cloud environment through Azure Monitor and Azure Resource Graph.

Promitor consists of Promitor Scraper and Promitor Resource Discovery. Promitor Scraper collects metrics on resources from Azure Monitor according to schedule settings and returns them.

Metrics can be collected from target resources in two ways: one method is to specify the target resources separately in a configuration file and the other is to detect the resources automatically. If you choose to detect them automatically, Promitor Resource Discovery detects resources in a tenant through Azure Resource Graph, and based on the results, Promitor Scraper collects metric information.

In addition, both Promitor Scraper and Promitor Resource Discovery require two configuration files for each of them. One configuration file is to define runtime settings, such as authentication information, and the other is to define metric information to be collected.

■ Key metric items

The key Promitor metric items are defined in the Promitor metric definition file (initial status). For details, see the description under Promitor metric definition file (metrics_promitor.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Metrics you can collect

Promitor can collect metrics for the following services to monitor:

You specify metrics you want to collect in the Promitor Scraper configuration file (metrics-declaration.yaml).

If you want to change the metrics specified in the Promitor Scraper settings file, see Change monitoring metrics (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide 1.21.2(8) Set up of Promitor (d) Configuring scraping targets (required).

You can also add new metrics to the Promitor metric definition file, based on the metrics specified in the Promitor Scraper configuration file. Metrics defined in Promitor Scraper configuration file can be specified to the PromQL statement written in the definition file.

Table 3‒16: Services supported as monitoring targets by Promitor
Promitor resourceType name	Azure Monitor namespace	Automatic discovery support
VirtualMachine	Microsoft.Compute/virtualMachines	Y
FunctionApp	Microsoft.Web/sites	Y
ContainerInstance	Microsoft.ContainerInstance/containerGroups	--
KubernetesService	Microsoft.ContainerService/managedClusters	Y
FileStorage	Microsoft.Storage/storageAccounts/fileServices	--
BlobStorage	Microsoft.Storage/storageAccounts/blobServices	--
ServiceBusNamespace	Microsoft.ServiceBus/namespaces	Y
CosmosDb	Microsoft.DocumentDB/databaseAccounts	Y
SqlDatabase	Microsoft.Sql/servers/databases	Y
SqlServer	Microsoft.Sql/servers/databases Microsoft.Sql/servers/elasticPools	--
SqlManagedInstance	Microsoft.Sql/managedInstances	Y
SqlElasticPool	Microsoft.Sql/servers/elasticPools	Y
LogicApp	Microsoft.Logic/workflows	Y

Legend:

Y: Automatic discovery is supported.

--: Automatic discovery is not supported.

■ Checking how Azure SDKs used by Promitor are supported

Promitor employs Azure SDK for .NET. An end of Azure SDK support is announced 12 months in advance. For details on the lifecycle of Azure SDK, see Lifecycle FAQ at the following website:

https://learn.microsoft.com/ja-jp/lifecycle/faq/azure#azure-sdk-----------

For the lifecycles of versions of Azure SDK libraries, you can find them in the following website:

https://azure.github.io/azure-sdk/releases/latest/all/dotnet.html

■ Credentials required for account information

Promitor can connect to Azure through the service principal method or the managed ID method. For details on the credentials assigned to the service principal and managed ID, see (a) Configuring the settings for establishing a connection to Azure (required) in the JP1/Integrated Management 3 - Manager Configuration Guide 1.21.2(8) Set up of Promitor.

(i) Blackbox exporter (Synthetic metric collector)

Blackbox exporter is an exporter that sends simulated requests to monitored Internet services on the network and obtains operation information obtained from the responses. The supported communication protocols are HTTP, HTTPS, and ICMP.

When the Blackbox exporter receives a scrape request from the Prometheus server, it throws a service request such as HTTP to the monitored target and obtains the response time and response. In addition, the execution results are summarized in the form of metrics and returned to the Prometheus server.

■ Main items to be acquired

The main retrieval items of Blackbox exporter are defined in Blackbox exporter metric definition file (default). For details, see Blackbox exporter metric definition file (metrics_blackbox_exporter.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can add retrieved items to the metric definition file. The following are the metrics that can be specified in the PromQL statement described in the definition file.

Metric Name	Prober	What to get	Label
probe_http_duration_seconds	http	The number of seconds taken per phase of the HTTP request Note: All redirects add up.	`instance:` Instance identification string `job:` Job name `phase:` Phase^# `#` Contains one of the following: `"resolve"` `"connect"` `"tls"` `"processing"` `"transfer"`
probe_http_content_length	http	HTTP content response length	`instance:` Instance identification string `job:` Job name
probe_http_uncompressed_body_length	http	Uncompressed response body length	`instance:` Instance identification string `job:` Job name
probe_http_redirects	http	Number of redirects	`instance:` Instance identification string `job:` Job name
probe_http_ssl	http	Whether SSL was used for the final redirect 0: TLS/SSL was not used 1: TLS/SSL was used	`instance:` Instance identification string `job:` Job name
probe_http_status_code	http	HTTP response status code value Note: If you are redirecting, the final status code is the value of the metric. If no redirection is performed, the first status code received is the value of the metric.	`instance:` Instance identification string `job:` Job name
probe_ssl_earliest_cert_expiry	http	Earliest expiring SSL certificate UNIX time	`instance:` Instance identification string `job:` Job name
probe_ssl_last_chain_expiry_timestamp_seconds	http	Expiration timestamp of the last certificate in the SSL chain Note: If you want to monitor this metric, you must specify false for the insecure_skip_verify parameter in the tls_config settings of the Blackbox exporter configuration file (jpc_blackbox_exporter.yml), place the certificate, and specify the path of the certificate file in the appropriate parameter.	`instance:` Instance identification string `job:` Job name
probe_ssl_last_chain_info	http	SSL leaf certificate information Note: This is the SHA256 hash value of the server certificate to be monitored. The hash value is set to the label "fingerprint_sha256".	`instance:` Instance identification string `job:` Job name `fingerprint_sha256:` SHA256 fingerprint on certificate
probe_tls_version_info	http	TLS version used Note: The TLS version, such as "TLS 1.2", is set to the label "version".	`instance:` Instance identification string `job:` Job name version:TLS Version
probe_http_version	http	HTTP version of the probe response	`instance:` Instance identification string `job:` Job name
probe_failed_due_to_regex	http	Whether the probe failed due to a regular expression check on the response body or response headers 0: Success 1: Failed	`instance:` Instance identification string `job:` Job name
probe_http_last_modified_timestamp_seconds	http	UNIX time showing Last-Modified HTTP response headers	`instance:` Instance identification string `job:` Job name
probe_icmp_duration_seconds	icmp	Seconds taken per phase of an ICMP request	`instance:` Instance identification string `job:` Job name `phase:` Phase^# `#` Contains one of the following: `resolve` Name Resolution Time `setup` Time from resolve completion to ICMP packet transmission `rtt` Time to get a response after setup
probe_icmp_reply_hop_limit	icmp	Hop limit (TTL for IPv4) value Note: Hop limit (TTL for IPv4) value	`instance` Instance identification string `job:` Job name
probe_success	--	Whether the probe was successful 0: Failed 1: Success	`instance` Instance identification string `job:` Job name
probe_duration_seconds	--	The number of seconds it took for the probe to complete	`instance` Instance identification string `job:` Job name

■ IP communication with monitored objects

Only IPv4 communication is supported.

■ Encrypted communication with monitored objects

HTTP monitoring enables encrypted communication using TLS. In this case, the Blackbox exporter acts as a TLS client to the monitored object (TLS server).

When using encrypted communication using TLS, specify it in item "tls_config" in the Blackbox exporter configuration file (jpc_blackbox_exporter.yml). In addition, the following certificate and key files must be prepared.

File	Format
CA certificate file	A file encoding an X509 public key certificate in pkcs7 format in PEM format
Client certificate file
Client certificate key file	A file in which the private key in pkcs1 or pkcs8 format is encoded in PEM format^# # You cannot use password-protected files.

File

Format

CA certificate file

A file encoding an X509 public key certificate in pkcs7 format in PEM format

Client certificate file

Client certificate key file

A file in which the private key in pkcs1 or pkcs8 format is encoded in PEM format^#

#: You cannot use password-protected files.

The available TLS versions and cipher suites are supported below.

Item	Scope of support
TLS Version	1.2 to 1.3
Cipher suites	"TLS_RSA_WITH_AES_128_CBC_SHA" (up to TLS 1.2) "TLS_RSA_WITH_AES_256_CBC_SHA" (up to TLS 1.2) "TLS_RSA_WITH_AES_128_GCM_SHA256" (TLS 1.2 only) "TLS_RSA_WITH_AES_256_GCM_SHA384" (TLS 1.2 only) "TLS_AES_128_GCM_SHA256" (TLS 1.3 only) "TLS_AES_256_GCM_SHA384" (TLS 1.3 only) "TLS_CHACHA20_POLY1305_SHA256" (TLS 1.3 only) "TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA" (up to TLS 1.2) "TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA" (up to TLS 1.2) "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA" (up to TLS 1.2) "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA" (up to TLS 1.2) "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256" (TLS 1.2 only) "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384" (TLS 1.2 only) "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256" (TLS 1.2 only) "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384" (TLS 1.2 only) "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256" (TLS 1.2 only) "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256" (TLS 1.2 only)

Item

Scope of support

TLS Version

1.2 to 1.3

Cipher suites

"TLS_RSA_WITH_AES_128_CBC_SHA" (up to TLS 1.2)
"TLS_RSA_WITH_AES_256_CBC_SHA" (up to TLS 1.2)
"TLS_RSA_WITH_AES_128_GCM_SHA256" (TLS 1.2 only)
"TLS_RSA_WITH_AES_256_GCM_SHA384" (TLS 1.2 only)
"TLS_AES_128_GCM_SHA256" (TLS 1.3 only)
"TLS_AES_256_GCM_SHA384" (TLS 1.3 only)
"TLS_CHACHA20_POLY1305_SHA256" (TLS 1.3 only)
"TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA" (up to TLS 1.2)
"TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA" (up to TLS 1.2)
"TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA" (up to TLS 1.2)
"TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA" (up to TLS 1.2)
"TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256" (TLS 1.2 only)
"TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384" (TLS 1.2 only)
"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256" (TLS 1.2 only)
"TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384" (TLS 1.2 only)
"TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256" (TLS 1.2 only)
"TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256" (TLS 1.2 only)

■ Timeout for collecting health information

In a network environment where response is slow (under normal conditions), operating information can be collected by adjusting the timeout period.

On the Prometheus server, you can specify the scrape request timeout period in the entry "scrape_timeout" of the Prometheus configuration file (jpc_prometheus_server.yml). For details, see the description of item scrape_timeout in Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

In addition, the timeout period when connecting from the Blackbox exporter to the monitoring target is 0.5 seconds before the value specified in "scrape_timeout" above.

■ Certificate expiration

When collecting operation information by HTTPS monitoring, the exporter receives a certificate list (server certificate and certificate list certifying server certificate) from the monitoring target.

The Blackbox exporter allows you to collect the expiration time (UNIX time) of the closest expiring certificate as a probe_ssl_earliest_cert_expiry metric.

You can also use the features in 3.15.1(3) Performance data monitoring notification function to monitor certificates that are close to their deadline, because you can calculate the number of seconds remaining before the deadline with the value calculated in probe_ssl_earliest_cert_expiry Metric Value-PromQL's time() function.

■ User-Agent value in HTTP request header when monitoring HTTP

The default value of User-Agent included in HTTP request header during HTTP monitoring is as shown below:

For version 13-00 or earlier

"Go-http-client/1.1"
For version 13-00-01 or later

"Blackbox Exporter/0.24.0"

You can change the value of User-Agent in the setting of item "headers" in the Blackbox exporter configuration file (jpc_blackbox_exporter.yml).

The following is an example of changing the value of User-Agent to "My-Http-Client".

modules:
  http:
    prober: http
    http:
      headers:
        User-Agent: "My-Http-Client"

For details, see the description of item headers in Blackbox exporter configuration file (jpc_blackbox_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ About HTTP 1.1 Name-Based Virtual Host Support

The Blackbox exporter supports HTTP 1.1 name-based virtual hosts and TLS Server Name Indication (SNI). You can monitor virtual hosts that disguise one HTTP/HTTPS server as multiple HTTP/HTTPS servers.

■ About TLS Server Authentication and Client Authentication

In Blackbox exporter's HTTPS monitoring, server authentication is performed using the CA certificate described in item "ca_file" of the Blackbox exporter configuration file (jpc_blackbox_exporter.yml) and the server certificate sent by the server when HTTPS communication with the server starts (TLS handshake).

If the sent certificate is incorrect (server name is incorrect, expired, self-certificate is used, etc.), HTTPS communication cannot be started and monitoring fails.

In addition, when a request is made to send a certificate from the monitored server at the start of HTTPS communication (TLS handshake), the client certificate described in item "cert_file" of the Blackbox exporter configuration file (jpc_blackbox_exporter.yml) is sent to the monitored server.

If the server validates the sent certificate, recognizes it as invalid, and returns an error to the Blackbox exporter via the TLS protocol (or if communication cannot be continued due to a loss of communication, etc.), the monitoring fails.

For details on the verification contents related to the client certificate and the operation in the event of an error on the monitored server, check the specifications of the monitored server (or relay device such as a load balancer).

To detect fraudulent certificates during server authentication, if you specify "true" in item "insecure_skip_verify" in the Blackbox exporter configuration file (jpc_blackbox_exporter.yml), HTTPS communication can be started without errors. However, in that case, the verification operation related to client authentication at the server will be invalidated.

For details, see the description of item insecure_skip_verify in Blackbox exporter configuration file (jpc_blackbox_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

Server authentication cannot be performed using certificates that the host name is not listed in the Subject Alternative Name field.

■ About cookie information

The Blackbox exporter does not use cookie information sent from the monitored target in the next HTTP communication request.

■ About external resources referenced from content included in the response body of HTTP communication

In Blackbox exporter, external resources (subframes, images, etc.) referenced from the content included in the response body of HTTP communication are not included in the monitoring range.

■ About Monitoring of Content Included in HTTP Communication Response Body

Since the Blackbox exporter does not parse the content, the execution result and execution time based on the syntax (HTML, javascript, etc.) in the content included in the response body of HTTP communication are not reflected in the monitoring result.

■ Precautions when the monitoring destination of HTTP monitoring redirects with Basic authentication

If the Blackbox exporter's HTTP monitoring destination redirects with Basic authentication, the Blackbox exporter sends the same Basic authentication username and password to the redirect source and destination. Therefore, when performing Basic authentication on both the redirect source and the redirect destination, the same user name and password must be set on the redirect source and the redirect destination.

(j) Script exporter (UAP monitoring capability)

Script exporter runs scripts on a host and gets results.

Installed in the same host as Prometheus, Script exporter runs a script on the host and gets a result when triggered by a scraping request from Prometheus server, and returns the result to the server.

Developing a script that gets UAP information and converts it to a metric and adding the script to Script exporter enables you to monitor applications that are not supported by Exporter as you want.

■ Key metric items

The key Script exporter metric items are defined in the Script exporter metric definition file (initial status). For details, see Script exporter metric definition file (metrics_script_exporter.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can add more metric items to the metric definition file. The following table shows the metrics you can specify with PromQL statements used within the definition file.

Metric name	Data to be obtained	Label
script_success	Script exit status (0 = error, 1 = success)	`instance:` `instance-identifier-string` `job:` `job-name` `script:` `script-name`
script_duration_seconds	Script execution time, in seconds.	`instance:` `instance-identifier-string` `job:` `job-name` `script:` `script-name`
script_exit_code	The exit code of the script.	`instance:` `instance-identifier-string` `job:` `job-name` `script:` `script-name`

Metric name

Data to be obtained

Label

script_success

Script exit status (0 = error, 1 = success)

instance: instance-identifier-string

job: job-name

script: script-name

script_duration_seconds

Script execution time, in seconds.

instance: instance-identifier-string

job: job-name

script: script-name

script_exit_code

The exit code of the script.

instance: instance-identifier-string

job: job-name

script: script-name

(k) OracleDB exporter (Oracle Database monitoring function)

OracleDB exporter is an Exporter for Prometheus that retrieves performance data from Oracle Database.

- About the number of sessions: If you monitor Oracle Database from OracleDB exporter, it connects to each scrape and disconnects when the data-collection is complete. The number of sessions when connecting is 1.

■ Conditions to be monitored

The following shows Oracle Database configurations that JP1/IM - Agent monitors and supports:

For non-clusters

Non CDB and CDB configurations
For Oracle RAC

CDB configuration

Because OracleDB exporter connects to one service in a single process, it launches more than one OracleDB exporter if there is more than one target.

Note

Oracle RAC One Node and Oracle Database Cloud Service are not supported.
HA clustering configuration on Oracle Database is not supported.

■ Acquisition items

The metrics that can be retrieved with the OracleDB exporter shipped with the JP1/IM - Agent are the metrics and cache_hit_ratio defined by the OracleDB exporter default.

OracleDB exporter retrieval items are defined in metric definition-file (default) of OracleDB exporter. For details, see OracleDB exporter metric definition file (metrics_oracledb_exporter.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

The following tables list metric that can be specified for PromQL expression in the definition file. The value of each metric is obtained by executing the SQL statement shown in the table to Oracle Database. For details about metric, contact Oracle based on SQL statement of the data source.

Metric name	Contents to be acquired	Label	Data source (SQL statement)
oracledb_sessions_value	Count of sessions	`status:` `status` `type:` `session type`	`SELECT status, type, COUNT(*) as value FROM v$session GROUP BY status, type`
oracledb_resource_current_utilization	Resource usage^#¹	`resource_name:` `resource_name`	`SELECT resource_name,current_utilization,CASE WHEN TRIM(limit_value) LIKE 'UNLIMITED' THEN '-1' ELSE TRIM(limit_value) END as limit_value FROM v$resource_limit`
oracledb_resource_limit_value	Resource usage limit^#¹ (UNLIMITED: -1)	`resource_name:` `resource_name`
oracledb_asm_diskgroup_total	Bytes of total size of ASM disk group	`name:` `disk group name`	`SELECT name,total_mb10241024 as total,free_mb10241024 as free FROM v$asm_diskgroup_stat where exists (select 1 from v$datafile where name like '+%')`
oracledb_asm_diskgroup_free	Bytes of free space available on ASM disk group	`name:` `disk group name`
oracledb_activity_execute_count	Total number of calls (user calls and recursive calls) executing SQL statements (cumulative value)	`none`	`SELECT name, value FROM v$sysstat WHERE name IN ('parse count (total)', 'execute count', 'user commits', 'user rollbacks', 'db block gets from cache', 'consistent gets from cache', 'physical reads cache')`
oracledb_activity_parse_count_total	Total number of parse calls (hard, soft and describe) (cumulative value)	`none`
oracledb_activity_user_commits	Total number of user commit (cumulative value)	`none`
oracledb_activity_user_rollbacks	The number of times a user manually issued a ROLLBACK statement, or the total number of times an error occurred during a user's transaction (cumulative value)	`none`
oracledb_activity_physical_reads_cache	Total number of data blocks read from disk to the buffer cache (cumulative value)	`none`
oracledb_activity_consistent_gets_from_cache	Number of times block read consistency was requested from the buffer cache (cumulative value)	`none`
oracledb_activity_db_block_gets_from_cache	Number of times CURRENT blocking was requested from the buffer cache (cumulative value)	`none`
oracledb_process_count	Count of Oracle Database active-processes	`none`	`SELECT COUNT(*) as count FROM v$process`
oracledb_wait_time_administrative	Hours spent waiting for Administrative wait class (in 1/100 seconds)^#²	`none`	`SELECT` `n.wait_class as WAIT_CLASS,` `round(m.time_waited/m.INTSIZE_CSEC,3) as VALUE` `FROM` `v$waitclassmetric m, v$system_wait_class n` `WHERE` `m.wait_class_id=n.wait_class_id AND n.wait_class != 'Idle'`
oracledb_wait_time_application	Hours spent waiting for Application wait class (in 1/100 seconds)^#²	`none`
oracledb_wait_time_commit	Hours spent waiting for Commit wait class (in 1/100 seconds)^#²	`none`
oracledb_wait_time_concurrency	Hours spent waiting for Concurrency wait class (in 1/100 seconds)^#²	`none`
oracledb_wait_time_configuration	Hours spent waiting for Configuration wait class (in 1/100 seconds)^#²	`none`
oracledb_wait_time_network	Hours spent waiting for Network wait class (in 1/100 seconds)^#²	`none`
oracledb_wait_time_other	Hours spent waiting for Other wait class (in 1/100 seconds)^#²	`none`
oracledb_wait_time_scheduler	Hours spent waiting for Scheduler wait class (in 1/100 seconds)^#²	`none`
oracledb_wait_time_system_io	Hours spent waiting for System I/O wait class (in 1/100 seconds)^#²	`none`
oracledb_wait_time_user_io	Hours spent waiting for User I/O wait class (in 1/100 seconds)^#²	`none`
oracledb_tablespace_bytes	Total bytes consumed by tablespaces	`tablespace:` `name of the tablespace` `type:` `tablespace contents`	`SELECT` `dt.tablespace_name as tablespace,` `dt.contents as type,` `dt.block_size * dtum.used_space as bytes,` `dt.block_size * dtum.tablespace_size as max_bytes,` `dt.block_size * (dtum.tablespace_size - dtum.used_space) as free,` `dtum.used_percent` `FROM dba_tablespace_usage_metrics dtum, dba_tablespaces dt` `WHERE dtum.tablespace_name = dt.tablespace_name` `ORDER by tablespace`
oracledb_tablespace_max_bytes	Maximum number of bytes in a tablespace	`tablespace:` `name of the tablespace` `type:` `tablespace contents`
oracledb_tablespace_free	Number of free bytes in the tablespace	`tablespace:` `name of the tablespace` `type:` `tablespace contents`
oracledb_tablespace_used_percent	Tablespace utilization If auto extension is ON, it is calculated with auto extension taken into account.	`tablespace:` `name of the tablespace` `type:` `tablespace contents`
oracledb_exporter_last_scrape_duration_seconds	The number of seconds taken the last scrape	`none`	`-`
oracledb_exporter_last_scrape_error	Whether the last scrape resulted in an error 0: Error 1: Success	`none`	`-`
oracledb_exporter_scrapes_total	Total number of times Oracle Database was scraped for metrics	`none`	`-`
oracledb_up	Whether the Oracle Database Server is up 0: Not running 1: Running	`none`	`-`

#1

In a PDB, the table in the source v$resource_limit is empty and cannot be retrieved.

#2

In a PDB, the table in the source v$waitclassmetric is empty and cannot be retrieved.

Important

Prior to using OracleDB exporter, make sure that SQL statements that serve as the data source can be executed, for example, with SQL*Plus command. This ensures that the required information can be displayed. Use OracleDB exporter to connect to Oracle Database when checking.
OracleDB exporter provided by JP1/IM - Agent does not support the ability to collect any metric (custom metrics).

■ Requirements for monitoring Oracle Database

When you monitor Oracle Database in OracleDB exporter, you must configure the following settings on Oracle Database:

You do not need to install Oracle Client, etc. on JP1/IM - Agent host-side.

Oracle listener
- Configure Oracle listener and servicename so that they can connect to the target.
- Oracle listener is configured to accept unencrypted connect requests.
Oracle Database

Set Oracle Database database-character set to the following:
- AL32UTF8 (Unicode UTF-8)
- JA16SJIS (Japanese-language SJIS)
- ZHS16GBK (Simplified Chinese GBK)
Users used to access Oracle Database
- Grant the permissions below to the users you want to use to connect to Oracle Database
  
  - Login permissions
  
  - SELECT permissions to the following tables
  
  dba_tablespace_usage_metrics
  
  dba_tablespaces
  
  v$system_wait_class
  
  v$asm_diskgroup_stat
  
  v$datafile
  
  v$sysstat
  
  v$process
  
  v$waitclassmetric
  
  v$session
  
  v$resource_limit
- User used to connect to Oracle Database
  
  For details about the character types and maximum lengths that can be specified for user names, see Environment variables.
- Password of the user used to connect to Oracle Database
  
  The following character types can be used for passwords:
  
  - Uppercase letters, lowercase letters, numbers, @, +, ', !, $, :, ., (, ), ~, -, _
  
  - The password can be from 1 to 30 bytes in length.

■ Obfuscation of Oracle Database passwords

OracleDB exporter shipped with JP1/IM - Agent manages the passwords in secret obfuscation capabilities for accessing Oracle Database from OracleDB exporter. For details, see 3.15.10 Secret obfuscation function.

■ Notes on Oracle Database log files

Monitoring Oracle Database with OracleDB exporter can generate a large number of logfiles. Therefore, Oracle Database administrator should consider deleting logfiles periodically.

Directory where log files are generated (including subdirectories)	Increasing log file extensions
`$ORACLE_BASE/diag/rdbms`	`.trc`, `.trm`

Directory where log files are generated

(including subdirectories)

Increasing log file extensions

$ORACLE_BASE/diag/rdbms

.trc, .trm

Below is a sample command line for deleting ".trc" or ".trm" files with older renewal dates. If necessary, consider running such commands periodically to delete unnecessary logs.

OS	Command line example for deleting logs
Windows	`forfiles /P "%ORACLE_BASE%\diag\rdbms" /M .trm /S /C "cmd /C del /Q @path" /D -14` `forfiles /P "%ORACLE_BASE%\diag\rdbms" /M .trc /S /C "cmd /C del /Q @path" /D -14`
Linux	`find $ORACLE_BASE/diag/rdbms -name '*.tr[cm]' -mtime +14 -delete`

Command line example for deleting logs

Windows

forfiles /P "%ORACLE_BASE%\diag\rdbms" /M *.trm /S /C "cmd /C del /Q @path" /D -14

forfiles /P "%ORACLE_BASE%\diag\rdbms" /M *.trc /S /C "cmd /C del /Q @path" /D -14

Linux

find $ORACLE_BASE/diag/rdbms -name '*.tr[cm]' -mtime +14 -delete

Set the $ORACLE_BASE and %ORACLE_BASE% environment variables as needed.

■ Environment variables

The following environment variables are required when using OracleDB exporter.

- Environment-variable "DATA_SOURCE_NAME" (mandatory)

Specify the destination of OracleDB exporter in the following format: There is no default value.

For Windows

oracle://user-name@host-name:port/service-name?connection timeout=10[&amp;instance name=instance-name]

For Linux

oracle://user-name@host-name:port/service-name?connection timeout=10[&instance name=instance-name]

user-name

Specifies the username to connect to Oracle listener. Up to 30 characters can be specified.
You can use uppercase letters, numbers, underscores, dollar signs, pound signs, periods, and at signs. Note that lowercase letters are not allowed.
For Linux, replace the pound sign with "%%23" when you include your username in unit definition file. For example, if you are a shared CDB user, specify "C##USER" as "C%%23%%23USER".
For Windows, replace the pound sign with %23 when you include the username in service definition file. For example, if you are a shared CDB user, specify "C##USER" as "C%23%23USER".

host-name

Specifies the host name of Oracle Database host to monitor. Up to 253 characters can be specified.
You can use uppercase letters, lowercase letters, numbers, hyphens, and periods.

port

Specifies the port number for connecting to Oracle listener.

service-name

Specifies the service name of Oracle listener. Up to 64 characters can be specified.
You can use uppercase letters, lowercase letters, numbers, underscores, hyphens, and periods.

Option

You can specify the following options. If you specify more than one, connect them with & in Windows and & in Linux.

connection timeout=number

Specifies the connection timeout in seconds. This option must be specified.

Be sure to specify 10. If you specify a value other than 10 or do not specify this option, scrape of Prometheus server times out and up metric may be 0 even if OracleDB exporter is running.
instance name=instance-name

Specifies instance to connect to. Specifying this option is optional.

(Example of specification)

oracle://orauser@orahost:1521/orasrv?connection timeout=10

For Windows

oracle://orauser@orahost:1521/orasrv?connection timeout=10&amp;instance name=orcl1

For Linux

oracle://orauser@orahost:1521/orasrv?connection timeout=10
&instance name=orcl1

- Environment variable DATA_SOURCE_NAME (mandatory)

Specify the full path of jp1ima directory under JP1/IM - Agent installation directory.

For a logical host, specify the full path of jp1ima directory under JP1/IM - Agent shared directory.

(Example of specification)

For Windows

C:\Program files\Hitachi\jp1ima

For Linux

/opt/jp1ima

■ Notes

If you try to stop the monitored Oracle Database instance and containers prior to stopping OracleDB exporter, NORMAL shutdown of Oracle may not terminate. Stop OracleDB exporter in advance or stop Oracle Database by IMMEDIATE shutdown
Shut down OracleDB exporter before making configuration changes or maintaining Oracle Database instance and containers.

(l) Fluentd (Log metrics)

This capability can generate and measure log metrics from log files created by monitoring targets. For details on the function, see 3.15.2 Log metrics.

■ Key metric items

You define what figures you need from the log files created by your monitoring targets in the log metrics definition file (fluentd_any-name_logmetrics.conf). These definitions allow you to get quantified data (log metrics) as metric items.

For details on the log metrics definition file, see Log metrics definition file (fluentd_any-name_logmetrics.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Sample files

The following provides descriptions of sample files for when you use the log metrics feature. If you copy the sample files, be careful of the linefeed codes. For details, see the description of each file of 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference. These sample files are based on the assumptions in Assumptions of the sample files. Copy each file and change the settings according to your monitoring targets.

- Assumptions of the sample files

The sample files described here assume that HostA, a monitored host (integrated agent host), exists and JP1/IM - Agent is installed in it, and that WebAppA, an application running on HostA, creates the following log file.

- ControllerLog.log

As shown in target log message 1, a log message is created, saying that an HTTP endpoint in WebAppA is used, at the start of processing of the request for that endpoint. The log message also indicates the number of records handled upon request processing.

Target log message 1:

...
2022-10-19 10:00:00 [INFO] c.b.springbootlogging.LoggingController : endpoint "/register" started. Target record: 5.
...

In the sample files, a regular expression to match target log message 1 is used, and the number of the log messages that match the expression is counted. The number is then displayed in the Trends tab of the JP1/IM integrated operation viewer as log metric 1, Requests to the register Endpoint.

The definition for log metric 1 uses counter as its log metric type.

In addition, the regular expression used in the above also extracts the number indicated as Target record from target log message 1, and then the extracted numbers are summed up. The total is then displayed in the Trends tab of the JP1/IM integrated operation viewer as log metric 2, Number of Registered Records.

The definition for log metric 2 uses counter as its log metric type.

Fluentd workers (multi-process workers feature) for the number of log files to be monitored are required. For details on the worker settings related to the log metrics feature, see the log metrics definition file (fluentd_any-name_logmetrics.conf). Here, it is assumed that 11 fluentd workers are running, and ControllerLog.log is monitored by a worker whose worker ID is 10.

These sample files also assume the tree structure consisting of the following IM management nodes:

All Systems
 + Host A
    + Application Server
       + WebAppA

- Target files in this example

The target files used in this example are as follows:

Integrated manager host

- User-specific metric definition file
Integrated agent host

- Prometheus configuration file

- User-specific discovery configuration file

- Log metrics definition file

- Fluentd log monitoring target definition file

- Sample user-specific metric definition file

- File name: metrics_logmatrics1.conf

- Written code

[
  {
    "name":"logmetrics_request_endpoint_register",
    "default":true,
    "promql":"logmetrics_request_endpoint_register and $jp1im_TrendData_labels",
    "resource_en":{
      "category":"HTTP",
      "label":"request_num_of_endpoint_register",
      "description":"The request number of endpoint register",
      "unit":"request"
    },
    "resource_ja":{
      "category":"HTTP",
      "label":"Requests to the register Endpoint",
      "description":"The request number of endpoint register",
      "unit":"request"
    }
  },
  {
    "name":"logmetrics_num_of_registeredrecord",
    "default":true,
    "promql":"logmetrics_num_of_registeredrecord and $jp1im_TrendData_labels",
    "resource_en":{
      "category":"DB",
      "label":"logmetrics_num_of_registeredrecord",
      "description":"The number of registered record",
      "unit":"record"
    },
    "resource_ja":{
      "category":"DB",
      "label":"Number of Registered Records",
      "description":"The number of registered record",
      "unit":"record"
    }
  }
]

Note: The storage directory, written code, and file name follow the format of the user-specific metric definition file (metrics_any-Prometheus-trend-name.conf).

- Sample Prometheus configuration file

- File name: jpc_prometheus_server.yml

- Written code

global:
  ...
(omitted)
  ...
scrape_configs:
  - job_name: 'LogMetrics'
    
    file_sd_configs:
      - files:
        - 'user/user_file_sd_config_logmetrics.yml'
    
    relabel_configs:
      - target_label: jp1_pc_nodelabel
        replacement: Log trapper(Fluentd)
    
    metric_relabel_configs:
      - target_label: jp1_pc_nodelabel
        replacement: ControllerLog
      - source_labels: ['__name__']
        regex: 'logmetrics_request_endpoint_register|logmetrics_num_of_registeredrecord'
        action: 'keep'
      - regex: (jp1_pc_multiple_node|jp1_pc_agent_create_flag)
        action: labeldrop
 
  ...
(omitted)
  ...

Note: The storage directory and written code follow the format of the Prometheus configuration file (jpc_prometheus_server.yml). You do not have to create a new file. Instead, you add the scrape_configs section for the log metrics feature to the Prometheus configuration file (jpc_prometheus_server.yml) created during installation.

- Sample user-specific discovery configuration file

- File name: user_file_sd_config_logmetrics.yml

- Written code

- targets:
  - HostA:24830
  labels:
    jp1_pc_exporter: logmetrics
    jp1_pc_category: WebAppA
    jp1_pc_trendname: logmetrics1
    jp1_pc_multiple_node: "{__name__=~'logmetrics_.*'}"
    jp1_pc_agent_create_flag: false

Note

The storage directory and written code follow the format of the user-specific discovery configuration file (file_sd_config_any-name.yml).

ControllerLog.log is monitored by the worker whose Fluentd worker ID is 10. Thus, when 24820 is set for port in the Sample log metrics definition file, the port number of the worker monitoring ControllerLog.log is 24820 + 10 = 24830.

- Sample log metrics definition file

- File name: fluentd_WebAppA_logmetrics.conf

- Written code

## Input
<worker 10>
  <source>
    @type prometheus
    bind '0.0.0.0'
    port 24820
    metrics_path /metrics
  </source>
</worker>
## Extract target log message 1
<worker 10>
  <source>
    @type tail
    @id logmetrics_counter
    path /usr/lib/WebAppA/ControllerLog/ControllerLog.log
    tag WebAppA.ControllerLog
    pos_file ../data/fluentd/tail/ControllerLog.pos
    read_from_head true
    <parse>
      @type regexp
      expression /^(?<logtime>[^\[]*) \[(?<loglebel>[^\]]*)\] (?<class>[^\[]*) : endpoint "\/register" started. Target record: (?<record_num>\d[^\[]*).$/
      time_key logtime
      time_format %Y-%m-%d %H:%M:%S
      types record_num:integer
    </parse>
  </source>
 
## Output
## Define log metrics 1 and 2
  <match WebAppA.ControllerLog>
    @type prometheus
    <metric>
      name logmetrics_request_endpoint_register
      type counter
      desc The request number of endpoint register
    </metric>
    <metric>
      name logmetrics_num_of_registeredrecord
      type counter
      desc The number of registered record
      key record_num
      <labels>
      loggroup ${tag_parts[0]}
      log ${tag_parts[1]}
      </labels>
    </metric>
  </match>
</worker>

Note: The storage directory and written code follow the format of the log metrics definition file (fluentd_any-name_logmetrics.conf).

- Sample Fluentd log monitoring target definition file

- File name: jpc_fluentd_common_list.conf

- Written code

## [Target Settings]
  ...
(omitted)
  ...
@include user/fluentd_WebAppA_logmetrics.conf

Note: The storage directory and written code follow the format of the Fluentd log monitoring target definition file (jpc_fluentd_common_list.conf) in JP1/IM - Agent definition files. You do not have to create a new file. Instead, you add the include section for the log metrics feature to the Fluentd log monitoring target definition file (jpc_fluentd_common_list.conf) created during installation.

(m) Whether Prometheus and Exporter are supported for the same host configuration and another host configuration

The following tables show whether Prometheus and Exporter can be supported for the same host configuration and another host configuration.

Table 3‒17: Whether or not Prometheus and Exporter host configuration are supported
Exporter type		Configuring Prometheus and Exporter hosts
Exporter type		Same host	Another host
Exporter provided by JP1/IM - Agent	Node exporter for AIX	N	Y
Exporter provided by JP1/IM - Agent	Exporter other than the above	Y	N
User-defined Exporter		Y	Y

Legend

Y: Supported

N: Not supported

The following configurations are not supported:

Configuring scrape from more than one Prometheus to the same Exporter
Exporter^# on the remote agent (the host on Exporter and the host being monitored are separate hosts)

#: Exporter of the remote agent is Exporter whose discovery configuration file contains the description "jp1_pc_remote_monitor_instance".

Also, if Prometheus and Exporter are configured on different hosts, it is assumed that the ports used by Exporter are protected by firewalls, network configurations, etc. so that they are not accessed by anyone other than JP1/IM - Agent's Prometheus server (e.g. by building integrated agent host and Exporter hosts in the same network so that they are not accessed externally).

To Page Top

(2) Centralized management of performance data

This function allows Prometheus server to store performance data collected from monitoring targets in the intelligent integrated management database of JP1/IM - Manager. It has the following features:

Remote light function
In addition, if JP1/IM - Agent 13-01 or later is newly installed, the service monitor performance data is centrally managed by default. When upgrading from JP1/IM - Agent 13-00 to 13-01 or later, you need to configure the settings to perform service monitoring. See 3.15.1(1)(c) Windows exporter (Windows performance data collection capability) and 3.15.1(1)(d) Node exporter (Linux performance data collection capability) for details on where to find setup instructions.

(a) Remote light function

This is a function in which the Prometheus server sends performance data collected from monitoring targets to an external database suitable for long-term storage. JP1/IM - Agent uses this function to send performance data to JP1/IM - Manager.

The following shows how to define a remote light.

Remote write definitions are described in the Prometheus server configuration file (jpc_prometheus_server.yml).
Download Prometheus server configuration file from integrated operation viewer, edit it in a text editor, modify Remote Write definition, and then upload it.

The following settings are supported by JP1/IM - Agent for defining Remote Write. For details about the settings, see Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

Table 3‒18: Settings for Remote Light Definition Supported by JP1/IM - Agent
Setting items	Description
Remote Light Destination (required)	Set the endpoint URL for JP1/IM agent control base.
Remote light timeout period (Optional)	You can set the timeout period if the remote light takes a long time. Change it if you are satisfied with the default value.
Relabeling (Optional)	You can remove unwanted metric and customize labeling.

To Page Top

(3) Performance data monitoring notification function

This function allows Prometheus server to monitor performance data collected from monitoring targets at a threshold value and notify JP1/IM - Manager. It has three functions:

Alert evaluation function
Alert notification function
Notification suppression function

If you add a service to be monitored in an environment where an alert definition for monitoring a service is set, the added service is also monitored. If you exclude a monitored service for which an alert has been fired from the monitoring target, you will receive an alert indicating that the alert that was fired has been recovered.

For an example of defining an alert, see Metric alert definition example in Node exporter metric definition file and Metric alert definition example in Windows exporter metric definition file in Alert configuration file (jpc_alerting_rules.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command Definition File API Reference. For Linux, the alerts are defined differently depending on whether or not the monitored auto-start is enabled (running systemctl enable). If you want to monitor a service for which automatic startup is disabled, you must create and configure an alert definition for each target.

(a) Alert evaluation function

This function monitors performance data collected from monitoring targets at a threshold value.

Define alert rules to evaluate alerts, monitor performance data at thresholds, and notify alerts.

Alerts can be evaluated by comparing the time series data directly with the thresholds, or by comparing the thresholds with the results of formulas using PromQL^#.

#: For details about PromQL, see 2.7.4(4) About PromQL.

For each time series of data or for each data generated by the calculation result of the PromQL expression, the alert status according to the evaluation is managed, and the action related to the notification is executed according to the alert state.

There are three alert states: pending, firing, and resolved. When the condition meets the alert rule first, it will be in the "pending" state. After that, when the condition continues to meet the alert rule (not resolved) during the time of "for" clause defined in the alert rule definition, it will be in the "firing" state.

When the condition does not meet(resolved), or if the time series is gone, it will be in the "resolved" state.

The relationship between alert status and notification behavior is as below.

Alert status	Description	Notification behavior
pending	The threshold is exceeded. The state the threshold is exceeded, but the time of "for" clause defined in the alert rule definition has not passed yet.	Do not notify alerts.
firing	The firing state. The state the threshold is exceeded, and the time of "for" clause defined in the alert rule definition has passed. Alternatively, the state the threshold is exceeded, and the "for" clause of the alert is not specified.	Notifies you of alerts.
resolved	The resolved state. The state the alert rule is no longer met.	When the condition recovers from the "firing" state, a notification of resolved is given. When the condition recovers from the "pending" state, no resolved notification is given.

The following shows how to define an alert rule.

Alert rule definitions are described in the alert configuration file (jpc_alerting_rules.yml) (definitions in any YAML format can also be described).
Before reflecting the created definition file in the environment to be used, format check and alert rule test with the promtool command.
Download alert configuration file from integrated operation viewer, edit it in a text editor, change the definition of the alert rule, and then upload it.

The following settings apply to the alert rule definitions supported by JP1/IM - Agent. For details about the settings, see Alert configuration file (jpc_alerting_rules.yml) in Chapter 2. definition file) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference. There is no default alert rule definition.

Table 3‒19: Settings for alert rule definitions supported by JP1/IM - Agent
Setting Item	Description
Alert Name (required)	Set the alert name.
Conditional expression (required)	Set the alert condition expression (threshold). It can be configured using PromQL.
Waiting time (required)	Set the amount of time to wait after entering the "pending" state before changing to the "firing" state. Change it if you are satisfied with the default value.
Label (required)	Set labels to add to alerts and recovery notifications. In JP1/IM - Agent, a specific label must be set.
Annotation (required)	Set to store additional information such as alert description and URL link. In JP1/IM - Agent, certain annotations must be set.

Labels and annotations can use the following variables:

Variable^#	Description
$labels	A variable that holds the label key-value pairs for the alert instance. The label key can be one of the following labels: When time series data is specified in the alarm evaluation conditional expression You can specify the label that the data retains. When time series data is specified in the alarm evaluation conditional expression You can specify the label that the data retains. When PromQL expression is specified as the condition expression for alarm evaluation You can specify a label that is set as the result of a PromQL expression. The label that the data retains depends on the metrics. With regards to the label, refer the description of the metrics that can be specified in the PromQL statement, in 3.15.1(1) Performance data collection function.
$values	A variable that holds the evaluation value of the alert instance. When a firing is notified, it is expanded to the value at the time the firing was detected. When the resolved notification, it is expanded to the value as of the firing just before resolved (note that it is not the value as of resolved).
$externalLabels	This variable holds the label and value set in "external_labels" of item "global" in the Prometheus configuration file (jpc_prometheus_server.yml).

Variable^#

Description

$labels

A variable that holds the label key-value pairs for the alert instance. The label key can be one of the following labels:

When time series data is specified in the alarm evaluation conditional expression

You can specify the label that the data retains.

When time series data is specified in the alarm evaluation conditional expression

You can specify the label that the data retains.
When PromQL expression is specified as the condition expression for alarm evaluation

You can specify a label that is set as the result of a PromQL expression.

The label that the data retains depends on the metrics.

With regards to the label, refer the description of the metrics that can be specified in the PromQL statement, in 3.15.1(1) Performance data collection function.

$values

A variable that holds the evaluation value of the alert instance.

When a firing is notified, it is expanded to the value at the time the firing was detected.

When the resolved notification, it is expanded to the value as of the firing just before resolved (note that it is not the value as of resolved).

$externalLabels

This variable holds the label and value set in "external_labels" of item "global" in the Prometheus configuration file (jpc_prometheus_server.yml).

#1

Variables are expanded by enclosing them in "{{" and "}}". The following is an example of how to use variables:

description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"

■ Alert rule definition for converting to JP1 events

In order to convert the alert to be notified into a JP1 event on the JP1/IM - Manager side, the following information must be set in the alert rule definition.

Setting item	Value to set	Uses
name	Configure any unique alert group definition name in integrated agent.	Alert group definition name
alert	Set any unique alert-definition-name in integrated agent.	Alert Definition Name
expr	Set the PromQL statement. It is recommended to set the PromQL statement described in the metric definition file. This way, when the JP1 event occurs, you can display trend information in the Integrated Operation Viewer.	Firing conditions^# # If the conditions are met, it is firing, and if the conditions are not met, it is resolved.
labels.jp1_pc_product_name	Set "/HITACHI/JP1/JPCCS" as fixed.	Set to the product name of the JP1 event.
labels.jp1_pc_severity	Set one of the following: Emergency Alert Critical Error Warning Notice Information Debug	Set to JP1 event severity^#. # This value is set to the severity of the JP1 event of the anomaly. The severity of a successful JP1 event is set to Information.
labels.jp1_pc_eventid	Set any value in the range of 0~1FFF,7FFF8000~7FFFFFFF.	Set to the event ID of the JP1 event.
labels.jp1_pc_metricname	Set the metric name. For Yet another cloudwatch exporter, be sure to specify it. Associates the JP1 event with the IM management node in the AWS namespace corresponding to the metric name (or the first metric name if multiple metric names are specified separated by commas).	Set to the metric name of the JP1 event. For yet another cloudwatch exporter, it is also used to correlate JP1 events.
annotations.jp1_pc_firing_description	Specify the value to be set for the message of the JP1 event when the firing condition of the alert is satisfied. If the length of the value is 1,024 bytes or more, set the string from the beginning to the 1,023rd byte. If the specification is omitted, the message content of the JP1 event is "The alert is firing. (alert = alert name)". You can also specify variables to embed job names and evaluation values. If a variable is used, the first 1,024 bytes of the expanded message are valid.	It is set to the message of the JP1 event.
annotations.jp1_pc_resolved_description	Specify the value to be set for the message of the JP1 event when the firing condition of the alert is not satisfied. If the length of the value is 1,024 bytes or more, set the string from the beginning to the 1,023rd byte. If the specification is omitted, the content of the message in the JP1 event is "The alert is resolved. (alert = alert name)". You can also specify variables to embed job names and evaluation values. If a variable is used, the first 1,024 bytes of the expanded message are valid.	It is set to the message of the JP1 event.

For an example of setting an alert definition, see Definition example in alert configuration file (jpc_alerting_rules.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

For details about the properties of the corresponding JP1 event, see 3.2.3 Lists of JP1 events output by JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ How to operate in combination with trending-related functions

Combine the definitions of the PromQL statement described in the metric definition file and the PromQL statement evaluated by the alert evaluation function, and in the alert definition annotations.jp1_pc_firing_description and annotations.jp1_pc_resolved_description of the alert definition in the alert configuration file, By describing the metric name of the corresponding trend data, when the JP1 event of the alert is issued, you can check the past change and current value of the performance value evaluated by the alert on the [Trend] tab of the integrated operation viewer.

For details about PromQL expression defined in trend displayed related capabilities, see 3.15.6(4) Return of trend data.

For example, if you want the Node exporter to monitor CPU usage and notify you when the CPU usage exceeds 80%, create an alert configuration file (alert definition) and a metric definition file as shown in the following example.

Example of description of alert configuration file (alert definition)

groups:
  - name: node_exporter
    rules:
    - alert: cpu_used_rate(Node exporter)
      expr: 80 < (avg by (instance,job,jp1_pc_nodelabel,jp1_pc_exporter) (rate(node_cpu_seconds_total{mode="system"}[2m])) + avg by (instance,job,jp1_pc_nodelabel,jp1_pc_exporter) (rate(node_cpu_seconds_total{mode="user"}[2m]))) * 100
      for: 3m
      labels:
        jp1_pc_product_name: "/HITACHI/JP1/JPCCS"
        jp1_pc_severity: "Error"
        jp1_pc_eventid: "0301"
        jp1_pc_metricname: "node_cpu_seconds_total"
      annotations:
        jp1_pc_firing_description: "CPU utilization exceeded threshold (80%).value={{ $value }}%"
        jp1_pc_resolved_description: "CPU usage has dropped below the threshold (80%)."

Example of description of metric definition

[
  {
    "name":"cpu_used_rate",
    "default":true,
    "promql":"(avg by (instance,job,jp1_pc_nodelabel,jp1_pc_exporter) (rate(node_cpu_seconds_total{mode=\"system\"}[2m]) and $jp1im_TrendData_labels) + avg by (instance,job,jp1_pc_nodelabel,jp1_pc_exporter) (rate(node_cpu_seconds_total{mode=\"user\"}[2m]) and $jp1im_TrendData_labels)) * 100",
    "resource_en":{
      "category":"platform_unix",
      "label":"CPU used rate",
      "description":"CPU usage.It also indicates the average value per processor. [Units: %]",
      "unit":"%"
    },
    "resource_ja":{
      "category":"platform_unix",
      "label":"CPU Usage",
      "description":"CPU Utilization (%). It is also an average percentage of each processor.",
      "unit":"%"
    }
  }
}

When the conditions of the PromQL statement specified in expr of the alert definition are satisfied and the JP1 event of the alert is issued, the message "CPU usage exceeded threshold (80%). value = performance value%" is set in the message of the JP1 event. Users can view this message to view "CPU Usage" trend information and see past changes and current values of CPU usage.

■ Behavior when the service is stopped

If the Alertmanager service is stopped, the JP1 event for the alert is not issued. In addition, if the Prometheus server and Alertmanager services are running and the exporter whose alert is firing is stopped due to a failure, the alert becomes resolved and a normal JP1 event is issued.

When alert is firing and the Prometheus server service is stopped while the Alertmanager is running, a normal JP1 event that gives a notification of resolved of the alert is issued.

For details, see About behavior when the Prometheus server is restarted or stoppedwhile the Alertmanager is running.

■ About behavior when the service is restarted

Even if the alert is firing or resolved and the Prometheus server, Alertmanager, or Exporter service is restarted, when the current alert status is the same as the alert state before the restart, the JP1 event is not issued.

When the alert is firing and the Prometheus server service is restarted while the Alertmanager is running, there are cases in which a normal JP1 event that gives a notification of resolved of the alert is issued.

For details, see About behavior when the Prometheus server is restarted or stopped while the Alertmanager is running.

■ About Considering Performance Data Spikes

Performance data can be momentarily jumpy (large values, small values, or minus values). These sudden changes in performance data are commonly referred to as "spikes." In many cases, even if a spike occurs and becomes an abnormal value momentarily, it immediately returns to normal and does not need to be treated as an abnormal. Also, when the performance data is reset, such as when the OS is restarted, a spike may occur instantaneously.

When monitoring such performance data metrics, it is necessary to consider suppressing sudden anomaly detection by specifying "for" (grace period before treating alerts as anomalies) in the alert rule definition.

■ About behavior when the Prometheus server is restarted or stopped while the Alertmanager is running

When the alert is firing and the Prometheus server service is restarted or stopped while the Alertmanager is running, there are cases in which a normal JP1 event that gives a notification of resolved of the alert be issued.

When following conditions are met, a normal JP1 event is issued.

The sum total of the duration of the "for" clause^# defined in alert definition of firing alert and the duration that Prometheus server service is not runnig due to being stopped or reloading becomes greater than the value of "evaluation_interval" defined in Prometheus configuration file.
#: When the "for" clause of the alert is not specified, define 0.

■ About behavior when the service is reloaded

Even if the alert is firing or resolved and the API that reloads the Prometheus server, Alertmanager, or Exporter service is executed, the JP1 event is not issued.

(b) Alert forwarder

This function notifies you when the alert status becomes "firing" or "resolved" after the Prometheus server evaluates the alert.

When the state of alert changes during JP1/IM - Manager (Intelligent Integrated Management Base) is stopped, there are cases in which a notification of firing and resolved is not performed.

The Prometheus server sends alerts one by one, and the sent alerts are notified to JP1/IM - Manager (Intelligent Integrated Management Base) via Alertmanager. You will also be notified one by one when you retry.

Alerts sent to JP1/IM - Manager are basically sent in the order in which they occurred, but the order may be changed when multiple alert rules meet the conditions at the same time or when a transmission error occurs and they are resent. However, since the alert information includes the time of occurrence, it is possible to understand in which order it occurred.

In addition, if the abnormal condition continues for 7 days, an alert will be re-notified.

The following shows how to define the notification destination of the alert.

Alert destinations are described in both the Prometheus configuration file (jpc_prometheus_server.yml) and the Alertmanager configuration file (jpc_alertmanager.yml).

For Prometheus configuration file, specify a Alertmanager that coexists as a destination for Prometheus server notifications. For Alertmanager configuration file, specify JP1/IM agent control base as the notification destination for Alertmanager.
Download the individual configuration file from integrated operation viewer, edit them in a text editor, change the alert notification destination definitions, and then upload them.

The following settings are related to definition of Prometheus server notification destinations supported by JP1/IM - Agent. For details about the settings, see Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

Table 3‒20: Settings for defining notification destinations for Prometheus server supported by JP1/IM - Agent
Setting items	Description
Notification destination (required)	Configure the notification destination Alertmanager. If a host name or internet address is specified for --web.listen-address in the Alertmanager command line option, modify localhost to the host name or internet address specified in --web.listen-address. For physical host environments Specifies the Alert manager that you want to live with. For clustered environment Specifies the Alertmanager that runs on the logical host.
Label setting (optional)	You can add labels. Configure as needed.

The following are Alertmanager notification destinations that JP1/IM - Agent supports: For details about the settings, see Alertmanager configuration file (jpc_alertmanager.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

Table 3‒21: Settings for defining Alertmanager notification destinations supported by JP1/IM - Agent
Setting items	Description
Webhook settings (required)	Set the endpoint URL for JP1/IM agent control base.

(c) Notification suppression function

This function suppresses the notifications described in (3.15.1(3)(b) Alert forwarder. It includes:

Silence function

Use this if you do not want to be temporarily notified of certain alerts.

■ Silence function

This feature temporarily suppresses certain notifications. You can set not to notify alerts that occur during temporary maintenance. Unlike when the common exclusion condition of JP1/IM - Manager is used, the notification suppression function does not notify JP1/IM - Manager itself.

While silence is enabled, you will not be notified when the alert status changes. When silence is disabled, if the state has changed compared to the state of the alert before silence was enabled, notification is given.

Here are two examples of when to notify:

Figure 3‒34: Cases where the state is different before and after disabling silence

The above figure shows an example in which the alert status is "Abnormal" when silence is enabled, and while silence is enabled, the alert status changes to "Normal", and then silence is disabled.

When the alert changes to "Normal", you will not be notified because silence is enabled. When silence is disabled, the alert status has changed from "abnormal" to "normal" before silence is enabled, so "normal" notification is given.

Figure 3‒35: Cases where the state is the same before and after enabling silence

The above figure shows an example in which the alert status changed to "normal" once, changed to "abnormal" again, and then disabled silence while silence was enabled.

When silence is disabled, notification is not performed because the alert status is the same "abnormal" as before silence was enabled.

If an alert fails to be sent and retries and silence is enabled to suppress the alert, the alert will not be retried.

- How to Configure silence

Silence settings (enable or disable) and retrieve the current silence settings are performed via REST API (GUI is not supported).

In addition, when configuring silence settings, integrated agent host must be able to communicate with Alertmanager port-number from the machine that you are operating.

For details about silence settings and REST API used to obtain current silence settings, see 5.21.3 Get silence list of Alertmanager, 5.21.4 Silence creation of Alertmanager, and 5.21.5 Silence Revocation of Alertmanager in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

To Page Top

(4) Communication function

(a) Communication protocols and authentication methods

The following shows the communication protocols and authentication methods used by integrated agent.

Connection source	Connect to	Protocol	Authentication method
Prometheus server	JP1/IM agent control base	HTTP	No authentication
Alertmanager	JP1/IM agent control base	HTTP	No authentication
Prometheus server	Alertmanager	HTTP	No authentication
Prometheus server	Exporter	HTTP	No authentication
Blackbox exporter	monitored	HTTP/HTTPS	Basic Authentication
			Basic Authentication
			No authentication
		HTTPS	Server Authentication
			With client authentication
			No client authentication
		ICMP^#	No authentication
Yet another cloudwatch exporter	Amazon CloudWatch	HTTPS	AWS IAM Authentication
Promitor Scraper	Azure Monitor	HTTPS	No client authentication
Promitor Resource Discovery	Azure Resource Graph	HTTPS	No client authentication
Promitor Scraper	Promitor Resource Discovery	HTTP	No authentication
Prometheus	Fluentd	HTTP	No authentication
OracleDB exporter	Oracle listener	Oracle listener-specific (no encryption)	Authentication by username/password

#: ICMPv6 is not available.

(b) Network configuration

Integrated agent can be used in a network configuration with only a IPv4 environment or in a network configuration with a mix of IPv4 and IPv6 environments. Only IPv4 communication is supported in a network configuration with a mix of IPv4 and IPv6 environments.

You can use integrated agent in the following configurations without a proxy server:

Connection source	Connect to	Connection type
Prometheus server	JP1/IM agent control base	No proxy server
Alertmanager	JP1/IM agent control base
Prometheus server	Alertmanager
Prometheus server	Exporter
Blackbox exporter	Monitoring targets (ICMP monitoring)
Blackbox exporter	Monitoring targets (HTTP monitoring)	No proxy server Through a proxy server without authentication Through a proxy server with authentication
Yet another cloudwatch exporter	Amazon CloudWatch	No proxy server Through a proxy server without authentication Through a proxy server with authenticationNo proxy server
Promitor Scraper	Azure Monitor	No proxy server Through a proxy server without authentication Through a proxy server with authenticationNo proxy server
Promitor Resource Discovery	Azure Resource Graph
OracleDB exporter	Oracle listener	No proxy server

Integrated agent transmits the following:

Connection source	Connect to	Transmitted data	Authentication method
Prometheus server	JP1/IM agent control base	Performance data in Protobuf format
Alertmanager	JP1/IM agent control base	Alert information in JSON format^#1
Prometheus server	Exporter	Prometheus textual performance data^#2
Blackbox exporter	monitored	Response for each protocol
Yet another cloudwatch exporter	Amazon CloudWatch	CloudWatch data
Promitor Scraper	Azure Monitor	Azure Monitor data (metrics information)	Service principal Managed ID
Promitor Resource Discovery	Azure Resource Graph	Azure Resource Graph data (resources exploration results)	Service principal Managed ID
OracleDB exporter	Oracle listener	Proprietary Oracle listener data

#1: For details, see the description of the message body for the request in 5.6.5 JP1 Event converter in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
#2: For details, see the description of Prometheus text formatting in 5.23 API for scrape of Exporter used by JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

To Page Top