3.15.1 Performance monitoring function by JP1/IM - Agent
Performance monitoring function consists of Prometheus, Alertmanager, Exporter of add-on program and provides the following two functions:
-
Function to retrieve performance data through Exporter and send performance data to the Integrated Manager host
-
This function monitors the thresholds of the acquired performance data. If a condition is met, it alerts JP1/IM - Manager.
Performance data and alerts sent to the Integrated manager host can be viewed in integrated operation viewer.
- Organization of this subsection
(1) Performance data collection function
Prometheus server is a function that collects performance data from monitored targets. It has two functions:
-
Scrape function (Prometheus server)
-
Ability to acquire monitored operation information (Exporter)
(a) Scrape function
Prometheus server is a function that acquires the performance data to be monitored via the Exporter.
When the Prometheus server accesses a specific URL of the Exporter, the Exporter retrieves the monitored performance data and returns it to the Prometheus server. This process is called scrape.
A scrape is executed in units of scrape jobs that combine multiple scrapes for the same purpose.
If a discovery configuration file is used for monitoring through UAP monitoring, jobs should be defined. Also, additional settings are required for the scraping definitions of the log metrics feature.
For details on the scraping description of the log metrics feature, see 1.21.2(10) Setting up scraping definitions in the JP1/Integrated Management 3 - Manager Configuration Guide.
Scrapes are defined in units of scrape jobs. JP1/IM - By default, the following scrape job name scrape definition is set according to the type of exporter.
|
Scrape Job Name |
Scrape Definition |
|---|---|
|
jpc_node |
Scrape definition for Node exporter |
|
jpc_windows |
Scrape definition for Windows exporter |
|
jpc_blackbox_http |
Scrape definition for HTTP/HTTPS monitoring in Blackbox exporeter |
|
jpc_blackbox_icmp |
Scrape Definition for ICMP Monitoring in Blackbox exporeter |
|
jpc_cloudwatch |
Scrape definition for Yet another cloudwatch exporter |
|
jpc_process |
Scraping definition for Process exporter |
|
jpc_promitor |
Scraping definition for Promitor |
|
jpc_script |
Scraping definition for Script exporter |
|
jpc_oracledb |
Scraping definition for OracleDB exporter |
|
jpc_node_aix |
Scraping definition for Node exporter for AIX |
|
jpc_web_probe |
Scraping definition for Web exporter |
|
jpc_vmware |
Scraping definition for VMware exporter |
|
jpc_hyperv |
Scraping definition for Windows exporter (Hyper-V monitoring) |
|
jpc_sql |
Scraping definition for SQL exporter |
If you want to scrape user-defined Exporter, you must add a scrape definition for each target exporter.
The metric obtained from Exporter by scraping of Prometheus server is depending on the type of Exporter. For details, see the description of metric definition file in each Exporter in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
In addition, the Prometheus server generates the following metrics when scraping is performed, in addition to the metrics obtained from the exporter.
|
Metric Name |
Description |
|---|---|
|
up |
This metric indicates "1" for successful scraping and "0" for failure. It can be used to monitor the operation of the exporter. Scrape failure may be caused by host stoppage, exporter stop, exporter returning anything other than 200, or communication error. |
|
scrape_duration_seconds |
A metric that indicates how long it took to scrape. It is not used in normal operation. It is used for investigations when the scrape does not finish within the expected time. |
|
scrape_samples_post_metric_relabeling |
A metric that indicates the number of samples remaining after the metric is relabeled. It is not used in normal operation. It is used to check the number of data when building the environment. |
|
scrape_samples_scraped |
A metric that indicates the number of samples returned by the exporter scraped. It is not used in normal operation. It is used to check the number of data when building the environment. |
|
scrape_series_added |
A metric that shows the approximate number of newly generated series. It is not used in normal operation. |
For details about how to run scrape, see 5.24 API for scrape of Exporter used by JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference. Exporter that you want to scrape must be able to run as described here.
The scrape definition method is shown below.:
-
Scrape definitions are done in units of scrape jobs.
-
The scrape definition is described in the Prometheus configuration file (jpc_prometheus_server.yml).
-
If you are editing a scrape definition, you can download Prometheus configuration file from integrated operation viewer, edit it, and then upload it.
The following are the settings related to scrape definitions supported by JP1/IM - Agent.
|
Setting Item |
Description |
|---|---|
|
Scrape Job Name (required) |
Sets the name of the scrape job that Prometheus scrapes. You can specify multiple scrape job names. The specified scrape job name is set in the metric label as job="scrape job name". |
|
Scrape to (required) |
Set the specific URL of the exporter to be scraped. Only exporters on hosts where JP1/IM - Agent resides can be specified as scrape destinations. The server to be scraped in the URL is specified by the host name. "localhost" cannot be used. The total number of scrape destinations specified in all scrape jobs is limited to 100. |
|
Scrape parameters (optional) |
You can set parameters to pass to the Exporter when scraping. Depending on the type of exporter, the contents that can be set differ. |
|
Scrape interval (optional) |
You can set the scrape interval. You can set a scrape interval that is common to all scrape jobs and a scrape interval for each scrape job. If both are set, the scrape interval for each scrape job takes precedence. You can specify the following units: years, weeks, days, hours, minutes, seconds, or milliseconds. |
|
Scrape timeout (optional) |
You can set a timeout period when scraping takes a long time. You can set a timeout period that is common to all scrape jobs and a timeout period for each scrape job. If both are set, the scrape interval for each scrape job takes precedence. |
|
Relabeling (optional) |
You can delete unnecessary metrics and customize labels. By using this feature and setting unnecessary metrics that are not supported by default, you can reduce the amount of data sent to JP1/IM - Manager. |
The outcome of scrape by Exporter subject to scrape of Prometheus server is returned in Text-based format data format of Prometheus. Here is a Text-based format of Prometheus:
- Text-based format basics
-
Item
Description
Start time
2014 Apr
Supported Versions
Prometheus Version 0.4.0 or Later
Transmission format
HTTP
Character code
UTF-8
Line feed code is \n
Content-Type
Text/plain; version=0.0.4
If there is no version value, it is treated as the latest text format version.
Content-Encoding
gzip
Advantages
-
Human readable
-
Easy to assemble, especially for minimal cases (no need for nesting).
-
Read on a line-by-line basis (except for hints and docstring).
Constraints
-
Redundancy
-
Since the type and docstring are not part of the syntax, there is little validation of the metric contract.
-
Cost of parsing
Supported Metrics
-
Counter
-
Gauge
-
Histogram
-
Summary
-
Untyped
-
- More information about Text-based format
-
Text-based format of Prometheus is row-oriented.
Separate lines with a newline character. The line feed code is \n. \ r\n is considered invalid.
The last line must be a newline character.
Also, blank lines are ignored.
- Row Format
-
Within a line, tokens can be separated by any number of blanks or tabs. However, when joining with the previous token, it must be separated by at least one space.
In addition, leading and trailing white spaces are ignored.
- Comments, help text, and information
-
Lines that have # as a character other than the first white space are comments.
This line is ignored unless the first token after # is a HELP or TYPE.
These lines are treated as follows:
If the token is a HELP, at least one more token (metric name) is expected. All remaining tokens are considered to be docstring of that metric name.
HELP line can contain any UTF-8 string after metric name. However, you must escape the backslash as \ and the newline character as \n. For any metric name, there can be only one HELP row.
If the token is a TYPE, two or more tokens are expected. The first is metric name. The second, either counter, gauge, histogram, summary, or untyped, defines the type of metric. There can be only one TYPE row for a given metric. Metric name of TYPE line must appear in front of the first sample.
If no TYPE row exists for metric name, the type is set to untyped.
Write a sample (one per line) using the following EBNF:
metric_name [ "{" label_name "=" `"` label_value `"` { "," label_name "=" `"` label_value `"` } [ "," ] "}" ] value [ timestamp ]- Sample Syntax
-
-
Metric_name and label_name are subject to the limitations of the normal Prometheus expression language.
-
The label_value is any UTF-8 string. However, backslash (\), double quote ("), and line feed must be escaped as \\, \" and \n, respectively.
-
Value is a floating-point number required by ParseFloat() function of Go language. In addition to the typical numbers, NaN, +Inf, -Inf is also a valid number. Indicates that NaN is not a number. The + Inf is positive infinity. -Inf is negative infinity.
-
The timestamp is a int64 (milliseconds from the epoch, 1970-01-01 00:00:00 UTC, excluding leap seconds), and is optionally represented by ParseInt() function of Go.
-
- Grouping and Sorting
-
All rows granted with metric must be provided as a single grouping, and the optional HELP and TYPE rows must come first (in any order).
It is also recommended, but not required, to perform repeatable sorting with a repeating description.
Each line must have a unique pair of metric names / labels. If it is not a unique combination, the capture behavior is undefined.
- Histograms and Summaries
-
Because histograms and summary types are difficult to express in text format, the following rules apply:
-
Sample sum x for the summary or histogram appears as another sample called x_sum.
-
Sample counts named x for a summary or histogram appear as another sample called x_count.
-
Each quantile in the summary named x appears as another sample line with the same name x and labeled {quantile="y"}.
-
Each bucket count in the histogram named x appears as another sample line named x_bucket and labeled {le="y"} ( y is the bucket limit).
-
The histogram must have a bucket of {le="+Inf"}. Its value must be the same as the value of x_count.
-
For le or quantile labels, the histogram bucket and summary quantiles must appear in ascending order of the values for the labels.
-
- Sample Text-based format
-
Here is a sample Prometheus metric exposition that contains comments, HELP and TYPE representations, histograms, summaries, and character escaping.
# HELP http_requests_total The total number of HTTP requests. # TYPE http_requests_total counter http_requests_total{method="post",code="200"} 1027 1395066363000 http_requests_total{method="post",code="400"} 3 1395066363000 # Escaping in label values: msdos_file_access_time_seconds{path="C:\\DIR\\FILE.TXT",error="Cannot find file:\n\"FILE.TXT\""} 1.458255915e9 # Minimalistic line: metric_without_timestamp_and_labels 12.47 # A weird metric from before the epoch: something_weird{problem="division by zero"} +Inf -3982045 # A histogram, which has a pretty complex representation in the text format: # HELP http_request_duration_seconds A histogram of the request duration. # TYPE http_request_duration_seconds histogram http_request_duration_seconds_bucket{le="0.05"} 24054 http_request_duration_seconds_bucket{le="0.1"} 33444 http_request_duration_seconds_bucket{le="0.2"} 100392 http_request_duration_seconds_bucket{le="0.5"} 129389 http_request_duration_seconds_bucket{le="1"} 133988 http_request_duration_seconds_bucket{le="+Inf"} 144320 http_request_duration_seconds_sum 53423 http_request_duration_seconds_count 144320 # Finally a summary, which has a complex representation, too: # HELP rpc_duration_seconds A summary of the RPC duration in seconds. # TYPE rpc_duration_seconds summary rpc_duration_seconds{quantile="0.01"} 3102 rpc_duration_seconds{quantile="0.05"} 3272 rpc_duration_seconds{quantile="0.5"} 4773 rpc_duration_seconds{quantile="0.9"} 9001 rpc_duration_seconds{quantile="0.99"} 76656 rpc_duration_seconds_sum 1.7560473e+07 rpc_duration_seconds_count 2693
(b) Ability to obtain monitored operational information
This function acquires operation information (performance data) from the monitoring target. The process of collecting operational information is performed by a program called "Exporter".
In response to scrape requests sent from the Prometheus server to the Exporter, the Exporter collects operational information from the monitored target and returns the results to Prometheus.
Exporters shipped with JP1/IM - Agent scrape only from Prometheus in JP1/IM - Agent that cohabits. Do not scrape from Prometheus provided by other hosts or users.
This section describes the functions of each exporter included with JP1/IM - Agent.
(c) Windows exporter (Windows performance data collection capability)
Windows exporter is an exporter that can be embedded in the monitored Windows host and obtain the operating information of the Windows host.
Windows exporter is installed on the same host as the Prometheus server, and upon a scrape request from the Prometheus server, it collects operational information from the Windows OS of the host and returns it to the Prometheus server.
It is possible to collect operational information related to memory and disk, which cannot be collected by monitoring from outside the host (external monitoring by URL or CloudWatch), from inside the host.
In addition, with JP1/IM - Manager and JP1/IM - Agent version 13-01 or later, you can monitor the operational status of integrated agent host (Windows) services (programs registered in Windows services) (service monitoring function#).
Note that you cannot use the service monitoring function by running JP1/IM - Agent inside the containers.
- #
-
If you use the service monitoring function in an environment where the version is upgraded from 13-00 to 13-01 or later, you need to configure the settings to perform service monitoring. The following are JP1/IM - Manger and JP1/IM - Agent setup instructions:
- Where to find instructions for setting up JP1/IM - Manager
-
See Editing category name definition file for IM management nodes (imdd_category_name.conf) (optional) in 1.19.3(1)(d) Settings of product plugin (for Windows) in the JP1/Integrated Management 3 - Manager Configuration Guide.
- Where to find instructions for setting up JP1/IM - Agent
-
See the instructions for configuring service monitoring in 1.21.2(3)(f) Configuring service monitoring (for Windows) (optional) and 1.21.2(5)(b) Modify metric to Collect (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.
This feature creates an IM management node for each service that you want to monitor. For details on displaying the tree, see 3.15.6(1)(i) Tree Format. If you configure an alert, a JP1 event is issued when the service is stopped and registered with IM management node corresponding to the stopped service. You can check the operational status of the past service from the service trend display.
■ Main items to be acquired
The main retrieval items of Windows exporter are defined in Windows exporter metric definition file (default) and Windows exporter (service monitoring) metric definition file (default). For details, see Windows exporter metric definition file (metrics_windows_exporter.conf) in Chapter 2. Definition Files and Windows exporter (service monitoring) metric definition file (metrics_windows_exporter_service.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
You can add retrieved items to the metric definition file. The following are the metrics that can be specified in the PromQL statement described in the definition file. For details of "Collector" in the table, refer to the description of "Collector" at the bottom of the table.
|
Metric Name |
Collector |
What to Get |
Label |
|---|---|---|---|
|
windows_cache_copy_read_hits_total |
cache |
Number of copy read requests that hit the cache (cumulative) |
instance: instance-identification-string job: job-name |
|
windows_cache_copy_reads_total |
cache |
Number of reads from the file system cache page (cumulative) |
instance: instance-identification-string job: job-name |
|
windows_cpu_time_total |
cpu |
Number of seconds of processor time spent per mode (cumulative) |
instance: instance-identification-string job: job-name core: coreid mode: mode#
|
|
windows_cs_physical_memory_bytes |
cs |
Number of bytes of the physical memory capacity |
instance: instance-identification-string job: job-name |
|
windows_logical_disk_idle_seconds_total |
logical_disk |
Number of seconds that the disk was idle (cumulative) |
instance: instance-identification-string job: job-name volume: volume-name |
|
windows_logical_disk_free_bytes |
logical_disk |
Number of bytes of unused disk space |
instance: instance-identification-string job: job-name volume: volume-name |
|
windows_logical_disk_read_bytes_total |
logical_disk |
Number of bytes transferred from disk during the read operation (cumulative) |
instance: instance-identification-string job: job-name volume: volume-name |
|
windows_logical_disk_read_seconds_total |
logical_disk |
Number of seconds that the disk was busy for read operations (cumulative) |
instance: instance-identification-string job: job-name volume: volume-name |
|
windows_logical_disk_reads_total |
logical_disk |
Number of read operations to disk (cumulative) |
instance: instance-identification-string job: job-name volume: volume-name |
|
windows_logical_disk_requests_queued |
logical_disk |
Number of requests queued on disk |
instance: instance-identification-string job: job-name volume: volume-name |
|
windows_logical_disk_size_bytes |
logical_disk |
Disk space bytes |
instance: instance-identification-string job: job-name volume: volume-name |
|
windows_logical_disk_write_bytes_total |
logical_disk |
Number of bytes transferred to disk during the write operation (cumulative) |
instance: instance-identification-string job: job-name volume: volume-name |
|
windows_logical_disk_write_seconds_total |
logical_disk |
Number of seconds that the disk was busy for write operations (cumulative) |
instance: instance-identification-string job: job-name volume: volume-name |
|
windows_logical_disk_writes_total |
logical_disk |
Number of disk write operations (cumulative) |
instance: instance-identification-string job: job-name volume: volume-name |
|
windows_memory_available_bytes |
memory |
Number of bytes of unused space in physical memory
|
instance: instance-identification-string job: job-name |
|
windows_memory_cache_bytes |
memory |
Number of bytes of physical memory used for file system caching |
instance: instance-identification-string job: job-name |
|
windows_memory_cache_faults_total |
memory |
Number of page faults in the file system cache (cumulative) |
instance: instance-identification-string job: job-name |
|
windows_memory_page_faults_total |
memory |
Number of times a page fault occurred (cumulative) |
instance: instance-identification-string job: job-name |
|
windows_memory_pool_nonpaged_allocs_total |
memory |
Number of times a nonpageable physical memory region was allocated |
instance: instance-identification-string job: job-name |
|
windows_memory_pool_paged_allocs_total |
memory |
Number of times you allocated a pageable physical memory region |
instance: instance-identification-string job: job-name |
|
windows_memory_swap_page_operations_total |
memory |
Number of pages read from or written to disk to resolve hard page faults (cumulative) |
instance: instance-identification-string job: job-name |
|
windows_memory_swap_pages_read_total |
memory |
Number of pages read from disk to resolve hard page faults (cumulative) |
instance: instance-identification-string job: job-name |
|
windows_memory_swap_pages_written_total |
memory |
Number of pages written to disk to resolve hard page faults (cumulative) |
instance: instance-identification-string job: job-name |
|
windows_memory_system_cache_resident_bytes |
memory |
Number of active system file cache bytes in physical memory |
instance: instance-identification-string job: job-name |
|
windows_memory_transition_faults_total |
memory |
The number of page faults resolved by recovering pages that were in use by other processes sharing the page, pages that were on the modified pages list or standby list, or pages that were written to disk (cumulative) |
instance: instance-identification-string job: job-name |
|
windows_net_bytes_received_total |
net |
Number of bytes received by the interface (cumulative)
|
instance: instance-identification-string job: job-name device: network-device-name |
|
windows_net_bytes_sent_total |
net |
Number of bytes sent from the interface (cumulative)
|
instance: instance-identification-string job: job-name device: network-device-name |
|
windows_net_bytes_total |
net |
Number of bytes received and transmitted by the interface (cumulative)
|
instance: instance-identification-string job: job-name device: network-device-name |
|
windows_net_packets_sent_total |
net |
Number of packets sent by the interface (cumulative)
|
instance: instance-identification-string job: job-name device: network-device-name |
|
windows_net_packets_received_total |
net |
Number of packets received by the interface (cumulative)
|
instance: instance-identification-string job: job-name device: network-device-name |
|
windows_system_context_switches_total |
system |
Number of context switches (cumulative) |
instance: instance-identification-string job: job-name device: network-device-name |
|
windows_system_processor_queue_length |
system |
Number of threads in the processor queue |
instance: instance-identification-string job: job-name device: network-device-name |
|
windows_system_system_calls_total |
system |
Number of times the process called the OS service routine (cumulative) |
instance: instance-identification-string job: job-name |
|
windows_process_start_time |
process |
Time of process start |
instance: instance-identifier-string job: job-name process: process-name# process_id: process-ID creating_process_id: creator-process-ID |
|
windows_process_cpu_time_total |
process |
Returns elapsed time that all of the threads of this process used the processor to execute instructions by mode (privileged, user). An instruction is the basic unit of execution in a computer, a thread is the object that executes instructions, and a process is the object created when a program is run. Code executed to handle some hardware interrupts and trap conditions is included in this count. |
instance: instance-identifier-string job: job-name process: process-name# process_id: process-ID creating_process_id: creator-process-ID mode: mode (privileged or user) |
|
windows_process_io_bytes_total |
process |
Bytes issued to I/O operations in different modes (read, write, other). This property counts all I/O activity generated by the process to include file, network, and device I/Os. Read and write mode includes data operations; other mode includes those that do not involve data, such as control operations. |
instance: instance-identifier-string job: job-name process: process-name# process_id: process-ID creating_process_id: creator-process-ID mode: mode (read, write, or other) |
|
windows_process_io_operations_total |
process |
I/O operations issued in different modes (read, write, other). This property counts all I/O activity generated by the process to include file, network, and device I/Os. Read and write mode includes data operations; other mode includes those that do not involve data, such as control operations. |
instance: instance-identifier-string job: job-name process: process-name# process_id: process-ID creating_process_id: creator-process-ID mode: mode (read, write, or other) |
|
windows_process_page_faults_total |
process |
Page faults by the threads executing in this process. A page fault occurs when a thread refers to a virtual memory page that is not in its working set in main memory. This can cause the page not to be fetched from disk if it is on the standby list and hence already in main memory, or if it is in use by another process with which the page is shared. |
instance: instance-identifier-string job: job-name process: process-name# process_id: process-ID creating_process_id: creator-process-ID |
|
windows_process_page_file_bytes |
process |
Current number of bytes this process has used in the paging file(s). Paging files are used to store pages of memory used by the process that are not contained in other files. Paging files are shared by all processes, and lack of space in paging files can prevent other processes from allocating memory. |
instance: instance-identifier-string job: job-name process: process-name# process_id: process-ID creating_process_id: creator-process-ID |
|
windows_process_pool_bytes |
process |
Pool Bytes is the last observed number of bytes in the paged or nonpaged pool. The nonpaged pool is an area of system memory (physical memory used by the operating system) for objects that cannot be written to disk, but must remain in physical memory as long as they are allocated. The paged pool is an area of system memory (physical memory used by the operating system) for objects that can be written to disk when they are not being used. Nonpaged pool bytes is calculated differently than paged pool bytes, so it might not equal the total of paged pool bytes. |
instance: instance-identifier-string job: job-name process: process-name# process_id: process-ID creating_process_id: creator-process-ID pool: paged (pool paged) or nonpaged (pool non paged) |
|
windows_process_priority_base |
process |
Current base priority of this process. Threads within a process can raise and lower their own base priority relative to the process base priority of the process. |
instance: instance-identifier-string job: job-name process: process-name# process_id: process-ID creating_process_id: creator-process-ID |
|
windows_process_private_bytes |
process |
Current number of bytes this process has allocated that cannot be shared with other processes. |
instance: instance-identifier-string job: job-name process: process-name# process_id: process-ID creating_process_id: creator-process-ID |
|
windows_process_virtual_bytes |
process |
Current size, in bytes, of the virtual address space that the process is using. Use of virtual address space does not necessarily imply corresponding use of either disk or main memory pages. Virtual space is finite and, by using too much, the process can limit its ability to load libraries. |
instance: instance-identifier-string job: job-name process: process-name# process_id: process-ID creating_process_id: creator-process-ID |
|
windows_service_state |
service |
The state of the service (State) |
instance: instance-identifier-string job: job-name name: service-name#1 state: service-status#2
|
- #
-
The process-name is set, but ".exe" is omitted.
■ Collector
Windows exporter has a built-in collection process called a "collector" for each monitored resource such as CPU and memory.
If you want to add the metrics listed in the table above as acquisition fields, you must enable the collector corresponding to the metric you want to use. You can also disable collectors of metrics that you do not want to collect to suppress unnecessary collection.
Enable/disable for each collector can be specified with the "--collectors.enabled" option on the Windows exporter command line or in the item "collectors.enabled" in the Windows exporter configuration file (jpc_windows_exporter.yml).
For details about Windows exporter command-line options, see the description of windows_exporter command options in Service definition file (jpc_program-name.service.xml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
For details about Windows exporter configuration file entry "collectors.enabled", see the description of item collectors in Windows exporter configuration file (jpc_windows_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ Specifying Monitored Services
When using the service monitoring function of Windows exporter, the service to be monitored is specified in the "services-where" field of Windows exporter configuration file (jpc_windows_exporter.yml).
For details about Windows exporter configuration file entry "services-where", see the entry "services-where" in Windows exporter configuration file (jpc_windows_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
The value of name label of the metric output by service collectors of Windows exporter is set to the service name. If half-width uppercase characters are included in the service name of the monitoring target, they are converted to half-width lowercase characters and set. When full-pitch uppercase characters are included, they are converted to full-pitch lowercase characters and set.
- About Monitoring JP1/IM - Agent Services
For the service name of JP1/IM - Agent service, see 10.1 Service of JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Administration Guide. For details about the service name in a logical host environment, see 7.3.6 Newly installing JP1/IM - Agent with integrated agent host (for Windows) in the JP1/Integrated Management 3 - Manager Configuration Guide.
Note that you cannot use the service monitoring function to monitor Prometheus server and Windows exporter services.
(d) Node exporter (Linux performance data collection capability)
Node exporter is an exporter that can be embedded in a monitored Linux host to obtain operating information of a Linux host.
The Node exporter is installed on the same host as the Prometheus server, and upon a scrape request from the Prometheus server, it collects operational information from the Linux OS of the host and returns it to the Prometheus server.
It is possible to collect operational information related to memory and disk, which cannot be collected by monitoring from outside the host (external monitoring by URL or CloudWatch), from inside the host.
In addition, with JP1/IM - Manager and JP1/IM - Agent version 13-01 or later, you can monitor the operational status of integrated agent host (Linux) service (program registered in Systemd) (service monitoring function#).
Note that you cannot use the service monitoring function by running JP1/IM - Agent inside the containers.
- #
-
If you use the service monitoring function in an environment where the version is upgraded from 13-00 to 13-01 or later, you need to configure the settings to perform service monitoring.
The following are JP1/IM - Manger and JP1/IM - Agent setup instructions:
- Where to find instructions for setting up JP1/IM - Manager
-
See Editing category name definition file for IM management nodes (imdd_category_name.conf) (optional) in 1.19.3(1)(d) Settings of product plugin (for Windows) in the JP1/Integrated Management 3 - Manager Configuration Guide.
- Where to find instructions for setting up JP1/IM - Agent
-
See the instructions for configuring service monitoring in 2.19.2(3)(f) Configuring service monitor settings (for Linux) (optional) and 2.19.2(5)(b) Change metric to collect (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.
This feature creates an IM management node for each service that you want to monitor. For details on displaying the tree, see 3.15.6(1)(i) Tree Format. If you configure an alert, a JP1 event is issued when the service is stopped and registered with IM management node corresponding to the stopped service. You can check the operational status of the past service from the service trend display.
■ Main items to be acquired
The main retrieval items of Node exporter are defined in Node exporter metric definition file (default) and Node exporter (service monitoring) metric definition file (default). For details, see Node exporter metric definition file (metrics_node_exporter.conf) and Node exporter (service monitoring) metric definition file (metrics_windows_exporter_service.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
You can add retrieved items to the metric definition file. The following are the metrics that can be specified in the PromQL statement described in the definition file. For details of "Collector" in the table, refer to the description of "Collector" at the bottom of the table.
|
Metric Name |
Collector |
What to Get |
Label |
|---|---|---|---|
|
node_boot_time_seconds |
stat |
Last boot time
|
instance: instance-identification-string job: job-name |
|
node_context_switches_total |
stat |
Number of times a context switch has been made (cumulative) |
instance: instance-identification-string job: job-name |
|
node_cpu_seconds_total |
cpu |
CPU seconds spent in each mode (cumulative) |
instance: instance-identification-string job: job-name cpu: cpuid mode: mode#
|
|
node_disk_io_now |
diskstats |
Number of disk I/Os currently in progress |
instance: instance-identification-string job: job-name device: device-name |
|
node_disk_io_time_seconds_total |
diskstats |
Seconds spent on disk I/O (cumulative) |
instance: instance-identification-string job: job-name device: device-name |
|
node_disk_read_bytes_total |
diskstats |
Number of bytes successfully read from disk (cumulative) |
instance: instance-identification-string job: job-name device: device-name |
|
node_disk_read_time_seconds_total |
diskstats |
Seconds took to read from disk (cumulative value) |
instance: instance-identification-string job: job-name device: device-name |
|
node_disk_reads_completed_total |
diskstats |
Number of successfully completed reads from disk (cumulative) |
instance: instance-identification-string job: job-name device: device-name |
|
node_disk_write_time_seconds_total |
diskstats |
Seconds took to write to disk (cumulative value) |
instance: instance-identification-string job: job-name device: device-name |
|
node_disk_writes_completed_total |
diskstats |
Number of successfully completed disk writes (cumulative) |
instance: instance-identification-string job: job-name device: device-name |
|
node_disk_written_bytes_total |
diskstats |
Number of bytes successfully written to disk (cumulative) |
instance: instance-identification-string job: job-name device: device-name |
|
node_filesystem_avail_bytes |
filesystem |
Number of file system bytes available to non-root users |
instance: instance-identification-string job: job-name fstype: file-system-type mountpoint: mount-point |
|
node_filesystem_files |
filesystem |
Number of file nodes in the file system |
instance: instance-identification-string job: job-name fstype: file-system-type mountpoint: mount-point |
|
node_filesystem_files_free |
filesystem |
Number of free file nodes in the file system |
instance: instance-identification-string job: job-name fstype: file-system-type mountpoint: mount-point |
|
node_filesystem_free_bytes |
filesystem |
Number of bytes of free file system space |
instance: instance-identification-string job: job-name fstype: file-system-type mountpoint: mount-point |
|
node_filesystem_size_bytes |
filesystem |
Number of bytes in file system capacity |
instance: instance-identification-string job: job-name fstype: file-system-type mountpoint: mount-point |
|
node_intr_total |
stat |
Number of interrupts handled (cumulative) |
instance: instance-identification-string job: job-name |
|
node_load1 |
loadavg |
One-minute average of the number of jobs in the run queue |
instance: instance-identification-string job: job-name |
|
node_load15 |
loadavg |
15-minute average of the number of jobs in the run queue |
instance: instance-identification-string job: job-name |
|
node_load5 |
loadavg |
5-minute average of the number of jobs in the run queue |
instance: instance-identification-string job: job-name |
|
node_memory_Active_file_bytes |
meminfo |
Bytes of recently used file cache memory
|
instance: instance-identification-string job: job-name |
|
node_memory_Buffers_bytes |
meminfo |
Number of bytes in the file buffer
|
instance: instance-identification-string job: job-name |
|
node_memory_Cached_bytes |
meminfo |
Number of bytes in file read cache memory
|
instance: instance-identification-string job: job-name |
|
node_memory_Inactive_file_bytes |
meminfo |
Number of bytes of file cache memory that have not been used recently
|
instance: instance-identification-string job: job-name |
|
node_memory_MemAvailable_bytes |
meminfo |
The number of bytes of memory available to start a new application without swapping
|
instance: instance-identification-string job: job-name |
|
node_memory_MemFree_bytes |
meminfo |
Number of bytes of free memory
|
instance: instance-identification-string job: job-name |
|
node_memory_MemTotal_bytes |
meminfo |
Total amount of bytes of memory
|
instance: instance-identification-string job: job-name |
|
node_memory_SReclaimable_bytes |
meminfo |
Number of bytes in the Slab cache that can be reclaimed
|
instance: instance-identification-string job: job-name |
|
node_memory_SwapFree_bytes |
meminfo |
Number of bytes of free swap memory space
|
instance: instance-identification-string job: job-name |
|
node_memory_SwapTotal_bytes |
meminfo |
Bytes of total swap memory
|
instance: instance-identification-string job: job-name |
|
node_netstat_Icmp6_InMsgs |
netstat |
Number of ICMPv6 messages received (cumulative) |
instance: instance-identification-string job: job-name |
|
node_netstat_Icmp_InMsgs |
netstat |
Number of ICMPv4 messages received (cumulative) |
instance: instance-identification-string job: job-name |
|
node_netstat_Icmp6_OutMsgs |
netstat |
Number of ICMPv6 messages sent (cumulative) |
instance: instance-identification-string job: job-name |
|
node_netstat_Icmp_OutMsgs |
netstat |
Number of ICMPv4 messages sent (cumulative) |
instance: instance-identification-string job: job-name |
|
node_netstat_Tcp_InSegs |
netstat |
Number of TCP packets received (cumulative) |
instance: instance-identification-string job: job-name |
|
node_netstat_Tcp_OutSegs |
netstat |
Number of TCP packets sent (cumulative) |
instance: instance-identification-string job: job-name |
|
node_netstat_Udp_InDatagrams |
netstat |
Number of UDP packets received (cumulative) |
instance: instance-identification-string job: job-name |
|
node_netstat_Udp_OutDatagrams |
netstat |
Number of UDP packets sent (cumulative) |
instance: instance-identification-string job: job-name |
|
node_network_flags |
netclass |
A numeric value indicating the state of the interface
|
instance: instance-identification-string job: job-name device: network-device-name |
|
node_network_iface_link |
netclass |
Interface serial number
|
instance: instance-identification-string job: job-name device: network-device-name |
|
node_network_mtu_bytes |
netclass |
Interface MTU value
|
instance: instance-identification-string job: job-name device: network-device-name |
|
node_network_receive_bytes_total |
netdev |
Number of bytes received by the network device (cumulative value) |
instance: instance-identification-string job: job-name device: network-device-name |
|
node_network_receive_errs_total |
netdev |
Number of network device receive errors (cumulative) |
instance: instance-identification-string job: job-name device: network-device-name |
|
node_network_receive_packets_total |
netdev |
Number of packets received by network devices (cumulative) |
instance: instance-identification-string job: job-name device: network-device-name |
|
node_network_transmit_bytes_total |
netdev |
Number of bytes sent by the network device (cumulative value) |
instance: instance-identification-string job: job-name device: network-device-name |
|
node_network_transmit_colls_total |
netdev |
Number of transmit collisions for network devices (cumulative) |
instance: instance-identification-string job: job-name device: network-device-name |
|
node_network_transmit_errs_total |
netdev |
Number of transmission errors for network devices (cumulative) |
instance: instance-identification-string job: job-name device: network-device-name |
|
node_network_transmit_packets_total |
netdev |
Number of packets sent by network devices (cumulative) |
instance: instance-identification-string job: job-name device: network-device-name |
|
node_time_seconds |
time |
Seconds of system time since the epoch (1970) |
instance: instance-identification-string job: job-name |
|
node_uname_info |
uname |
System information obtained by the uname system call |
instance: instance-identification-string job: job-name domainname: NIS-and-YP-domain-names machine: hardware-identifiers nodename: machine-name-in-some-network-defined-at-implementation-time release: operating-system-release-number (e.g. "2.6.28") sysname: the-name-of-the-OS (e.g. "Linux") version: operating-system-version |
|
node_vmstat_pswpin |
vmstat |
Number of page swap-ins (cumulative)
|
instance: instance-identification-string job: job-name |
|
node_vmstat_pswpout |
vmstat |
Number of page swap-outs (cumulative)
|
instance: instance-identification-string job: job-name |
|
node_systemd_unit_state |
systemd |
The state of the systemd unit. |
instance: instance-identification-string job: job-name name: unit-file-name state: service-status#1 type: how-to-launch-a-process#2
|
■ Collector
The Node exporter has a built-in collection process called a "collector" for each monitored resource such as CPU and memory.
If you want to add the metrics listed in the table above as acquisition fields, you must enable the collector corresponding to the metric you want to use. You can also disable collectors of metrics that you do not want to collect to suppress unnecessary collection.
Per-collector enable/disable can be specified in the Node exporter command line options. Specify the collector to enable with the "--collector.collector-name" option and the collector to disable with the "--no-collector.collector-name" option.
For details about Node exporter command-line options, see the description of node_exporter command options in Unit definition file (jpc_program-name.service) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ Specifying monitored services
When using the service monitoring function of Node exporter, the service to be monitored is specified in the "--collector.systemd.unit-include" field of Node exporter unit definition file (jpc_node_exporter.service). Collects performance data for the service specified in this file that meets one of the following conditions:
-
Automatic start of monitored services is enabled (running systemctl enable)
-
Automatic startup of monitored services is disabled, but the status is active
Performance data for services with auto-start disabled is not collected while the service is stopped. Therefore, if you want to monitor a service that has auto-start disabled and is stopped, start the service that you want to monitor and collect performance data prior to creating IM management node tree.
For unit definition file, see the description in item "--collector.systemd.unit-include" in "node_exporter command options" in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
- About monitoring JP1/IM - Agent services
For unit definition file name of JP1/IM - Agent services, see 10.1 Service of JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Administration Guide. For unit definition file name in a logical host environment, see 8.3.6 Newly installing JP1/IM - Agent with integrated agent host (for UNIX) in the JP1/Integrated Management 3 - Manager Configuration Guide.
Note that you cannot use the service monitoring function to monitor Prometheus server and Node exporter services.
(e) Process exporter (Linux process data collection capability)
Process exporter, built into a monitored Linux host, collects operating information of processes running on that host.
Installed in the same host as Prometheus server, Process exporter collects operating information of the processes from the Linux OS on the host when triggered by scraping requests from Prometheus server, and returns it to the server.
Process exporter allows you to collect process-related operating information, which cannot be obtained through monitoring from outside the host (such as synthetic monitoring with URLs or CloudWatch), from within the host.
■ Key metric items
The key Process exporter metric items are defined in the Process exporter metric definition file (initial status). For details, see Process exporter metric definition file (metrics_process_exporter.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
You can add more metric items to the metric definition file. The following table shows the metrics you can specify with PromQL statements used within the definition file.
|
Metric name |
Data to be obtained |
Label |
|---|---|---|
|
namedprocess_namegroup_num_procs |
Number of processes in this group. |
instance: instance-identifier-string job: job-name groupname: group-name# |
|
namedprocess_namegroup_cpu_seconds_total |
CPU usage based on /proc/[pid]/stat fields utime(14) and stime(15) i.e. user and system time. |
instance: instance-identifier-string job: job-name groupname: group-name# mode: user or system |
|
namedprocess_namegroup_read_bytes_total |
Bytes read based on /proc/[pid]/io field read_bytes. As /proc/[pid]/io are set by the kernel as read only to the process' user, to get these values you should run process-exporter either as that user or as root. Otherwise, we can't read these values and you'll get a constant 0 in the metric. |
instance: instance-identifier-string job: job-name groupname: group-name# |
|
namedprocess_namegroup_write_bytes_total |
Bytes written based on /proc/[pid]/io field write_bytes. |
instance: instance-identifier-string job: job-name groupname: group-name# |
|
namedprocess_namegroup_major_page_faults_total |
Number of major page faults based on /proc/[pid]/stat field majflt(12). |
instance: instance-identifier-string job: job-name groupname: group-name# |
|
namedprocess_namegroup_minor_page_faults_total |
Number of minor page faults based on /proc/[pid]/stat field minflt(10). |
instance: instance-identifier-string job: job-name groupname: group-name# |
|
namedprocess_namegroup_context_switches_total |
Number of context switches based on /proc/[pid]/status fields voluntary_ctxt_switches and nonvoluntary_ctxt_switches. The extra label ctxswitchtype can have two values: voluntary and nonvoluntary. |
instance: instance-identifier-string job: job-name groupname: group-name# ctxswitchtype: voluntary or nonvoluntary |
|
namedprocess_namegroup_memory_bytes |
Number of bytes of memory used. The extra label memtype can have three values:
If gathering smaps file is enabled, two additional values for memtype are added:
proportionalSwapped: Sum of SwapPss fields from /proc/[pid]/smaps |
instance: instance-identifier-string job: job-name groupname: group-name# memtype: resident, virtual, swapped, proportionalResident, or proportionalSwapped |
|
namedprocess_namegroup_open_filedesc |
Number of file descriptors, based on counting how many entries are in the directory /proc/[pid]/fd. |
instance: instance-identifier-string job: job-name groupname: group-name# |
|
namedprocess_namegroup_worst_fd_ratio |
Worst ratio of open filedescs to filedesc limit, amongst all the procs in the group. The limit is the fd soft limit based on /proc/[pid]/limits. |
instance: instance-identifier-string job: job-name groupname: group-name# |
|
namedprocess_namegroup_oldest_start_time_seconds |
Epoch time (seconds since 1970/1/1) at which the oldest process in the group started. This is derived from field starttime(22) from /proc/[pid]/stat, added to boot time to make it relative to epoch. |
instance: instance-identifier-string job: job-name groupname: group-name# |
|
namedprocess_namegroup_num_threads |
Sum of number of threads of all process in the group. Based on field num_threads(20) from /proc/[pid]/stat. |
instance: instance-identifier-string job: job-name groupname: group-name# |
|
namedprocess_namegroup_states |
Number of threads in the group in each of various states, based on the field state(3) from /proc/[pid]/stat. The extra label state can have these values: Running, Sleeping, Waiting, Zombie, Other. |
instance: instance-identifier-string job: job-name groupname: group-name# state: Running, Sleeping, Waiting, Zombie, or Other |
|
namedprocess_namegroup_thread_count |
Number of threads in this thread subgroup. |
instance: instance-identifier-string job: job-name groupname: group-name# threadname: thread-name |
|
namedprocess_namegroup_thread_cpu_seconds_total |
Same as cpu_user_seconds_total and cpu_system_seconds_total, but broken down per-thread subgroup. |
instance: instance-identifier-string job: job-name groupname: group-name# threadname: thread-name mode: user or system |
|
namedprocess_namegroup_thread_io_bytes_total |
Same as read_bytes_total and write_bytes_total, but broken down per-thread subgroup. Unlike read_bytes_total/write_bytes_total, the label iomode is used to distinguish between read and write bytes. |
instance: instance-identifier-string job: job-name groupname: group-name# threadname: thread-name iomode: read or write |
|
namedprocess_namegroup_thread_major_page_faults_total |
Same as major_page_faults_total, but broken down per-thread subgroup. |
instance: instance-identifier-string job: job-name groupname: group-name# |
|
namedprocess_namegroup_thread_minor_page_faults_total |
Same as minor_page_faults_total, but broken down per-thread subgroup. |
instance: instance-identifier-string job: job-name groupname: group-name# |
|
namedprocess_namegroup_thread_context_switches_total |
Same as context_switches_total, but broken down per-thread subgroup. |
instance: instance-identifier-string job: job-name groupname: group-name# |
- #
-
The group-name contains a name that uniquely identifies the collected performance value. In addition, the value is stored according to the contents set by the user in the item "name" of the Process exporter configuration file (jpc_process_exporter.yml).
- Important
-
-
Processes whose name contains multi-byte characters cannot be monitored.
-
Process exporter still continues to output information of processes that it collected once, even after the processes stop running. Therefore, if Process exporter is configured to collect information based on PIDs, new time-series data is added every time a process is restarted and its PID is changed, resulting in large amounts of unnecessary data.
Furthermore, it is not recommended to use PIDs in open source software (OSS), and thus version 13-00 of our software is configured not to collect PID information by default (groupname). If the user wants to manage processes on the same command line separately, we recommend operational means, such as a change in the order of arguments or the use of PIDs (however, periodic restarts are needed to prevent collected information from accumulating continuously).
Note that information collected by Windows exporter is different from what Process exporter collects, because Windows exporter collects the PID information.
-
When Process exporter monitors a monitored process, by default it monitors the child processes of the monitored process and acquires the operational data including the child processes.
To avoid including child processes, unit definition file of Process exporter must be edited.
For details, see 2.19.2(6)(d) Setting that excludes child processes from monitoring in the JP1/Integrated Management 3 - Manager Configuration Guide.
-
(f) Node exporter for AIX (AIX performance data collection capability)
A Node exporter for AIX is an Exporter that is embedded in a monitored AIX host to obtain the health of the host.
Node exporter for AIX is installed on a host other than Prometheus server and is returned to Prometheus server after scrape is requested from Prometheus server to collect operational data from AIX OS of the same host.
You can collect activity on memory and disks from inside the host that cannot be collected by monitoring from outside the host (external shape monitoring by URL or CloudWatch).
■ Prerequisites
It is a prerequisite that the ports used by Node exporter for AIX are protected by firewalls, networking configurations, and so on, so that they are not accessed by anything other than Prometheus server of JP1/IM - Agent.
For the ports used by Node exporter for AIX, see the explanation of node_exporter_aix command options in 10.4.2(1) Enabling registering services in the JP1/Integrated Management 3 - Manager Administration Guide.
■ Conditions to be monitored
See the Release Notes for the supporting OS of the host on which you are installing Node exporter for AIX.
WPAR is not supported.
Multiple boots of Node exporter for AIX on the same host are not supported, even if they are booted on both physical and logical hosts.
The logical host configuration of the monitored AIX hosts is supported only if the following conditions are met:
-
The hostname of the monitored AIX hostname can be uniquely resolved from Prometheus.
Note: If more than one IP address is assigned to AIX monitored host, Node exporter for AIX can be accessed by all IP addresses.
For the upper limit of Node exporter for AIX that can be monitored by one Prometheus server, refer to the limit value list in JP1/IM - Agent of Appendix D.1 Limits when using the Intelligent Integrated Management Base.
■ Main items to be acquired
The main retrieval items for Node exporter for AIX that JP1/IM - Agent ships with are defined in metric definition-file (default) of Node exporter for AIX. For details, see Node exporter for AIX metric definition file (metrics_node_exporter_aix.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
You can add retrieval items to metric definition file. The following table lists metric that can be specified for PromQL expression in the definition file:
|
Metric Name |
Command-line options for retrieva |
Contents to be acquired |
Label |
Data Source |
|---|---|---|---|---|
|
node_context_switches |
-C |
Total number of context switches. (cumulative value) |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID |
Get by perfstat_cpu_total func pswitch of perfstat_cpu_t structure |
|
node_cpu |
-C |
Seconds the cpus spent in each mode. (cumulative value) |
instance: instance-identity-string job: job-name cpu: cpuid mode: mode (idle, sys, user, or wait) |
Get by perfstat_cpu func Perfstat_cpu_t structure |
|
aix_diskpath_wblks |
-D |
Blocks written via the path |
cpupool_id=physical-processor-shared-pooling-ID diskpath=disk-path-name group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID |
Get by perfstat_diskpath func wblks of perfstat_diskpath_t structure |
|
aix_diskpath_rblks |
-D |
Blocks read via the path |
cpupool_id=physical-processor-shared-pooling-ID diskpath=disk-path-name group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID |
Get by perfstat_diskpath func rblks of perfstat_diskpath_t structure |
|
aix_disk_rserv |
-d |
Read or receive service time |
cpupool_id=physical-processor-shared-pooling-ID disk=disk-name group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID vgname=volume-group-name |
Get by perfstat_disk func rserv of perfstat_disk_t structure |
|
aix_disk_rblks |
-d |
Number of blocks read from disk |
cpupool_id=physical-processor-shared-pooling-ID disk=disk-name group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID vgname=volume-group-name |
Get by perfstat_disk func rblks of perfstat_disk_t structures |
|
aix_disk_wserv |
-d |
Write or send service time |
cpupool_id=physical-processor-shared-pooling-ID disk=disk-name group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID vgname=volume-group-name |
Get by perfstat_disk func wserv of perfstat_disk_t structure |
|
aix_disk_wblks |
-d |
Number of blocks written to disk |
cpupool_id=physical-processor-shared-pooling-ID disk=disk-name group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID vgname=volume-group-name |
Get by perfstat_disk func wblks of perfstat_disk_t structure |
|
aix_disk_time |
-d |
Amount of time disk is active |
cpupool_id=physical-processor-shared-pooling-ID disk=disk-name group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID vgname=volume-group-name |
Get by perfstat_disk func time of perfstat_disk_t structure |
|
aix_disk_xrate |
-d |
Number of transfers from disk |
cpupool_id=physical-processor-shared-pooling-ID disk=disk-name group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID vgname=volume-group-name |
Get by perfstat_disk func xrate of perfstat_disk_t structure |
|
aix_disk_xfers |
-d |
Number of transfers to/from disk |
cpupool_id=physical-processor-shared-pooling-ID disk=disk-name group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID vgname=volume-group-name |
Get by perfstat_disk func xfers of perfstat_disk_t structure |
|
node_filesystem_avail_bytes |
-f |
Filesystem space available to non-root users in bytes. |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID device=device-name fstype=file-system-type mountpoint=mount-point |
Get by stat_filesystems func avail_bytes of filesystem structure |
|
node_filesystem_files |
-f |
Filesystem total file nodes. |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID device=device-name fstype=file-system-type mountpoint=mount-point |
Get by stat_filesystems func files of filesystem structure |
|
node_filesystem_files_free |
-f |
Filesystem total free file nodes. |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID device=device-name fstype=file-system-type mountpoint=mount-point |
Get by stat_filesystems func files_free of filesystem structure |
|
node_filesystem_free_bytes |
-f |
Filesystem free space in bytes. |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID device=device-name fstype=file-system-type mountpoint=mount-point |
Get by stat_filesystems func free_bytes of filesystem structure |
|
node_filesystem_size_bytes |
-f |
Filesystem size in bytes. |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID device=device-name fstype=file-system-type mountpoint=mount-point |
Get by stat_filesystems func size_bytes of filesystem structure |
|
node_intr |
-C |
Total number of interrupts serviced. |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID |
Get by perfstat_cpu_total func decrintrs of perfstat_cpu_total_t structure mpcsintrs of perfstat_cpu_total_t structure devintrs of perfstat_cpu_total_t structure softintrs of perfstat_cpu_total_t structure |
|
node_load1 |
-C |
1m load average. |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID |
Get by perfstat_cpu_total func loadavg[0] of perfstat_cpu_total_t structure |
|
node_load5 |
-C |
5m load average. |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID |
Get by perfstat_cpu_total func loadavg[1] of perfstat_cpu_total_t structure |
|
node_load15 |
-C |
15m load average. |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID |
Get by perfstat_cpu_total func loadavg[2] of perfstat_cpu_total_t structure |
|
aix_memory_real_avail |
-m |
Number of pages (in 4KB pages) of memory available without paging out working segments |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID |
Get by perfstat_memory_total func real_avail of perfstat_memory_total_t structure |
|
aix_memory_real_free |
-m |
Free real memory (in 4 KB pages). |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID |
Get by perfstat_memory_total func real_free of perfstat_memory_total_t structures |
|
aix_memory_real_inuse |
-m |
Real memory which is in use (in 4KB pages) |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID |
Get by perfstat_memory_total func real_inuse of perfstat_memory_total_t structures |
|
aix_memory_real_total |
-m |
Total real memory (in 4 KB pages). |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID |
Get by perfstat_memory_total func perfstat_memory_total_t structure real_total |
|
aix_netinterface_mtu |
-i |
Network frame size |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID netinterface=net-interface-name |
Get by perfstat_netinterface func mtu of perfstat_netinterface_t structure |
|
aix_netinterface_ibytes |
-i |
Number of bytes received on interface |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID netinterface=net-interface-name |
Get by perfstat_netinterface func ibytes of perfstat_netinterface_t structure |
|
aix_netinterface_ierrors |
-i |
Number of input errors on interface |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID netinterface=net-interface-name |
Get by perfstat_netinterface func ierrors of perfstat_netinterface_t structure |
|
aix_netinterface_ipackets |
-i |
Number of packets received on interface |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID netinterface=net-interface-name |
Get by perfstat_netinterface func ipackets of perfstat_netinterface_t structure |
|
aix_netinterface_obytes |
-i |
Number of bytes sent on interface |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID netinterface=net-interface-name |
Get by perfstat_netinterface func obytes of perfstat_netinterface_t structure |
|
aix_netinterface_collisions |
-i |
Number of collisions on csma interface |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID netinterface=net-interface-name |
Get by perfstat_netinterface func collisions of perfstat_netinterface_t structure |
|
aix_netinterface_oerrors |
-i |
Number of output errors on interface |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID netinterface=net-interface-name |
Get by perfstat_netinterface func oerrors of perfstat_netinterface_t structure |
|
aix_netinterface_opackets |
-i |
Number of packets sent on interface |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID netinterface=net-interface-name |
Get by perfstat_netinterface func opackets of perfstat_netinterface_t structure |
|
aix_memory_pgspins |
-m |
Number of page ins from paging space |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID |
Get by perfstat_memory_total func pgspins of perfstat_memory_total_t structure |
|
aix_memory_pgspouts |
-m |
Number of pages paged out from paging space |
cpupool_id=physical-processor-shared-pooling-ID group_id=group-ID instance: instance-identity-string job: job-name lpar=partition-name machine_serial=machine-ID |
Get by perfstat_memory_total func pgspouts of perfstat_memory_total_t structure |
Node exporter for AIX is collected for each monitored resource, such as CPU, memories. You can enable or disable collection for each resource that you want to monitor by using Node exporter for AIX command-line options.
For Node exporter for AIX command-line options, see the description of node_exporter_aix command options in 10.4.2(1) Enabling registering services in the JP1/Integrated Management 3 - Manager Administration Guide.
Use Script exporter to collect information about processes. For details on how to configure the settings, see 1.23.2(4)(e) Monitoring processes on monitoring hosts (AIX) (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.
Use JP1/Base log file trap feature to monitor the log files of the monitored AIX hosts.
■ Notes on logging Node exporter for AIX
Node exporter for AIX log file is output to OS system log. Therefore, the destination depends on OS system log settings. For details on changing the output destination of the system log for Node exporter for AIX logging OS, see 1.23.2(4)(f) Changing the log destination of Node exporter for AIX (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.
■ Precautions When Using SMT or Micro-Partitioning
In an SMT(Simultaneous multithreading) or Micro-Partitioning deployment, calculating CPU Utilization (cpu_used_rate) metric for Node exporter for AIX does not include physical CPU quotas, but calculating CPU utilization as displayed by sar command includes physical CPU quotas.
Therefore, CPU Utilization (cpu_used_rate) of Node exporter for AIX might show a lower metric than sar command output.
(g) Yet another cloudwatch exporter (Azure Monitor performance data collection capability)
Yet another cloudwatch exporter is an exporter included in the monitoring agent that uses Amazon CloudWatch to collect uptime information for AWS services in the cloud.
Yet another cloudwatch exporter is installed on the same host as the Prometheus server, and collects CloudWatch metrics obtained via the SDK provided by AWS (AWS SDK)# upon scrape requests from the Prometheus server, and sends them to the Prometheus server. I will return it.
- #
-
SDK provided by Amazon Web Services (AWS). Yet another cloudwatch exporter uses the Go language version of the AWS SDK for Go (V1). CloudWatch monitoring requires that Amazon CloudWatch supports the AWS SDK for Go (V1).
You can monitor services that cannot include Node exporter or Windows exporter.
- Restrictions
-
To monitor with Yet another cloudwatch exporter (Amazon CloudWatch performance data collection capability), you must be able to connect to AWS Sercurity Token Service(STS) global endpoint. You cannot use the regional endpoint with Yet another cloudwatch exporter that shipped with JP1/IM - Agent.
■ Main items to be acquired
The main retrieval items of Yet another cloudwatch exporter are defined in Yet another cloudwatch exporter metric definition file (default). For details, see Yet another cloudwatch exporter metric definition file (metrics_ya_cloudwatch_exporter.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ CloudWatch metrics you can collect
You can collect metric of namespace name of AWS that is supported for monitoring by Yet another cloudwatch exporter of JP1/IM - Agent that is listed in 3.15.6(1)(k) Creating an IM Management Node for Yet another cloudwatch exporter.
Specify the metrics to collect by describing the AWS service name and CloudWatch metric name in the Yet another Cloudwatch Exporter configuration file (jpc_ya_cloudwatch_exporter.yml).
The following is an example of the description of the Yet another cloudwatch exporter configuration file when collecting CPUUtilization and DiskReadBytes for CloudWatch metrics for AWS/EC2 services.
discovery:
exportedTagsOnMetrics:
ec2:
- jp1_pc_nodelabel
jobs:
- type: ec2
regions:
- ap-northeast-1
period: 60
length: 300
delay: 60
nilToZero: true
searchTags:
- key: jp1_pc_nodelabel
value: .*
metrics:
- name: CPUUtilization
statistics:
- Maximum
- name: DiskReadBytes
statistics:
- MaximumFor details about what Yet another cloudwatch exporter configuration file describes, see Yet another cloudwatch exporter configuration file (jpc_ya_cloudwatch_exporter.yml in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
You can also add new metrics to the Yet another cloudwatch exporter metrics definition file using the metrics you set in the Yet another cloudwatch exporter configuration file.
The metrics and labels specified in the PromQL statement described in the definition file conform to the following naming conventions:
- - Naming conventions for Exporter metrics
-
Yet another cloudwatch exporter treats the metric name of CloudWatch as the metric name of the exporter as the automatic conversion of the metric name in CloudWatch by the following rules. Also, the metric specified on the PromQL statement is described using the indicator name of the exporter.
"aws_"#1+name-space#2+"_"+CloudWatch-metric#2+"_"+statistic-type#2
- #1
-
Appended if the namespace does not begin with "aws_".
- #2
-
Indicates the name you set in the Yet another cloudwatch exporter configuration file (jpc_ya_cloudwatch_exporter.yml). It is converted by the following rules:
-
It is converted from camel case notation to snake case notation.
CamelCase is a notation that capitalizes word breaks, such as "CamelCase" or "camelCase."
Snakecase is a notation that separates words with "_", such as "snake_case".
-
The following symbols are converted to "_".
whitespace,comma,tab, /, \, half-width period, -, :, =, full-width left double quote, @, <, >
-
"%" is converted to "_percent".
-
- - Exporter label naming conventions
-
Yet another cloudwatch exporter treats the CloudWatch dimension tag name as the Exporter's label name, which is automatically converted by the following rules. Also, labels specified on the PromQL statement are described using the label name of the Exporter.
-
For dimensions
"dimension"+"_"+dimensions_name#
-
For tags
"tag"+"_"+tag_name#
-
For custom tags
"custom_tag_"+"_"+custom tag_name#
- #
-
Indicates the name you set in the Yet another cloudwatch exporter configuration file (jpc_ya_cloudwatch_exporter.yml).
-
■ About policies for IAM users in your AWS account
To connect to AWS CloudWatch, you must create a policy with the following permissions and assign it to an IAM user.
"tag:GetResources", "cloudwatch:GetMetricData", "cloudwatch:GetMetricStatistics", "cloudwatch:ListMetrics"
For details on how to set JSON format information, refer to "1.21.2(7)(b) Modify Setup to connect to CloudWatch (for Windows) (optional)" in the manual JP1/Integrated Management 3-Manager Configuration Guide. (The references for Linux are the same.)
■ Environment-variable HTTPS_PROXY
Environment-variable that you specify when you connect to CloudWatch from a Yet another cloudwatch exporter through a proxy. The URL that can be set in the environment-variable HTTPS_PROXY is http only. Note that the only Authentication method supported is Basic authentication.
You can set the environment-variable HTTPS_PROXY to connect to AWS CloudWatch through proxies. The following shows an example configuration.
HTTPS_PROXY=http://username:password@proxy.example.com:5678
■ How to handle monitoring targets JP1/IM - Agent does not support
If you have a product or metric that cannot be monitored by JP1/IM - Agent, you must retrieve it, for example, using user-defined Exporter.
(h) Promitor (Azure Monitor performance data collection capability)
Promitor, included in the integrated agent, collects operating information of Azure services on the cloud environment through Azure Monitor and Azure Resource Graph.
Promitor consists of Promitor Scraper and Promitor Resource Discovery. Promitor Scraper collects metrics on resources from Azure Monitor according to schedule settings and returns them.
Metrics can be collected from target resources in two ways: one method is to specify the target resources separately in a configuration file and the other is to detect the resources automatically. If you choose to detect them automatically, Promitor Resource Discovery detects resources in a tenant through Azure Resource Graph, and based on the results, Promitor Scraper collects metric information.
In addition, both Promitor Scraper and Promitor Resource Discovery require two configuration files for each of them. One configuration file is to define runtime settings, such as authentication information, and the other is to define metric information to be collected.
■ Key metric items
The key Promitor metric items are defined in the Promitor metric definition file (initial status). For details, see the description under Promitor metric definition file (metrics_promitor.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ Metrics you can collect
Promitor can collect metrics for the following services to monitor:
You specify metrics you want to collect in the Promitor Scraper configuration file (metrics-declaration.yaml).
If you want to change the metrics specified in the Promitor Scraper settings file, see Change monitoring metrics (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide 1.21.2(8) Set up of Promitor (d) Configuring scraping targets (required).
You can also add new metrics to the Promitor metric definition file, based on the metrics specified in the Promitor Scraper configuration file. Metrics defined in Promitor Scraper configuration file can be specified to the PromQL statement written in the definition file.
|
Promitor resourceType name |
Azure Monitor namespace |
Automatic discovery support |
|---|---|---|
|
VirtualMachine |
Microsoft.Compute/virtualMachines |
Y |
|
FunctionApp |
Microsoft.Web/sites |
Y |
|
ContainerInstance |
Microsoft.ContainerInstance/containerGroups |
-- |
|
KubernetesService |
Microsoft.ContainerService/managedClusters |
Y |
|
FileStorage |
Microsoft.Storage/storageAccounts/fileServices |
-- |
|
BlobStorage |
Microsoft.Storage/storageAccounts/blobServices |
-- |
|
ServiceBusNamespace |
Microsoft.ServiceBus/namespaces |
Y |
|
CosmosDb |
Microsoft.DocumentDB/databaseAccounts |
Y |
|
SqlDatabase |
Microsoft.Sql/servers/databases |
Y |
|
SqlServer |
Microsoft.Sql/servers/databases Microsoft.Sql/servers/elasticPools |
-- |
|
SqlManagedInstance |
Microsoft.Sql/managedInstances |
Y |
|
SqlElasticPool |
Microsoft.Sql/servers/elasticPools |
Y |
|
LogicApp |
Microsoft.Logic/workflows |
Y |
- Legend:
-
Y: Automatic discovery is supported.
--: Automatic discovery is not supported.
■ Checking how Azure SDKs used by Promitor are supported
Promitor employs Azure SDK for .NET. An end of Azure SDK support is announced 12 months in advance. For details on the lifecycle of Azure SDK, see Lifecycle FAQ at the following website:
https://learn.microsoft.com/ja-jp/lifecycle/faq/azure#azure-sdk-----------
For the lifecycles of versions of Azure SDK libraries, you can find them in the following website:
https://azure.github.io/azure-sdk/releases/latest/all/dotnet.html
■ Credentials required for account information
Promitor can connect to Azure through the service principal method or the managed ID method. For details on the credentials assigned to the service principal and managed ID, see (a) Configuring the settings for establishing a connection to Azure (required) in the JP1/Integrated Management 3 - Manager Configuration Guide 1.21.2(8) Set up of Promitor.
(i) Blackbox exporter (Synthetic metric collector)
Blackbox exporter is an exporter that sends simulated requests to monitored Internet services on the network and obtains operation information obtained from the responses. The supported communication protocols are HTTP, HTTPS, and ICMP.
When the Blackbox exporter receives a scrape request from the Prometheus server, it throws a service request such as HTTP to the monitored target and obtains the response time and response. In addition, the execution results are summarized in the form of metrics and returned to the Prometheus server.
■ Main items to be acquired
The main retrieval items of Blackbox exporter are defined in Blackbox exporter metric definition file (default). For details, see Blackbox exporter metric definition file (metrics_blackbox_exporter.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
You can add retrieved items to the metric definition file. The following are the metrics that can be specified in the PromQL statement described in the definition file.
|
Metric Name |
Prober |
What to get |
Label |
|---|---|---|---|
|
probe_http_duration_seconds |
http |
The number of seconds taken per phase of the HTTP request
|
instance: instance-identification-string job: job-name phase: phase#
|
|
probe_http_content_length |
http |
HTTP content response length |
instance: instance-identification-string job: job-name |
|
probe_http_uncompressed_body_length |
http |
Uncompressed response body length |
instance: instance-identification-string job: job-name |
|
probe_http_redirects |
http |
Number of redirects |
instance: instance-identification-string job: job-name |
|
probe_http_ssl |
http |
Whether SSL was used for the final redirect
|
instance: instance-identification-string job: job-name |
|
probe_http_status_code |
http |
HTTP response status code value
|
instance: instance-identification-string job: job-name |
|
probe_ssl_earliest_cert_expiry |
http |
Earliest expiring SSL certificate UNIX time |
instance: instance-identification-string job: job-name |
|
probe_ssl_last_chain_expiry_timestamp_seconds |
http |
Expiration timestamp of the last certificate in the SSL chain
|
instance: instance-identification-string job: job-name |
|
probe_ssl_last_chain_info |
http |
SSL leaf certificate information
|
instance: instance-identification-string job: job-name fingerprint_sha256: SHA256-fingerprint-on-certificate |
|
probe_tls_version_info |
http |
TLS version used
|
instance: instance-identification-string job: job-name version:TLS-version |
|
probe_http_version |
http |
HTTP version of the probe response |
instance: instance-identification-string job: job-name |
|
probe_failed_due_to_regex |
http |
Whether the probe failed due to a regular expression check on the response body or response headers
|
instance: instance-identification-string job: job-name |
|
probe_http_last_modified_timestamp_seconds |
http |
UNIX time showing Last-Modified HTTP response headers |
instance: instance-identification-string job: job-name |
|
probe_icmp_duration_seconds |
icmp |
Seconds taken per phase of an ICMP request |
instance: instance-identification-string job: job-name phase: phase#
|
|
probe_icmp_reply_hop_limit |
icmp |
Hop limit (TTL for IPv4) value
|
instance instance-identification-string job: job-name |
|
probe_success |
-- |
Whether the probe was successful
|
instance instance-identification-string job: job-name |
|
probe_duration_seconds |
-- |
The number of seconds it took for the probe to complete |
instance instance-identification-string job: job-name |
■ IP communication with monitored objects
Only IPv4 communication is supported.
■ Encrypted communication with monitored objects
HTTP monitoring enables encrypted communication using TLS. In this case, the Blackbox exporter acts as a TLS client to the monitored object (TLS server).
When using encrypted communication using TLS, specify it in item "tls_config" in the Blackbox exporter configuration file (jpc_blackbox_exporter.yml). In addition, the following certificate and key files must be prepared.
|
File |
Format |
|---|---|
|
CA certificate file |
A file encoding an X509 public key certificate in pkcs7 format in PEM format |
|
Client certificate file |
|
|
Client certificate key file |
A file in which the private key in pkcs1 or pkcs8 format is encoded in PEM format#
|
The available TLS versions and cipher suites are supported below.
|
Item |
Scope of support |
|---|---|
|
TLS Version |
1.2 to 1.3 |
|
Cipher suites |
|
■ Timeout for collecting health information
In a network environment where response is slow (under normal conditions), operating information can be collected by adjusting the timeout period.
On the Prometheus server, you can specify the scrape request timeout period in the entry "scrape_timeout" of the Prometheus configuration file (jpc_prometheus_server.yml). For details, see the description of item scrape_timeout in Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
In addition, the timeout period when connecting from the Blackbox exporter to the monitoring target is 0.5 seconds before the value specified in "scrape_timeout" above.
■ Certificate expiration
When collecting operation information by HTTPS monitoring, the exporter receives a certificate list (server certificate and certificate list certifying server certificate) from the monitoring target.
The Blackbox exporter allows you to collect the expiration time (UNIX time) of the closest expiring certificate as a probe_ssl_earliest_cert_expiry metric.
You can also use the features in 3.15.1(3) Performance data monitoring notification function to monitor certificates that are close to their deadline, because you can calculate the number of seconds remaining before the deadline with the value calculated in probe_ssl_earliest_cert_expiry Metric Value-PromQL's time() function.
■ User-Agent value in HTTP request header when monitoring HTTP
The default value of User-Agent included in HTTP request header during HTTP monitoring is as shown below:
-
For version 13-00 or earlier
"Go-http-client/1.1"
-
For version 13-00-01 or later
"Blackbox Exporter/0.24.0"
You can change the value of User-Agent in the setting of item "headers" in the Blackbox exporter configuration file (jpc_blackbox_exporter.yml).
The following is an example of changing the value of User-Agent to "My-Http-Client".
modules:
http:
prober: http
http:
headers:
User-Agent: "My-Http-Client"For details, see the description of item headers in Blackbox exporter configuration file (jpc_blackbox_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ About HTTP 1.1 Name-Based Virtual Host Support
The Blackbox exporter supports HTTP 1.1 name-based virtual hosts and TLS Server Name Indication (SNI). You can monitor virtual hosts that disguise one HTTP/HTTPS server as multiple HTTP/HTTPS servers.
■ About TLS Server Authentication and Client Authentication
In Blackbox exporter's HTTPS monitoring, server authentication is performed using the CA certificate described in item "ca_file" of the Blackbox exporter configuration file (jpc_blackbox_exporter.yml) and the server certificate sent by the server when HTTPS communication with the server starts (TLS handshake).
If the sent certificate is incorrect (server name is incorrect, expired, self-certificate is used, etc.), HTTPS communication cannot be started and monitoring fails.
In addition, when a request is made to send a certificate from the monitored server at the start of HTTPS communication (TLS handshake), the client certificate described in item "cert_file" of the Blackbox exporter configuration file (jpc_blackbox_exporter.yml) is sent to the monitored server.
If the server validates the sent certificate, recognizes it as invalid, and returns an error to the Blackbox exporter via the TLS protocol (or if communication cannot be continued due to a loss of communication, etc.), the monitoring fails.
For details on the verification contents related to the client certificate and the operation in the event of an error on the monitored server, check the specifications of the monitored server (or relay device such as a load balancer).
To detect fraudulent certificates during server authentication, if you specify "true" in item "insecure_skip_verify" in the Blackbox exporter configuration file (jpc_blackbox_exporter.yml), HTTPS communication can be started without errors. However, in that case, the verification operation related to client authentication at the server will be invalidated.
For details, see the description of item insecure_skip_verify in Blackbox exporter configuration file (jpc_blackbox_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
Server authentication cannot be performed using certificates that the host name is not listed in the Subject Alternative Name field.
■ About cookie information
The Blackbox exporter does not use cookie information sent from the monitored target in the next HTTP communication request.
■ About external resources referenced from content included in the response body of HTTP communication
In Blackbox exporter, external resources (subframes, images, etc.) referenced from the content included in the response body of HTTP communication are not included in the monitoring range.
■ About Monitoring of Content Included in HTTP Communication Response Body
Since the Blackbox exporter does not parse the content, the execution result and execution time based on the syntax (HTML, javascript, etc.) in the content included in the response body of HTTP communication are not reflected in the monitoring result.
■ Precautions when the monitoring destination of HTTP monitoring redirects with Basic authentication
If the Blackbox exporter's HTTP monitoring destination redirects with Basic authentication, the Blackbox exporter sends the same Basic authentication username and password to the redirect source and destination. Therefore, when performing Basic authentication on both the redirect source and the redirect destination, the same user name and password must be set on the redirect source and the redirect destination.
(j) Script exporter (UAP monitoring capability)
Script exporter runs scripts on a host and gets results.
The Script exporter is installed on the same host as the JP1/IM - Agent, and upon a scrape request from the Prometheus server, it executes a script on that host to retrieve the results and returns them to the Prometheus server.
Developing a script that gets UAP information and converts it to a metric and adding the script to Script exporter enables you to monitor applications that are not supported by Exporter as you want.
■ Key metric items
The key Script exporter metric items are defined in the Script exporter metric definition file (initial status). For details, see Script exporter metric definition file (metrics_script_exporter.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
You can add more metric items to the metric definition file. The following table shows the metrics you can specify with PromQL statements used within the definition file.
|
Metric name |
Data to be obtained |
Label |
|---|---|---|
|
script_success |
Script exit status (0 = error, 1 = success) |
instance: instance-identifier-string job: job-name script: script-name |
|
script_duration_seconds |
Script execution time, in seconds. |
instance: instance-identifier-string job: job-name script: script-name |
|
script_exit_code |
The exit code of the script. |
instance: instance-identifier-string job: job-name script: script-name |
(k) OracleDB exporter (Oracle Database monitoring function)
OracleDB exporter is an Exporter for Prometheus that retrieves performance data from Oracle Database.
- - About the number of sessions
-
If you monitor Oracle Database from OracleDB exporter, it connects to each scrape and disconnects when the data-collection is complete. The number of sessions when connecting is 1.
■ Conditions to be monitored
The following are the Oracle Database configurations and database character sets that JP1/IM - Agent supports for monitoring:
-
Configuring Oracle Database
-
For non-clusters
Non CDB and CDB configurations
-
For Oracle RAC
CDB configuration
-
Because OracleDB exporter connects to one service in a single process, it launches more than one OracleDB exporter if there is more than one target.
- Note
-
-
Oracle RAC One Node and Oracle Database Cloud Service are not supported.
-
HA clustering configuration on Oracle Database is not supported.
-
-
Oracle Database database-character set
-
AL32UTF8(Unicode UTF-8)
-
JA16SJIS (Japanese-language SJIS)
-
ZHS16GBK (Simplified Chinese GBK)
-
■ Acquisition items
The metrics that can be retrieved with the OracleDB exporter shipped with the JP1/IM - Agent are the metrics and cache_hit_ratio defined by the OracleDB exporter default.
OracleDB exporter retrieval items are defined in metric definition-file (default) of OracleDB exporter. For details, see OracleDB exporter metric definition file (metrics_oracledb_exporter.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
The following tables list metric that can be specified for PromQL expression in the definition file. The value of each metric is obtained by executing the SQL statement shown in the table to Oracle Database. For details about metric, contact Oracle based on SQL statement of the data source.
|
Metric name |
Contents to be acquired |
Label |
Data source (SQL statement) |
|---|---|---|---|
|
oracledb_sessions_value |
Count of sessions |
status: status type: session-type |
SELECT status, type, COUNT(*) as value FROM v$session GROUP BY status, type |
|
oracledb_resource_current_utilization |
Resource usage#1 |
resource_name: resource-name |
SELECT resource_name,current_utilization,CASE WHEN TRIM(limit_value) LIKE 'UNLIMITED' THEN '-1' ELSE TRIM(limit_value) END as limit_value FROM v$resource_limit |
|
oracledb_resource_limit_value |
Resource usage limit#1 (UNLIMITED: -1) |
resource_name: resource-name |
|
|
oracledb_asm_diskgroup_total |
Bytes of total size of ASM disk group |
name: disk-group-name |
SELECT name,total_mb*1024*1024 as total,free_mb*1024*1024 as free FROM v$asm_diskgroup_stat where exists (select 1 from v$datafile where name like '+%') |
|
oracledb_asm_diskgroup_free |
Bytes of free space available on ASM disk group |
name: disk-group-name |
|
|
oracledb_activity_execute_count |
Total number of calls (user calls and recursive calls) executing SQL statements (cumulative value) |
none |
SELECT name, value FROM v$sysstat WHERE name IN ('parse count (total)', 'execute count', 'user commits', 'user rollbacks', 'db block gets from cache', 'consistent gets from cache', 'physical reads cache') |
|
oracledb_activity_parse_count_total |
Total number of parse calls (hard, soft and describe) (cumulative value) |
none |
|
|
oracledb_activity_user_commits |
Total number of user commit (cumulative value) |
none |
|
|
oracledb_activity_user_rollbacks |
The number of times a user manually issued a ROLLBACK statement, or the total number of times an error occurred during a user's transaction (cumulative value) |
none |
|
|
oracledb_activity_physical_reads_cache |
Total number of data blocks read from disk to the buffer cache (cumulative value) |
none |
|
|
oracledb_activity_consistent_gets_from_cache |
Number of times block read consistency was requested from the buffer cache (cumulative value) |
none |
|
|
oracledb_activity_db_block_gets_from_cache |
Number of times CURRENT blocking was requested from the buffer cache (cumulative value) |
none |
|
|
oracledb_process_count |
Count of Oracle Database active-processes |
none |
SELECT COUNT(*) as count FROM v$process |
|
oracledb_wait_time_administrative |
Hours spent waiting for Administrative wait class (in 1/100 seconds)#2 |
none |
SELECT n.wait_class as WAIT_CLASS, round(m.time_waited/m.INTSIZE_CSEC,3) as VALUE FROM v$waitclassmetric m, v$system_wait_class n WHERE m.wait_class_id=n.wait_class_id AND n.wait_class != 'Idle' |
|
oracledb_wait_time_application |
Hours spent waiting for Application wait class (in 1/100 seconds)#2 |
none |
|
|
oracledb_wait_time_commit |
Hours spent waiting for Commit wait class (in 1/100 seconds)#2 |
none |
|
|
oracledb_wait_time_concurrency |
Hours spent waiting for Concurrency wait class (in 1/100 seconds)#2 |
none |
|
|
oracledb_wait_time_configuration |
Hours spent waiting for Configuration wait class (in 1/100 seconds)#2 |
none |
|
|
oracledb_wait_time_network |
Hours spent waiting for Network wait class (in 1/100 seconds)#2 |
none |
|
|
oracledb_wait_time_other |
Hours spent waiting for Other wait class (in 1/100 seconds)#2 |
none |
|
|
oracledb_wait_time_scheduler |
Hours spent waiting for Scheduler wait class (in 1/100 seconds)#2 |
none |
|
|
oracledb_wait_time_system_io |
Hours spent waiting for System I/O wait class (in 1/100 seconds)#2 |
none |
|
|
oracledb_wait_time_user_io |
Hours spent waiting for User I/O wait class (in 1/100 seconds)#2 |
none |
|
|
oracledb_tablespace_bytes |
Total bytes consumed by tablespaces |
tablespace: name-of-the-tablespace type: tablespace-contents |
SELECT dt.tablespace_name as tablespace, dt.contents as type, dt.block_size * dtum.used_space as bytes, dt.block_size * dtum.tablespace_size as max_bytes, dt.block_size * (dtum.tablespace_size - dtum.used_space) as free, dtum.used_percent FROM dba_tablespace_usage_metrics dtum, dba_tablespaces dt WHERE dtum.tablespace_name = dt.tablespace_name ORDER by tablespace |
|
oracledb_tablespace_max_bytes |
Maximum number of bytes in a tablespace |
tablespace: name-of-the-tablespace type: tablespace-contents |
|
|
oracledb_tablespace_free |
Number of free bytes in the tablespace |
tablespace: name-of-the-tablespace type: tablespace-contents |
|
|
oracledb_tablespace_used_percent |
Tablespace utilization If auto extension is ON, it is calculated with auto extension taken into account. |
tablespace: name-of-the-tablespace type: tablespace-contents |
|
|
oracledb_exporter_last_scrape_duration_seconds |
The number of seconds taken the last scrape |
none |
- |
|
oracledb_exporter_last_scrape_error |
Whether the last scrape resulted in an error 0: Error 1: Success |
none |
- |
|
oracledb_exporter_scrapes_total |
Total number of times Oracle Database was scraped for metrics |
none |
- |
|
oracledb_up |
Whether the Oracle Database Server is up 0: Not running 1: Running |
none |
- |
- #1
-
In a PDB, the table in the source v$resource_limit is empty and cannot be retrieved.
- #2
-
In a PDB, the table in the source v$waitclassmetric is empty and cannot be retrieved.
- Important
-
-
Prior to using OracleDB exporter, make sure that SQL statements that serve as the data source can be executed, for example, with SQL*Plus command. This ensures that the required information can be displayed. Use OracleDB exporter to connect to Oracle Database when checking.
-
OracleDB exporter provided by JP1/IM - Agent does not support the ability to collect any metric (custom metrics).
-
■ Requirements for monitoring Oracle Database
If you want to monitor Oracle Database on OracleDB exporter, Oracle Database must have the following settings:
You do not need to install Oracle Client, etc. on JP1/IM - Agent host-side.
-
Oracle listener
-
Configure Oracle listener and servicename so that they can connect to the target.
-
Oracle listener is configured to accept unencrypted connect requests.
-
-
Oracle Database
Set Oracle Database database-character set to one of the following:
-
AL32UTF8 (Unicode UTF-8)
-
JA16SJIS (Japanese-language SJIS)
-
ZHS16GBK (Simplified Chinese GBK)
-
-
Users used to access Oracle Database
-
The user used to connect to Oracle Database must have the following permissions:
- Login permissions
- SELECT permissions to the following tables
dba_tablespace_usage_metrics
dba_tablespaces
v$system_wait_class
v$asm_diskgroup_stat
v$datafile
v$sysstat
v$process
v$waitclassmetric
v$session
v$resource_limit
-
User used to connect to Oracle Database
For details about the character types and maximum lengths that can be specified for user names, see Environment variables.
-
Password of the user used to connect to Oracle Database
The following character types can be used for passwords:
- Uppercase letters, lowercase letters, numbers, @, +, ', !, $, :, ., (, ), ~, -, _
- The password can be from 1 to 30 bytes in length.
-
■ Obfuscation of Oracle Database passwords
OracleDB exporter shipped with JP1/IM - Agent manages the passwords in secret obfuscation capabilities for accessing Oracle Database from OracleDB exporter. For details, see 3.15.10 Secret obfuscation function.
■ Notes on Oracle Database log files
Monitoring Oracle Database with OracleDB exporter can generate a large number of logfiles. Therefore, Oracle Database administrator should consider deleting logfiles periodically.
|
Directory where log files are generated (including subdirectories) |
Increasing log file extensions |
|---|---|
|
$ORACLE_BASE/diag/rdbms |
.trc, .trm |
Below is a sample command line for deleting ".trc" or ".trm" files with older renewal dates. If necessary, consider running such commands periodically to delete unnecessary logs.
|
OS |
Command line example for deleting logs |
|---|---|
|
Windows |
forfiles /P "%ORACLE_BASE%\diag\rdbms" /M *.trm /S /C "cmd /C del /Q @path" /D -14 forfiles /P "%ORACLE_BASE%\diag\rdbms" /M *.trc /S /C "cmd /C del /Q @path" /D -14 |
|
Linux |
find $ORACLE_BASE/diag/rdbms -name '*.tr[cm]' -mtime +14 -delete |
Set the $ORACLE_BASE and %ORACLE_BASE% environment variables as needed.
■ Environment variables
The following environment variables are required when using OracleDB exporter.
- - Environment-variable "DATA_SOURCE_NAME" (required)
-
Specify the destination of OracleDB exporter in the following format: There is no default value.
-
For Windows
oracle://user-name@host-name:port/service-name?connection timeout=10[&instance name=instance-name]
-
For Linux
oracle://user-name@host-name:port/service-name?connection timeout=10[&instance name=instance-name]
- user-name
-
-
Specifies the username to connect to Oracle listener. Up to 30 characters can be specified.
-
You can use uppercase letters, numbers, underscores, dollar signs, pound signs, periods, and at signs. Note that lowercase letters are not allowed.
-
For Linux, replace the pound sign with "%%23" when you include your username in unit definition file. For example, if you are a shared CDB user, specify "C##USER" as "C%%23%%23USER".
-
For Windows, replace the pound sign with %23 when you include the username in service definition file. For example, if you are a shared CDB user, specify "C##USER" as "C%23%23USER".
-
- host-name
-
-
Specifies the host name of Oracle Database host to monitor. Up to 253 characters can be specified.
-
You can use uppercase letters, lowercase letters, numbers, hyphens, and periods.
-
- port
-
-
Specifies the port number for connecting to Oracle listener.
-
- service-name
-
-
Specifies the service name of Oracle listener. Up to 64 characters can be specified.
-
You can use uppercase letters, lowercase letters, numbers, underscores, hyphens, and periods.
-
- Option
-
You can specify the following options. If you specify more than one, connect them with & in Windows and & in Linux.
-
connection timeout=number
Specifies the connection timeout in seconds. This option must be specified.
Be sure to specify 10. If you specify a value other than 10 or do not specify this option, scrape of Prometheus server times out and up metric may be 0 even if OracleDB exporter is running.
-
instance name=instance-name
Specifies instance to connect to. Specifying this option is optional.
-
(Example of specification)
oracle://orauser@orahost:1521/orasrv?connection timeout=10
-
For Windows
oracle://orauser@orahost:1521/orasrv?connection timeout=10&instance name=orcl1
-
For Linux
oracle://orauser@orahost:1521/orasrv?connection timeout=10 &instance name=orcl1
-
- - Environment variable DATA_SOURCE_NAME (required)
-
Specify the full path of jp1ima directory under JP1/IM - Agent installation directory.
For a logical host, specify the full path of jp1ima directory under JP1/IM - Agent shared directory.
(Example of specification)
-
For Windows
C:\Program files\Hitachi\jp1ima
-
For Linux
/opt/jp1ima
-
■ Notes
-
If you try to stop the monitored Oracle Database instance and containers prior to stopping OracleDB exporter, NORMAL shutdown of Oracle may not terminate. Stop OracleDB exporter in advance or stop Oracle Database by IMMEDIATE shutdown
-
Shut down OracleDB exporter before making configuration changes or maintaining Oracle Database instance and containers.
(l) Fluentd (Log metrics)
This capability can generate and measure log metrics from log files created by monitoring targets. For details on the function, see 3.15.2 Log metrics by JP1/IM - Agent.
■ Key metric items
You define what figures you need from the log files created by your monitoring targets in the log metrics definition file (fluentd_any-name_logmetrics.conf). These definitions allow you to get quantified data (log metrics) as metric items.
For details on the log metrics definition file, see Log metrics definition file (fluentd_any-name_logmetrics.conf) in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ Sample files
The following provides descriptions of sample files for when you use the log metrics feature. If you copy the sample files, be careful of the linefeed codes. For details, see the description of each file of 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference. These sample files are based on the assumptions in Assumptions of the sample files. Copy each file and change the settings according to your monitoring targets.
- - Assumptions of the sample files
-
The sample files described here assume that HostA, a monitored host (integrated agent host), exists and JP1/IM - Agent is installed in it, and that WebAppA, an application running on HostA, creates the following log file.
- - ControllerLog.log
-
As shown in target log message 1, a log message is created, saying that an HTTP endpoint in WebAppA is used, at the start of processing of the request for that endpoint. The log message also indicates the number of records handled upon request processing.
Target log message 1:
... 2022-10-19 10:00:00 [INFO] c.b.springbootlogging.LoggingController : endpoint "/register" started. Target record: 5. ...
In the sample files, a regular expression to match target log message 1 is used, and the number of the log messages that match the expression is counted. The number is then displayed in the Trends tab of the JP1/IM integrated operation viewer as log metric 1, Requests to the register Endpoint.
The definition for log metric 1 uses counter as its log metric type.
In addition, the regular expression used in the above also extracts the number indicated as Target record from target log message 1, and then the extracted numbers are summed up. The total is then displayed in the Trends tab of the JP1/IM integrated operation viewer as log metric 2, Number of Registered Records.
The definition for log metric 2 uses counter as its log metric type.
Fluentd workers (multi-process workers feature) for the number of log files to be monitored are required. For details on the worker settings related to the log metrics feature, see the log metrics definition file (fluentd_any-name_logmetrics.conf). Here, it is assumed that 11 fluentd workers are running, and ControllerLog.log is monitored by a worker whose worker ID is 10.
These sample files also assume the tree structure consisting of the following IM management nodes:
All Systems + Host A + Application Server + WebAppA - - Target files in this example
-
The target files used in this example are as follows:
-
Integrated manager host
- User-specific metric definition file
-
Integrated agent host
- Prometheus configuration file
- User-specific discovery configuration file
- Log metrics definition file
- Fluentd log monitoring target definition file
-
- - Sample user-specific metric definition file
-
- File name: metrics_logmatrics1.conf
- Written code
[ { "name":"logmetrics_request_endpoint_register", "default":true, "promql":"logmetrics_request_endpoint_register and $jp1im_TrendData_labels", "resource_en":{ "category":"HTTP", "label":"request_num_of_endpoint_register", "description":"The request number of endpoint register", "unit":"request" }, "resource_ja":{ "category":"HTTP", "label":"registerへのリクエスト数", "description":"The request number of endpoint register", "unit":"リクエスト" } }, { "name":"logmetrics_num_of_registeredrecord", "default":true, "promql":"logmetrics_num_of_registeredrecord and $jp1im_TrendData_labels", "resource_en":{ "category":"DB", "label":"logmetrics_num_of_registeredrecord", "description":"The number of registered record", "unit":"record" }, "resource_ja":{ "category":"DB", "label":"登録されたレコード数", "description":"The number of registered record", "unit":"レコード" } } ]- Note
-
The storage directory, written code, and file name follow the format of the user-specific metric definition file (metrics_any-Prometheus-trend-name.conf).
- - Sample Prometheus configuration file
-
- File name: jpc_prometheus_server.yml
- Written code
global: ... (omitted) ... scrape_configs: - job_name: 'LogMetrics' file_sd_configs: - files: - 'user/user_file_sd_config_logmetrics.yml' relabel_configs: - target_label: jp1_pc_nodelabel replacement: Log trapper(Fluentd) metric_relabel_configs: - target_label: jp1_pc_nodelabel replacement: ControllerLog - source_labels: ['__name__'] regex: 'logmetrics_request_endpoint_register|logmetrics_num_of_registeredrecord' action: 'keep' - regex: (jp1_pc_multiple_node|jp1_pc_agent_create_flag) action: labeldrop ... (omitted) ...- Note
-
The storage directory and written code follow the format of the Prometheus configuration file (jpc_prometheus_server.yml). You do not have to create a new file. Instead, you add the scrape_configs section for the log metrics feature to the Prometheus configuration file (jpc_prometheus_server.yml) created during installation.
- - Sample user-specific discovery configuration file
-
- File name: user_file_sd_config_logmetrics.yml
- Written code
- targets: - HostA:24830 labels: jp1_pc_exporter: logmetrics jp1_pc_category: WebAppA jp1_pc_trendname: logmetrics1 jp1_pc_multiple_node: "{__name__=~'logmetrics_.*'}" jp1_pc_agent_create_flag: false- Note
-
The storage directory and written code follow the format of the user-specific discovery configuration file (file_sd_config_any-name.yml).
ControllerLog.log is monitored by the worker whose Fluentd worker ID is 10. Thus, when 24820 is set for port in the Sample log metrics definition file, the port number of the worker monitoring ControllerLog.log is 24820 + 10 = 24830.
- - Sample log metrics definition file
-
- File name: fluentd_WebAppA_logmetrics.conf
- Written code
## Input <worker 10> <source> @type prometheus bind '0.0.0.0' port 24820 metrics_path /metrics </source> </worker> ## Extract target log message 1 <worker 10> <source> @type tail @id logmetrics_counter path /usr/lib/WebAppA/ControllerLog/ControllerLog.log tag WebAppA.ControllerLog pos_file ../data/fluentd/tail/ControllerLog.pos read_from_head true <parse> @type regexp expression /^(?<logtime>[^\[]*) \[(?<loglebel>[^\]]*)\] (?<class>[^\[]*) : endpoint "\/register" started. Target record: (?<record_num>\d[^\[]*).$/ time_key logtime time_format %Y-%m-%d %H:%M:%S types record_num:integer </parse> </source> ## Output ## Define log metrics 1 and 2 <match WebAppA.ControllerLog> @type prometheus <metric> name logmetrics_request_endpoint_register type counter desc The request number of endpoint register </metric> <metric> name logmetrics_num_of_registeredrecord type counter desc The number of registered record key record_num <labels> loggroup ${tag_parts[0]} log ${tag_parts[1]} </labels> </metric> </match> </worker>- Note
-
The storage directory and written code follow the format of the log metrics definition file (fluentd_any-name_logmetrics.conf).
- - Sample Fluentd log monitoring target definition file
-
- File name: jpc_fluentd_common_list.conf
- Written code
## [Target Settings] ... (omitted) ... @include user/fluentd_WebAppA_logmetrics.conf
- Note
-
The storage directory and written code follow the format of the Fluentd log monitoring target definition file (jpc_fluentd_common_list.conf) in JP1/IM - Agent definition files. You do not have to create a new file. Instead, you add the include section for the log metrics feature to the Fluentd log monitoring target definition file (jpc_fluentd_common_list.conf) created during installation.
(m) Web scenario monitoring function
In JP1/IM - Manager and JP1/IM - Agent versions 13-10 and later, Web scenario monitoring function is available. #
Web scenario monitoring function is one of Synthetic metric collector. Monitors how long a user action plays in Web browser. The monitoring scope is HTTP(S) communication of the initial screen and the series of operations from login to logoff. HTTP(S) Monitors the operation of Web contents that issue a large number of communications combining HTML,json,xml, etc. based on communications. It is possible to monitor from the viewpoint of user operation, which cannot be done by Synthetic metric collector (single HTTP (S) monitoring) by Blackbox exporter.
- #
-
If JP1/IM - Manager is upgraded from a version earlier than 13-10 to a version later than 13-10, and you use Web scenario monitoring function, you must configure the settings to use Web scenario monitoring function. For instructions on setting up JP1/IM - Manager, see Setting up the environment variables and Setting up Web exporter in 1.21.2(13)(a) Setting up JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Configuration Guide.
See Configuring authentication in 1.21.2(13)(a) Setting up JP1/IM - Agent
■ Prerequisites
When you use Web scenario monitoring function, you have the following prerequisites:
-
Prerequisite browser
The following browsers must be installed before you can create and monitor Web scenarios.
-
Google Chrome
-
Microsoft Edge
In addition, you must be able to access the targets from the above browsers.
The above browsers are used to create Web scenarios and to monitor Web scenarios using Web scenarios.
-
-
Agent host
We recommend that you create Web scenarios on the same host and monitor Web scenarios on the same host.
If you want to migrate Web scenario file to a different host, you must perform the steps in 1.5.1(9)(c) Migrating Web Scenario Files to another host in the JP1/Integrated Management 3 - Manager Administration Guide.
In addition, Web scenario monitoring function can only be used by agent host on Windows host that have JP1/IM - Agent for Windows installed.
-
Web exporter
The listen port used by Web exporter must be protected, for example, by a firewall or networking configuration, so that it is not accessed by anything other than JP1/IM - Agent's Prometheus server. For the port used by Web exporter, see the explanation of web_exporter command options in Service definition file (jpc_program-name_service.xml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ Network configuration
We recommend that you install JP1/IM - Agent on a networked host that is close to the user who is using the monitored Web contents to monitor according to user's Web scenarios. If the network path from JP1/IM - Agent to the monitoring target differs greatly from that of the user using Web contents, it is difficult to detect monitoring errors due to failure of the relay device.
■ Function List
Web scenario monitoring function monitors the response time of the user's experience by automatically playing back the user's actions on the browser screen and measuring the playing time.
Web scenario monitoring function consists of Web operation information collection function, which collects performance information for Web operation responses based on Web operation scenario, and Web scenario creation support function, which helps create a Web operation scenario (Web scenario).
|
Function |
Description |
||
|---|---|---|---|
|
Web scenario monitoring function |
Monitoring of Web system is realized from the performance data of the collected Web response. |
||
|
Web scenario creation support function |
Supports you create scenarios for Web manipulation. |
||
|
Web scenario creation function |
Launch your browser and create Web scenarios. |
||
|
Web operation information collection function |
Based on Web action scenario, collect the performance-information of the response of Web action. Use a Web exporter that provides Web operation information collection function. |
||
|
Web scenario execution function |
Perform Web actions as they were created in Web scenario. |
||
|
Trace viewer function |
Displays trace information to be used for investigation when an error is detected during Web scenario execution. |
||
■ Web Scenario Creation Support Function
Web Scenario creation support function launches Web Scenario creation function, which launches a browser and records what user interact with in the browser as a Web scenario.
- ■Scenarios that can be created
-
You can monitor Web contents using Web scenarios that record the following actions:
-
Operations for displaying the top page
This is just the operation to display the top page. No other operations are required.
-
To log on from the Login screen
Enter the username and password and click the logon button.
-
Operation of the logoff button on the logoff screen
-
■ Web Scenario Creation Function (playwright codegen)
Web Scenario creation function provides the ability to assist in the creation of Web manipulation scenarios (Web scenarios). Web Scenario creation function uses Playwright of OSS.
- ■Prerequisites
-
When you run playwright command manually, the current folder must be Playwright working folder. For Playwright working folders, see Appendix A.4(3) Integrated agent host (Windows).
It must be run in built-in Administrator.
- ■Starting Codegen
-
Web Scenario creation function uses Codegen of Playwright.
The user who runs Codegen must be the same user as the user who runs Web exporter.
Use playwright codegen command to perform Web scenario creation function.
Playwright codegen command is a command that opens a Web site and generates a code-based page in response to your actions. Allows users to run on the terminal.
Recording starts when Web Scenario creation function is activated.
npx playwright codegen --target playwright-test --channel channels --lang locale URL -o ./tests/Web-scenario-filename
Codegen opens two windows: a browser window for interacting with the monitored Web site and a Playwright Inspector window for recording Web scenario code.
When a user runs a Codegen and performs an action in a browser, Playwright generates the code according to the action.
For details about the parameters that can be specified in playwright codegen command, see the following tables.
-
npx playwright codegen command option
Item
Description
Changeability
What You Setup in Your JP1/IM - Agent
JP1/IM - Agent Defaults Value
-o filename
or
--output filename
Save the generated script to a file
REQ
Specifies the path to the filename of the destination Web scenario file relative to the command-execution directory.
If not specified, the script will be discarded when Codegen terminates. The generated script must be copied to a text file, for example, by the user.
File names have the following rules:
-
The filename must be in the format "String.spec.ts".
-
The file name can contain only single-byte alphanumeric characters and underscores (_).
-
The maximum number of bytes that can be specified for a parameter is 256 bytes.
-
You cannot specify folders and files on a network drive. If specified, operation cannot be guaranteed in the event of a network failure or delay (in the event of a Windows).
-
The following pathnames cannot be specified:
- File name with a leading "-" (hyphen)
- Folder or file name containing environment-dependent characters
If you specify a file that does not exist, a new file is created.
If you specify a file name that already exists, the file is overwritten.
For details about the storage location of the output Web scenario file, see Appendix A.4(3) Integrated agent host (Windows).
./tests/Web-scenario-filename
--target language
Select the language for generating the script.
--
None
None
--channel channels
Specifies the distribution channel for Chromium.
REQ
Specify one of the following as the browser for executing Codegen.
-
"chrome"
Specify if you want to use Google Chrome.
-
"msedge"
Specify if you want to use Microsoft Edge.
None
--lang language
Specify the language and locale.
<Example of specification>
"ja-JP"
REQ
One of the following, depending on the language code at the time of the test run:
-
"en-US"
-
"ja-JP"
-
"zh-CN"
-
"th"
If not specified, a Web scenario is generated in a language code that differs from the language code at the time of the test. Originally, a successful Web scenario may fail.
None
--proxy-server proxies
Specify the proxy server.
<Example of specification>
"http://myproxy:3128"
"socks5://myproxy:8080"
Y
Specifies the proxy used for the request.
Specify the entire domain with up to 253 alphanumeric characters.
None
--proxy-bypass Bypass proiesy
Specifies a comma-separated domain of proxies to bypass.
<Example of specification>
".com,chromium.org,.domain.com"
Y
Specifies the domain for proxy bypass, up to 253 alphanumeric characters.
None
URL
Specify URL to be monitored.
Y
Specify the entire domain with up to 253 alphanumeric characters.
Specify URL in the following format:
Protocol://Hostname:port-number
None
-
- Legend:
-
REQ: Required setting, Y: Changeable, --: Not applicable
-
- ■Recording screen operations
-
When recording operations such as mouse clicking, text entry, and HTML operations, perform the operations you want to record in the browser window while recording is already started. As you work, a Web scenario code is generated on Playwright Inspector window.
The following tables show the browser operations and operations that can be recorded and measured as Web scenarios.
Table 3‒38: Operation and operation of browsers that can be recorded and measured as Web scenario Classification
Operation
Record
Remarks
Sample Codes Recorded by codegen
Mouse operation
--
--
--
The mouse operation itself is not recorded, but it is recorded as button operation, etc. caused by mouse operation.
--
Click
Y
Y
--
await page.getByRole('button', { name: 'Login' }).click();
Double click
Y
Y
--
await page.getByRole('button', { name: 'Clear' }).dblclick();
Sub button click
Y
Y
--
await page.locator('body').click({
button: 'right'
});
Keyboard operation (key entry operation)
--
--
--
--
--
Entering Characters
Y
Y
The value being input is reflected in real time. The entered value is recorded as a HTML action, etc.
await page.locator('input[name="username"]').fill('username');
Shortcut key input
--
--
This is the same as browser operation.
This is the same as browser operation.
Accelerate key input
--
--
Other key input
--
--
Browser operations
--
--
--
--
--
Move next item
[Tab]
Y
Y
Only recorded if an element in HTML is selected.
await page.locator('body').press('Tab');
Move previous item
[Shift]+[Tab]
Y
Y
await page.locator('body').press('Shift+Tab');
Go to next page
[Alt]+[→]
Y
Y
Keyboard actions are disabled. Page transitions are recorded.
await page.goto('URL');
Go to previous page
[Alt]+[←]
[BackSpace]
Y
Y
Context menu display
[Right-click]
[Shift]+[F10]
Y
Y
--
await page.locator('body').press('Shift+F10');
Scroll up
[↑]
Y
Y
--
await page.locator('body').press('ArrowUp');
Scroll down
[↓]
Y
Y
--
await page.locator('body').press('ArrowDown');
Page Up Scroll
[PgUp]
Y
Y
--
await page.locator('body').press('PageUp');
Page Down Scroll
[PgDn]
Y
Y
--
await page.locator('body').press('PageDown');
Go to top of page
[Home]
Y
Y
--
await page.locator('body').press('Home');
Go to End of Page
[End]
Y
Y
--
await page.locator('body').press('End');
Stop operation
[Esc]
Y
Y
--
await page.locator('body').press('Escape');
Link-click [Enter]
[Click]
Y
Y
This is the same as HTML linking operation. Page transitions are recorded.
This is the same as HTML linking operation.
Multiple selection operation
[Ctrl] +[click]
Y
Y
--
await page.getByRole('listbox').selectOption(['apple', 'banana', 'orange']);
Cut
[Ctrl]+[X]
Y
Y
--
await page.locator('body').press('Control+x');
Copy
[Ctrl]+[C]
Y
Y
--
await page.locator('body').press('Control+c');
Paste
[Ctrl]+[V]
Y
Y
--
await page.locator('body').press('Control+v');
Select All
[Ctrl]+[A]
Y
Y
--
await page.locator('body').press('Control+a');
Dialog operation
--
--
--
The operation itself may not be recorded, but page transitions are recorded.
await page.goto('URL');
Text input
Y
#1
--
--
Key operation
Y
#1
--
--
Other input items
Y
#1
--
--
HTML operation
--
--
--
Records operations related to input operations and page transitions.
--
Link operation
Y
Y
--
--
INPUT TEXT handling (text-entry)
Y
Y
--
await page.getByLabel('Name (4 to 8 characters):').fill('test');
INPUT PASSWORD (password-entry)
Y
Y
--
await page.getByLabel('password (8 characters or more):').fill('pwdtest1');
INPUT CHECKBOX
Y
Y
--
await page.getByRole('checkbox').check();
INPUT RADIO
Y
Y
--
await page.getByLabel(''apple').check();
INPUT SUBMIT
Y
Y
--
await page.getByRole('button', { name: 'Send' }).click();
INPUT RESET
Y
Y
--
await page.getByRole('button', { name: 'Reset Form' }).click();
INPUT BUTTON
Y
Y
--
await page.getByRole('button', { name: 'test' }).click();
Script operation
--
--
--
Scripts without page transitions, HTML operations, or button operations are not recorded, but page transitions, HTML operations, and button operations that are caused by script operations are recorded. #2
--
Page transition operation
Y
Y
Actions implemented inside the script may not be recorded, but page transitions are recorded.
--
- Legend
-
Operation field Y: Can be operated --: Not applicable
Recording field Y: Recording object --: Not applicable
Other than the above --: Not applicable
- #1
-
The data is recorded based on the values entered in the dialog or the operation results by pressing the button. However, some dialogs may not be recorded correctly. Be sure to run it after creating Web scenario to see if it runs correctly.
Dialogs that do not run correctly in Web scenario, such as stopping while dialogs are open, cannot be handled.
- #2
-
Depending on the page-transitions, HTML and button-operation timings caused by scripting, recording may not be possible.
Be sure to run it after creating Web scenario to see if it runs correctly.
Operations or behaviors not described in the above tables cannot be recorded and measured as Web scenarios.
Note that dialogue authentication (other than user ID and passwords) and ActiveX in-control are not supported. Also, ftp is not supported.
- ■Record of assertion
-
Assertion is an operation that checks whether the elements displayed on Web website match the expected content. When you run Codegen to create a Web scenario, clicking on an element displayed in the browser window and adding an assert to Web scenario determines whether the element displayed on the browser window when Web scenario is run matches the element displayed on the browser window when Codegen is run.
The following types of assertions are available:
-
assert visibility
Assert that the element exists.
-
assert text
Asserts that the element contains certain text.
-
assert value
Asserts that an element has a specific value.
If you want to add an assertion to a Web scenario, click one of the buttons on assert visibility,assert text,assert value and select the element to be asserted in the browser window. An assertion is generated for the selected element in Playwright Inspector window.
-
- ■Pausing Recording
-
If you want to pause recording, press Record. Clicking Record button again resumes recording.
- ■Saving the generated Web scenario code
-
When you exit Web scenario creation function, the generated Web scenario is saved in Web scenario file specified in the command-line options at the beginning of Web scenario creation function.
- ■Exiting Codegen
-
To exit Web scenario creation function, press the Ctrl+C keys in the terminal where playwright codegen command was executed to exit, or close the browser window that was opened when Web scenario creation function was started.
- ■Codegen Window Structure
-
When you use playwright codegen command to execute Web scenario creation function, the following window is displayed.
-
Browser window
Web site where you want to run the scenarios is displayed. Records clicks and typing actions by navigating Web pages.
The following tables show the buttons and operations that are used to record Web page operations. For details, see Playwright documentation.
Item number
Button
How to operate
1
--
Drag this button to move the tab.
2
Record
Click this button to stop or resume recording.
3
assert visibility
Click this button, and then select the element that you want to assert that the element is visible.
Click this button again to return to normal operation recording.
4
assert text
Click this button, then select the element for which you want to assert that the element contains specific text.
Click this button again to return to normal operation recording.
5
assert value
Click this button, then select the element for which you want to assert that the element has a specific value.
Click this button again to return to normal operation recording.
-
Playwright Inspector window
Allows you to record Web scenarios.
The following buttons and procedures are used to record Web scenarios: For details, see Playwright documentation.
Item number
Button
How to operate
1
Record
Same as Browser window.
2
assert visibility
Same as Browser window.
3
assert text
4
assert value
-
■Notes
-
Web scenario created by Web scenario creation function cannot be used to determine the status code of HTTP. Therefore, if "404 Not Found" or "500 Internal server error" is returned, it may be determined that Web entry was successful.
-
When using Web scenario creation function to verify that Web page transitions, you will not be able to detect successful or unsuccessful page transitions when using the following Web scenario:
<Example of operation to check>
On integrated operation viewer login page (URL:'http://hostname:20703/login'), enter your registered username and password to verify that you can successfully log in.
<Some coding that Codegen writes to Web scenarios>
test('test', async ({ page }) => { await page.goto('http://hostname:20703/login'); await page.locator('input[name="username"]').fill('username'); await page.locator('input[name="password"]').fill('password'); await page.getByRole('button', { name: 'Login'}).click(); });When the above Web scenario is played back, the operation to click Login button is played, and playback is terminated regardless of whether or not the screen is displayed after logging in. Therefore, page transitions cannot be detected as successful or unsuccessful.
The following are the actions to be taken to verify a successful page transition:
Use assertion of Codegen to add an action that asserts that the page displays its own elements after the page transition.
For details on how to record assertions, see ■Record of assertion.
Here is an example of a Web scenario with the above example modified:
<Some coding that Codegen writes to Web scenarios> test('test', async ({ page }) => { await page.goto('http://hostname:20703/login'); await page.locator('input[name="username"]').fill('username'); await page.locator('input[name="password"]').fill('password'); await page.getByRole('button', { name: 'Login' }).click(); await expect(page.getByRole('button', { name: Logout' })).toBeVisible(); });In the example of the modified Web scenario above, we added an action to assert that the page that transitions after the login process shows Logout button that should be displayed on that page. If the assertion of Logout button fails, it can detect that the page transitions after logging in failed.
-
When URL transitions are recorded in Codegen, if a transfer (redirection#) is made to another new URL when accessed in the specified URL, URL transition to the redirection source is not recorded, only URL transition to the redirection destination may be recorded.
If Codegen records actions that are redirected to a different URL by a server of the specified URL, the monitoring scope cannot include redirection from the redirection source to the destination.
- #
-
Refers to the redirection performed by HTTP protocol using HTTP status code (in the 300 range) and Location header field.
The following example shows where URL transitions to the redirection source are not recorded, but only URL transitions to the redirection destination are recorded.
-
When servers redirect from URL to new URL due to, for example, the transfer of a monitored site
-
When a forward slash (/) is missing at the end of URL specified in Codegen and the servers automatically add the forward slash and redirect it to the correct URL.
Note that redirects that do not involve the following HTTP protocols do not fall under this precaution, and URL transitions of the redirection source and the redirection destination are recorded.
-
HTML redirection using the <meta> element of HTML
-
JavaScript redirection executed due to the URL string of the window.location property set by a client script such as JavaScript
■ Web operation information collection function (Web exporter)
Web operation information collection function (Web exporter) executes the scenario for Web scenario file created beforehand using Prometheus server's scrape request as the trigger, and returns the execution result as scrape result. Detailed motion at the time of scenario execution is output as a trace and can be viewed by the user using the trace viewer function.
- ■Acquisition items
-
The metrics that can be retrieved with Web exporter (Web operation information collection function are probe_webscena_success (Displays whether the probe was successful#1) and probe_webscena_duration_seconds (The seconds taken by the web scenario probe#2).
- #1
-
Signifies the success or failure of the entire collection, including preparation for collection (such as process startup).
- #2
-
If the collection fails, metric may not be retrieved.
Web exporter retrieval items are defined in metric definition file (metrics_web_exporter.conf) of Web exporter. For details, see Web exporter Metric Definition File (metrics_web_exporter.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
- ■Monitoring when a monitoring target is temporarily stopped
-
To suppress error detection during a power failure or maintenance, you must stop collecting activity information for the target.
The collection of operational information can be stopped by deleting the applicable monitoring target in targets of Web exporter discovery configuration file (jpc_file_sd_config_web.yml). For details, see Web exporter discovery configuration file (jpc_file_sd_config_web.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ Web scenario execution function
Use playwright to perform Web scenario execution functions.
Playwright exporter configuration file specifies the parameters for Web scenario execution function.
For Playwright configuration file, see Playwright configuration file (jpc_playwright.config.ts) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
- ■Trace
-
Web scenario execution function outputs the trace during Web scenario execution to a trace file in the following Web exporter: Web scenario file displays the results of the actions performed and HTTP communication traces.
-
For physical hosts
Agent-path\logs\web_exporter\trace\Web-scenario-filename-test-project-name-number-of-retries_generation-number\trace.zip
-
For logical hosts
shared-folder\jp1ima\logs\web_exporter\trace\Web-scenario-filename-test-project-name-number-of-retries_generation-number\trace.zip
- Web-scenario-filename
-
If Web scenario filename ends with ".spec.ts", the text without ".spec.ts" is stored.
- project-name
-
The character string specified in name parameter of Playwright configuration file is set.
Spaces, control characters, and the following characters are converted to a hyphen (-).
! " # $ % & ' ( ) * + , . / : ; < = > ? @ [ \ ] ^ _ { | } ~
- number-of-retries
-
Used if retries parameter of Playwright configuration file is 1 or more. retry1, retry2, retry3, ... is set according to the number of retries when Web scenario execution failed.
The "- number-of-retries" part is granted only when retrying. Therefore, it is not granted to the first-run tracing of Web scenarios per scrape.
- generation-number
-
The 4-digit number is set.
For the number of generations of traces to be saved, see tracenum in Web exporter configuration file (jpc_web_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
The file size of the trace is a few MB (it varies greatly depending on the content of the monitored content). For Web scenarios where you log in to and log off from Intelligent Integrated Management Base, it is approximately 2MB per scenario. If 2000 generations and 0 retries are retained every 6 minutes (defaults), approximately 4GB of disc space is required.
-
- ■Trace Viewer
-
Web exporter trace file can be referenced in the trace viewer.
The trace viewer is used to investigate the details when an error is detected.
For details about the trace viewer, see 3.15.1(1)(n) Trace Viewer Function (playwright show-trace).
■ Monitoring with Other Monitoring Function
Web scenario monitor function allows you to monitor Web contents from the user's point of view, but does not allow detailed monitoring of HTTP communication (name resolution times or certificate expiration time) or monitoring inside the monitoring target. Therefore, if an error occurs, you cannot investigate the cause of the error using only metric information acquired by Web scenario monitoring function.
For example, you need to monitor HTTP communication using Blackbox exporter outline monitoring and monitor the inside of the monitored side (HTTP servers and DB servers) using log trapper of Fluentd.
■ Handling of Public Key Infrastructure (PKI: Public Key Infrastructure) Certificates Used in TLS Communication
If the monitoring target is a HTTPS server, register the certificate below in OS (for Windows, register it in the certificate store).
-
CA certificate of authentication authority that issued the server certificate
-
Client certificate (if HTTPS server requires a client certificate during TLS handshake) and private key
For details about how to register with OS, see the documentation for your OS.
■ Understanding Web Scenarios for HTTP authentication with Passwords
If the monitored Web contents require HTTP authentication with a username and password (such as Basic authentication), enter the username and password in URL fields of Web scenario creation function as follows:
http://username:password@domain-name:port/Web-content-path
■ Handling passwords
HTTP authentication and Web contents the passwords that you use in your own authentication (if you are prompting for a username and password on the form) are stored in Web exporter configuration file, Web scenario file, and the trace file. When providing the information for failure investigation to the requester, the user should perform masking such as replacing the password part with a different character string to prevent leakage.
■ Configuring HTTP Proxies
To set up a HTTP proxy server to communicate from JP1/IM - Agent host to the monitoring target, set "proxy" in Playwright configuration file (jpc_playwright.config.ts) item.
For details about Playwright configuration file, see Playwright configuration file (jpc_playwright.config.ts) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
For details on configuration file editing procedure, see To edit the configuration files (for Windows) in 1.19.3(1)(a) Common way to setup in the JP1/Integrated Management 3 - Manager Configuration Guide.
■ About Reviewing Web scenarios
Web scenario monitoring function does not provide the ability to independently test Web scenario. Actually monitor Web scenarios. Make sure that the monitoring is successful. Refer to metric of the probe_webscena_success to determine whether it is normal.
■ Timeout Settings and User Tasks When Timeout Occurs
Web scenario creation support function suspends the collection of too-long activity information (collection of Web scenario execution times) due to timeouts.
The following parameters relate to timeouts:
|
Setting point |
Parameter name |
|---|---|
|
Prometheus configuration file (jpc_prometheus_server.yml)#1 |
scrape_timeout (scrape required timeout period) |
|
web_exporter command options#2 |
--timeout-offset (The number of seconds to be subtracted from the Prometheus scrape_timeout value (Offset subtracted from timeout time)). It is fixed at 0.5 second. The user cannot be changed. |
- #1
-
For details about Prometheus configuration file parameters, see Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
- #2
-
For details about the options of the web_exporter command, see Service definition file (jpc_program-name_service.xml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
The timeout setting is always applied, and the collection is interrupted when the timeout period is exceeded. As a result, the collection does not continue indefinitely without interruption.
The timeout period is --timeout-offset(0.5 seconds) subtracted from the scrape_timeout time.
This timeout time must include the execution time of the processing required to collect the activity information. Collection of operational information includes manipulation of Web contents (browser operations) according to Web scenarios.
In practice, it is recommended that you set a timeout of 30 seconds more than it would take to run the actual Web scenario. This is because there is a startup process for browsers and other processes.
If processing is aborted due to a timeout, one or more of the following messages is output to the log. At this time, the probe_webscena_success metric may be 0 (failed) or the metric may not be sent. Check the following log file to see if the processing was aborted by a timeout.
|
Log file |
Message |
|---|---|
|
web_exporter log |
KNBC20144-E An error occurred while an internal command was executing. (maintenance information = exit status 1) |
|
KNBC20147-E An error occurred while an internal command was executing. (message = Test timeout of milliseconds exceeded., ...) |
|
|
Prometheus server log |
msg="Scrape failed" err="Get URL: context deadline exceeded" |
Even if the timeout occurred, the child process that started is terminated, so the user does not need to terminate it.
■ Notes
-
The following monitoring cannot be performed using Web scenario monitoring function:
-
Monitoring Web contents that do not support JP1/IM - Agent supported browsers
-
Monitoring Web contents that behave differently than when creating Web scenarios
-
Monitoring HTTP status codes
-
Monitoring Web sites using external authentication providers for authentication
-
-
If a timeout occurs during the collection of operational information, the browser process may remain unfinished. In this case, the user must stop the applicable process. For details, see ■Timeout Settings and User Tasks When Timeout Occurs.
(n) Trace Viewer Function (playwright show-trace)
The Trace viewer function provides a visual overview of the actions recorded in the trace during a Web scenario.
- ■Prerequisites
-
When the user runs playwright command manually, the current folder must be Playwright working folder. For Playwright working folders, see Appendix A.4(3) Integrated agent host (Windows).
Run as a user with Administrator's permissions (run from the Administrator Console if Windows's UAC function is enabled).
You can use playwright show-trace commands to perform trace viewer functions.
playwright show-trace command displays the trace viewer. Allows users to run on the terminal.
- ■Run Web Scenarios to log tracing
-
To log traces when running Web scenarios, you must specify a on in Playwright configuration file (jpc_playwright.config.ts) trace optional mode to ensure that traces are recorded at all times for every test run.
For the format and options of Playwright configuration file, see Playwright configuration file (jpc_playwright.config.ts) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
- ■Open the trace
-
You can run the following command to display the trace for the path specified in the command options in the trace viewer.
Run the command as a user with Administrator's permissions (if the Windows UAC function is enabled, run the command from the administrator console).
npx playwright show-trace trace-file-path
For details about the parameters that you specify for playwright show-trace command, see the following tables:
-
npx playwright show-trace command option
Item
Description
Changeability
What You Setup in Your JP1/IM - Agent
JP1/IM - Agent Defaults Value
Path to a trace file
Specifies the trace file to be displayed in the trace viewer.
Y
Specifies the path to the output trace file.
If it is not specified, drag-and-drop the trace file on the displayed HTML to display the trace.
None
- Legend:
-
Y: Changeable
In the trace viewer, you can see the following information:
-
Action
Action tab, you can see which locator was used for the action and how long it took each action to execute.
If you want to verify the transformation of DOM snapshot, hover over the respective action in Web scenario.
If you are investigating or debugging, move the time axis forward or backward and click the action you want to review.
Use the Before and After tabs to see the differences before and after the actions.
-
Screenshots
Records screenshots as traces and displays thumbnail images in chronological order at the top of the trace viewer. You can mouse over a thumbnail image to display an enlarged image of each action and state.
You can double-click an action to view the time that the action was executed. When you select multiple actions using the sliders on the timeline, they appear in the Action tab, and you can filter and view the log for only the selected actions.
-
Snapshot
By default, tracing is performed with the snapshot option turned off.
If you want to use this function, you must specify true for the snapshots parameter of the Playwright configuration file. For Playwright configuration file, see Playwright configuration file (jpc_playwright.config.ts) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
You can switch the tabs in the center of the screen to see the types of snapshots listed in the following table.
Type
Description
Action
Snapshot of the moment of input that was executed
Use this type of snapshot to see exactly where Playwright clicked.
Before
Snapshot at the time the action was invoked
After
Snapshot after action
-
Source
When you hover over an action in a Web scenario, the code line for that action is highlighted in the Source tabbed page.
-
Call
Call tabbed page shows the execution time and used locators.
-
Log
Use to view a log of actions, such as scrolling, waiting for elements to appear, enabled and stable, clicking, and filling in a view.
-
Error
If Web scenario execution fails, an error message is displayed on the Error tabbed page. The timeline also displays a red line to indicate where the error occurred.
To check the source code line, select Source tabbed page.
-
Console
Browse the console logs for browser and Web scenario runs.
-
Network
Network tabbed page that shows the networking requests that were made during Web scenario.
Name, Method, Status, Content Type, Duration alternatively, select Size to change the order.
Click Request to view information about the request, such as the request header, response header, request body, and response body.
If you want to use this function, you must specify true for the snapshots parameter of the Playwright configuration file. For Playwright configuration file, see Playwright configuration file (jpc_playwright.config.ts) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
-
Metadata
Metadata tab next to the Action tab provides detailed information about Web scenario execution, such as browser, viewport size, and runtime.
start time shows the time when Web was started. The time displayed in the trace is the date and time of JP1/IM - Agent host displayed in "YYYY/MM/DD hh:mm:ss" format. If the time zone of the monitored host differs from the time zone of JP1/IM - Agent host, the date and time of JP1/IM - Agent host also apply.
-
- ■Close the trace
-
To exit the trace viewer, press the Ctrl+C keys to exit or close the trace viewer window at the terminal where playwright show-trace command was executed.
(o) VMware exporter (VMware performance data collection capability)
VMware exporter is an Exporter for Prometheus that retrieves performance data from VMware ESXi.
■ Prerequisites
It is a prerequisite that the ports used by VMware exporter are protected by firewalls, networking configurations, and so on, so that they are not accessed by anything other than Prometheus server of JP1/IM - Agent.
For the port used by VMware exporter, see vmware_exporter command options in Service definition file (jpc_program-name_service.xml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ Conditions to be monitored
-
VMware vCenterServer are not monitored.
-
VMware exporter target is VMware ESXi. For details about the supported VMware ESXi versions, see the Release Notes.
-
The name of the datastore# managed by VMware ESXi must be the same as the host name. If the datastore name and host name are different, separate nodes are created for the datastore and the hypervisor, and the available metrics are separated.
When nodes are divided into datastores and hypervisors, the metrics that can be retrieved for each node are as follows.
-
Data store
vmware_host_size, vmware_host_used, vmware_host_free, vmware_datastore_used_percent
-
Hypervisor
Metrics for hosts, except: vmware_host_size, vmware_host_used, vmware_host_free, vmware_datastore_used_percent
For details about each metric and its description, see VMware exporter metric definition file for host (metrics_vmware_exporter_host.conf) and VMware exporter metric definition file for VM (metrics_vmware_exporter_vm.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
#: If there is more than one data store, use "host-name_any-string".
-
-
Do not use duplicate VM names that are managed by VMware ESXi. If VM names are duplicated, the same node will be displayed with more than one monitor result. Therefore, be sure to set VM name to a unique name.
■ Acquisition items
VMware exporter shipped with JP1/IM - Agent has metric that is defined by VMware exporter defaults.
VMware exporter retrieval items are defined in metric definition file for host and metric definition file for VM of VMware exporter. For details, see VMware exporter metric definition file for host (metrics_vmware_exporter_host.conf) and VMware exporter metric definition File for VM (metrics_vmware_exporter_vm.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
Metric are obtained using OSS's pyVmomi and VMware's officially provided vRealize Operations of metric. Metric of vRealize Operations used by metric are listed in the following tables.
|
Metric Name |
Category |
Description |
Label |
Data source |
|---|---|---|---|---|
|
vmware_datastore_capacity_size |
DATASTORES |
VMware Datastore capacity in bytes (Unit:B) |
dc_name : data-center-name ds_name : datastore-name instance : data-retrieval-address job : job-name |
Get by pyVmomi vmware_datastore_capacity_size of datastore structure |
|
vmware_datastore_freespace_size |
DATASTORES |
VMware Datastore freespace in bytes (Unit:B) |
dc_name : data-center-name ds_name : datastore-name instance : data-retrieval-address job : job-name |
Get by pyVmomi vmware_datastore_freespace_size of datastore structure |
|
vmware_host_num_cpu |
HOSTS |
VMware Number of processors in the Host |
dc_name : data-center-name host_name : host-name instance : data-retrieval-address job : job-name |
Get by pyVmomi vmware_host_num_cpu of vmware_datastore_freespace_size of hostst structure |
|
vmware_host_memory_usage |
HOSTS |
VMware Host Memory usage in Mbytes (Unit:MB) |
dc_name : data-center-name host_name : host-name instance : data-retrieval-address job : job-name |
Get by pyVmomi vmware_host_memory_usage of hostst structure |
|
vmware_host_memory_max |
HOSTS |
VMware Host Memory Max availability in Mbytes (Unit:MB) |
dc_name : data-center-name host_name : host-name instance : data-retrieval-address job : job-name |
Get by pyVmomi vmware_host_memory_max of hostst structure |
|
vmware_host_mem_vmmemctl_average |
HOSTS |
The total amount of memory currently used for virtual machine memory control. (Unit:KB) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi mem.vmmemctl.average of performance counters |
|
vmware_vm_mem_swapped_average |
VMS |
The amount of unreserved memory in kilobytes. (Unit:KB) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi mem.swapped.average of performance counters |
|
vmware_host_net_bytesRX_average |
HOSTS |
Average amount of data received per second. (Unit:KBps) |
dc_name : data-center-name host_name : host-name |
Get by pyVmomi vmware_host_net_bytesRX_average of performance counters |
|
vmware_host_net_bytesTX_average |
HOSTS |
Average amount of data transferred per second. (Unit:KBps) |
dc_name : data-center-name host_name : host-name |
Get by pyVmomi vmware_host_net_bytesTX_average of performance counters |
|
vmware_vm_mem_active_average |
VMS |
The amount of memory that is being used effectively. (Unit:KB) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi mem.active.average of performance counters |
|
vmware_vm_guest_disk_capacity |
VMGUESTS |
Disk capacity metric per partition (Unit:B) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi vmware_vm_guest_disk_capacity of vmguests structure |
|
vmware_vm_guest_disk_free |
VMGUESTS |
Disk metric per partition (Unit:B) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi vmware_vm_guest_disk_free of vmguests structure |
|
vmware_vm_mem_vmmemctl_average |
VMS |
The total amount of memory currently used for virtual machine memory control. (Unit:KB) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi mem.vmmemctl.average of performance counters |
|
vmware_vm_mem_consumed_average |
VMS |
The amount of host memory consumed by the virtual machine for guest memory. (Unit:KB) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi mem.consumed.average of performance counters |
|
vmware_vm_net_transmitted_average |
VMS |
The average amount of data transferred per second. (Unit:KBps) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi net.transmitted.average of performance counters |
|
vmware_vm_net_received_average |
VMS |
The average amount of data received per second. (Unit:KBps) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi net.received.average of performance counters |
|
vmware_vm_power_state |
VMS |
VMWare VM Power state (On / Off)VMWare VM Power state (On / Off) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi vmware_vm_power_state of vms structure |
|
vmware_host_cpu_used_summation |
HOSTS |
Used CPU (Unit:msec) |
dc_name : data-center-name host_name : host-name instance : data-retrieval-address job : job-name |
Get by pyVmomi cpu.used.summation of performance counters |
|
vmware_vm_cpu_ready_summation |
VMS |
Time spent in VMware host ready state. (Unit:msec) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi cpu.ready.summation of performance counters |
|
vmware_vm_num_cpu |
VMS |
VMWare Number of processors in the virtual machine |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi vmware_vm_num_cpu of vms structure |
|
vmware_vm_memory_max |
VMS |
VMWare VM Memory Max availability in Mbytes (Unit:MB) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi vmware_vm_memory_max of vms structure |
|
vmware_vm_max_cpu_usage |
VMS |
VMWare VM Cpu Max availability in hz (Unit:hz) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi vmware_vm_max_cpu_usage of vms structure |
|
vmware_vm_template |
VMS |
VMWare VM Template (true / false) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi vmware_vm_template of vms structure |
|
vmware_host_cpu_usage_average |
-- |
Average CPU usage |
dc_name : data-center-name host_name : host-name instance : data-retrieval-address job : job-name |
Get by pyVmomi cpu.usage.average of performance counters |
|
vmware_host_disk_write_average |
-- |
The amount of data written to disk during the performance interval. (Unit:KBps) |
dc_name : data-center-name host_name : host-name instance : data-retrieval-address job : job-name |
Get by pyVmomi disk.write.average of performance counters |
|
vmware_host_disk_read_average |
-- |
The amount of data read during the performance interval. (Unit:KBps) |
dc_name : data-center-name host_name : host-name instance : data-retrieval-address job : job-name |
Get by pyVmomi disk.read.average of performance counters |
|
vmware_vm_cpu_usage_average |
-- |
Average CPU usage |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi cpu.usage.average of performance counters |
|
vmware_vm_disk_write_average |
-- |
The amount of data written to disk during the performance interval. (Unit:KBps) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi disk.write.average of performance counters |
|
vmware_vm_disk_read_average |
-- |
The amount of data read during the performance interval. (Unit:KBps) |
dc_name : data-center-name ds_name : datastore-name host_name : host-name instance : data-retrieval-address job : job-name vm_name : virtual-machine-name |
Get by pyVmomi disk.read.average of performance counters |
■ Obfuscation of VMware exporter passwords
VMware exporter shipped with JP1/IM - Agent manages the passwords for accessing VMware ESXi from VMware exporter in secret obfuscation capabilities. For details, see 3.15.10 Secret obfuscation function.
(p) Windows exporter (Hyper-V monitoring function)
Hyper-V monitoring function monitors Hyper-V activity using Widows exporter's hyperv collectors.
■ Prerequisites
The port used by the Hyper-V monitoring function must be protected by a firewall or network configuration so that it cannot be accessed by anyone other than the Prometheus server of JP1/IM - Agent.
For details about the ports used by Hyper-V monitoring function, see the explanation of windows_exporter command options (Hyper-V monitoring) in Service definition file (jpc_program-name_service.xml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ Conditions to be monitored
-
For details about the versions of Hyper-V that Hyper-V monitoring function supports as targets, see the Release Notes.
-
The following rules apply to VM naming:
-
VM name must be the same as the host name of the guest OS.
-
Do not set a VM containing "-".
-
The name of the disc managed by Hyper-V must be the same as VM name.
If you use a different name for the disc name and VM name, the following metric cannot be displayed on VM:
- hyperv_vm_device_written
- hyperv_vm_device_read
For details about individual metric, see "Acquisition items" below.
If there are multiple disks, use the name of "host-name_any-string".
-
-
If you use live migration, for example, to move a VM from a monitored host, you will not be able to monitor that VM. You can monitor the destination VM by making it a monitoring target.
-
VM that have never been started are not collected, and no VM are created. Therefore, the tree must be updated when VM is started for the first time.
-
You can monitor only VM of hosts with which JP1/IM - Agent resides. It does not monitor VM in nested constructs.
■ Acquisition items
Hyper-V monitoring function obtains metric of Windows exporter (Hyper-V monitoring) defaults-defined Hyper-V.
Windows exporter (Hyper-V monitoring) retrieval items are defined in metric definition file for host and metric definition file for VM of Windows exporter (Hyper-V monitoring). For details, see Windows exporter (Hyper-V monitoring) metric definition file (metrics_windows_exporter_hyperv_host.conf) and Windows exporter (Hyper-V monitoring) metric definition file for VM (metrics_windows_exporter_hyperv_vm.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
The following table lists metric that can be specified for PromQL expression in the definition file: For details about the "Collector" in the table, see the description of the "Collectors" in the table below.
|
Metric Name |
Collector |
Contents to be acquired |
Type |
Label |
|---|---|---|---|---|
|
windows_hyperv_vm_cpu_total_run_time |
hyperv |
The time spent by the virtual processor in guest and hypervisor code |
gauge |
instance: instance-identification-string job: job-name core: coreid vm: virtual-machine-name |
|
windows_hyperv_vm_device_bytes_written |
hyperv |
The total number of bytes that have been written per second on this virtual device |
counter |
instance: instance-identification-string job: job-name vm_device: virtual-disk-file-path |
|
windows_hyperv_vm_device_bytes_read |
hyperv |
The total number of bytes that have been read per second on this virtual device |
counter |
instance: instance-identification-string job: job-name vm_device: virtual-disk-file-path |
|
windows_hyperv_host_cpu_total_run_time |
hyperv |
The time spent by the virtual processor in guest and hypervisor code |
gauge |
instance: instance-identification-string job: job-name core: coreid |
|
windows_hyperv_vswitch_bytes_received_total |
hyperv |
The total number of bytes received per second by the virtual switch |
counter |
instance: instance-identification-string job: job-name vswitch: virtual-switch-name |
|
windows_hyperv_vswitch_bytes_sent_total |
hyperv |
The total number of bytes sent per second by the virtual switch |
counter |
instance: instance-identification-string job: job-name vswitch: virtual-switch-name |
|
windows_cs_logical_processors |
cs |
Number of installed logical processors |
gauge |
instance: instance-identification-string job: job-name |
|
windows_hyperv_vm_cpu_hypervisor_run_time |
hyperv |
The time spent by the virtual processor in hypervisor code |
gauge |
instance: instance-identification-string job: job-name core: coreid vm: virtual-machine-name |
■ Collector
Windows exporter (Hyper-V monitoring) has a built-in collection process called a "collector" for each monitored resource such as CPU and memory.
You must enable the collectors for metric listed in the tables above that correspond to metric you want to collect. You can also disable collectors for metric that you do not want to collect to suppress unwanted collections.
Enable/disable for each collector can be specified with the "--collectors.enabled" option on the Windows exporter (Hyper-V monitoring) command line or in the item "collectors.enabled" in the Windows exporter (Hyper-V monitoring) configuration file (jpc_windows_exporter_hyperv.yml).
For details about Windows exporter (Hyper-V monitoring) command-line options, see the description of windows_exporter command options (Hyper-V monitoring) in Service definition file (jpc_program-name.service.xml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
For details about Windows exporter (Hyper-V monitoring) configuration file entry "collectors.enabled", see the description of item collectors in Windows exporter (Hyper-V monitoring) configuration file (jpc_windows_exporter_hyperv.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ Notes
Because Hyper-V monitoring is used to monitor Hyper-V in JP1/IM - Agent's own host, when you use HA host clusters or live migration, you must deploy JP1/IM - Agent on the monitored targets according to the configuration of Hyper-V that you want to monitor.
When Hyper-V configuration is changed, the tree must be updated after the first startup of VM to be monitored.
(q) SQL exporter (Microsoft SQL Server monitoring function)
SQL exporter is an Exporter for Prometheus that retrieves performance data from Microsoft SQL Server.
- - About the number of sessions
-
When monitoring Microsoft SQL Server from SQL exporter, the connection is made according to the number of connections defined in SQL exporter configuration file (jpc_sql_exporter.yml), and if the session retention time is within the time defined in this file, the data is acquired in the same session.
For details about SQL exporter configuration file, see SQL exporter configuration file (jpc_sql_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ Supported targets and configurations
The target is instance of Microsoft SQL Server. Monitoring can be performed in units of instance and the maximum number of monitored devices is 10.
For details about supported Microsoft SQL Server versions and editions, see the Release Notes for JP1/IM - Agent.
The following shows Microsoft SQL Server configurations that are supported for monitoring.
-
Monitoring a single host (including remote monitoring)
-
Monitoring multiple hosts (including remote monitoring)
In a mirrored configuration, you can monitor both the principal database and the secondary database by setting them to be monitored. However, because each instance is different, the Watch Tree is collected as a separate node.
If you are configuring with SQL Server AlwaysOn Availability Group function, you can monitor both the primary and secondary databases by setting them to be monitored. However, because each instance is different, the Watch Tree is collected as a separate node.
■ Acquisition items
The metrics that can be retrieved with the SQL exporter shipped with the JP1/IM - Agent are the metrics defined by SQL exporter defaults and metrics listed below.
-
mssql_database_detail_process_count
-
mssql_global_server_summary_perc_busy
-
mssql_global_server_summary_packet_errors
-
mssql_server_detail_blocked_processes
-
mssql_server_overview_cache_hit
-
mssql_transaction_log_overview_log_space_used
SQL exporter retrieval items are defined in metric definition-file of SQL exporter. For details, see SQL exporter metric definition file (metrics_sql_exporter.conf) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
The following tables list metric that can be specified for PromQL expression in the definition file. The value of each metric is obtained by executing the SQL statement shown in the table to Microsoft SQL Server. For details about metric, contact Microsoft based on SQL statement of the data source.
|
Metric name |
Contents to be acquired |
Label |
Data source (SQL statement) |
|---|---|---|---|
|
mssql_local_time_seconds |
Local time in seconds since epoch (UNIX time). |
none |
SELECT DATEDIFF(second, '19700101', GETUTCDATE()) AS unix_time |
|
mssql_connections |
Number of active connections. |
none |
SELECT DB_NAME(sp.dbid) AS db, COUNT(sp.spid) AS count FROM sys.sysprocesses sp GROUP BY DB_NAME(sp.dbid) |
|
mssql_deadlocks |
Number of lock requests that resulted in a deadlock. |
none |
SELECT cntr_value FROM sys.dm_os_performance_counters WITH (NOLOCK) WHERE counter_name = 'Number of Deadlocks/sec' AND instance_name = '_Total' |
|
mssql_user_errors |
Number of user errors. |
none |
SELECT cntr_value FROM sys.dm_os_performance_counters WITH (NOLOCK) WHERE counter_name = 'Errors/sec' AND instance_name = 'User Errors' |
|
mssql_kill_connection_errors |
Number of severe errors that caused SQL Server to kill the connection. |
none |
SELECT cntr_value FROM sys.dm_os_performance_counters WITH (NOLOCK) WHERE counter_name = 'Errors/sec' AND instance_name = 'Kill Connection Errors' |
|
mssql_page_life_expectancy_seconds |
The minimum number of seconds a page will stay in the buffer pool on this node without references. |
none |
SELECT top(1) cntr_value FROM sys.dm_os_performance_counters WITH (NOLOCK) WHERE counter_name = 'Page life expectancy' |
|
mssql_batch_requests |
Number of command batches received. |
none |
SELECT cntr_value FROM sys.dm_os_performance_counters WITH (NOLOCK) WHERE counter_name = 'Batch Requests/sec' |
|
mssql_log_growths |
Number of times the transaction log has been expanded, per database. |
none |
SELECT rtrim(instance_name) AS db, cntr_value FROM sys.dm_os_performance_counters WITH (NOLOCK) WHERE counter_name = 'Log Growths' AND instance_name <> '_Total' |
|
mssql_buffer_cache_hit_ratio |
Ratio of requests that hit the buffer cache |
none |
SELECT cntr_value FROM sys.dm_os_performance_counters WHERE [counter_name] = 'Buffer cache hit ratio' |
|
mssql_checkpoint_pages_sec |
Checkpoint Pages Per Second |
none |
SELECT cntr_value FROM sys.dm_os_performance_counters WHERE [counter_name] = 'Checkpoint pages/sec' |
|
mssql_io_stall_seconds |
Stall time in seconds per database and I/O operation. |
none |
|
|
mssql_io_stall_total_seconds |
Total stall time in seconds per database. |
none |
|
|
mssql_resident_memory_bytes |
SQL Server resident memory size (AKA working set). |
none |
FROM sys.dm_os_process_memory |
|
mssql_virtual_memory_bytes |
Microsoft SQL Server committed virtual memory size. |
none |
FROM sys.dm_os_process_memory |
|
mssql_memory_utilization_percentage |
The percentage of committed memory that is in the working set. |
none |
FROM sys.dm_os_process_memory |
|
mssql_page_fault_count |
The number of page faults that were incurred by the Microsoft SQL Server process. |
none |
FROM sys.dm_os_process_memory |
|
mssql_os_memory |
OS physical memory, used and available. |
none |
FROM sys.dm_os_sys_memory |
|
mssql_os_page_file |
OS page file, used and available. |
none |
FROM sys.dm_os_sys_memory |
|
mssql_database_detail_process_count |
Total number of processes |
none |
FROM master.sys.dm_exec_sessions des WHERE ISNULL(des.database_id,0) <> 0 GROUP BY DB_NAME(ISNULL(des.database_id,0)) |
|
mssql_global_server_summary_perc_busy |
Percentage of CPU Busy Time Note: This field cannot acquire the correct value. |
none |
SELECT 100.0 * @@cpu_busy / (@@cpu_busy+ @@idle+ @@io_busy) AS cpu_busy_percent |
|
mssql_global_server_summary_packet_errors |
The number of packet errors |
none |
SELECT @@packet_errors AS count |
|
mssql_server_detail_blocked_processes |
The number of processes waiting due to processes running on Microsoft SQL Server being locked |
none |
SELECT DB_NAME(ISNULL(S.database_id,0)) AS db, SUM(ISNULL(R.blocking_session_id,0)) AS count FROM master.sys.dm_exec_sessions S LEFT OUTER JOIN master.sys.dm_exec_requests R ON S.session_id = R.session_id GROUP BY DB_NAME(ISNULL(S.database_id,0)) |
|
mssql_server_overview_cache_hit |
The percentage of times data pages were found in the data cache |
none |
SELECT 100.0 * ( SELECT cntr_value FROM master.sys.dm_os_performance_counters WHERE RTRIM(object_name) LIKE '%:Buffer Manager' AND RTRIM(LOWER(counter_name)) = 'buffer cache hit ratio' ) / ( SELECT cntr_value FROM master.sys.dm_os_performance_counters WHERE RTRIM(object_name) LIKE '%:Buffer Manager' AND RTRIM(LOWER(counter_name)) = 'buffer cache hit ratio base' ) AS cache_hity_percent |
■ Requirements for monitoring Microsoft SQL Server
If you monitor Microsoft SQL Server on SQL exporter, you must configure the following settings:
-
Microsoft SQL Server
Set Microsoft SQL Server database-character set to the following:
-
AL32UTF8 (Unicode UTF-8)
-
JA16SJIS (Japanese-language SJIS)
-
ZHS16GBK (Simplified Chinese GBK)
The supported authentication methods are user ID and password-based SQL Server authentication registered in Microsoft SQL Server. Windows authentication is not supported.
-
-
Users used to access Microsoft SQL Server
Grant the permissions below to the users you want to use to connect to Microsoft SQL Server.
-
Login permissions
CONNECT SQL
-
SELECT permissions to the following tables
Table name
Permissions
sys.sysprocesses
VIEW SERVER STATE
sys.dm_os_performance_counters
VIEW SERVER PERFORMANCE STATE
sys.dm_io_virtual_file_stats
VIEW SERVER PERFORMANCE STATE
sys.master_files
CREATE DATABASE
ALTER ANY DATABASE or VIEW ANY DEFINITION
sys.dm_os_process_memory
VIEW SERVER PERFORMANCE STATE
sys.dm_os_sys_memory
VIEW SERVER PERFORMANCE STATE
-
■ Obfuscation of Microsoft SQL Server passwords
SQL exporter shipped with JP1/IM - Agent manages the passwords in secret obfuscation capabilities for accessing Microsoft SQL Server from SQL exporter. For details, see 3.15.10 Secret obfuscation function.
■ Notes
-
If Microsoft SQL Server is not installed or configured, or if Microsoft SQL Server is not running, no performance information is collected.
-
If Microsoft SQL Server to be monitored is rebuilding the index while performance information is being collected, Microsoft SQL Server may receive lock-release wait to ensure data integrity. In such cases, the lock-release wait is cleared for processes that Microsoft SQL Server determines to have little impact, but the performance information collection request is rolled back, and performance information collection may fail.
-
If Microsoft SQL Server creates the table during a transaction and does not commit the operation, the data-collection fails because the system table is shared locked. In this case, data collection may not be possible until the operation is confirmed.
-
A shared lock is placed on the database when performance information is collected. If you attempt to create a new database of Microsoft SQL Server at this time, the creation may fail.
(r) Script exporter (job monitoring function)
The Job monitoring function works in conjunction with JP1/AJS3-Manager 13-50 and later to monitor JP1/AJS3 job information as metric to detect and anomaly detection the performance issues of the execution time of the root JobNet and to visualize the transition of the root jobnet execution time with integrated operation viewer.
Trend data using JP1/AJS3 root jobnet execution time as a metric can be stored in the trend data management DB of JP1/IM - Manager and displayed and monitored on the Trends and Dashboards tabs of the integrated operations viewer.
Metric of JP1/AJS3 job information displayed in integrated operation viewer is defined in metric definition file (metrics_ajs_rootjobnet.conf) of JP1/AJS3. For details, see Setup for linking JP1/IM3 in the JP1/Automatic Job Management System 3 Linkage Guide.
A Script exporter is a Exporter that runs a script that resides on a host and retrieves the results.
The Script exporter is installed on the same host as the JP1/IM - Agent, and upon a scrape request from the Prometheus server, it executes a script on that host to retrieve the results and returns them to the Prometheus server.
When JP1/AJS3 linkage is configured and job supervision functions are used, the Prometheus server can collect performance data of JP1/AJS3 job information via the script exporter after executing and completing the root jobnet of JP1/AJS3.
Note that the maximum number of JP1/AJS3 root jobnet that can collect performance data with a single unified agent is 5,000. If you collect performance data for more than 5000 root jobnet in a single integrated agent, you can collect up to 10,000 alert rules by evaluating them every two minutes or more. For details about how often alert rules are evaluated, see the evaluation_interval entry of Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
You can also configure integrated agent host and JP1/AJS3-Manager hosts as separate hosts. You must install JP1/Base on integrated agent host.
■ Creating a IM management node for use with the Job monitoring function
The IM management node of the JP1/AJS3 root jobnet, which can monitor job information, is created using the adapter command included with the JP1/AJS3 product plug-in. You can create a IM management node as follows:
-
Setup JP1/AJS3 linkage.
For details about how to set up monitoring of JP1/AJS3 root JobNets, see Setup for linking JP1/IM3 in the JP1/Automatic Job Management System 3 Linkage Guide.
-
Generate tree information from the integrated operations viewer or run the jddcreatetree command.
-
Accept tree information from the integrated operations viewer or run the jddupdatetree command.
■ Tree of IM management node created by the job monitoring function
IM management node tree created by the job monitoring function is shown below.
All Systems
+ JP1/AJS3-manager-host
| + Job
| | + JP1/AJS3 - Manager
| | + scheduler-service
| | + job-group#1
| | + root-jobnet#2
| + Management Applications
| + JP1/AJS3 - Manager
| + JP1/AJS3 - Manager Scheduler Service
| + scheduler-service
+ JP1/AJS3-agent-host
+ Management Applications
+ JP1/AJS3 -Agent- #1
-
A job group can have multiple hierarchies.
- #2
-
JP1/IM - Manager 13-50 and later, a new SID for the configuration information of the IM management node corresponding to the root jobnet is created. However, the tree structure remains the same as in the JP1/AJS3 linkage used in JP1/IM - Manager version 13-11 and earlier. A node in the root jobnet has two configuration SIDs associated with one tree SID (one with "_JP1PC-IMB_" at the beginning and one without).
The types and formats of configuration SID corresponding to IM management node created by the job monitoring function are shown below.
|
Type of configuration SID |
SID format |
|
|---|---|---|
|
Job category |
Root jobnet SID |
_JP1PC-IMB_JP1/IM-manager-host-name/_JP1PC-M_Prometheus-host-name/_JP1PC-AHOST_Exporter-host-name/JP1AJS-M_JP1/AJS3-manager-host-name/_HOST_JP1/AJS3-manager-host-name/_JP1SCHE_scheduler-service-name/_JP1JOBG_job-group-name/_JP1ROOTJOBNET_root-job-net-name# |
- #
-
If a job group is defined with multiple hierarchies, "_JP1JOBG_job-group-name" is repeated depending on the definition.
Because the job monitoring function uses Script exporter, the following IM management node tree is also created.
All Systems
+ JP1/IM-Agent-host
+ Script
| + ajseventmetrics#1
+ Management Applications
+ JP1/IM - agent control base
+ Metric forwarder(Prometheus server)
+ Alert forwarder(Alertmanager)
+ JP1/AJS3 metric collector(Script exporter)#2- #1
-
Indicates agent SID of Script exporter for job monitoring.
- #2
-
Indicates agent service SID of Script exporter for job monitoring.
If you use Script exporter and also configure UAP monitoring capability in addition to the job monitoring function, Script exporter's IM management node is created as agent serviced SID for Script metric collector(Script exporter), as shown in the following IM management node tree. If you want to monitor the life and death of integrated agent processes, set the associated alert definition for each IM management node of Script metric collector (Script exporter) and JP1/AJS3 metric collector (Script exporter). In that case, when the script exporter stops, a JP1 event associated with each IM management node is issued. For details about integrated agent process alive monitoring, see 1.21.2 (18) Setup of integrated agent process alive monitoring (for Windows) (optional) and 2.19.2 (17) Setup of integrated agent process alive monitoring (for Linux) (optional) in the JP1/Integrated Management 3 - Manager Configuration Guide.
All Systems
+ JP1/IM-Agent-host
+ Script
| + ajseventmetrics#1
+ Platform#2
| + uap_run#3
+ Management Applications
+ JP1/IM - agent control base
+ Metric forwarder(Prometheus server)
+ Alert forwarder(Alertmanager)
+ JP1/AJS3 metric collector(Script exporter)#4
+ Script metric collector(Script exporter)#5- #1
-
Indicates agent SID of Script exporter for job monitoring.
- #2
-
Indicates agent SID of Script exporter for user-specified UAP monitoring.
- #3
-
Indicates agent SID of Script exporter for UAP monitoring.
- #4
-
Indicates agent service SID of Script exporter for job monitoring.
- #5
-
Indicates agent service SID of Script exporter for UAP monitoring.
■ Viewing performance data for JP1/AJS3 Job Information
When JP1/AJS3 linkage is set up and JP1/AJS3 root jobnet has been executed, you can check metric of the job information related to the selected root jobnet from the Dashboard tab or Trend tab when you select IM management node of the root jobnet with IM management node of JP1/AJS3 reflected in the tree in integrated operation viewer. You can also customize the Dashboard tab or create a new dashboard to check the trend data of metric in the job information in the various panels.
When customizing the Dashboard tab or creating a new dashboard, we recommend that you specify no more than 20 root jobnets for target node in the various panels. If you specify more than 20, it will take time to display the dashboard panels. In addition, the panel display of the dashboard takes time depending on the following conditions in addition to the number of target nodes.
-
Fixed value of the range vector selector specified in PromQL (promql in the metric definition file) of the target metric#1
-
Number of samples of performance data associated with the target node of target metric
-
Number of performance data label sets associated with the target node of target metric
-
Display range setting duration in the panel#2
-
Number of plots in the panel#2
- #1
-
For details about JP1/AJS3 metric definition file (metrics_ajs_rootjobnet.conf), see Setup for linking JP1/IM3 in the JP1/Automatic Job Management System 3 Linkage Guide. For details on specifying the range vector selector, see Consolidation display of trend data with dynamic range vectors in 3.15.6(4)(c) About Performance Data to Retrieve.
- #2
-
For details on the various panel settings, see Target node of each panel in 2.4.3 Add panel window in the JP1/Integrated Management 3 - Manager GUI Reference.
The following table lists the dashboards that are automatically generated and displayed on the Dashboard tab and the information displayed on the Trend tab when IM management node that is created by JP1/AJS3 linkage is selected in integrated operation viewer. Depending on the support of the job monitoring function, the displayed content differs between JP1/IM - Manager 13-11 or earlier and 13-50 or later.
|
Selected node |
Panels from the second row of dashboards that are automatically generated and displayed on the Dashboard tab#1 |
Trend tab |
|||
|---|---|---|---|---|---|
|
JP1/IM - Manager Version |
JP1/IM - Manager Version |
||||
|
13-11 or earlier |
13-50 or later |
13-11 or earlier |
13-50 or later |
||
|
Host |
JP1/AJS3-manager-host |
None#2 |
None#2 |
None |
None |
|
Job category |
Job |
None#2 |
None#2 |
None |
None |
|
JP1/AJS3 - Manager |
None |
None |
None |
None |
|
|
scheduler-service |
None |
None |
None |
None |
|
|
job-group |
None |
None |
None |
None |
|
|
root-jobnet |
None |
Displays metric trend panel#3 associated with the root jobnet node |
None |
Displays metric associated with the root jobnet node |
|
|
Management Applications category |
Management Applications |
None |
None |
None |
None |
|
JP1/AJS3 - Manager |
None |
None |
None |
None |
|
|
JP1/AJS3 - Manager Scheduler Services |
None |
None |
None |
None |
|
|
scheduler-service |
None |
None |
Displays metric associated with the scheduler service |
Displays metric associated with the scheduler service |
|
|
Host |
JP1/AJS3-agent |
None#2 |
None#2 |
None |
None |
|
Management Applications category |
Management Applications |
None |
None |
None |
None |
|
JP1/AJS3 - Agent |
None |
None |
None |
None |
|
- #1
-
No matter which node you select, the first row of the dashboard shows the Node Status, Alert Information, Numeric and Trend panels.
- #2
-
If integrated agent, user-defined Prometheus, and user-defined Fluentd hosts are the same host as JP1/AJS3 manager host, the Trend panel for metric associated with those nodes is displayed. If there is more than one terminal node under the selected node, and the same metric is related, it is displayed in one panel. Note that the panel of metrics related to the root jobnet is not displayed.
- #3
-
If Display range setting of the dashboard is the default to 1 hour, metric trend panel associated with the node in the root jobnet displays seven days of data for each day.
In integrated operation viewer, for a dashboard that is automatically generated and displayed on the Dashboard tab when a node of Exporter or Fluentd and a node of the root JobNet of JP1/AJS3 are selected, the following is the difference in the panel display for metric associated with the terminal node under the selected node.
|
Selected node |
Terminal node under the node of Exporter or Flluentd |
Terminal node under the node of JP1/AJS3 root jobnet |
||
|---|---|---|---|---|
|
Panel view#1 |
Panel setting#1 |
Panel view#2 |
Panel setting#2 |
|
|
Terminal node |
View 1-hour trend data per minute #3 |
|
7-day trend data display per day#4 |
|
|
Top nodes of the terminal node (except system node) |
No panel display |
Not applicable |
||
- #1
-
Panel display for metrics other than job information in JP1/AJS3 is eligible.
- #2
-
Panel display of JP1/AJS3 job information metrics is eligible.
- #3
-
Assume that the dashboard display range is set to 1 hour.
- #4
-
The dashboard display range settings are configured as follows:
-
"Dashboard display range (start time or end time)" setting: Start time
-
"Past or future range of the time difference" setting: Past range
-
"Time difference" setting: 143h (5 days 23 hours)
-
(s) Whether Prometheus and Exporter are supported for the same host configuration and another host configuration
The following tables show whether Prometheus and Exporter can be supported for the same host configuration and another host configuration.
|
Exporter type |
Configuring Prometheus and Exporter hosts |
||
|---|---|---|---|
|
Same host |
Another host |
||
|
Exporter provided by JP1/IM - Agent |
Node exporter for AIX |
N |
Y |
|
Exporter other than the above |
Y |
N |
|
|
User-defined Exporter |
Y |
Y |
|
- Legend
-
Y: Supported
N: Not supported
The following configurations are not supported:
-
Configuring scrape from more than one Prometheus to the same Exporter
-
Exporter# on the remote agent (the host on Exporter and the host being monitored are separate hosts)
- #
-
Exporter of the remote agent is Exporter whose discovery configuration file contains the description "jp1_pc_remote_monitor_instance".
Also, if Prometheus and Exporter are configured on different hosts, it is assumed that the ports used by Exporter are protected by firewalls, network configurations, etc. so that they are not accessed by anyone other than JP1/IM - Agent's Prometheus server (e.g. by building integrated agent host and Exporter hosts in the same network so that they are not accessed externally).
(2) Centralized management of performance data
This function allows Prometheus server to store performance data collected from monitoring targets in the intelligent integrated management database of JP1/IM - Manager. It has the following features:
-
Remote light function
-
In addition, if JP1/IM - Agent 13-01 or later is newly installed, the service monitor performance data is centrally managed by default. When upgrading from JP1/IM - Agent 13-00 to 13-01 or later, you need to configure the settings to perform service monitoring. See 3.15.1(1)(c) Windows exporter (Windows performance data collection capability) and 3.15.1(1)(d) Node exporter (Linux performance data collection capability) for details on where to find setup instructions.
(a) Remote light function
This is a function in which the Prometheus server sends performance data collected from monitoring targets to an external database suitable for long-term storage. JP1/IM - Agent uses this function to send performance data to JP1/IM - Manager.
The following shows how to define a remote light.
-
Remote write definitions are described in the Prometheus server configuration file (jpc_prometheus_server.yml).
-
Download Prometheus server configuration file from integrated operation viewer, edit it in a text editor, modify Remote Write definition, and then upload it.
The following settings are supported by JP1/IM - Agent for defining Remote Write. For details about the settings, see Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
|
Setting items |
Description |
|---|---|
|
Remote Light Destination (required) |
Set the endpoint URL for JP1/IM agent control base. |
|
Remote light timeout period (optional) |
You can set the timeout period if the remote light takes a long time. Change it if you are satisfied with the default value. |
|
Relabeling (optional) |
You can remove unwanted metric and customize labeling. |
(3) Performance data monitoring notification function
This function allows Prometheus server to monitor performance data collected from monitoring targets at a threshold value and notify JP1/IM - Manager. It has three functions:
-
Alert evaluation function
-
Alert notification function
-
Notification suppression function
If you add a service to be monitored in an environment where an alert definition for monitoring a service is set, the added service is also monitored. If you exclude a monitored service for which an alert has been fired from the monitoring target, you will receive an alert indicating that the alert that was fired has been recovered.
For an example of defining an alert, see Alert definition example for metrics in Node exporter metric definition file and Alert definition example for metrics in Windows exporter metric definition file in Alert configuration file (jpc_alerting_rules.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference. For Linux, the alerts are defined differently depending on whether or not the monitored auto-start is enabled (running systemctl enable). If you want to monitor a service for which automatic startup is disabled, you must create and configure an alert definition for each target.
- - When using the job monitoring function
-
If you want to monitor performance data for job information, the alert rule evaluation interval must be at least one minute. For details about how often alert rules are evaluated, see the evaluation_interval entry of Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
(a) Alert evaluation function
This function monitors performance data collected from monitoring targets at a threshold value.
Define alert rules to evaluate alerts, monitor performance data at thresholds, and notify alerts.
Alerts can be evaluated by comparing the time series data directly with the thresholds, or by comparing the thresholds with the results of formulas using PromQL#.
- #
-
For details about PromQL, see 2.7.4(7) About PromQL.
For each time series of data or for each data generated by the calculation result of the PromQL expression, the alert status according to the evaluation is managed, and the action related to the notification is executed according to the alert state.
There are three alert states: pending, firing, and resolved. When the condition meets the alert rule first, it will be in the "pending" state. After that, when the condition continues to meet the alert rule (not resolved) during the time of "for" clause defined in the alert rule definition, it will be in the "firing" state.
When the condition does not meet(resolved), or if the time series is gone, it will be in the "resolved" state.
The relationship between alert status and notification behavior is as below.
|
Alert status |
Description |
Notification behavior |
|---|---|---|
|
pending |
The threshold is exceeded. The state the threshold is exceeded, but the time of "for" clause defined in the alert rule definition has not passed yet. |
Do not notify alerts. |
|
firing |
The firing state. The state the threshold is exceeded, and the time of "for" clause defined in the alert rule definition has passed. Alternatively, the state the threshold is exceeded, and the "for" clause of the alert is not specified. |
Notifies you of alerts. |
|
resolved |
The resolved state. The state the alert rule is no longer met. |
|
The following shows how to define an alert rule.
-
Alert rule definitions are described in the alert configuration file (jpc_alerting_rules.yml) (definitions in any YAML format can also be described).
-
Before reflecting the created definition file in the environment to be used, format check and alert rule test with the promtool command.
-
Download alert configuration file from integrated operation viewer, edit it in a text editor, change the definition of the alert rule, and then upload it.
The following settings apply to the alert rule definitions supported by JP1/IM - Agent. For details about the settings, see Alert configuration file (jpc_alerting_rules.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference. There is no default alert rule definition.
|
Setting Item |
Description |
|---|---|
|
Alert Name (required) |
Set the alert name. |
|
Conditional expression (required) |
Set the alert condition expression (threshold). It can be configured using PromQL. |
|
Waiting time (required) |
Set the amount of time to wait after entering the "pending" state before changing to the "firing" state. Change it if you are satisfied with the default value. |
|
Label (required) |
Set labels to add to alerts and recovery notifications. In JP1/IM - Agent, a specific label must be set. |
|
Annotation (required) |
Set to store additional information such as alert description and URL link. In JP1/IM - Agent, certain annotations must be set. |
Labels and annotations can use the following variables:
|
Variable# |
Description |
|---|---|
|
$labels |
A variable that holds the label key-value pairs for the alert instance. The label key can be one of the following labels: When time series data is specified in the alarm evaluation conditional expression You can specify the label that the data retains.
|
|
$values |
A variable that holds the evaluation value of the alert instance. When a firing is notified, it is expanded to the value at the time the firing was detected. When the resolved notification, it is expanded to the value as of the firing just before resolved (note that it is not the value as of resolved). |
|
$externalLabels |
This variable holds the label and value set in "external_labels" of item "global" in the Prometheus configuration file (jpc_prometheus_server.yml). |
- #1
-
Variables are expanded by enclosing them in "{{" and "}}". The following is an example of how to use variables:
description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"
■ Alert rule definition for converting to JP1 events
In order to convert the alert to be notified into a JP1 event on the JP1/IM - Manager side, the following information must be set in the alert rule definition.
|
Setting item |
Value to set |
Uses |
|---|---|---|
|
name |
Configure any unique alert group definition name in integrated agent. |
Alert group definition name |
|
alert |
Set any unique alert-definition-name in integrated agent. |
Alert Definition Name |
|
expr |
Set the PromQL statement. It is recommended to set the PromQL statement described in the metric definition file. This way, when the JP1 event occurs, you can display trend information in the Integrated Operation Viewer. |
Firing conditions#
|
|
labels.jp1_pc_product_name |
Set "/HITACHI/JP1/JPCCS" as fixed. |
Set to the product name of the JP1 event. |
|
labels.jp1_pc_severity |
Set one of the following:
|
Set to JP1 event severity#.
|
|
labels.jp1_pc_eventid |
Set any value in the range of 0~1FFF,7FFF8000~7FFFFFFF. |
Set to the event ID of the JP1 event. |
|
labels.jp1_pc_metricname |
Set the metric name. For Yet another cloudwatch exporter, be sure to specify it. Associates the JP1 event with the IM management node in the AWS namespace corresponding to the metric name (or the first metric name if multiple metric names are specified separated by commas). |
Set to the metric name of the JP1 event. For yet another cloudwatch exporter, it is also used to correlate JP1 events. |
|
annotations.jp1_pc_firing_description |
Specify the value to be set for the message of the JP1 event when the firing condition of the alert is satisfied. If the length of the value is 1,024 bytes or more, set the string from the beginning to the 1,023rd byte. If the specification is omitted, the message content of the JP1 event is "The alert is firing. (alert = alert name)". You can also specify variables to embed job names and evaluation values. If a variable is used, the first 1,024 bytes of the expanded message are valid. |
It is set to the message of the JP1 event. |
|
annotations.jp1_pc_resolved_description |
Specify the value to be set for the message of the JP1 event when the firing condition of the alert is not satisfied. If the length of the value is 1,024 bytes or more, set the string from the beginning to the 1,023rd byte. If the specification is omitted, the content of the message in the JP1 event is "The alert is resolved. (alert = alert name)". You can also specify variables to embed job names and evaluation values. If a variable is used, the first 1,024 bytes of the expanded message are valid. |
It is set to the message of the JP1 event. |
For an example of setting an alert definition, see Definition example in alert configuration file (jpc_alerting_rules.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
For details about the properties of the corresponding JP1 event, see 3.2.3 Lists of JP1 events output by JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ How to operate in combination with trending-related functions
Combine the definitions of the PromQL statement described in the metric definition file and the PromQL statement evaluated by the alert evaluation function, and in the alert definition annotations.jp1_pc_firing_description and annotations.jp1_pc_resolved_description of the alert definition in the alert configuration file, By describing the metric name of the corresponding trend data, when the JP1 event of the alert is issued, you can check the past change and current value of the performance value evaluated by the alert on the Trends tab of the integrated operation viewer.
For details about PromQL expression defined in trend displayed related capabilities, see 3.15.6(4) Return of trend data.
For example, if you want the Node exporter to monitor CPU usage and notify you when the CPU usage exceeds 80%, create an alert configuration file (alert definition) and a metric definition file as shown in the following example.
-
Example of description of alert configuration file (alert definition)
groups: - name: node_exporter rules: - alert: cpu_used_rate(Node exporter) expr: 80 < (avg by (instance,job,jp1_pc_nodelabel,jp1_pc_exporter) (rate(node_cpu_seconds_total{mode="system"}[2m])) + avg by (instance,job,jp1_pc_nodelabel,jp1_pc_exporter) (rate(node_cpu_seconds_total{mode="user"}[2m]))) * 100 for: 3m labels: jp1_pc_product_name: "/HITACHI/JP1/JPCCS2" jp1_pc_component: "/HITACHI/JP1/JPCCS/CONFINFO" jp1_pc_severity: "Error" jp1_pc_eventid: "0301" jp1_pc_metricname: "node_cpu_seconds_total" annotations: jp1_pc_firing_description: "CPU usage has exceeded the threshold (80%). value={{ $value }}%" jp1_pc_resolved_description: "CPU usage has fallen below the threshold (80%)." -
Example of description of metric definition file
[ { "name":"cpu_used_rate", "default":true, "promql":"(avg by (instance,job,jp1_pc_nodelabel,jp1_pc_exporter) (rate(node_cpu_seconds_total{mode=\"system\"}[2m]) and $jp1im_TrendData_labels) + avg by (instance,job,jp1_pc_nodelabel,jp1_pc_exporter) (rate(node_cpu_seconds_total{mode=\"user\"}[2m]) and $jp1im_TrendData_labels)) * 100", "resource_en":{ "category":"platform_unix", "label":"CPU used rate", "description":"CPU usage.It also indicates the average value per processor. [Units: %]", "unit":"%" }, "resource_ja":{ "category":"platform_unix", "label":"CPU使用率", "description":"CPU使用率(%)。プロセッサごとの割合の平均値でもある。", "unit":"%" } } }When the conditions of the PromQL statement specified in expr of the alert definition are satisfied and the JP1 event of the alert is issued, the message "CPU usage has exceeded the threshold (80%). value = performance-value%" is set in the message of the JP1 event. Users can view this message to view "CPU Usage" trend information and see past changes and current values of CPU usage.
■ Behavior when the service is stopped
If the Alertmanager service is stopped, the JP1 event for the alert is not issued. In addition, if the Prometheus server and Alertmanager services are running and the exporter whose alert is firing is stopped due to a failure, the alert becomes resolved and a normal JP1 event is issued.
When alert is firing and the Prometheus server service is stopped while the Alertmanager is running, a normal JP1 event that gives a notification of resolved of the alert is issued.
For details, see About behavior when the Prometheus server is restarted or stoppedwhile the Alertmanager is running.
■ About behavior when the service is restarted
Even if the alert is firing or resolved and the Prometheus server, Alertmanager, or Exporter service is restarted, when the current alert status is the same as the alert state before the restart, the JP1 event is not issued.
When the alert is firing and the Prometheus server service is restarted while the Alertmanager is running, there are cases in which a normal JP1 event that gives a notification of resolved of the alert is issued.
For details, see About behavior when the Prometheus server is restarted or stopped while the Alertmanager is running.
■ About Considering Performance Data Spikes
Performance data can be momentarily jumpy (large values, small values, or minus values). These sudden changes in performance data are commonly referred to as "spikes." In many cases, even if a spike occurs and becomes an abnormal value momentarily, it immediately returns to normal and does not need to be treated as an abnormal. Also, when the performance data is reset, such as when the OS is restarted, a spike may occur instantaneously.
When monitoring such performance data metrics, it is necessary to consider suppressing sudden anomaly detection by specifying "for" (grace period before treating alerts as anomalies) in the alert rule definition.
■ About behavior when the Prometheus server is restarted or stopped while the Alertmanager is running
When the alert is firing and the Prometheus server service is restarted or stopped while the Alertmanager is running, there are cases in which a normal JP1 event that gives a notification of resolved of the alert be issued.
When following conditions are met, a normal JP1 event is issued.
-
The sum total of the duration of the "for" clause# defined in alert definition of firing alert and the duration that Prometheus server service is not runnig due to being stopped or reloading becomes greater than the value of "evaluation_interval" defined in Prometheus configuration file.
-
#: When the "for" clause of the alert is not specified, define 0.
■ About behavior when the service is reloaded
Even if the alert is firing or resolved and the API that reloads the Prometheus server, Alertmanager, or Exporter service is executed, the JP1 event is not issued.
(b) Alert forwarder
This function notifies you when the alert status becomes "firing" or "resolved" after the Prometheus server evaluates the alert.
When the state of alert changes during JP1/IM - Manager (Intelligent Integrated Management Base) is stopped, there are cases in which a notification of firing and resolved is not performed.
The Prometheus server sends alerts one by one, and the sent alerts are notified to JP1/IM - Manager (Intelligent Integrated Management Base) via Alertmanager. You will also be notified one by one when you retry.
Alerts sent to JP1/IM - Manager are basically sent in the order in which they occurred, but the order may be changed when multiple alert rules meet the conditions at the same time or when a transmission error occurs and they are resent. However, since the alert information includes the time of occurrence, it is possible to understand in which order it occurred.
In addition, if the abnormal condition continues for 7 days, an alert will be re-notified.
The following shows how to define the notification destination of the alert.
-
Alert destinations are described in both the Prometheus configuration file (jpc_prometheus_server.yml) and the Alertmanager configuration file (jpc_alertmanager.yml).
For Prometheus configuration file, specify a Alertmanager that coexists as a destination for Prometheus server notifications. For Alertmanager configuration file, specify JP1/IM agent control base as the notification destination for Alertmanager.
-
Download the individual configuration file from integrated operation viewer, edit them in a text editor, change the alert notification destination definitions, and then upload them.
The following settings are related to definition of Prometheus server notification destinations supported by JP1/IM - Agent. For details about the settings, see Prometheus configuration file (jpc_prometheus_server.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
|
Setting items |
Description |
|---|---|
|
Notification destination (required) |
Configure the notification destination Alertmanager. If a host name or internet address is specified for --web.listen-address in the Alertmanager command line option, modify localhost to the host name or internet address specified in --web.listen-address.
|
|
Label setting (optional) |
You can add labels. Configure as needed. |
The following are Alertmanager notification destinations that JP1/IM - Agent supports: For details about the settings, see Alertmanager configuration file (jpc_alertmanager.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
|
Setting items |
Description |
|---|---|
|
Webhook settings (required) |
Set the endpoint URL for JP1/IM agent control base. |
(c) Notification suppression function
This function suppresses the notifications described in 3.15.1(3)(b) Alert forwarder. It includes:
-
Silence function
Use this if you do not want to be temporarily notified of certain alerts.
■ Silence function
This feature temporarily suppresses certain notifications. You can set not to notify alerts that occur during temporary maintenance. Unlike when the common exclusion condition of JP1/IM - Manager is used, the notification suppression function does not notify JP1/IM - Manager itself.
While silence is enabled, you will not be notified when the alert status changes. When silence is disabled, if the state has changed compared to the state of the alert before silence was enabled, notification is given.
Here are two examples of when to notify:
|
|
The above figure shows an example in which the alert status is "abnormal" when silence is enabled, and while silence is enabled, the alert status changes to "normal", and then silence is disabled.
When the alert changes to "normal", you will not be notified because silence is enabled. When silence is disabled, the alert status has changed from "abnormal" to "normal" before silence is enabled, so "normal" notification is given.
|
|
The above figure shows an example in which the alert status changed to "normal" once, changed to "abnormal" again, and then disabled silence while silence was enabled.
When silence is disabled, notification is not performed because the alert status is the same "abnormal" as before silence was enabled.
If an alert fails to be sent and retries and silence is enabled to suppress the alert, the alert will not be retried.
- - How to Configure silence
-
Silence settings (enable or disable) and retrieve the current silence settings are performed via REST API (GUI is not supported).
In addition, when configuring silence settings, integrated agent host must be able to communicate with Alertmanager port-number from the machine that you are operating.
For details about silence settings and REST API used to obtain current silence settings, see 5.22.3 Get silence list of Alertmanager, 5.22.4 Silence creation of Alertmanager, and 5.22.5 Silence Revocation of Alertmanager in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
(4) Communication function
(a) Communication protocols and authentication methods
The following shows the communication protocols and authentication methods used by integrated agent.
|
Connection source |
Connect to |
Protocol |
Authentication method |
|---|---|---|---|
|
Prometheus server |
JP1/IM agent control base |
HTTP |
No authentication |
|
Alertmanager |
|||
|
Prometheus server |
Alertmanager |
HTTP |
No authentication |
|
Exporter |
|||
|
Blackbox exporter |
monitored |
HTTP/HTTPS |
Basic Authentication |
|
Basic Authentication |
|||
|
No authentication |
|||
|
HTTPS |
Server Authentication |
||
|
With client authentication |
|||
|
No client authentication |
|||
|
ICMP#1 |
No authentication |
||
|
Yet another cloudwatch exporter |
Amazon CloudWatch |
HTTPS |
AWS IAM Authentication |
|
Promitor Scraper |
Azure Monitor |
HTTPS |
No client authentication |
|
Promitor Resource Discovery |
Azure Resource Graph |
HTTPS |
No client authentication |
|
Promitor Scraper |
Promitor Resource Discovery |
HTTP |
No authentication |
|
Prometheus |
Fluentd |
HTTP |
No authentication |
|
OracleDB exporter |
Oracle listener |
Oracle listener-specific (no encryption) |
Authentication by username/password |
|
Web scenario execution function |
Browser that invokes Web scenario-execution feature |
Chrome devtools protocol (CDP) |
No authentication |
|
Web Scenario Execute Function/Browser from which Web Scenario Execute Function starts |
Monitored server |
|
|
|
VMware exporter |
VMware ESXi |
No SSL/TSL connected |
Authentication by Username and Password |
|
Connected #4 with SSL/TLS |
|
||
|
SQL exporter |
Microsoft SQL Server |
No TSL connected |
Authentication by username and password |
|
Connected #5 with TLS |
Authentication by username and password |
- #1
-
ICMPv6 is not available.
- #2
-
The specific protocol depends on the target.
- #3
-
See Configuring authentication in 1.21.2(13)(a) Setting up JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Configuration Guide.
- #4
-
Only TLS1.1 and TLS1.2 can be connected.
- #5
-
You must provide the option to enable TLS communication with Microsoft SQL Server in the connection information of the monitoring target set by SQL exporter configuration file (jpc_sql_exporter.yml). For details, see SQL exporter configuration file (jpc_sql_exporter.yml) in Chapter 2. Definition Files in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
(b) Network configuration
Integrated agent can be used in a network configuration with only a IPv4 environment or in a network configuration with a mix of IPv4 and IPv6 environments. Only IPv4 communication is supported in a network configuration with a mix of IPv4 and IPv6 environments.
You can use integrated agent in the following configurations without a proxy server:
|
Connection source |
Connect to |
Connection type |
|---|---|---|
|
Prometheus server |
JP1/IM agent control base |
No proxy server |
|
Alertmanager |
||
|
Prometheus server |
Alertmanager |
|
|
Exporter |
||
|
Blackbox exporter |
Monitoring targets (ICMP monitoring) |
|
|
Monitoring targets (HTTP monitoring) |
|
|
|
Yet another cloudwatch exporter |
Amazon CloudWatch |
|
|
Promitor Scraper |
Azure Monitor |
|
|
Promitor Resource Discovery |
Azure Resource Graph |
|
|
OracleDB exporter |
Oracle listener |
No proxy server |
|
Web Scenario Execute Function/Browser from which Web Scenario Execute Function starts |
Monitored server |
|
|
VMware exporter |
VMware ESXi |
No proxy server |
|
SQL exporter |
Microsoft SQL Server |
No proxy server |
Integrated agent transmits the following:
|
Connection source |
Connect to |
Transmitted data |
Authentication method |
|---|---|---|---|
|
Prometheus server |
JP1/IM agent control base |
Performance data in Protobuf format |
|
|
Alertmanager |
Alert information in JSON format#1 |
||
|
Prometheus server |
Exporter |
None |
|
|
Exporter |
Prometheus server |
Prometheus textual performance data#2 |
|
|
Blackbox exporter |
monitored |
Response for each protocol |
|
|
Yet another cloudwatch exporter |
Amazon CloudWatch |
CloudWatch data |
|
|
Promitor Scraper |
Azure Monitor |
Azure Monitor data (metrics information) |
|
|
Promitor Resource Discovery |
Azure Resource Graph |
Azure Resource Graph data (resources exploration results) |
|
|
OracleDB exporter |
Oracle listener |
Proprietary Oracle listener data |
|
|
Web scenario execution function |
Browser that invokes Web scenario-execution feature |
Browser operation data |
|
|
Web Scenario Execute Function/Browser from which Web Scenario Execute Function starts |
Monitored server |
Data that depends on the target |
|
|
Monitored server |
Web Scenario Execute Function/Browser from which Web Scenario Execute Function starts |
Data that depends on the target |
|
|
VMware exporter |
VMware ESXi |
VMware ESXi information |
|
|
VMware ESXi |
VMware exporter |
||
|
SQL exporter |
Microsoft SQL Server |
None |
|
|
Microsoft SQL Server |
SQL exporter |
Result of executing SQL statement |
|
- #1
-
For details, see the description of the message body for the request in 5.6.5 JP1 Event converter in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
- #2
-
For details, see the description of Prometheus text formatting in 5.24 API for scrape of Exporter used by JP1/IM - Agent in the JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.