Hitachi

JP1 Version 13 JP1/Integrated Management 3 - Manager Overview and System Design Guide


12.5.2 Performance monitoring capabilities

Organization of this subsection

(1) Communication capability

(a) Communication protocol

The following table lists the communication protocols used by the IM Exporter add-on programs.

Connected from

Connected to

Protocol

Authentication method

Yet another cloudwatch exporter

Amazon CloudWatch

See 9.5.3(1)(a) Communication protocols and authentication methods of JP1/IM - Agent.

Promitor Scraper

Azure Monitor

HTTPS

No client authentication

Promitor Resource Discovery

Azure Resource Graph

HTTPS

No client authentication

Promitor Scraper

Promitor Resource Discovery

HTTP

No authentication

Prometheus

Fluentd

HTTP

No authentication

(b) Network configuration

The environments where the IM Exporter add-on programs are available follow the standards for JP1/IM. The following table shows the proxy configurations that are available.

Connected from

Connected to

Available proxy configuration

Yet another cloudwatch exporter

Amazon CloudWatch

See 9.5.3(1)(b) Network configuration of JP1/IM - Agent.

Promitor Scraper

Azure Monitor

  • Without a proxy server

  • With a proxy server (no authentication)

  • With a proxy server (with authentication)

Promitor Resource Discovery

Azure Resource Graph

The following table shows what data is transmitted by the IM Exporter add-on programs.

Connected from

Connected to

Data to be transmitted

Authentication method

Yet another cloudwatch exporter

Amazon CloudWatch

See 9.5.3(1)(b) Network configuration of JP1/IM - Agent.

Promitor Scraper

Azure Monitor

Azure Monitor data (metrics information)

  • Service principal

  • Managed ID

Promitor Resource Discovery

Azure Resource Graph

Azure Resource Graph data (resources exploration results)

(2) Performance data collection capabilities

With these capabilities, Prometheus server collects performance data from monitoring targets. There are two capabilities available as follows:

For details, see 9.5.3(2) Performance data collection function of JP1/IM - Agent.

(a) Scraping capability

Scraping is defined on a scraping job basis. In JP1/IM - Agent, scraping jobs with names that correspond to the types of Exporters are defined by default.

If a discovery configuration file is used for monitoring through UAP monitoring, jobs should be defined. Also, additional settings are required for the scraping definitions of the log metrics feature.

For details on the scraping description of the log metrics feature, see 10.1.2(2) Setting up scraping definitions (required) of IM Exporter in the manual JP1/Integrated Management 3 - Manager Configuration Guide.

The following table lists the default scraping definition for each IM Exporter add-on program.

Scraping job name

Scraping definition

jpc_windows

Scraping definition for Windows exporter

jpc_process

Scraping definition for Process exporter

jpc_cloudwatch

Scraping definition for Yet another cloudwatch exporter

jpc_promitor

Scraping definition for Promitor

jpc_script

Scraping definition for Script exporter

Prometheus server scrapes targets and receives different metrics from the Exporters depending on the types of Exporters. For details, see the description of the metric definition file for each Exporter under 10. IM Exporter definition files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

(b) Operating data collection from monitoring targets

The following describes the capabilities of the IM Exporter add-on programs, which collect operating information (performance data) from monitoring targets.

(c) Windows exporter

Windows exporter, built into a monitored Windows host, collects operating information from that host. For details, see 9.5.3(2) Performance data collection function of JP1/IM - Agent.

In IM Exporter, operating information of processes can be collected in addition to the capabilities of Windows exporter that comes with JP1/IM - Agent. process is added by default to the collectors available.

■ Key metric items

The key Windows exporter metric items are defined in the Windows exporter metric definition file (initial status). For details, see the description of Windows exporter metric definition file (metrics_windows_exporter.conf) of JP1/IM - Agent in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

In IM Exporter, the metric items listed in the table below can be added to the metric definition file. The following table shows the metrics you can specify with PromQL statements used within the definition file.

Metric name

Collector

Data to be obtained

Label

windows_process_start_time

process

Time of process start

instance: instance-identifier-string

job: job-name

process: process-name

process_id: process-ID

creating_process_id: creator-process-ID

windows_process_cpu_time_total

process

Returns elapsed time that all of the threads of this process used the processor to execute instructions by mode (privileged, user). An instruction is the basic unit of execution in a computer, a thread is the object that executes instructions, and a process is the object created when a program is run. Code executed to handle some hardware interrupts and trap conditions is included in this count.

instance: instance-identifier-string

job: job-name

process: process-name

process_id: process-ID

creating_process_id: creator-process-ID

mode: mode (privileged or user)

windows_process_io_bytes_total

process

Bytes issued to I/O operations in different modes (read, write, other). This property counts all I/O activity generated by the process to include file, network, and device I/Os. Read and write mode includes data operations; other mode includes those that do not involve data, such as control operations.

instance: instance-identifier-string

job: job-name

process: process-name

process_id: process-ID

creating_process_id: creator-process-ID

mode: mode (privileged or user)

windows_process_io_operations_total

process

I/O operations issued in different modes (read, write, other). This property counts all I/O activity generated by the process to include file, network, and device I/Os. Read and write mode includes data operations; other mode includes those that do not involve data, such as control operations.

instance: instance-identifier-string

job: job-name

process: process-name

process_id: process-ID

creating_process_id: creator-process-ID

mode: mode (read, write, or other)

windows_process_page_faults_total

process

Page faults by the threads executing in this process. A page fault occurs when a thread refers to a virtual memory page that is not in its working set in main memory. This can cause the page not to be fetched from disk if it is on the standby list and hence already in main memory, or if it is in use by another process with which the page is shared.

instance: instance-identifier-string

job: job-name

process: process-name

process_id: process-ID

creating_process_id: creator-process-ID

windows_process_page_file_bytes

process

Current number of bytes this process has used in the paging file(s). Paging files are used to store pages of memory used by the process that are not contained in other files. Paging files are shared by all processes, and lack of space in paging files can prevent other processes from allocating memory.

instance: instance-identifier-string

job: job-name

process: process-name

process_id: process-ID

creating_process_id: creator-process-ID

windows_process_pool_bytes

process

Pool Bytes is the last observed number of bytes in the paged or nonpaged pool. The nonpaged pool is an area of system memory (physical memory used by the operating system) for objects that cannot be written to disk, but must remain in physical memory as long as they are allocated. The paged pool is an area of system memory (physical memory used by the operating system) for objects that can be written to disk when they are not being used. Nonpaged pool bytes is calculated differently than paged pool bytes, so it might not equal the total of paged pool bytes.

instance: instance-identifier-string

job: job-name

process: process-name

process_id: process-ID

creating_process_id: creator-process-ID

pool: paged (pool paged) or nonpaged (pool non paged)

windows_process_priority_base

process

  • Current base priority of this process. Threads within a process can raise and lower their own base priority relative to the process base priority of the process.

instance: instance-identifier-string

job: job-name

process: process-name

process_id: process-ID

creating_process_id: creator-process-ID

windows_process_private_bytes

process

Current number of bytes this process has allocated that cannot be shared with other processes.

instance: instance-identifier-string

job: job-name

process: process-name

process_id: process-ID

creating_process_id: creator-process-ID

windows_process_virtual_bytes

process

Current size, in bytes, of the virtual address space that the process is using. Use of virtual address space does not necessarily imply corresponding use of either disk or main memory pages. Virtual space is finite and, by using too much, the process can limit its ability to load libraries.

instance: instance-identifier-string

job: job-name

process: process-name

process_id: process-ID

creating_process_id: creator-process-ID

■ Comparison with key performance data that can be collected by JP1/PFM - Agent for Platform

The following table shows whether Windows exporter can collect key performance data that can be collected by JP1/PFM - Agent for Platform as metrics, in comparison with the records JP1/PFM - Agent for Platform uses for collection.

Key performance data that can be collected by JP1/PFM - Agent for Platform

Whether Windows exporter can collect it as a metric

Record name

(Record ID)

Information stored in the record

Record is based on

What can be collected

What cannot be collected

Process Detail

(PD)

Performance data that shows the state of paging, memory usage, and time usage of one process at a point in time.

Process ID

This corresponds to where a node is created on a process_id basis.

  • Execution user/group

  • ID of a virtualization environment

  • Handle Count

  • Thread Count

  • Size of the memory used by processes

  • Size of the memory used by processes

Process Detail Interval

(PDI)

Performance data that shows the state of paging, memory usage, and time usage of one process at a point in time.

Process ID

The metrics to be collected are all included in PD.

The metrics that are determined through the calculation of the average or frequency can be obtained by calculating using the start time of the process, not the collection interval.

--

Process End Detail

(PD_PEND)

Performance data that shows the state after the process ends.

Process ID

--

The information of any ended process cannot be collected.

Workgroup Summary

(PI_WGRP)

Performance data obtained by summarizing a record stored in the Process Detail (PD) record at a point in time on a workgroup basis.

Workgroup

A workgroup is a JP1/PFM-specific unit, which cannot be collected.

--

Application Process Interval

(PD_APSI)

Performance data that shows the state of a process for which process monitoring has been configured, at a point in time.

Process ID

A given unit cannot be specified.

The metrics to be collected are all included in APS.

--

Application Process Overview

(PD_APS)

Performance data that shows the state of a process at a point in time.

Process ID

A given unit cannot be specified, but this corresponds to where a node is created on a process basis.

  • Command line

  • Execution user/group

  • ID of a virtualization environment

  • Handle Count

  • Thread Count

  • Size of the memory used by processes

  • Size of the memory used by processes

Legend:

--: Not applicable

(d) Process exporter

Process exporter, built into a monitored Linux host, collects operating information of processes running on that host.

Installed in the same host as Prometheus server, Process exporter collects operating information of the processes from the Linux OS on the host when triggered by scraping requests from Prometheus server, and returns it to the server.

Process exporter allows you to collect process-related operating information, which cannot be obtained through monitoring from outside the host (such as synthetic monitoring with URLs or CloudWatch), from within the host.

■ Key metric items

The key Process exporter metric items are defined in the Process exporter metric definition file (initial status). For details, see Process exporter metric definition file (metrics_process_exporter.conf) of 10. IM Exporter definition files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can add more metric items to the metric definition file. The following table shows the metrics you can specify with PromQL statements used within the definition file.

Metric name

Data to be obtained

Label

namedprocess_namegroup_num_procs

Number of processes in this group.

instance: instance-identifier-string

job: job-name

groupname: group-name

namedprocess_namegroup_cpu_seconds_total

CPU usage based on /proc/[pid]/stat fields utime(14) and stime(15) i.e. user and system time.

instance: instance-identifier-string

job: job-name

groupname: group-name

mode: user or system

namedprocess_namegroup_read_bytes_total

Bytes read based on /proc/[pid]/io field read_bytes. As /proc/[pid]/io are set by the kernel as read only to the process' user, to get these values you should run process-exporter either as that user or as root. Otherwise, we can't read these values and you'll get a constant 0 in the metric.

instance: instance-identifier-string

job: job-name

groupname: group-name

namedprocess_namegroup_write_bytes_total

Bytes written based on /proc/[pid]/io field write_bytes.

instance: instance-identifier-string

job: job-name

groupname: group-name

namedprocess_namegroup_major_page_faults_total

Number of major page faults based on /proc/[pid]/stat field majflt(12).

instance: instance-identifier-string

job: job-name

groupname: group-name

namedprocess_namegroup_minor_page_faults_total

Number of minor page faults based on /proc/[pid]/stat field minflt(10).

instance: instance-identifier-string

job: job-name

groupname: group-name

namedprocess_namegroup_context_switches_total

Number of context switches based on /proc/[pid]/status fields voluntary_ctxt_switches and nonvoluntary_ctxt_switches. The extra label ctxswitchtype can have two values: voluntary and nonvoluntary.

instance: instance-identifier-string

job: job-name

groupname: group-name

ctxswitchtype: voluntary or nonvoluntary

namedprocess_namegroup_memory_bytes

Number of bytes of memory used. The extra label memtype can have three values:

  • resident: Field rss(24) from /proc/[pid]/stat. This is just the pages which count toward text, data, or stack space. This does not include pages which have not been demand-loaded in, or which are swapped out.

  • virtual: Field vsize(23) from /proc/[pid]/stat, virtual memory size.

  • swapped: Field VmSwap from /proc/[pid]/status, translated from KB to bytes.

If gathering smaps file is enabled, two additional values for memtype are added:

  • proportionalResident: Sum of Pss fields from /proc/[pid]/smaps

proportionalSwapped: Sum of SwapPss fields from /proc/[pid]/smaps

instance: instance-identifier-string

job: job-name

groupname: group-name

memtype: resident, virtual, swapped, proportionalResident, or proportionalSwapped

namedprocess_namegroup_open_filedesc

Number of file descriptors, based on counting how many entries are in the directory /proc/[pid]/fd.

instance: instance-identifier-string

job: job-name

groupname: group-name

namedprocess_namegroup_worst_fd_ratio

Worst ratio of open filedescs to filedesc limit, amongst all the procs in the group. The limit is the fd soft limit based on /proc/[pid]/limits.

instance: instance-identifier-string

job: job-name

groupname: group-name

namedprocess_namegroup_oldest_start_time_seconds

Epoch time (seconds since 1970/1/1) at which the oldest process in the group started. This is derived from field starttime(22) from /proc/[pid]/stat, added to boot time to make it relative to epoch.

instance: instance-identifier-string

job: job-name

groupname: group-name

namedprocess_namegroup_num_threads

Sum of number of threads of all process in the group. Based on field num_threads(20) from /proc/[pid]/stat.

instance: instance-identifier-string

job: job-name

groupname: group-name

namedprocess_namegroup_states

Number of threads in the group in each of various states, based on the field state(3) from /proc/[pid]/stat.

The extra label state can have these values: Running, Sleeping, Waiting, Zombie, Other.

instance: instance-identifier-string

job: job-name

groupname: group-name

state: Running, Sleeping, Waiting, Zombie, or Other

namedprocess_namegroup_thread_count

Number of threads in this thread subgroup.

instance: instance-identifier-string

job: job-name

groupname: group-name

threadname: thread-name

namedprocess_namegroup_thread_cpu_seconds_total

Same as cpu_user_seconds_total and cpu_system_seconds_total, but broken down per-thread subgroup.

instance: instance-identifier-string

job: job-name

groupname: group-name

threadname: thread-name

mode: user or system

namedprocess_namegroup_thread_io_bytes_total

Same as read_bytes_total and write_bytes_total, but broken down per-thread subgroup. Unlike read_bytes_total/write_bytes_total, the label iomode is used to distinguish between read and write bytes.

instance: instance-identifier-string

job: job-name

groupname: group-name

threadname: thread-name

iomode: read or write

namedprocess_namegroup_thread_major_page_faults_total

Same as major_page_faults_total, but broken down per-thread subgroup.

instance: instance-identifier-string

job: job-name

groupname: group-name

namedprocess_namegroup_thread_minor_page_faults_total

Same as minor_page_faults_total, but broken down per-thread subgroup.

instance: instance-identifier-string

job: job-name

groupname: group-name

namedprocess_namegroup_thread_context_switches_total

Same as context_switches_total, but broken down per-thread subgroup.

instance: instance-identifier-string

job: job-name

groupname: group-name

Important
  • Processes whose name contains multi-byte characters cannot be monitored.

  • Process exporter still continues to output information of processes that it collected once, even after the processes stop running. Therefore, if Process exporter is configured to collect information based on PIDs, new time-series data is added every time a process is restarted and its PID is changed, resulting in large amounts of unnecessary data.

    Furthermore, it is not recommended to use PIDs in open source software (OSS), and thus version 13-00 of our software is configured not to collect PID information by default (groupname). If the user wants to manage processes on the same command line separately, we recommend operational means, such as a change in the order of arguments or the use of PIDs (however, periodic restarts are needed to prevent collected information from accumulating continuously).

    Note that information collected by Windows exporter is different from what Process exporter collects, because Windows exporter collects the PID information. (If you want to exclude the PIDs from the collected information, use drop in the scraping definition of the Prometheus configuration file (jpc_prometheus_server.yml) to exclude them.)

■ Comparison with key performance data that can be collected by JP1/PFM - Agent for Platform

The following table shows whether Process exporter can collect key performance data that can be collected by JP1/PFM - Agent for Platform as metrics, in comparison with the records JP1/PFM - Agent for Platform uses for collection.

Key performance data that can be collected by JP1/PFM - Agent for Platform

Whether Process exporter can collect it as a metric

Record name

(Record ID)

Information stored in the record

Record is based on

What can be collected

What cannot be collected

Process Detail

(PD)

Performance data that shows the state of a process at a point in time.

Process ID

The data can be collected on a process ID basis if groupname is specified such that it contains {{.PID}}.

It also corresponds to cases where a node is created on a process ID basis.

  • Parent process/child process information

  • Real/effective group/real user/terminal information, effective user ID

  • Memory usage of certain memory types

  • CPU usage by CPU

  • ID of a virtualization environment

Process Detail Interval

(PDI)

Performance data of a process over a certain unit of time.

Process ID

The metrics to be collected are all included in PD.

The metrics that are determined through the calculation of the average or frequency can be obtained by calculating using the start time of the process, not the collection interval.

--

Process Summary

(PD_PDS)

Performance data obtained by summarizing data stored in the Process Detail (PD) record at a point in time.

System

This can be aggregated on an instance (host) basis.

  • Part of the state of a process

  • Real user/terminal information

Program Summary

(PD_PGM)

Performance data obtained by summarizing data stored in the Process Detail (PD) record at a point in time on a program basis.

Program

The data can be collected on a program basis if groupname is specified based on a program (that is, use {{.ExeBase}} or {{.ExeFull}}).

--

Terminal Summary

(PD_TERM)

Performance data obtained by summarizing data stored in the Process Detail (PD) record at a point in time on a terminal basis.

Terminal

--

The data cannot be aggregated on a terminal basis because the terminal information cannot be collected.

User Summary

(PD_USER)

Performance data obtained by summarizing data stored in the Process Detail (PD) record at a point in time on a user basis.

User ID

The data can be aggregated on a user basis by putting data having the same user name together with {{.Username}} contained in groupname.

  • Effective user ID

Workgroup Summary

(PI_WGRP)

Performance data obtained by summarizing data stored in the Process Detail (PD) record at a point in time on a workgroup basis.

Workgroup

A workgroup is a JP1/PFM-specific unit, which cannot be collected.

--

Application Process Interval

(PD_APSI)

Performance data that shows the state of a process for which process monitoring has been configured, at a point in time.

Process ID

All the metrics to be collected, except for ApplicationName (which nearly corresponds to groupname of Process exporter), are included in APS.

The metrics that are determined through the calculation of the average or frequency can be obtained by calculating using the start time of the process, not the collection interval.

--

Application Process Overview

(PD_APS)

Performance data of processor usage over a certain unit of time.

Process ID

This corresponds to cases where a node is created on a groupname basis.

The metrics for each process (process-ID-based) are the same as PD.

Same as PD.

Legend:

--: Not applicable

(e) Yet another cloudwatch exporter

Yet another cloudwatch exporter collects operating information of AWS services running on the cloud environment through Amazon CloudWatch. For details, see the description in 9.5.3(2) Performance data collection function of JP1/IM - Agent.

■ Key metric items

The key metric items of Yet another cloudwatch exporter are defined in the Yet another cloudwatch exporter metric definition file (initial status). For details, see the description under Yet another cloudwatch exporter metric definition file (metrics_ya_cloudwatch_exporter.conf) of JP1/IM - Agent definition files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ CloudWatch metrics you can collect

In addition to the AWS namespaces supported by Yet another cloudwatch exporter of JP1/IM - Agent as monitoring targets, IM Exporter can collect the metrics with the AWS namespaces listed in the following table.

Table 12‒10: AWS namespaces supported by IM Exporter as extended monitoring targets

AWS namespace

Metric category name on CloudWatch#

Dimension

AWS/EBS

Per-volume metrics

VolumeId

AWS/ECS

ClusterName, ServiceName

ClusterName

ServiceName

AWS/EFS

File system metrics

FileSystemId

AWS/EFS

File system storage metrics

FilesSystemId

StorageClass

AWS/FSx

File system metrics

FileSystemId

AWS/RDS

Per-database metrics

DBInstanceIdentifier

DBClusterIdentifier

DBClusterIdentifier

AWS/SNS

Topic metrics

TopicName

#

The name of a class after metrics are categorized by dimension in AWS CloudWatch. You can view them in the CloudWatch website.

(f) Promitor

Promitor, included in the integrated agent, collects operating information of Azure services on the cloud environment through Azure Monitor and Azure Resource Graph.

Promitor consists of Promitor Scraper and Promitor Resource Discovery. Promitor Scraper collects metrics on resources from Azure Monitor according to schedule settings and returns them.

Metrics can be collected from target resources in two ways: one method is to specify the target resources separately in a configuration file and the other is to detect the resources automatically. If you choose to detect them automatically, Promitor Resource Discovery detects resources in a tenant through Azure Resource Graph, and based on the results, Promitor Scraper collects metric information.

In addition, both Promitor Scraper and Promitor Resource Discovery require two configuration files for each of them. One configuration file is to define runtime settings, such as authentication information, and the other is to define metric information to be collected.

■ Key metric items

The key Promitor metric items are defined in the Promitor metric definition file (initial status). For details, see the description under Promitor metric definition file (metrics_promitor.conf) of 10. IM Exporter definition files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Metrics you can collect

Promitor can collect metrics for the following services to monitor:

You specify metrics you want to collect in the Promitor Scraper configuration file (metrics-declaration.yaml).

If you want to change the metrics specified in the Promitor Scraper settings file, see Change monitoring metrics (optional) in the manial JP1/Integrated Management 3 - Manager Configuration Guide 10.1.2(6) Setting up Promitor (d) Configuring scraping targets (required).

You can also add new metrics to the Promitor metric definition file, based on the metrics specified in the Promitor Scraper configuration file. Metrics defined in Promitor Scraper configuration file can be specified to the PromQL statement written in the definition file.

Table 12‒11: Services supported as monitoring targets by Promitor

Promitor resourceType name

Azure Monitor namespace

Automatic discovery support

VirtualMachine

Microsoft.Compute/virtualMachines

Y

FunctionApp

Microsoft.Web/sites

Y

ContainerInstance

Microsoft.ContainerInstance/containerGroups

--

KubernetesService

Microsoft.ContainerService/managedClusters

Y

FileStorage

Microsoft.Storage/storageAccounts/fileServices

--

BlobStorage

Microsoft.Storage/storageAccounts/blobServices

--

ServiceBusNamespace

Microsoft.ServiceBus/namespaces

Y

CosmosDb

Microsoft.DocumentDB/databaseAccounts

Y

SqlDatabase

Microsoft.Sql/servers/databases

Y

SqlServer

Microsoft.Sql/servers/databases

Microsoft.Sql/servers/elasticPools

--

SqlManagedInstance

Microsoft.Sql/managedInstances

Y

SqlElasticPool

Microsoft.Sql/servers/elasticPools

Y

LogicApp

Microsoft.Logic/workflows

Y

Legend:

Y: Automatic discovery is supported.

--: Automatic discovery is not supported.

■ Checking how Azure SDKs used by Promitor are supported

Promitor employs Azure SDK for .NET. An end of Azure SDK support is announced 12 months in advance. For details on the lifecycle of Azure SDK, see Lifecycle FAQ at the following website:

https://learn.microsoft.com/ja-jp/lifecycle/faq/azure#azure-sdk-----------

For the lifecycles of versions of Azure SDK libraries, you can find them in the following website:

https://azure.github.io/azure-sdk/releases/latest/all/dotnet.html

■ Credentials required for account information

Promitor can connect to Azure through the service principal method or the managed ID method. For details on the credentials assigned to the service principal and managed ID, see (a) Configuring the settings for establishing a connection to Azure (required) in the manual JP1/Integrated Management 3 - Manager Configuration Guide 10.1.2(6) Setting up Promitor.

(g) Container monitoring

Container environment monitoring uses different methods to collect operating information depending on monitoring targets, as listed in the following table.

Monitoring target

How to collect operating information

Red Hat OpenShift

User-specific Prometheus

Kubernetes

Amazon Elastic Kubernetes Service (EKS)

Azure Kubernetes Service (AKS)

Azure's monitoring feature (Promitor)

The following describes how operating information is collected for each monitoring target.

(h) Red Hat OpenShift

In Red Hat OpenShift, Prometheus as a default monitoring component collects operating information from scraping targets (kube-state-metrics, node_exporter, and kubelet) and sends the information to JP1/IM - Manager.

■ Key metric items

The key metric items for Red Hat OpenShift are defined in the metric definition file (initial status) for each scraping target for container monitoring, as shown in the table below. For details, see the description on each metric definition file in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference (2. Definition Files or 10. IM Exporter Definition Files).

Scraping target

Metric definition file

kube-stat-metrics

Container monitoring metric definition file (metrics_kubernetes.conf)

node_exporter

Node exporter metric definition file (metrics_node_exporter.conf)

kubelet

Container monitoring metric definition file (metrics_kubernetes.conf)

You can add more metric items to the metric definition file. The following table shows the metrics you can specify with PromQL statements used within the definition file.

- When kube-stat-metrics is to be scraped

Metric name

Data to be obtained

Label

kube_cronjob_info

Info about cronjob.

instance: instance-identifier-string

job: job-name

cronjob: cronjob-name

namespace=cronjob-namespace

schedule=schedule

concurrency_policy=concurrency-policy

kube_cronjob_labels

Kubernetes labels converted to Prometheus labels.

instance: instance-identifier-string

job: job-name

cronjob: cronjob-name

namespace=cronjob-namespace

label_CRONJOB_LABEL=CRONJOB_LABEL

kube_cronjob_created

Unix creation timestamp

instance: instance-identifier-string

job: job-name

cronjob: cronjob-name

namespace=cronjob-namespace

kube_cronjob_next_schedule_time

Next time the cronjob should be scheduled. The time after lastScheduleTime, or after the cron job's creation time if it's never been scheduled.

Use this to determine if the job is delayed.

instance: instance-identifier-string

job: job-name

cronjob: cronjob-name

namespace=cronjob-namespace

kube_cronjob_status_active

Active holds pointers to currently running jobs.

instance: instance-identifier-string

job: job-name

cronjob: cronjob-name

namespace=cronjob-namespace

kube_cronjob_status_last_schedule_time

LastScheduleTime keeps information of when was the last time the job was successfully scheduled.

instance: instance-identifier-string

job: job-name

cronjob: cronjob-name

namespace=cronjob-namespace

kube_cronjob_spec_suspend

Suspend flag tells the controller to suspend subsequent executions.

instance: instance-identifier-string

job: job-name

cronjob: cronjob-name

namespace=cronjob-namespace

kube_cronjob_spec_starting_deadline_seconds

Deadline in seconds for starting the job if it misses scheduled time for any reason.

instance: instance-identifier-string

job: job-name

cronjob: cronjob-name

namespace=cronjob-namespace

kube_cronjob_metadata_resource_version

Resource version representing a specific version of the cronjob.

instance: instance-identifier-string

job: job-name

cronjob: cronjob-name

namespace=cronjob-namespace

kube_daemonset_created

Unix creation timestamp

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_daemonset_status_current_number_scheduled

The number of nodes running at least one daemon pod and are supposed to.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_daemonset_status_desired_number_scheduled

The number of nodes that should be running the daemon pod.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_daemonset_status_number_available

The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_daemonset_status_number_misscheduled

The number of nodes running a daemon pod but are not supposed to.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_daemonset_status_number_ready

The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_daemonset_status_number_unavailable

The number of nodes that should be running the daemon pod and have none of the daemon pod running and available

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_daemonset_status_observed_generation

The most recent generation observed by the daemon set controller.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_daemonset_status_updated_number_scheduled

The total number of nodes that are running updated daemon pod

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_daemonset_metadata_generation

Sequence number representing a specific generation of the desired state.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_daemonset_labels

Kubernetes labels converted to Prometheus labels.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

label_DAEMONSET_LABEL=DAEMONSET_LABEL

kube_deployment_status_replicas

The number of replicas per deployment.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_deployment_status_replicas_ready

The number of ready replicas per deployment.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_deployment_status_replicas_available

The number of available replicas per deployment.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_deployment_status_replicas_unavailable

The number of unavailable replicas per deployment.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_deployment_status_replicas_updated

The number of updated replicas per deployment.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_deployment_status_observed_generation

The generation observed by the deployment controller.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_deployment_status_condition

The current status conditions of a deployment.

instance: instance-identifier-string

job: job-name

deployment=deployment-name

namespace=deployment-namespace

condition=deployment-condition

status=true|false|unknown

kube_deployment_spec_replicas

Number of desired pods for a deployment.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_deployment_spec_paused

Whether the deployment is paused and will not be processed by the deployment controller.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_deployment_spec_strategy_rollingupdate_max_unavailable

Maximum number of unavailable replicas during a rolling update of a deployment.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_deployment_spec_strategy_rollingupdate_max_surge

Maximum number of replicas that can be scheduled above the desired number of replicas during a rolling update of a deployment.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_deployment_metadata_generation

Sequence number representing a specific generation of the desired state.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_deployment_labels

Kubernetes labels converted to Prometheus labels.

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_deployment_created

Unix creation timestamp

instance: instance-identifier-string

job: job-name

daemonset=daemonset-name

namespace=daemonset-namespace

kube_job_info

Information about job.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_job_labels

Kubernetes labels converted to Prometheus labels.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

label_JOB_LABEL=JOB_LABEL

kube_job_owner

Information about the Job's owner.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

owner_kind=owner kind

owner_name=owner name

owner_is_controller=whether owner is controller

kube_job_spec_parallelism

The maximum desired number of pods the job should run at any given time.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_job_spec_completions

The desired number of successfully finished pods the job should be run with.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_job_spec_active_deadline_seconds

The duration in seconds relative to the startTime that the job may be active before the system tries to terminate it.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_job_status_active

The number of actively running pods.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_job_status_succeeded

The number of pods which reached Phase Succeeded.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_job_status_failed

The number of pods which reached Phase Failed.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

reason=failure reason

kube_job_status_start_time

StartTime represents time when the job was acknowledged by the Job Manager.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_job_status_completion_time

CompletionTime represents time when the job was completed.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_job_complete

The job has completed its execution.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

condition=true|false|unknown

kube_job_failed

The job has failed its execution.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

condition=true|false|unknown

kube_job_created

Unix creation timestamp

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_replicaset_status_replicas

The number of replicas per ReplicaSet.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_replicaset_status_fully_labeled_replicas

The number of fully labeled replicas per ReplicaSet.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_replicaset_status_ready_replicas

The number of ready replicas per ReplicaSet.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_replicaset_status_observed_generation

The generation observed by the ReplicaSet controller.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_replicaset_spec_replicas

Number of desired pods for a ReplicaSet.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_replicaset_metadata_generation

Sequence number representing a specific generation of the desired state.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_replicaset_labels

Kubernetes labels converted to Prometheus labels.

instance: instance-identifier-string

job: job-name

replicaset=replicaset-name

namespace=replicaset-namespace

label_REPLICASET_LABEL=REPLICASET_LABEL

kube_replicaset_created

Unix creation timestamp

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_replicaset_owner

Information about the ReplicaSet's owner.

instance: instance-identifier-string

job: job-name

replicaset=replicaset-name

namespace=replicaset-namespace

owner_kind=owner kind

owner_name=owner name

owner_is_controller=whether owner is controller

kube_statefulset_status_replicas

The number of replicas per StatefulSet.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_statefulset_status_replicas_current

The number of current replicas per StatefulSet.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_statefulset_status_replicas_ready

The number of ready replicas per StatefulSet.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_statefulset_status_replicas_updated

The number of updated replicas per StatefulSet.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_statefulset_status_observed_generation

The generation observed by the StatefulSet controller.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_statefulset_replicas

Number of desired pods for a StatefulSet.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_statefulset_metadata_generation

Sequence number representing a specific generation of the desired state for the StatefulSet.

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_statefulset_created

Unix creation timestamp

instance: instance-identifier-string

job: job-name

job_name=job-name

namespace=job-namespace

kube_statefulset_labels

Kubernetes labels converted to Prometheus labels.

instance: instance-identifier-string

job: job-name

statefulset=statefulset-name

namespace=statefulset-namespace

label_STATEFULSET_LABEL=STATEFULSET_LABEL

kube_statefulset_status_current_revision

Indicates the version of the StatefulSet used to generate Pods in the sequence [0,currentReplicas).

instance: instance-identifier-string

job: job-name

statefulset=statefulset-name

namespace=statefulset-namespace

revision=statefulset-current-revision

kube_statefulset_status_update_revision

Indicates the version of the StatefulSet used to generate Pods in the sequence [replicas-updatedReplicas,replicas)

instance: instance-identifier-string

job: job-name

statefulset=statefulset-name

namespace=statefulset-namespace

revision=statefulset-current-revision

kube_namespace_created

Unix creation timestamp

instance: instance-identifier-string

job: job-name

namespace=namespace-name

kube_namespace_labels

Kubernetes labels converted to Prometheus labels

instance: instance-identifier-string

job: job-name

namespace=namespace-name

label_NS_LABEL=NS_LABEL

kube_namespace_status_phase

kubernetes namespace status phase

instance: instance-identifier-string

job: job-name

namespace=namespace-name

phase=Active|Terminating

kube_node_info

Information about a cluster node

instance: instance-identifier-string

job: job-name

node=node-address

kernel_version=kernel-version

os_image=os-image-name

container_runtime_version=container-runtime-and-version-combination

kubelet_version=kubelet-version

kubeproxy_version=kubeproxy-version

pod_cidr=pod-cidr

provider_id=provider-id

system_uuid=system-uuid

internal_ip=internal-ip

kube_node_labels

Kubernetes labels converted to Prometheus labels

instance: instance-identifier-string

job: job-name

node=node-address

label_NODE_LABEL=NODE_LABEL

kube_node_spec_unschedulable

Whether a node can schedule new pods

instance: instance-identifier-string

job: job-name

node=node-address

kube_node_spec_taint

The taint of a cluster node.

instance: instance-identifier-string

job: job-name

node=node-address

key=taint-key

value=taint-value

effect=taint-effect

kube_node_status_capacity

The capacity for different resources of a node

instance: instance-identifier-string

job: job-name

node=node-address

resource=resource-name

unit=resource-unit

kube_node_status_allocatable

The allocatable for different resources of a node that are available for scheduling

instance: instance-identifier-string

job: job-name

node=node-address

resource=resource-name

unit=resource-unit

kube_node_status_condition

The condition of a cluster node

instance: instance-identifier-string

job: job-name

node=node-address

condition=node-condition

status=true|false|unknown

kube_node_created

Unix creation timestamp

instance: instance-identifier-string

job: job-name

node=node-address

kube_pod_info

Information about pod

instance: instance-identifier-string

job: job-name

pod=pod-name

namespace=pod-namespace

host_ip=host-ip

pod_ip=pod-ip

node=node-name

created_by_kind=created_by_kind

created_by_name=created_by_name

uid=pod-uid

priority_class=priority_class

host_network=host_network

kube_pod_start_time

Start time in unix timestamp for a pod

instance: instance-identifier-string

job: job-name

pod=pod-name

namespace=pod-namespace

ip=pod-ip-address

ip_family=4 OR 6

uid=pod-uid

kube_pod_completion_time

Completion time in unix timestamp for a pod

instance: instance-identifier-string

job: job-name

pod=pod-name

namespace=pod-namespace

uid=pod-uid

kube_pod_owner

Information about the Pod's owner

instance: instance-identifier-string

job: job-name

pod=pod-name

namespace=pod-namespace

owner_kind=owner kind

owner_name=owner name

owner_is_controller=whether owner is controller

uid=pod-uid

kube_pod_labels

Kubernetes labels converted to Prometheus labels

instance: instance-identifier-string

job: job-name

pod=pod-name

namespace=pod-namespace

label_POD_LABEL=POD_LABEL

uid=pod-uid

kube_pod_status_phase

The pods current phase

instance: instance-identifier-string

job: job-name

pod=pod-name

namespace=pod-namespace

phase=Pending|Running|Succeeded|Failed|Unknown

uid=pod-uid

kube_pod_status_ready

Describes whether the pod is ready to serve requests

instance: instance-identifier-string

job: job-name

pod=pod-name

namespace=pod-namespace

condition=true|false|unknown

uid=pod-uid

kube_pod_status_scheduled

Describes the status of the scheduling process for the pod

instance: instance-identifier-string

job: job-name

pod=pod-name

namespace=pod-namespace

condition=true|false|unknown

uid=pod-uid

kube_pod_container_info

Information about a container in a pod

instance: instance-identifier-string

job: job-name

container=container-name

pod=pod-name

namespace=pod-namespace

image=image-name

image_id=image-id

image_spec=image-spec

container_id=containerid

uid=pod-uid

kube_pod_container_status_waiting

Describes whether the container is currently in waiting state

instance: instance-identifier-string

job: job-name

container=container-name

pod=pod-name

namespace=pod-namespace

uid=pod-uid

kube_pod_container_status_waiting_reason

Describes the reason the container is currently in waiting state

instance: instance-identifier-string

job: job-name

container=container-name

pod=pod-name

namespace=pod-namespace

reason=container-waiting-reason

uid=pod-uid

kube_pod_container_status_running

Describes whether the container is currently in running state

instance: instance-identifier-string

job: job-name

container=container-name

pod=pod-name

namespace=pod-namespace

uid=pod-uid

kube_pod_container_state_started

Start time in unix timestamp for a pod container

instance: instance-identifier-string

job: job-name

container=container-name

pod=pod-name

namespace=pod-namespace

uid=pod-uid

kube_pod_container_status_terminated

Describes whether the container is currently in terminated state

instance: instance-identifier-string

job: job-name

container=container-name

pod=pod-name

namespace=pod-namespace

uid=pod-uid

kube_pod_container_status_ready

Describes whether the containers readiness check succeeded

instance: instance-identifier-string

job: job-name

container=container-name

pod=pod-name

namespace=pod-namespace

uid=pod-uid

kube_pod_container_status_restarts_total

The number of container restarts per container(Counter)

container=container-name

namespace=pod-namespace

instance: instance-identifier-string

job: job-name

pod=pod-name

uid=pod-uid

kube_pod_created

Unix creation timestamp

instance: instance-identifier-string

job: job-name

pod=pod-name

namespace=pod-namespace

uid=pod-uid

kube_pod_restart_policy

Describes the restart policy in use by this pod

instance: instance-identifier-string

job: job-name

pod=pod-name

namespace=pod-namespace

type=Always|Never|OnFailure

uid=pod-uid

kube_pod_init_container_info

Information about an init container in a pod

instance: instance-identifier-string

job: job-name

container=container-name

pod=pod-name

namespace=pod-namespace

image=image-name

image_id=image-id

image_spec=image-spec

container_id=containerid

uid=pod-uid

kube_pod_init_container_status_waiting

Describes whether the init container is currently in waiting state

instance: instance-identifier-string

job: job-name

container=container-name

pod=pod-name

namespace=pod-namespace

uid=pod-uid

kube_pod_init_container_status_running

Describes whether the init container is currently in running state

instance: instance-identifier-string

job: job-name

container=container-name

pod=pod-name

namespace=pod-namespace

uid=pod-uid

kube_pod_init_container_status_terminated

Describes whether the init container is currently in terminated state

instance: instance-identifier-string

job: job-name

container=container-name

pod=pod-name

namespace=pod-namespace

uid=pod-uid

kube_pod_init_container_status_ready

Describes whether the init containers readiness check succeeded

instance: instance-identifier-string

job: job-name

container=container-name

pod=pod-name

namespace=pod-namespace

uid=pod-uid

kube_pod_init_container_status_restarts_total

The number of restarts for the init container

instance: instance-identifier-string

job: job-name

container=container-name

namespace=pod-namespace

pod=pod-name

uid=pod-uid

kube_pod_spec_volumes_persistentvolumeclaims_info

Information about persistentvolumeclaim volumes in a pod

instance: instance-identifier-string

job: job-name

pod=pod-name

namespace=pod-namespace

volume=volume-name

persistentvolumeclaim=persistentvolumeclaim-claimname

uid=pod-uid

kube_pod_spec_volumes_persistentvolumeclaims_readonly

Describes whether a persistentvolumeclaim is mounted read only

instance: instance-identifier-string

job: job-name

pod=pod-name

namespace=pod-namespace

volume=volume-name

persistentvolumeclaim=persistentvolumeclaim-claimname

uid=pod-uid

kube_pod_status_scheduled_time

Unix timestamp when pod moved into scheduled status

instance: instance-identifier-string

job: job-name

pod=pod-name

namespace=pod-namespace

uid=pod-uid

kube_pod_status_unschedulable

Describes the unschedulable status for the pod

instance: instance-identifier-string

job: job-name

pod=pod-name

namespace=pod-namespace

uid=pod-uid

- When node_exporter is to be scraped

See Key metric items in 9.5.3(2)(d) Node exporter.

- When kubelet is to be scraped

Metric name

Data to be obtained

Label

container_blkio_device_usage_total

Blkio device bytes usage

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

major: major-identifier

minor: minor-identifier

operation: operation (Async, Sync, Discard, Read, Write, or Total)

container_cpu_cfs_periods_total

Number of elapsed enforcement period intervals

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_cpu_cfs_throttled_periods_total

Number of throttled period intervals

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_cpu_cfs_throttled_seconds_total

Total time duration the container has been throttled

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_cpu_system_seconds_total

Cumulative system cpu time consumed

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_cpu_usage_seconds_total

Cumulative cpu time consumed

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

cpu: CPU-name

container_cpu_user_seconds_total

Cumulative user cpu time consumed

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_fs_inodes_free

Number of available Inodes

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_inodes_total

Total number of Inodes

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_io_current

Number of I/Os currently in progress

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_io_time_seconds_total

Cumulative count of seconds spent doing I/Os

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_io_time_weighted_seconds_total

Cumulative weighted I/O time

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_limit_bytes

Number of bytes that can be consumed by the container on this filesystem

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_reads_bytes_total

Cumulative count of bytes read

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_read_seconds_total

Cumulative count of seconds spent reading

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_reads_merged_total

Cumulative count of reads merged

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_reads_total

Cumulative count of reads completed

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_sector_reads_total

Cumulative count of sector reads completed

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_sector_writes_total

Cumulative count of sector writes completed

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_usage_bytes

Number of bytes that are consumed by the container on this filesystem

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_writes_bytes_total

Cumulative count of bytes written

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_write_seconds_total

Cumulative count of seconds spent writing

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_writes_merged_total

Cumulative count of writes merged

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_fs_writes_total

Cumulative count of writes completed

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

device: device-name

container_memory_cache

Total page cache memory

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_memory_failcnt

Number of memory usage hits limits

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_memory_failures_total

Cumulative count of memory allocation failures

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

failure_type: cause-of-failure (pgfault or pgmajfault)

scope: scope (container or hierarchy)

container_memory_mapped_file

Size of memory mapped files

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_memory_max_usage_bytes

Maximum memory usage recorded

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_memory_rss

Size of RSS

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_memory_swap

Container swap usage

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_memory_usage_bytes

Current memory usage, including all memory regardless of when it was accessed

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_memory_working_set_bytes

Current working set

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_spec_cpu_period

CPU period of the container

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_spec_cpu_quota

CPU quota of the container

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_spec_cpu_shares

CPU share of the container

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_spec_memory_limit_bytes

Memory limit for the container

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_spec_memory_reservation_limit_bytes

Memory reservation limit for the container

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

container_spec_memory_swap_limit_bytes

Memory swap limit for the container

id: container-identifier

name: container-name

image: image-name

container: container-name (defined as kubernetes)

namespace: namespace

pod: pod-name

(i) Kubernetes

In Kubernetes, the user-specific Prometheus that monitors the Kubernetes environment collects operating information from scraping targets (kube-state-metrics, node_exporter, and kubelet) and sends the information to JP1/IM - Manager.

The following table lists the names of components to be monitored by Kubernetes.

Configuration component name

Monitoring target

Component name

Cluster

Y

Cluster

Control Plane

Host

Y#1

Node

Service (such as apiserver)

--

--

Worker node

Host

Y#1

Node

Service (such as apiserver)

--

--

Container

--

--

Namespace

Y#1

Namespace

Workload#2

Y#1

See the table in #2.

Pod

Y

Pod

Legend:

Y: Monitored, --: Not monitored

#1

Not supported by AKS.

#2

The workloads can be divided into the six types shown in the following table.

Type of workload

Component name

CronJob

CronJob

Job

Job

DaemonSet

DaemonSet

Deployment

Deployment

ReplicaSet

ReplicaSet

StatefulSet

StatefulSet

■ Key metric items

See Key metric items in 12.5.2(2)(h) Red Hat OpenShift.

(j) Amazon Elastic Kubernetes Service (EKS)

In Amazon Elastic Kubernetes Service (EKS), Prometheus or an AWS Distro for OpenTelemetry (ADOT) agent (which uses Prometheus receiver and exporter) collects information from scraping targets (kube-state-metrics, node_exporter, and kubelet) and sends the information to JP1/IM - Manager.

If you want to monitor the EKS on Fargate service, you need to use the ADOT agent in order to collect performance data of pods, as shown in the following table.

Collection tool

Service to be monitored

EKS on EC2

EKS on Fargate

Prometheus

Y

C

ADOT agent

Y

Y

Legend:

Y: The tool can collect metrics (and pods' performance data can be collected).

C (conditional): The tool can collect metrics (whereas pods' performance data cannot be collected).

■ Key metric items

See Key metric items in 12.5.2(2)(h) Red Hat OpenShift.

(k) Azure Kubernetes Service (AKS)

To monitor Azure Kubernetes Service (AKS), the Azure monitoring capability (Promitor) is used to collect default AKS information. For details on Promitor, see 12.5.2(2)(f) Promitor.

■ Key metric items

he key metric items when Promitor monitors AKS are defined in the Promitor metric definition file (initial status). For details, see Promitor metric definition file (metrics_promitor.conf) of 10. IM Exporter Definition Files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can add more metric items to the Promitor metric definition file. For details on the AKS-monitoring metrics you can specify with PromQL statements used within the definition file, see Metrics you can collect in 12.5.2(2)(f) Promitor.

(l) Log metrics

This capability can generate and measure log metrics from log files created by monitoring targets.

■ Key metric items

You define what figures you need from the log files created by your monitoring targets in the log metrics definition file (fluentd_any-name_logmetrics.conf). These definitions allow you to get quantified data (log metrics) as metric items.

For details on the log metrics definition file, see Log metrics definition file (fluentd_any-name_logmetrics.conf) of 10. IM Exporter Definition Files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

■ Sample files

The following provides descriptions of sample files for when you use the log metrics feature. If you copy the sample files, be careful of the linefeed codes. For details, see the description of each file of 2. Definition Files and 10. IM Exporter Definition Files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference. These sample files are based on the assumptions in Assumptions of the sample files. Copy each file and change the settings according to your monitoring targets.

- Assumptions of the sample files

The sample files described here assume that HostA, a monitored host (integrated agent host), exists and JP1/IM - Agent is installed in it, and that WebAppA, an application running on HostA, creates the following log file.

- ControllerLog.log

As shown in target log message 1, a log message is created, saying that an HTTP endpoint in WebAppA is used, at the start of processing of the request for that endpoint. The log message also indicates the number of records handled upon request processing.

Target log message 1:

...
2022-10-19 10:00:00 [INFO] c.b.springbootlogging.LoggingController : endpoint "/register" started. Target record: 5.
...

In the sample files, a regular expression to match target log message 1 is used, and the number of the log messages that match the expression is counted. The number is then displayed in the Trends tab of the JP1/IM integrated operation viewer as log metric 1, Requests to the register Endpoint.

The definition for log metric 1 uses counter as its log metric type.

In addition, the regular expression used in the above also extracts the number indicated as Target record from target log message 1, and then the extracted numbers are summed up. The total is then displayed in the Trends tab of the JP1/IM integrated operation viewer as log metric 2, Number of Registered Records.

The definition for log metric 2 uses counter as its log metric type.

Fluentd workers (multi-process workers feature) for the number of log files to be monitored are required. For details on the worker settings related to the log metrics feature, see the log metrics definition file (fluentd_any-name_logmetrics.conf). Here, it is assumed that 11 fluentd workers are running, and ControllerLog.log is monitored by a worker whose worker ID is 10.

These sample files also assume the tree structure consisting of the following IM management nodes:

All Systems
 + Host A
    + Application Server
       + WebAppA
- Target files in this example

The target files used in this example are as follows:

  • Integrated manager host

    - User-specific metric definition file

  • Integrated agent host

    - Prometheus configuration file

    - User-specific discovery configuration file

    - Log metrics definition file

    - Fluentd log monitoring target definition file

- Sample user-specific metric definition file

- File name: metrics_logmatrics1.conf

- Written code

[
  {
    "name":"logmetrics_request_endpoint_register",
    "default":true,
    "promql":"logmetrics_request_endpoint_register and $jp1im_TrendData_labels",
    "resource_en":{
      "category":"HTTP",
      "label":"request_num_of_endpoint_register",
      "description":"The request number of endpoint register",
      "unit":"request"
    },
    "resource_ja":{
      "category":"HTTP",
      "label":"Requests to the register Endpoint",
      "description":"The request number of endpoint register",
      "unit":"request"
    }
  },
  {
    "name":"logmetrics_num_of_registeredrecord",
    "default":true,
    "promql":"logmetrics_num_of_registeredrecord and $jp1im_TrendData_labels",
    "resource_en":{
      "category":"DB",
      "label":"logmetrics_num_of_registeredrecord",
      "description":"The number of registered record",
      "unit":"record"
    },
    "resource_ja":{
      "category":"DB",
      "label":"Number of Registered Records",
      "description":"The number of registered record",
      "unit":"record"
    }
  }
]
Note

The storage directory, written code, and file name follow the format of the user-specific metric definition file (metrics_any-Prometheus-trend-name.conf).

- Sample Prometheus configuration file

- File name: jpc_prometheus_server.yml

- Written code

global:
  ...
(omitted)
  ...
scrape_configs:
  - job_name: 'LogMetrics'
    
    file_sd_configs:
      - files:
        - 'user/user_file_sd_config_logmetrics.yml'
    
    relabel_configs:
      - target_label: jp1_pc_nodelabel
        replacement: Log trapper(Fluentd)
    
    metric_relabel_configs:
      - target_label: jp1_pc_nodelabel
        replacement: ControllerLog
      - source_labels: ['__name__']
        regex: 'logmetrics_request_endpoint_register|logmetrics_num_of_registeredrecord'
        action: 'keep'
      - regex: (jp1_pc_multiple_node|jp1_pc_agent_create_flag)
        action: labeldrop
 
  ...
(omitted)
  ...
Note

The storage directory and written code follow the format of the Prometheus configuration file (jpc_prometheus_server.yml). You do not have to create a new file. Instead, you add the scrape_configs section for the log metrics feature to the Prometheus configuration file (jpc_prometheus_server.yml) created during installation.

- Sample user-specific discovery configuration file

- File name: user_file_sd_config_logmetrics.yml

- Written code

- targets:
  - HostA:24830
  labels:
    jp1_pc_exporter: logmetrics
    jp1_pc_category: WebAppA
    jp1_pc_trendname: logmetrics1
    jp1_pc_multiple_node: "{__name__=~'logmetrics_.*'}"
    jp1_pc_agent_create_flag: false
Note

The storage directory and written code follow the format of the user-specific discovery configuration file (file_sd_config_any-name.yml).

ControllerLog.log is monitored by the worker whose Fluentd worker ID is 10. Thus, when 24820 is set for port in the Sample log metrics definition file, the port number of the worker monitoring ControllerLog.log is 24820 + 10 = 24830.

- Sample log metrics definition file

- File name: fluentd_WebAppA_logmetrics.conf

- Written code

## Input
<worker 10>
  <source>
    @type prometheus
    bind '0.0.0.0'
    port 20732
    metrics_path /metrics
  </source>
</worker>
## Extract target log message 1
<worker 10>
  <source>
    @type tail
    @id logmetrics_counter
    path /usr/lib/WebAppA/ControllerLog/ControllerLog.log
    tag WebAppA.ControllerLog
    pos_file ../data/fluentd/tail/ControllerLog.pos
    read_from_head true
    <parse>
      @type regexp
      expression /^(?<logtime>[^\[]*) \[(?<loglebel>[^\]]*)\] (?<class>[^\[]*) : endpoint "\/register" started. Target record: (?<record_num>\d[^\[]*).$/
      time_key logtime
      time_format %Y-%m-%d %H:%M:%S
      types record_num:integer
    </parse>
  </source>
 
## Output
## Define log metrics 1 and 2
  <match WebAppA.ControllerLog>
    @type prometheus
    <metric>
      name logmetrics_request_endpoint_register
      type counter
      desc The request number of endpoint register
    </metric>
    <metric>
      name logmetrics_num_of_registeredrecord
      type counter
      desc The number of registered record
      key record_num
      <labels>
      loggroup ${tag_parts[0]}
      log ${tag_parts[1]}
      </labels>
    </metric>
  </match>
</worker>
Note

The storage directory and written code follow the format of the log metrics definition file (fluentd_any-name_logmetrics.conf).

- Sample Fluentd log monitoring target definition file

- File name: jpc_fluentd_common_list.conf

- Written code

## [Target Settings]
  ...
(omitted)
  ...
@include user/fluentd_WebAppA_logmetrics.conf
Note

The storage directory and written code follow the format of the Fluentd log monitoring target definition file (jpc_fluentd_common_list.conf) in JP1/IM - Agent definition files. You do not have to create a new file. Instead, you add the include section for the log metrics feature to the Fluentd log monitoring target definition file (jpc_fluentd_common_list.conf) created during installation.

(m) Script exporter

Script exporter runs scripts on a host and gets results.

Installed in the same host as Prometheus, Script exporter runs a script on the host and gets a result when triggered by a scraping request from Prometheus server, and returns the result to the server.

Developing a script that gets UAP information and converts it to a metric and adding the script to Script exporter enables you to monitor applications that are not supported by Exporter as you want.

■ Key metric items

The key Script exporter metric items are defined in the Script exporter metric definition file (initial status). For details, see Script exporter metric definition file (metrics_script_exporter.conf) of 10. IM Exporter Definition Files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.

You can add more metric items to the metric definition file. The following table shows the metrics you can specify with PromQL statements used within the definition file.

Metric name

Data to be obtained

Label

script_success

Script exit status (0 = error, 1 = success)

instance: instance-identifier-string

job: job-name

script: script-name

script_duration_seconds

Script execution time, in seconds.

instance: instance-identifier-string

job: job-name

script: script-name

script_exit_code

The exit code of the script.

instance: instance-identifier-string

job: job-name

script: script-name