12.5.2 Performance monitoring capabilities
- Organization of this subsection
(1) Communication capability
(a) Communication protocol
The following table lists the communication protocols used by the IM Exporter add-on programs.
Connected from |
Connected to |
Protocol |
Authentication method |
---|---|---|---|
Yet another cloudwatch exporter |
Amazon CloudWatch |
See 9.5.3(1)(a) Communication protocols and authentication methods of JP1/IM - Agent. |
|
Promitor Scraper |
Azure Monitor |
HTTPS |
No client authentication |
Promitor Resource Discovery |
Azure Resource Graph |
HTTPS |
No client authentication |
Promitor Scraper |
Promitor Resource Discovery |
HTTP |
No authentication |
Prometheus |
Fluentd |
HTTP |
No authentication |
(b) Network configuration
The environments where the IM Exporter add-on programs are available follow the standards for JP1/IM. The following table shows the proxy configurations that are available.
Connected from |
Connected to |
Available proxy configuration |
---|---|---|
Yet another cloudwatch exporter |
Amazon CloudWatch |
See 9.5.3(1)(b) Network configuration of JP1/IM - Agent. |
Promitor Scraper |
Azure Monitor |
|
Promitor Resource Discovery |
Azure Resource Graph |
The following table shows what data is transmitted by the IM Exporter add-on programs.
Connected from |
Connected to |
Data to be transmitted |
Authentication method |
---|---|---|---|
Yet another cloudwatch exporter |
Amazon CloudWatch |
See 9.5.3(1)(b) Network configuration of JP1/IM - Agent. |
|
Promitor Scraper |
Azure Monitor |
Azure Monitor data (metrics information) |
|
Promitor Resource Discovery |
Azure Resource Graph |
Azure Resource Graph data (resources exploration results) |
(2) Performance data collection capabilities
With these capabilities, Prometheus server collects performance data from monitoring targets. There are two capabilities available as follows:
-
Scraping (Prometheus server)
-
Operating data collection from monitoring targets (Exporters)
For details, see 9.5.3(2) Performance data collection function of JP1/IM - Agent.
(a) Scraping capability
Scraping is defined on a scraping job basis. In JP1/IM - Agent, scraping jobs with names that correspond to the types of Exporters are defined by default.
If a discovery configuration file is used for monitoring through UAP monitoring, jobs should be defined. Also, additional settings are required for the scraping definitions of the log metrics feature.
For details on the scraping description of the log metrics feature, see 10.1.2(2) Setting up scraping definitions (required) of IM Exporter in the manual JP1/Integrated Management 3 - Manager Configuration Guide.
The following table lists the default scraping definition for each IM Exporter add-on program.
Scraping job name |
Scraping definition |
---|---|
jpc_windows |
Scraping definition for Windows exporter |
jpc_process |
Scraping definition for Process exporter |
jpc_cloudwatch |
Scraping definition for Yet another cloudwatch exporter |
jpc_promitor |
Scraping definition for Promitor |
jpc_script |
Scraping definition for Script exporter |
Prometheus server scrapes targets and receives different metrics from the Exporters depending on the types of Exporters. For details, see the description of the metric definition file for each Exporter under 10. IM Exporter definition files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
(b) Operating data collection from monitoring targets
The following describes the capabilities of the IM Exporter add-on programs, which collect operating information (performance data) from monitoring targets.
(c) Windows exporter
Windows exporter, built into a monitored Windows host, collects operating information from that host. For details, see 9.5.3(2) Performance data collection function of JP1/IM - Agent.
In IM Exporter, operating information of processes can be collected in addition to the capabilities of Windows exporter that comes with JP1/IM - Agent. process is added by default to the collectors available.
■ Key metric items
The key Windows exporter metric items are defined in the Windows exporter metric definition file (initial status). For details, see the description of Windows exporter metric definition file (metrics_windows_exporter.conf) of JP1/IM - Agent in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
In IM Exporter, the metric items listed in the table below can be added to the metric definition file. The following table shows the metrics you can specify with PromQL statements used within the definition file.
Metric name |
Collector |
Data to be obtained |
Label |
---|---|---|---|
windows_process_start_time |
process |
Time of process start |
instance: instance-identifier-string job: job-name process: process-name process_id: process-ID creating_process_id: creator-process-ID |
windows_process_cpu_time_total |
process |
Returns elapsed time that all of the threads of this process used the processor to execute instructions by mode (privileged, user). An instruction is the basic unit of execution in a computer, a thread is the object that executes instructions, and a process is the object created when a program is run. Code executed to handle some hardware interrupts and trap conditions is included in this count. |
instance: instance-identifier-string job: job-name process: process-name process_id: process-ID creating_process_id: creator-process-ID mode: mode (privileged or user) |
windows_process_io_bytes_total |
process |
Bytes issued to I/O operations in different modes (read, write, other). This property counts all I/O activity generated by the process to include file, network, and device I/Os. Read and write mode includes data operations; other mode includes those that do not involve data, such as control operations. |
instance: instance-identifier-string job: job-name process: process-name process_id: process-ID creating_process_id: creator-process-ID mode: mode (privileged or user) |
windows_process_io_operations_total |
process |
I/O operations issued in different modes (read, write, other). This property counts all I/O activity generated by the process to include file, network, and device I/Os. Read and write mode includes data operations; other mode includes those that do not involve data, such as control operations. |
instance: instance-identifier-string job: job-name process: process-name process_id: process-ID creating_process_id: creator-process-ID mode: mode (read, write, or other) |
windows_process_page_faults_total |
process |
Page faults by the threads executing in this process. A page fault occurs when a thread refers to a virtual memory page that is not in its working set in main memory. This can cause the page not to be fetched from disk if it is on the standby list and hence already in main memory, or if it is in use by another process with which the page is shared. |
instance: instance-identifier-string job: job-name process: process-name process_id: process-ID creating_process_id: creator-process-ID |
windows_process_page_file_bytes |
process |
Current number of bytes this process has used in the paging file(s). Paging files are used to store pages of memory used by the process that are not contained in other files. Paging files are shared by all processes, and lack of space in paging files can prevent other processes from allocating memory. |
instance: instance-identifier-string job: job-name process: process-name process_id: process-ID creating_process_id: creator-process-ID |
windows_process_pool_bytes |
process |
Pool Bytes is the last observed number of bytes in the paged or nonpaged pool. The nonpaged pool is an area of system memory (physical memory used by the operating system) for objects that cannot be written to disk, but must remain in physical memory as long as they are allocated. The paged pool is an area of system memory (physical memory used by the operating system) for objects that can be written to disk when they are not being used. Nonpaged pool bytes is calculated differently than paged pool bytes, so it might not equal the total of paged pool bytes. |
instance: instance-identifier-string job: job-name process: process-name process_id: process-ID creating_process_id: creator-process-ID pool: paged (pool paged) or nonpaged (pool non paged) |
windows_process_priority_base |
process |
|
instance: instance-identifier-string job: job-name process: process-name process_id: process-ID creating_process_id: creator-process-ID |
windows_process_private_bytes |
process |
Current number of bytes this process has allocated that cannot be shared with other processes. |
instance: instance-identifier-string job: job-name process: process-name process_id: process-ID creating_process_id: creator-process-ID |
windows_process_virtual_bytes |
process |
Current size, in bytes, of the virtual address space that the process is using. Use of virtual address space does not necessarily imply corresponding use of either disk or main memory pages. Virtual space is finite and, by using too much, the process can limit its ability to load libraries. |
instance: instance-identifier-string job: job-name process: process-name process_id: process-ID creating_process_id: creator-process-ID |
■ Comparison with key performance data that can be collected by JP1/PFM - Agent for Platform
The following table shows whether Windows exporter can collect key performance data that can be collected by JP1/PFM - Agent for Platform as metrics, in comparison with the records JP1/PFM - Agent for Platform uses for collection.
Key performance data that can be collected by JP1/PFM - Agent for Platform |
Whether Windows exporter can collect it as a metric |
|||
---|---|---|---|---|
Record name (Record ID) |
Information stored in the record |
Record is based on |
What can be collected |
What cannot be collected |
Process Detail (PD) |
Performance data that shows the state of paging, memory usage, and time usage of one process at a point in time. |
Process ID |
This corresponds to where a node is created on a process_id basis. |
|
Process Detail Interval (PDI) |
Performance data that shows the state of paging, memory usage, and time usage of one process at a point in time. |
Process ID |
The metrics to be collected are all included in PD. The metrics that are determined through the calculation of the average or frequency can be obtained by calculating using the start time of the process, not the collection interval. |
-- |
Process End Detail (PD_PEND) |
Performance data that shows the state after the process ends. |
Process ID |
-- |
The information of any ended process cannot be collected. |
Workgroup Summary (PI_WGRP) |
Performance data obtained by summarizing a record stored in the Process Detail (PD) record at a point in time on a workgroup basis. |
Workgroup |
A workgroup is a JP1/PFM-specific unit, which cannot be collected. |
-- |
Application Process Interval (PD_APSI) |
Performance data that shows the state of a process for which process monitoring has been configured, at a point in time. |
Process ID |
A given unit cannot be specified. The metrics to be collected are all included in APS. |
-- |
Application Process Overview (PD_APS) |
Performance data that shows the state of a process at a point in time. |
Process ID |
A given unit cannot be specified, but this corresponds to where a node is created on a process basis. |
|
- Legend:
-
--: Not applicable
(d) Process exporter
Process exporter, built into a monitored Linux host, collects operating information of processes running on that host.
Installed in the same host as Prometheus server, Process exporter collects operating information of the processes from the Linux OS on the host when triggered by scraping requests from Prometheus server, and returns it to the server.
Process exporter allows you to collect process-related operating information, which cannot be obtained through monitoring from outside the host (such as synthetic monitoring with URLs or CloudWatch), from within the host.
■ Key metric items
The key Process exporter metric items are defined in the Process exporter metric definition file (initial status). For details, see Process exporter metric definition file (metrics_process_exporter.conf) of 10. IM Exporter definition files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
You can add more metric items to the metric definition file. The following table shows the metrics you can specify with PromQL statements used within the definition file.
Metric name |
Data to be obtained |
Label |
---|---|---|
namedprocess_namegroup_num_procs |
Number of processes in this group. |
instance: instance-identifier-string job: job-name groupname: group-name |
namedprocess_namegroup_cpu_seconds_total |
CPU usage based on /proc/[pid]/stat fields utime(14) and stime(15) i.e. user and system time. |
instance: instance-identifier-string job: job-name groupname: group-name mode: user or system |
namedprocess_namegroup_read_bytes_total |
Bytes read based on /proc/[pid]/io field read_bytes. As /proc/[pid]/io are set by the kernel as read only to the process' user, to get these values you should run process-exporter either as that user or as root. Otherwise, we can't read these values and you'll get a constant 0 in the metric. |
instance: instance-identifier-string job: job-name groupname: group-name |
namedprocess_namegroup_write_bytes_total |
Bytes written based on /proc/[pid]/io field write_bytes. |
instance: instance-identifier-string job: job-name groupname: group-name |
namedprocess_namegroup_major_page_faults_total |
Number of major page faults based on /proc/[pid]/stat field majflt(12). |
instance: instance-identifier-string job: job-name groupname: group-name |
namedprocess_namegroup_minor_page_faults_total |
Number of minor page faults based on /proc/[pid]/stat field minflt(10). |
instance: instance-identifier-string job: job-name groupname: group-name |
namedprocess_namegroup_context_switches_total |
Number of context switches based on /proc/[pid]/status fields voluntary_ctxt_switches and nonvoluntary_ctxt_switches. The extra label ctxswitchtype can have two values: voluntary and nonvoluntary. |
instance: instance-identifier-string job: job-name groupname: group-name ctxswitchtype: voluntary or nonvoluntary |
namedprocess_namegroup_memory_bytes |
Number of bytes of memory used. The extra label memtype can have three values:
If gathering smaps file is enabled, two additional values for memtype are added:
proportionalSwapped: Sum of SwapPss fields from /proc/[pid]/smaps |
instance: instance-identifier-string job: job-name groupname: group-name memtype: resident, virtual, swapped, proportionalResident, or proportionalSwapped |
namedprocess_namegroup_open_filedesc |
Number of file descriptors, based on counting how many entries are in the directory /proc/[pid]/fd. |
instance: instance-identifier-string job: job-name groupname: group-name |
namedprocess_namegroup_worst_fd_ratio |
Worst ratio of open filedescs to filedesc limit, amongst all the procs in the group. The limit is the fd soft limit based on /proc/[pid]/limits. |
instance: instance-identifier-string job: job-name groupname: group-name |
namedprocess_namegroup_oldest_start_time_seconds |
Epoch time (seconds since 1970/1/1) at which the oldest process in the group started. This is derived from field starttime(22) from /proc/[pid]/stat, added to boot time to make it relative to epoch. |
instance: instance-identifier-string job: job-name groupname: group-name |
namedprocess_namegroup_num_threads |
Sum of number of threads of all process in the group. Based on field num_threads(20) from /proc/[pid]/stat. |
instance: instance-identifier-string job: job-name groupname: group-name |
namedprocess_namegroup_states |
Number of threads in the group in each of various states, based on the field state(3) from /proc/[pid]/stat. The extra label state can have these values: Running, Sleeping, Waiting, Zombie, Other. |
instance: instance-identifier-string job: job-name groupname: group-name state: Running, Sleeping, Waiting, Zombie, or Other |
namedprocess_namegroup_thread_count |
Number of threads in this thread subgroup. |
instance: instance-identifier-string job: job-name groupname: group-name threadname: thread-name |
namedprocess_namegroup_thread_cpu_seconds_total |
Same as cpu_user_seconds_total and cpu_system_seconds_total, but broken down per-thread subgroup. |
instance: instance-identifier-string job: job-name groupname: group-name threadname: thread-name mode: user or system |
namedprocess_namegroup_thread_io_bytes_total |
Same as read_bytes_total and write_bytes_total, but broken down per-thread subgroup. Unlike read_bytes_total/write_bytes_total, the label iomode is used to distinguish between read and write bytes. |
instance: instance-identifier-string job: job-name groupname: group-name threadname: thread-name iomode: read or write |
namedprocess_namegroup_thread_major_page_faults_total |
Same as major_page_faults_total, but broken down per-thread subgroup. |
instance: instance-identifier-string job: job-name groupname: group-name |
namedprocess_namegroup_thread_minor_page_faults_total |
Same as minor_page_faults_total, but broken down per-thread subgroup. |
instance: instance-identifier-string job: job-name groupname: group-name |
namedprocess_namegroup_thread_context_switches_total |
Same as context_switches_total, but broken down per-thread subgroup. |
instance: instance-identifier-string job: job-name groupname: group-name |
- Important
-
-
Processes whose name contains multi-byte characters cannot be monitored.
-
Process exporter still continues to output information of processes that it collected once, even after the processes stop running. Therefore, if Process exporter is configured to collect information based on PIDs, new time-series data is added every time a process is restarted and its PID is changed, resulting in large amounts of unnecessary data.
Furthermore, it is not recommended to use PIDs in open source software (OSS), and thus version 13-00 of our software is configured not to collect PID information by default (groupname). If the user wants to manage processes on the same command line separately, we recommend operational means, such as a change in the order of arguments or the use of PIDs (however, periodic restarts are needed to prevent collected information from accumulating continuously).
Note that information collected by Windows exporter is different from what Process exporter collects, because Windows exporter collects the PID information. (If you want to exclude the PIDs from the collected information, use drop in the scraping definition of the Prometheus configuration file (jpc_prometheus_server.yml) to exclude them.)
-
■ Comparison with key performance data that can be collected by JP1/PFM - Agent for Platform
The following table shows whether Process exporter can collect key performance data that can be collected by JP1/PFM - Agent for Platform as metrics, in comparison with the records JP1/PFM - Agent for Platform uses for collection.
Key performance data that can be collected by JP1/PFM - Agent for Platform |
Whether Process exporter can collect it as a metric |
|||
---|---|---|---|---|
Record name (Record ID) |
Information stored in the record |
Record is based on |
What can be collected |
What cannot be collected |
Process Detail (PD) |
Performance data that shows the state of a process at a point in time. |
Process ID |
The data can be collected on a process ID basis if groupname is specified such that it contains {{.PID}}. It also corresponds to cases where a node is created on a process ID basis. |
|
Process Detail Interval (PDI) |
Performance data of a process over a certain unit of time. |
Process ID |
The metrics to be collected are all included in PD. The metrics that are determined through the calculation of the average or frequency can be obtained by calculating using the start time of the process, not the collection interval. |
-- |
Process Summary (PD_PDS) |
Performance data obtained by summarizing data stored in the Process Detail (PD) record at a point in time. |
System |
This can be aggregated on an instance (host) basis. |
|
Program Summary (PD_PGM) |
Performance data obtained by summarizing data stored in the Process Detail (PD) record at a point in time on a program basis. |
Program |
The data can be collected on a program basis if groupname is specified based on a program (that is, use {{.ExeBase}} or {{.ExeFull}}). |
-- |
Terminal Summary (PD_TERM) |
Performance data obtained by summarizing data stored in the Process Detail (PD) record at a point in time on a terminal basis. |
Terminal |
-- |
The data cannot be aggregated on a terminal basis because the terminal information cannot be collected. |
User Summary (PD_USER) |
Performance data obtained by summarizing data stored in the Process Detail (PD) record at a point in time on a user basis. |
User ID |
The data can be aggregated on a user basis by putting data having the same user name together with {{.Username}} contained in groupname. |
|
Workgroup Summary (PI_WGRP) |
Performance data obtained by summarizing data stored in the Process Detail (PD) record at a point in time on a workgroup basis. |
Workgroup |
A workgroup is a JP1/PFM-specific unit, which cannot be collected. |
-- |
Application Process Interval (PD_APSI) |
Performance data that shows the state of a process for which process monitoring has been configured, at a point in time. |
Process ID |
All the metrics to be collected, except for ApplicationName (which nearly corresponds to groupname of Process exporter), are included in APS. The metrics that are determined through the calculation of the average or frequency can be obtained by calculating using the start time of the process, not the collection interval. |
-- |
Application Process Overview (PD_APS) |
Performance data of processor usage over a certain unit of time. |
Process ID |
This corresponds to cases where a node is created on a groupname basis. The metrics for each process (process-ID-based) are the same as PD. |
Same as PD. |
- Legend:
-
--: Not applicable
(e) Yet another cloudwatch exporter
Yet another cloudwatch exporter collects operating information of AWS services running on the cloud environment through Amazon CloudWatch. For details, see the description in 9.5.3(2) Performance data collection function of JP1/IM - Agent.
■ Key metric items
The key metric items of Yet another cloudwatch exporter are defined in the Yet another cloudwatch exporter metric definition file (initial status). For details, see the description under Yet another cloudwatch exporter metric definition file (metrics_ya_cloudwatch_exporter.conf) of JP1/IM - Agent definition files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ CloudWatch metrics you can collect
In addition to the AWS namespaces supported by Yet another cloudwatch exporter of JP1/IM - Agent as monitoring targets, IM Exporter can collect the metrics with the AWS namespaces listed in the following table.
AWS namespace |
Metric category name on CloudWatch# |
Dimension |
---|---|---|
AWS/EBS |
Per-volume metrics |
VolumeId |
AWS/ECS |
ClusterName, ServiceName |
ClusterName ServiceName |
AWS/EFS |
File system metrics |
FileSystemId |
AWS/EFS |
File system storage metrics |
FilesSystemId StorageClass |
AWS/FSx |
File system metrics |
FileSystemId |
AWS/RDS |
Per-database metrics |
DBInstanceIdentifier |
DBClusterIdentifier |
DBClusterIdentifier |
|
AWS/SNS |
Topic metrics |
TopicName |
- #
-
The name of a class after metrics are categorized by dimension in AWS CloudWatch. You can view them in the CloudWatch website.
(f) Promitor
Promitor, included in the integrated agent, collects operating information of Azure services on the cloud environment through Azure Monitor and Azure Resource Graph.
Promitor consists of Promitor Scraper and Promitor Resource Discovery. Promitor Scraper collects metrics on resources from Azure Monitor according to schedule settings and returns them.
Metrics can be collected from target resources in two ways: one method is to specify the target resources separately in a configuration file and the other is to detect the resources automatically. If you choose to detect them automatically, Promitor Resource Discovery detects resources in a tenant through Azure Resource Graph, and based on the results, Promitor Scraper collects metric information.
In addition, both Promitor Scraper and Promitor Resource Discovery require two configuration files for each of them. One configuration file is to define runtime settings, such as authentication information, and the other is to define metric information to be collected.
■ Key metric items
The key Promitor metric items are defined in the Promitor metric definition file (initial status). For details, see the description under Promitor metric definition file (metrics_promitor.conf) of 10. IM Exporter definition files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ Metrics you can collect
Promitor can collect metrics for the following services to monitor:
You specify metrics you want to collect in the Promitor Scraper configuration file (metrics-declaration.yaml).
If you want to change the metrics specified in the Promitor Scraper settings file, see Change monitoring metrics (optional) in the manial JP1/Integrated Management 3 - Manager Configuration Guide 10.1.2(6) Setting up Promitor (d) Configuring scraping targets (required).
You can also add new metrics to the Promitor metric definition file, based on the metrics specified in the Promitor Scraper configuration file. Metrics defined in Promitor Scraper configuration file can be specified to the PromQL statement written in the definition file.
Promitor resourceType name |
Azure Monitor namespace |
Automatic discovery support |
---|---|---|
VirtualMachine |
Microsoft.Compute/virtualMachines |
Y |
FunctionApp |
Microsoft.Web/sites |
Y |
ContainerInstance |
Microsoft.ContainerInstance/containerGroups |
-- |
KubernetesService |
Microsoft.ContainerService/managedClusters |
Y |
FileStorage |
Microsoft.Storage/storageAccounts/fileServices |
-- |
BlobStorage |
Microsoft.Storage/storageAccounts/blobServices |
-- |
ServiceBusNamespace |
Microsoft.ServiceBus/namespaces |
Y |
CosmosDb |
Microsoft.DocumentDB/databaseAccounts |
Y |
SqlDatabase |
Microsoft.Sql/servers/databases |
Y |
SqlServer |
Microsoft.Sql/servers/databases Microsoft.Sql/servers/elasticPools |
-- |
SqlManagedInstance |
Microsoft.Sql/managedInstances |
Y |
SqlElasticPool |
Microsoft.Sql/servers/elasticPools |
Y |
LogicApp |
Microsoft.Logic/workflows |
Y |
- Legend:
-
Y: Automatic discovery is supported.
--: Automatic discovery is not supported.
■ Checking how Azure SDKs used by Promitor are supported
Promitor employs Azure SDK for .NET. An end of Azure SDK support is announced 12 months in advance. For details on the lifecycle of Azure SDK, see Lifecycle FAQ at the following website:
https://learn.microsoft.com/ja-jp/lifecycle/faq/azure#azure-sdk-----------
For the lifecycles of versions of Azure SDK libraries, you can find them in the following website:
https://azure.github.io/azure-sdk/releases/latest/all/dotnet.html
■ Credentials required for account information
Promitor can connect to Azure through the service principal method or the managed ID method. For details on the credentials assigned to the service principal and managed ID, see (a) Configuring the settings for establishing a connection to Azure (required) in the manual JP1/Integrated Management 3 - Manager Configuration Guide 10.1.2(6) Setting up Promitor.
(g) Container monitoring
Container environment monitoring uses different methods to collect operating information depending on monitoring targets, as listed in the following table.
Monitoring target |
How to collect operating information |
---|---|
Red Hat OpenShift |
User-specific Prometheus |
Kubernetes |
|
Amazon Elastic Kubernetes Service (EKS) |
|
Azure Kubernetes Service (AKS) |
Azure's monitoring feature (Promitor) |
The following describes how operating information is collected for each monitoring target.
(h) Red Hat OpenShift
In Red Hat OpenShift, Prometheus as a default monitoring component collects operating information from scraping targets (kube-state-metrics, node_exporter, and kubelet) and sends the information to JP1/IM - Manager.
■ Key metric items
The key metric items for Red Hat OpenShift are defined in the metric definition file (initial status) for each scraping target for container monitoring, as shown in the table below. For details, see the description on each metric definition file in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference (2. Definition Files or 10. IM Exporter Definition Files).
Scraping target |
Metric definition file |
---|---|
kube-stat-metrics |
Container monitoring metric definition file (metrics_kubernetes.conf) |
node_exporter |
Node exporter metric definition file (metrics_node_exporter.conf) |
kubelet |
Container monitoring metric definition file (metrics_kubernetes.conf) |
You can add more metric items to the metric definition file. The following table shows the metrics you can specify with PromQL statements used within the definition file.
- When kube-stat-metrics is to be scraped
Metric name |
Data to be obtained |
Label |
---|---|---|
kube_cronjob_info |
Info about cronjob. |
instance: instance-identifier-string job: job-name cronjob: cronjob-name namespace=cronjob-namespace schedule=schedule concurrency_policy=concurrency-policy |
kube_cronjob_labels |
Kubernetes labels converted to Prometheus labels. |
instance: instance-identifier-string job: job-name cronjob: cronjob-name namespace=cronjob-namespace label_CRONJOB_LABEL=CRONJOB_LABEL |
kube_cronjob_created |
Unix creation timestamp |
instance: instance-identifier-string job: job-name cronjob: cronjob-name namespace=cronjob-namespace |
kube_cronjob_next_schedule_time |
Next time the cronjob should be scheduled. The time after lastScheduleTime, or after the cron job's creation time if it's never been scheduled. Use this to determine if the job is delayed. |
instance: instance-identifier-string job: job-name cronjob: cronjob-name namespace=cronjob-namespace |
kube_cronjob_status_active |
Active holds pointers to currently running jobs. |
instance: instance-identifier-string job: job-name cronjob: cronjob-name namespace=cronjob-namespace |
kube_cronjob_status_last_schedule_time |
LastScheduleTime keeps information of when was the last time the job was successfully scheduled. |
instance: instance-identifier-string job: job-name cronjob: cronjob-name namespace=cronjob-namespace |
kube_cronjob_spec_suspend |
Suspend flag tells the controller to suspend subsequent executions. |
instance: instance-identifier-string job: job-name cronjob: cronjob-name namespace=cronjob-namespace |
kube_cronjob_spec_starting_deadline_seconds |
Deadline in seconds for starting the job if it misses scheduled time for any reason. |
instance: instance-identifier-string job: job-name cronjob: cronjob-name namespace=cronjob-namespace |
kube_cronjob_metadata_resource_version |
Resource version representing a specific version of the cronjob. |
instance: instance-identifier-string job: job-name cronjob: cronjob-name namespace=cronjob-namespace |
kube_daemonset_created |
Unix creation timestamp |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_daemonset_status_current_number_scheduled |
The number of nodes running at least one daemon pod and are supposed to. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_daemonset_status_desired_number_scheduled |
The number of nodes that should be running the daemon pod. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_daemonset_status_number_available |
The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_daemonset_status_number_misscheduled |
The number of nodes running a daemon pod but are not supposed to. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_daemonset_status_number_ready |
The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_daemonset_status_number_unavailable |
The number of nodes that should be running the daemon pod and have none of the daemon pod running and available |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_daemonset_status_observed_generation |
The most recent generation observed by the daemon set controller. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_daemonset_status_updated_number_scheduled |
The total number of nodes that are running updated daemon pod |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_daemonset_metadata_generation |
Sequence number representing a specific generation of the desired state. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_daemonset_labels |
Kubernetes labels converted to Prometheus labels. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace label_DAEMONSET_LABEL=DAEMONSET_LABEL |
kube_deployment_status_replicas |
The number of replicas per deployment. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_deployment_status_replicas_ready |
The number of ready replicas per deployment. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_deployment_status_replicas_available |
The number of available replicas per deployment. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_deployment_status_replicas_unavailable |
The number of unavailable replicas per deployment. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_deployment_status_replicas_updated |
The number of updated replicas per deployment. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_deployment_status_observed_generation |
The generation observed by the deployment controller. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_deployment_status_condition |
The current status conditions of a deployment. |
instance: instance-identifier-string job: job-name deployment=deployment-name namespace=deployment-namespace condition=deployment-condition status=true|false|unknown |
kube_deployment_spec_replicas |
Number of desired pods for a deployment. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_deployment_spec_paused |
Whether the deployment is paused and will not be processed by the deployment controller. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_deployment_spec_strategy_rollingupdate_max_unavailable |
Maximum number of unavailable replicas during a rolling update of a deployment. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_deployment_spec_strategy_rollingupdate_max_surge |
Maximum number of replicas that can be scheduled above the desired number of replicas during a rolling update of a deployment. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_deployment_metadata_generation |
Sequence number representing a specific generation of the desired state. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_deployment_labels |
Kubernetes labels converted to Prometheus labels. |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_deployment_created |
Unix creation timestamp |
instance: instance-identifier-string job: job-name daemonset=daemonset-name namespace=daemonset-namespace |
kube_job_info |
Information about job. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_job_labels |
Kubernetes labels converted to Prometheus labels. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace label_JOB_LABEL=JOB_LABEL |
kube_job_owner |
Information about the Job's owner. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace owner_kind=owner kind owner_name=owner name owner_is_controller=whether owner is controller |
kube_job_spec_parallelism |
The maximum desired number of pods the job should run at any given time. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_job_spec_completions |
The desired number of successfully finished pods the job should be run with. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_job_spec_active_deadline_seconds |
The duration in seconds relative to the startTime that the job may be active before the system tries to terminate it. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_job_status_active |
The number of actively running pods. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_job_status_succeeded |
The number of pods which reached Phase Succeeded. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_job_status_failed |
The number of pods which reached Phase Failed. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace reason=failure reason |
kube_job_status_start_time |
StartTime represents time when the job was acknowledged by the Job Manager. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_job_status_completion_time |
CompletionTime represents time when the job was completed. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_job_complete |
The job has completed its execution. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace condition=true|false|unknown |
kube_job_failed |
The job has failed its execution. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace condition=true|false|unknown |
kube_job_created |
Unix creation timestamp |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_replicaset_status_replicas |
The number of replicas per ReplicaSet. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_replicaset_status_fully_labeled_replicas |
The number of fully labeled replicas per ReplicaSet. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_replicaset_status_ready_replicas |
The number of ready replicas per ReplicaSet. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_replicaset_status_observed_generation |
The generation observed by the ReplicaSet controller. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_replicaset_spec_replicas |
Number of desired pods for a ReplicaSet. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_replicaset_metadata_generation |
Sequence number representing a specific generation of the desired state. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_replicaset_labels |
Kubernetes labels converted to Prometheus labels. |
instance: instance-identifier-string job: job-name replicaset=replicaset-name namespace=replicaset-namespace label_REPLICASET_LABEL=REPLICASET_LABEL |
kube_replicaset_created |
Unix creation timestamp |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_replicaset_owner |
Information about the ReplicaSet's owner. |
instance: instance-identifier-string job: job-name replicaset=replicaset-name namespace=replicaset-namespace owner_kind=owner kind owner_name=owner name owner_is_controller=whether owner is controller |
kube_statefulset_status_replicas |
The number of replicas per StatefulSet. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_statefulset_status_replicas_current |
The number of current replicas per StatefulSet. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_statefulset_status_replicas_ready |
The number of ready replicas per StatefulSet. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_statefulset_status_replicas_updated |
The number of updated replicas per StatefulSet. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_statefulset_status_observed_generation |
The generation observed by the StatefulSet controller. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_statefulset_replicas |
Number of desired pods for a StatefulSet. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_statefulset_metadata_generation |
Sequence number representing a specific generation of the desired state for the StatefulSet. |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_statefulset_created |
Unix creation timestamp |
instance: instance-identifier-string job: job-name job_name=job-name namespace=job-namespace |
kube_statefulset_labels |
Kubernetes labels converted to Prometheus labels. |
instance: instance-identifier-string job: job-name statefulset=statefulset-name namespace=statefulset-namespace label_STATEFULSET_LABEL=STATEFULSET_LABEL |
kube_statefulset_status_current_revision |
Indicates the version of the StatefulSet used to generate Pods in the sequence [0,currentReplicas). |
instance: instance-identifier-string job: job-name statefulset=statefulset-name namespace=statefulset-namespace revision=statefulset-current-revision |
kube_statefulset_status_update_revision |
Indicates the version of the StatefulSet used to generate Pods in the sequence [replicas-updatedReplicas,replicas) |
instance: instance-identifier-string job: job-name statefulset=statefulset-name namespace=statefulset-namespace revision=statefulset-current-revision |
kube_namespace_created |
Unix creation timestamp |
instance: instance-identifier-string job: job-name namespace=namespace-name |
kube_namespace_labels |
Kubernetes labels converted to Prometheus labels |
instance: instance-identifier-string job: job-name namespace=namespace-name label_NS_LABEL=NS_LABEL |
kube_namespace_status_phase |
kubernetes namespace status phase |
instance: instance-identifier-string job: job-name namespace=namespace-name phase=Active|Terminating |
kube_node_info |
Information about a cluster node |
instance: instance-identifier-string job: job-name node=node-address kernel_version=kernel-version os_image=os-image-name container_runtime_version=container-runtime-and-version-combination kubelet_version=kubelet-version kubeproxy_version=kubeproxy-version pod_cidr=pod-cidr provider_id=provider-id system_uuid=system-uuid internal_ip=internal-ip |
kube_node_labels |
Kubernetes labels converted to Prometheus labels |
instance: instance-identifier-string job: job-name node=node-address label_NODE_LABEL=NODE_LABEL |
kube_node_spec_unschedulable |
Whether a node can schedule new pods |
instance: instance-identifier-string job: job-name node=node-address |
kube_node_spec_taint |
The taint of a cluster node. |
instance: instance-identifier-string job: job-name node=node-address key=taint-key value=taint-value effect=taint-effect |
kube_node_status_capacity |
The capacity for different resources of a node |
instance: instance-identifier-string job: job-name node=node-address resource=resource-name unit=resource-unit |
kube_node_status_allocatable |
The allocatable for different resources of a node that are available for scheduling |
instance: instance-identifier-string job: job-name node=node-address resource=resource-name unit=resource-unit |
kube_node_status_condition |
The condition of a cluster node |
instance: instance-identifier-string job: job-name node=node-address condition=node-condition status=true|false|unknown |
kube_node_created |
Unix creation timestamp |
instance: instance-identifier-string job: job-name node=node-address |
kube_pod_info |
Information about pod |
instance: instance-identifier-string job: job-name pod=pod-name namespace=pod-namespace host_ip=host-ip pod_ip=pod-ip node=node-name created_by_kind=created_by_kind created_by_name=created_by_name uid=pod-uid priority_class=priority_class host_network=host_network |
kube_pod_start_time |
Start time in unix timestamp for a pod |
instance: instance-identifier-string job: job-name pod=pod-name namespace=pod-namespace ip=pod-ip-address ip_family=4 OR 6 uid=pod-uid |
kube_pod_completion_time |
Completion time in unix timestamp for a pod |
instance: instance-identifier-string job: job-name pod=pod-name namespace=pod-namespace uid=pod-uid |
kube_pod_owner |
Information about the Pod's owner |
instance: instance-identifier-string job: job-name pod=pod-name namespace=pod-namespace owner_kind=owner kind owner_name=owner name owner_is_controller=whether owner is controller uid=pod-uid |
kube_pod_labels |
Kubernetes labels converted to Prometheus labels |
instance: instance-identifier-string job: job-name pod=pod-name namespace=pod-namespace label_POD_LABEL=POD_LABEL uid=pod-uid |
kube_pod_status_phase |
The pods current phase |
instance: instance-identifier-string job: job-name pod=pod-name namespace=pod-namespace phase=Pending|Running|Succeeded|Failed|Unknown uid=pod-uid |
kube_pod_status_ready |
Describes whether the pod is ready to serve requests |
instance: instance-identifier-string job: job-name pod=pod-name namespace=pod-namespace condition=true|false|unknown uid=pod-uid |
kube_pod_status_scheduled |
Describes the status of the scheduling process for the pod |
instance: instance-identifier-string job: job-name pod=pod-name namespace=pod-namespace condition=true|false|unknown uid=pod-uid |
kube_pod_container_info |
Information about a container in a pod |
instance: instance-identifier-string job: job-name container=container-name pod=pod-name namespace=pod-namespace image=image-name image_id=image-id image_spec=image-spec container_id=containerid uid=pod-uid |
kube_pod_container_status_waiting |
Describes whether the container is currently in waiting state |
instance: instance-identifier-string job: job-name container=container-name pod=pod-name namespace=pod-namespace uid=pod-uid |
kube_pod_container_status_waiting_reason |
Describes the reason the container is currently in waiting state |
instance: instance-identifier-string job: job-name container=container-name pod=pod-name namespace=pod-namespace reason=container-waiting-reason uid=pod-uid |
kube_pod_container_status_running |
Describes whether the container is currently in running state |
instance: instance-identifier-string job: job-name container=container-name pod=pod-name namespace=pod-namespace uid=pod-uid |
kube_pod_container_state_started |
Start time in unix timestamp for a pod container |
instance: instance-identifier-string job: job-name container=container-name pod=pod-name namespace=pod-namespace uid=pod-uid |
kube_pod_container_status_terminated |
Describes whether the container is currently in terminated state |
instance: instance-identifier-string job: job-name container=container-name pod=pod-name namespace=pod-namespace uid=pod-uid |
kube_pod_container_status_ready |
Describes whether the containers readiness check succeeded |
instance: instance-identifier-string job: job-name container=container-name pod=pod-name namespace=pod-namespace uid=pod-uid |
kube_pod_container_status_restarts_total |
The number of container restarts per container(Counter) |
container=container-name namespace=pod-namespace instance: instance-identifier-string job: job-name pod=pod-name uid=pod-uid |
kube_pod_created |
Unix creation timestamp |
instance: instance-identifier-string job: job-name pod=pod-name namespace=pod-namespace uid=pod-uid |
kube_pod_restart_policy |
Describes the restart policy in use by this pod |
instance: instance-identifier-string job: job-name pod=pod-name namespace=pod-namespace type=Always|Never|OnFailure uid=pod-uid |
kube_pod_init_container_info |
Information about an init container in a pod |
instance: instance-identifier-string job: job-name container=container-name pod=pod-name namespace=pod-namespace image=image-name image_id=image-id image_spec=image-spec container_id=containerid uid=pod-uid |
kube_pod_init_container_status_waiting |
Describes whether the init container is currently in waiting state |
instance: instance-identifier-string job: job-name container=container-name pod=pod-name namespace=pod-namespace uid=pod-uid |
kube_pod_init_container_status_running |
Describes whether the init container is currently in running state |
instance: instance-identifier-string job: job-name container=container-name pod=pod-name namespace=pod-namespace uid=pod-uid |
kube_pod_init_container_status_terminated |
Describes whether the init container is currently in terminated state |
instance: instance-identifier-string job: job-name container=container-name pod=pod-name namespace=pod-namespace uid=pod-uid |
kube_pod_init_container_status_ready |
Describes whether the init containers readiness check succeeded |
instance: instance-identifier-string job: job-name container=container-name pod=pod-name namespace=pod-namespace uid=pod-uid |
kube_pod_init_container_status_restarts_total |
The number of restarts for the init container |
instance: instance-identifier-string job: job-name container=container-name namespace=pod-namespace pod=pod-name uid=pod-uid |
kube_pod_spec_volumes_persistentvolumeclaims_info |
Information about persistentvolumeclaim volumes in a pod |
instance: instance-identifier-string job: job-name pod=pod-name namespace=pod-namespace volume=volume-name persistentvolumeclaim=persistentvolumeclaim-claimname uid=pod-uid |
kube_pod_spec_volumes_persistentvolumeclaims_readonly |
Describes whether a persistentvolumeclaim is mounted read only |
instance: instance-identifier-string job: job-name pod=pod-name namespace=pod-namespace volume=volume-name persistentvolumeclaim=persistentvolumeclaim-claimname uid=pod-uid |
kube_pod_status_scheduled_time |
Unix timestamp when pod moved into scheduled status |
instance: instance-identifier-string job: job-name pod=pod-name namespace=pod-namespace uid=pod-uid |
kube_pod_status_unschedulable |
Describes the unschedulable status for the pod |
instance: instance-identifier-string job: job-name pod=pod-name namespace=pod-namespace uid=pod-uid |
- When node_exporter is to be scraped
See Key metric items in 9.5.3(2)(d) Node exporter.
- When kubelet is to be scraped
Metric name |
Data to be obtained |
Label |
---|---|---|
container_blkio_device_usage_total |
Blkio device bytes usage |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name major: major-identifier minor: minor-identifier operation: operation (Async, Sync, Discard, Read, Write, or Total) |
container_cpu_cfs_periods_total |
Number of elapsed enforcement period intervals |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_cpu_cfs_throttled_periods_total |
Number of throttled period intervals |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_cpu_cfs_throttled_seconds_total |
Total time duration the container has been throttled |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_cpu_system_seconds_total |
Cumulative system cpu time consumed |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_cpu_usage_seconds_total |
Cumulative cpu time consumed |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name cpu: CPU-name |
container_cpu_user_seconds_total |
Cumulative user cpu time consumed |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_fs_inodes_free |
Number of available Inodes |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_inodes_total |
Total number of Inodes |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_io_current |
Number of I/Os currently in progress |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_io_time_seconds_total |
Cumulative count of seconds spent doing I/Os |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_io_time_weighted_seconds_total |
Cumulative weighted I/O time |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_limit_bytes |
Number of bytes that can be consumed by the container on this filesystem |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_reads_bytes_total |
Cumulative count of bytes read |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_read_seconds_total |
Cumulative count of seconds spent reading |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_reads_merged_total |
Cumulative count of reads merged |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_reads_total |
Cumulative count of reads completed |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_sector_reads_total |
Cumulative count of sector reads completed |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_sector_writes_total |
Cumulative count of sector writes completed |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_usage_bytes |
Number of bytes that are consumed by the container on this filesystem |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_writes_bytes_total |
Cumulative count of bytes written |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_write_seconds_total |
Cumulative count of seconds spent writing |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_writes_merged_total |
Cumulative count of writes merged |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_fs_writes_total |
Cumulative count of writes completed |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name device: device-name |
container_memory_cache |
Total page cache memory |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_memory_failcnt |
Number of memory usage hits limits |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_memory_failures_total |
Cumulative count of memory allocation failures |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name failure_type: cause-of-failure (pgfault or pgmajfault) scope: scope (container or hierarchy) |
container_memory_mapped_file |
Size of memory mapped files |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_memory_max_usage_bytes |
Maximum memory usage recorded |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_memory_rss |
Size of RSS |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_memory_swap |
Container swap usage |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_memory_usage_bytes |
Current memory usage, including all memory regardless of when it was accessed |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_memory_working_set_bytes |
Current working set |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_spec_cpu_period |
CPU period of the container |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_spec_cpu_quota |
CPU quota of the container |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_spec_cpu_shares |
CPU share of the container |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_spec_memory_limit_bytes |
Memory limit for the container |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_spec_memory_reservation_limit_bytes |
Memory reservation limit for the container |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
container_spec_memory_swap_limit_bytes |
Memory swap limit for the container |
id: container-identifier name: container-name image: image-name container: container-name (defined as kubernetes) namespace: namespace pod: pod-name |
(i) Kubernetes
In Kubernetes, the user-specific Prometheus that monitors the Kubernetes environment collects operating information from scraping targets (kube-state-metrics, node_exporter, and kubelet) and sends the information to JP1/IM - Manager.
The following table lists the names of components to be monitored by Kubernetes.
Configuration component name |
Monitoring target |
Component name |
|
---|---|---|---|
Cluster |
Y |
Cluster |
|
Control Plane |
Host |
Y#1 |
Node |
Service (such as apiserver) |
-- |
-- |
|
Worker node |
Host |
Y#1 |
Node |
Service (such as apiserver) |
-- |
-- |
|
Container |
-- |
-- |
|
Namespace |
Y#1 |
Namespace |
|
Workload#2 |
Y#1 |
See the table in #2. |
|
Pod |
Y |
Pod |
- Legend:
-
Y: Monitored, --: Not monitored
- #1
-
Not supported by AKS.
- #2
-
The workloads can be divided into the six types shown in the following table.
Type of workload
Component name
CronJob
CronJob
Job
Job
DaemonSet
DaemonSet
Deployment
Deployment
ReplicaSet
ReplicaSet
StatefulSet
StatefulSet
■ Key metric items
See Key metric items in 12.5.2(2)(h) Red Hat OpenShift.
(j) Amazon Elastic Kubernetes Service (EKS)
In Amazon Elastic Kubernetes Service (EKS), Prometheus or an AWS Distro for OpenTelemetry (ADOT) agent (which uses Prometheus receiver and exporter) collects information from scraping targets (kube-state-metrics, node_exporter, and kubelet) and sends the information to JP1/IM - Manager.
If you want to monitor the EKS on Fargate service, you need to use the ADOT agent in order to collect performance data of pods, as shown in the following table.
Collection tool |
Service to be monitored |
|
---|---|---|
EKS on EC2 |
EKS on Fargate |
|
Prometheus |
Y |
C |
ADOT agent |
Y |
Y |
- Legend:
-
Y: The tool can collect metrics (and pods' performance data can be collected).
C (conditional): The tool can collect metrics (whereas pods' performance data cannot be collected).
■ Key metric items
See Key metric items in 12.5.2(2)(h) Red Hat OpenShift.
(k) Azure Kubernetes Service (AKS)
To monitor Azure Kubernetes Service (AKS), the Azure monitoring capability (Promitor) is used to collect default AKS information. For details on Promitor, see 12.5.2(2)(f) Promitor.
■ Key metric items
he key metric items when Promitor monitors AKS are defined in the Promitor metric definition file (initial status). For details, see Promitor metric definition file (metrics_promitor.conf) of 10. IM Exporter Definition Files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
You can add more metric items to the Promitor metric definition file. For details on the AKS-monitoring metrics you can specify with PromQL statements used within the definition file, see Metrics you can collect in 12.5.2(2)(f) Promitor.
(l) Log metrics
This capability can generate and measure log metrics from log files created by monitoring targets.
■ Key metric items
You define what figures you need from the log files created by your monitoring targets in the log metrics definition file (fluentd_any-name_logmetrics.conf). These definitions allow you to get quantified data (log metrics) as metric items.
For details on the log metrics definition file, see Log metrics definition file (fluentd_any-name_logmetrics.conf) of 10. IM Exporter Definition Files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
■ Sample files
The following provides descriptions of sample files for when you use the log metrics feature. If you copy the sample files, be careful of the linefeed codes. For details, see the description of each file of 2. Definition Files and 10. IM Exporter Definition Files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference. These sample files are based on the assumptions in Assumptions of the sample files. Copy each file and change the settings according to your monitoring targets.
- - Assumptions of the sample files
-
The sample files described here assume that HostA, a monitored host (integrated agent host), exists and JP1/IM - Agent is installed in it, and that WebAppA, an application running on HostA, creates the following log file.
- - ControllerLog.log
-
As shown in target log message 1, a log message is created, saying that an HTTP endpoint in WebAppA is used, at the start of processing of the request for that endpoint. The log message also indicates the number of records handled upon request processing.
Target log message 1:
... 2022-10-19 10:00:00 [INFO] c.b.springbootlogging.LoggingController : endpoint "/register" started. Target record: 5. ...
In the sample files, a regular expression to match target log message 1 is used, and the number of the log messages that match the expression is counted. The number is then displayed in the Trends tab of the JP1/IM integrated operation viewer as log metric 1, Requests to the register Endpoint.
The definition for log metric 1 uses counter as its log metric type.
In addition, the regular expression used in the above also extracts the number indicated as Target record from target log message 1, and then the extracted numbers are summed up. The total is then displayed in the Trends tab of the JP1/IM integrated operation viewer as log metric 2, Number of Registered Records.
The definition for log metric 2 uses counter as its log metric type.
Fluentd workers (multi-process workers feature) for the number of log files to be monitored are required. For details on the worker settings related to the log metrics feature, see the log metrics definition file (fluentd_any-name_logmetrics.conf). Here, it is assumed that 11 fluentd workers are running, and ControllerLog.log is monitored by a worker whose worker ID is 10.
These sample files also assume the tree structure consisting of the following IM management nodes:
All Systems + Host A + Application Server + WebAppA
- - Target files in this example
-
The target files used in this example are as follows:
-
Integrated manager host
- User-specific metric definition file
-
Integrated agent host
- Prometheus configuration file
- User-specific discovery configuration file
- Log metrics definition file
- Fluentd log monitoring target definition file
-
- - Sample user-specific metric definition file
-
- File name: metrics_logmatrics1.conf
- Written code
[ { "name":"logmetrics_request_endpoint_register", "default":true, "promql":"logmetrics_request_endpoint_register and $jp1im_TrendData_labels", "resource_en":{ "category":"HTTP", "label":"request_num_of_endpoint_register", "description":"The request number of endpoint register", "unit":"request" }, "resource_ja":{ "category":"HTTP", "label":"Requests to the register Endpoint", "description":"The request number of endpoint register", "unit":"request" } }, { "name":"logmetrics_num_of_registeredrecord", "default":true, "promql":"logmetrics_num_of_registeredrecord and $jp1im_TrendData_labels", "resource_en":{ "category":"DB", "label":"logmetrics_num_of_registeredrecord", "description":"The number of registered record", "unit":"record" }, "resource_ja":{ "category":"DB", "label":"Number of Registered Records", "description":"The number of registered record", "unit":"record" } } ]
- Note
-
The storage directory, written code, and file name follow the format of the user-specific metric definition file (metrics_any-Prometheus-trend-name.conf).
- - Sample Prometheus configuration file
-
- File name: jpc_prometheus_server.yml
- Written code
global: ... (omitted) ... scrape_configs: - job_name: 'LogMetrics' file_sd_configs: - files: - 'user/user_file_sd_config_logmetrics.yml' relabel_configs: - target_label: jp1_pc_nodelabel replacement: Log trapper(Fluentd) metric_relabel_configs: - target_label: jp1_pc_nodelabel replacement: ControllerLog - source_labels: ['__name__'] regex: 'logmetrics_request_endpoint_register|logmetrics_num_of_registeredrecord' action: 'keep' - regex: (jp1_pc_multiple_node|jp1_pc_agent_create_flag) action: labeldrop ... (omitted) ...
- Note
-
The storage directory and written code follow the format of the Prometheus configuration file (jpc_prometheus_server.yml). You do not have to create a new file. Instead, you add the scrape_configs section for the log metrics feature to the Prometheus configuration file (jpc_prometheus_server.yml) created during installation.
- - Sample user-specific discovery configuration file
-
- File name: user_file_sd_config_logmetrics.yml
- Written code
- targets: - HostA:24830 labels: jp1_pc_exporter: logmetrics jp1_pc_category: WebAppA jp1_pc_trendname: logmetrics1 jp1_pc_multiple_node: "{__name__=~'logmetrics_.*'}" jp1_pc_agent_create_flag: false
- Note
-
The storage directory and written code follow the format of the user-specific discovery configuration file (file_sd_config_any-name.yml).
ControllerLog.log is monitored by the worker whose Fluentd worker ID is 10. Thus, when 24820 is set for port in the Sample log metrics definition file, the port number of the worker monitoring ControllerLog.log is 24820 + 10 = 24830.
- - Sample log metrics definition file
-
- File name: fluentd_WebAppA_logmetrics.conf
- Written code
## Input <worker 10> <source> @type prometheus bind '0.0.0.0' port 20732 metrics_path /metrics </source> </worker> ## Extract target log message 1 <worker 10> <source> @type tail @id logmetrics_counter path /usr/lib/WebAppA/ControllerLog/ControllerLog.log tag WebAppA.ControllerLog pos_file ../data/fluentd/tail/ControllerLog.pos read_from_head true <parse> @type regexp expression /^(?<logtime>[^\[]*) \[(?<loglebel>[^\]]*)\] (?<class>[^\[]*) : endpoint "\/register" started. Target record: (?<record_num>\d[^\[]*).$/ time_key logtime time_format %Y-%m-%d %H:%M:%S types record_num:integer </parse> </source> ## Output ## Define log metrics 1 and 2 <match WebAppA.ControllerLog> @type prometheus <metric> name logmetrics_request_endpoint_register type counter desc The request number of endpoint register </metric> <metric> name logmetrics_num_of_registeredrecord type counter desc The number of registered record key record_num <labels> loggroup ${tag_parts[0]} log ${tag_parts[1]} </labels> </metric> </match> </worker>
- Note
-
The storage directory and written code follow the format of the log metrics definition file (fluentd_any-name_logmetrics.conf).
- - Sample Fluentd log monitoring target definition file
-
- File name: jpc_fluentd_common_list.conf
- Written code
## [Target Settings] ... (omitted) ... @include user/fluentd_WebAppA_logmetrics.conf
- Note
-
The storage directory and written code follow the format of the Fluentd log monitoring target definition file (jpc_fluentd_common_list.conf) in JP1/IM - Agent definition files. You do not have to create a new file. Instead, you add the include section for the log metrics feature to the Fluentd log monitoring target definition file (jpc_fluentd_common_list.conf) created during installation.
(m) Script exporter
Script exporter runs scripts on a host and gets results.
Installed in the same host as Prometheus, Script exporter runs a script on the host and gets a result when triggered by a scraping request from Prometheus server, and returns the result to the server.
Developing a script that gets UAP information and converts it to a metric and adding the script to Script exporter enables you to monitor applications that are not supported by Exporter as you want.
■ Key metric items
The key Script exporter metric items are defined in the Script exporter metric definition file (initial status). For details, see Script exporter metric definition file (metrics_script_exporter.conf) of 10. IM Exporter Definition Files in the manual JP1/Integrated Management 3 - Manager Command, Definition File and API Reference.
You can add more metric items to the metric definition file. The following table shows the metrics you can specify with PromQL statements used within the definition file.
Metric name |
Data to be obtained |
Label |
---|---|---|
script_success |
Script exit status (0 = error, 1 = success) |
instance: instance-identifier-string job: job-name script: script-name |
script_duration_seconds |
Script execution time, in seconds. |
instance: instance-identifier-string job: job-name script: script-name |
script_exit_code |
The exit code of the script. |
instance: instance-identifier-string job: job-name script: script-name |