Hitachi

JP1 Version 13 JP1/Integrated Management 3 - Manager Command, Definition File and API Reference


Prometheus configuration file (jpc_prometheus_server.yml)

Organization of this page

Format

Write in YAML format.

global:
  scrape_interval:      1m
  scrape_timeout:      10s
  evaluation_interval:  1m
  external_labels:
    jp1_pc_prome_hostname: "Monitoring agent host name"
  :
(Abbreviated)
  :
scrape_configs:#
  - job_name: Scrape Job Name
    
    file_sd_configs:
      - files:
        - Discovery configuration file name
    
    relabel_configs:
      - target_label: jp1_pc_nodelabel
        replacement: Node exporter
      - regex: (jp1_pc_category|jp1_pc_trendname)
        action: labeldrop
  :
(Abbreviated)
  :
remote_write:
  - url: http://host-name-of-JP1/IM - Agent:20727/ima/api/v1/proxy/service/promscale/api/v1/write
    remote_timeout: 30s
    send_exemplars: false
    queue_config:
      capacity: 10000
      max_shards: 200
      min_shards: 4
      max_samples_per_send: 3000
      batch_send_deadline: 10s
      min_backoff: 100ms
      max_backoff: 10s
#

When Script exporter discovery is specified using the HTTP-based http_sd_config method, specify the direct endpoint to http_sd_configs.url in scrape_configs in this file.

The following shows the HTTP-based service discovery endpoint for the Script exporter specified in http_sd_configs.url in scrape_configs.

http://installation-host-name:Script-exporter-port-number/discovery

In contrast to file_sd_configs, labels cannot be independently added with http_sd_configs. Labels must be added using relabel_configs. For details on the labels required, see 1.21.2(10)(b) Scraping definition for Script exporter in the JP1/Integrated Management 3 - Manager Configuration Guide.

File

jpc_prometheus_server.yml

jpc_prometheus_server.yml.model (model file)

Storage directory

■Integrated agent host

In Windows:
  • For a physical host

    Agent-path\conf\

  • For a logical host

    shared-folder\jp1ima\conf\

In Linux:
  • For a physical host

    /opt/jp1ima/conf/

  • For a logical host

    shared-directory/jp1ima/conf/

Description

This is a configuration file that defines the operation of the Prometheus server.

Character code

UTF-8 (without BOM)

Line feed code

In Windows: CR+LF

In Linux: LF

When the definitions are applied

When you run the Prometheus server reload API or restart the Prometheus server, it is reflected in the operation of the Prometheus server.

Also, if the value of the jp1_pc_prome_hostname label or the scrape definition (definition of the scrape_configs) is changed, it will be reflected in the displayed contents of the tree in the integrated operation viewer when the jddcreatetree command and the jddupdatetree command are executed after performing the above operation.

Information that is specified

For definitions of common placeholders used in the table below, see About definition of common placeholders for descriptive items in yml file.

Item

Description

Changeability

What You Setup in Your JP1/IM - Agent

JP1/IM - Agent Defaults Value

global:

--

N

--

--

[ scrape_interval: <duration> | default = 1m ]

Specify the scrape interval to the target, ranging from 15 seconds to 24 hours.

The value is specified in numbers and units. The units that can be specified are s (seconds), m (minutes), and h (hours).

<Configuration Example>

global:

scrape_interval: 5m

Y

Specifies the scrape interval.#

scrape_interval:1m

[ scrape_timeout: <duration> | default = 10s ]

Specifies the scrape request timeout period, ranging from 10 seconds to 60 minutes.

The value is specified in numbers and units. The units that can be specified are s (seconds) and m (minutes).

You must specify a value that is less than global.scrape_interval.

<Configuration Example>

global:

scrape_timeout: 20s

Y

Configure as needed.

scrape_timeout: 10s

[ evaluation_interval: <duration> | default = 1m ]

Specify the evaluation interval for the alert rule, ranging from 15 seconds to 48 hours.

The value is specified in numbers and units. You can specify the following units: s (seconds), m (minutes), and h (hours).

<Configuration Example>

global:

evaluation_interval: 15s

Y

Configure as needed.

evaluation_interval: 1m

external_labels:

Specify a label to add when notifying remote lights and Alertmanager. You can specify up to 30 of them.

N

--

--

[ <labelname>: <labelvalue> ... ]

Specify the label name and label value. The label name and label value can be up to 255 bytes each.

Do not delete jp1_pc_prome_hostname labels that are set by default.

<Configuration Example>

global:

external_labels:

labelname1: valuename1

labelname2: valuename2

Y

Since it is set by the installation script of the monitoring module, it is usually not necessary to change it.

In a clustered environment, manually set the logical host name.

external_labels:

jp1_pc_prome_hostname: "host-name"

rule_files:

Specify the alert rule file. You can specify up to 30 of them.

N

--

--

[ - <filepath_glob> ... ]

Specify a file name. The file name can be up to 255 bytes.

<Configuration Example>

rule_files:

- " jpc_alerting_rules.yml"

- "alerting_rules2.yml"

Y

You can change, add, and delete rule file names.

Normally, no changes are required.

rule_files:

- "jpc_alerting_rules.yml"

scrape_configs:

Specifies the scrape definition. You can specify up to 30 of them.

N

--

--

[ - <scrape_config> ... ]

See the description of <scrape_config> below.

Y

You can add scrape definitions.

If you have your own Exporter, add a definition.

Normally, no changes are required.

The following Exporter definitions are pre-populated:

  • node_exporter

  • windows_exporter

  • blackbox_exporter(http)

  • blackbox_exporter(icmp)

  • yet_another_cloudwatch_exporter

alerting:

Configure the settings related to Alertmanager.

N

--

--

alert_relabel_configs:

Set up relabeling for alert notifications.

N

--

--

[ - <relabel_config> ... ]

See the description of <relabel_config> below.

Y

Specify this if you want to add or change the label of the alert.

--

alertmanagers:

Configure the alert notification destination Alertmanager.

N

--

--

[ - <alertmanager_config> ... ]

See the description of <alertmanager_config> below.

Y

Specify the cohabiting Alertmanager as the alert notification destination.

--

remote_write:

Configure settings related to remote writing.

N

--

--

url: <string>

Specify the endpoint to which the remote write is sent.

<Configuration Example>

remote_write:

- url: http://integrated-agent-host-name:20727/ima/api/v1/proxy/service/promscale/write

R

Specifies the remote write endpoint for imagent on the same host.

Modify Host name and Port number to suit your needs.

url: http://localhost:20727/ima/api/v1/proxy/service/promscale/write

[ remote_timeout: <duration> | default = 30s ]

Specify the remote write timeout period in the range of 30 seconds to 60 minutes.

The value is specified in numbers and units. The units that can be specified are s (seconds) and m (minutes).

<Configuration Example>

remote_write:

- url: http://localhost:20727/ima/api/v1/proxy/service/promscale/write

remote_timeout: 1m

Y

If the remote write times out, increase the value.

remote_timeout: 30s

write_relabel_configs:

Set up relabeling during remote write.

N

--

--

[ - <relabel_config> ... ]

See the description of <relabel_config> below.

<Configuration Example>

The following is a setting example when you do not want to remotely write the node_boot_time_seconds and node_context_switches_total obtained by node_exporter command.

remote_write:

- url: http://localhost:20727/ima/api/v1/proxy/service/promscale/write

write_relabel_configs:

- source_labels: ['__name__']

regex: '(node_boot_time_seconds|node_context_switches_total)'

action: 'drop'

Y

Specify if you do not want to remotely write a specific metric.

--

[ send_exemplars: <boolean> | default = false ]

Specify to write Exemplars remotely.

N

--

send_exemplars: false

queue_config:

Set up a queue for remote write.

N

--

--

[ capacity: <int> | default = 2500 ]

Specifies the number of samples to buffer.

N

--

capacity: 10000

[ min_shards: <int> | default = 1 ]

Specify the lower limit for the number of parallel executions of remote write.

N

--

min_shards: 4

[ max_samples_per_send: <int> | default = 500]

Specifies the maximum number of samples to send at one time.

N

--

max_samples_per_send: 3000

[ batch_send_deadline: <duration> | default = 5s ]

Specifies the amount of time to wait before flushing the remaining queued samples.

N

--

batch_send_deadline: 10s

[ min_backoff: <duration> | default = 30ms ]

Specifies the minimum wait time limit for transmission retries.

N

--

min_backoff: 100ms

[ max_backoff: <duration> | default = 100ms ]

Specifies the upper limit of the wait time for transmission retries.

N

--

max_backoff: 10s

Legend:

R: Required, Y: Changeable, N: Not changeable, --: Not applicable

#

When changing this value from the initial value (1m), review the value of the range vector selector specified in the PromQL statement of the metric definition file (the time range specified by square brackets { }). For the range vector selector, specify a value that is at least twice the scrape interval. If you specify a value less than 2 times, trend information cannot be obtained or trend information cannot be obtained at some times.

Also, when monitoring using Yet another cloudwatch exporter, do not specify a value greater than 10m. If specified, the configuration may not be retrieved when the jddcreatetree command is executed.

Legend:

Y: Changeable, N: Not changeable, --: Not applicable

#
  • In case of jpc_node:

    - source_labels: ['__name__']

    regex: 'node_network_receive_bytes_total|node_network_transmit_bytes_total|node_disk_read_time_seconds_total|node_disk_write_time_seconds_total|node_boot_time_seconds|node_context_switches_total|node_cpu_seconds_total|node_disk_io_now|node_disk_io_time_seconds_total|node_disk_read_bytes_total|node_disk_reads_completed_total|node_disk_writes_completed_total|node_disk_written_bytes_total|node_filesystem_avail_bytes|node_filesystem_files|node_filesystem_files_free|node_filesystem_free_bytes|node_filesystem_size_bytes|node_intr_total|node_load1|node_load15|node_load5|node_memory_Active_file_bytes|node_memory_Buffers_bytes|node_memory_Cached_bytes|node_memory_Inactive_file_bytes|node_memory_MemAvailable_bytes|node_memory_MemFree_bytes|node_memory_MemTotal_bytes|node_memory_SReclaimable_bytes|node_memory_SwapFree_bytes|node_memory_SwapTotal_bytes|node_netstat_Icmp6_InMsgs|node_netstat_Icmp_InMsgs|node_netstat_Icmp6_OutMsgs|node_netstat_Icmp_OutMsgs|node_netstat_Tcp_InSegs|node_netstat_Tcp_OutSegs|node_netstat_Udp_InDatagrams|node_netstat_Udp_OutDatagrams|node_network_flags|node_network_iface_link|node_network_mtu_bytes|node_network_receive_errs_total|node_network_receive_packets_total|node_network_transmit_colls_total|node_network_transmit_errs_total|node_network_transmit_packets_total|node_time_seconds|node_uname_info|node_vmstat_pswpin|node_vmstat_pswpout|node_systemd_unit_state'

    action: 'keep'

    - source_labels: ['__name__']

    regex: 'node_systemd_unit_.*'

    target_label: 'jp1_pc_trendname'

    replacement: 'node_exporter_service'

    - source_labels: ['__name__']

    regex: 'node_systemd_unit_.*'

    target_label: 'jp1_pc_category'

    replacement: 'service'

    - source_labels: ['__name__','name']

    regex: 'node_systemd_unit_.*;(.*)'

    target_label: 'jp1_pc_nodelabel'

    replacement: ${1}

    - regex: jp1_pc_multiple_node

    action: labeldrop

  • In case of jpc_windows:

    - source_labels: ['__name__']

    regex: 'windows_cs_physical_memory_bytes|windows_cache_copy_read_hits_total|windows_cache_copy_reads_total|windows_cpu_time_total|windows_logical_disk_free_bytes|windows_logical_disk_idle_seconds_total|windows_logical_disk_read_bytes_total|windows_logical_disk_read_latency_seconds_total|windows_logical_disk_read_seconds_total|windows_logical_disk_reads_total|windows_logical_disk_requests_queued|windows_logical_disk_size_bytes|windows_logical_disk_write_bytes_total|windows_logical_disk_write_latency_seconds_total|windows_logical_disk_write_seconds_total|windows_logical_disk_writes_total|windows_memory_available_bytes|windows_memory_cache_bytes|windows_memory_cache_faults_total|windows_memory_page_faults_total|windows_memory_pool_nonpaged_allocs_total|windows_memory_pool_paged_allocs_total|windows_memory_swap_page_operations_total|windows_memory_swap_pages_read_total|windows_memory_swap_pages_written_total|windows_memory_system_cache_resident_bytes|windows_memory_transition_faults_total|windows_net_bytes_received_total|windows_net_bytes_sent_total|windows_net_bytes_total|windows_net_packets_sent_total|windows_net_packets_received_total|windows_system_context_switches_total|windows_system_processor_queue_length|windows_system_system_calls_total|windows_process_start_time|windows_process_cpu_time_total|windows_process_handles|windows_process_io_bytes_total|windows_process_io_operations_total|windows_process_page_faults_total|windows_process_page_file_bytes|windows_process_pool_bytes|windows_process_priority_base|windows_process_private_bytes|windows_process_threads|windows_process_virtual_bytes|windows_process_working_set_private_bytes|windows_process_working_set_peak_bytes|windows_process_working_set_bytes|windows_service_state'

    action: 'keep'

    - source_labels: ['__name__']

    regex: 'windows_process_.*'

    target_label: 'jp1_pc_trendname'

    replacement: 'windows_exporter_process'

    - source_labels: ['__name__','process']

    regex: 'windows_process_.*;(.*)'

    target_label: 'jp1_pc_nodelabel'

    replacement: ${1}

    - source_labels: ['__name__']

    regex: 'windows_service_.*'

    target_label: 'jp1_pc_trendname'

    replacement: 'windows_exporter_service'

    - source_labels: ['__name__']

    regex: 'windows_service_.*'

    target_label: 'jp1_pc_category'

    replacement: 'service'

    - source_labels: ['__name__','name']

    regex: 'windows_service_.*;(.*)'

    target_label: 'jp1_pc_nodelabel'

    replacement: ${1}

    - regex: jp1_pc_multiple_node

    action: labeldrop

  • In case of jpc_blackbox_http:

    - source_labels: ['__name__']

    regex: 'probe_http_duration_seconds|probe_http_content_length|probe_http_uncompressed_body_length|probe_http_redirects|probe_http_ssl|probe_http_status_code|probe_ssl_earliest_cert_expiry|probe_ssl_last_chain_expiry_timestamp_seconds|probe_ssl_last_chain_info|probe_tls_version_info|probe_http_version|probe_failed_due_to_regex|probe_http_last_modified_timestamp_seconds|probe_success|probe_duration_seconds'

    action: 'keep'

  • In case of jpc_blackbox_icmp:

    - source_labels: ['__name__']

    regex: 'probe_icmp_duration_seconds|probe_icmp_reply_hop_limit|probe_success|probe_duration_seconds'

    action: 'keep'

  • In case of jpc_cloudwatch:

    - regex: 'tag_(jp1_pc_.*)'

    replacement: ${1}

    action: labelmap

    - regex: 'tag_(jp1_pc_.*)'

    action: 'labeldrop'

    - source_labels: ['__name__','jp1_pc_nodelabel']

    regex: '(aws_ec2_cpuutilization_average|aws_ec2_disk_read_bytes_sum|aws_ec2_disk_write_bytes_sum|aws_lambda_errors_sum|aws_lambda_duration_average|aws_s3_bucket_size_bytes_sum|aws_s3_5xx_errors_sum|aws_dynamodb_consumed_read_capacity_units_sum|aws_dynamodb_consumed_write_capacity_units_sum|aws_states_execution_time_average|aws_states_executions_failed_sum|aws_sqs_approximate_number_of_messages_delayed_sum|aws_sqs_number_of_messages_deleted_sum|aws_ebs_volume_read_bytes_sum|aws_ebs_volume_write_bytes_sum|aws_ecs_cpuutilization_average|aws_ecs_memory_utilization_average|aws_efs_total_iobytes_average|aws_efs_storage_bytes_average|aws_fsx_data_read_bytes_sum|aws_fsx_data_write_bytes_sum|aws_fsx_free_storage_capacity_average|aws_rds_cpuutilization_average|aws_rds_read_iops_average|aws_rds_write_iops_average|aws_sns_number_of_notifications_failed_sum|aws_sns_number_of_notifications_filtered_out_sum);.+$'

    action: 'keep'

    - source_labels: ['__name__','dimension_ClusterName']

    target_label: jp1_pc_nodelabel

    regex: 'aws_ecs_.+;(.+)'

    replacement: ${1}

    - source_labels: ['__name__','dimension_ServiceName']

    target_label: jp1_pc_nodelabel

    regex: 'aws_ecs_.+;(.+)'

    replacement: ${1}

  • For jpc_process

    - source_labels: [groupname]

    regex: ([^;]*?);([^;]*?);(.*)

    target_label: program

    replacement: ${1}

    - source_labels: [groupname]

    regex: ([^;]*?);([^;]*?);(.*)

    target_label: user

    replacement: ${2}

    - source_labels: [groupname]

    regex: ([^;]*?);([^;]*?);(.*)

    target_label: command_line

    replacement: ${3}

    - source_labels: [program]

    target_label: jp1_pc_nodelabel

    - source_labels: ['__name__']

    regex: 'namedprocess_namegroup_num_procs|namedprocess_namegroup_cpu_seconds_total|namedprocess_namegroup_read_bytes_total|namedprocess_namegroup_write_bytes_total|namedprocess_namegroup_major_page_faults_total|namedprocess_namegroup_minor_page_faults_total|namedprocess_namegroup_context_switches_total|namedprocess_namegroup_memory_bytes|namedprocess_namegroup_open_filedesc|namedprocess_namegroup_worst_fd_ratio|namedprocess_namegroup_oldest_start_time_seconds|namedprocess_namegroup_num_threads|namedprocess_namegroup_states|namedprocess_namegroup_thread_count|namedprocess_namegroup_thread_cpu_seconds_total|namedprocess_namegroup_thread_io_bytes_total|namedprocess_namegroup_thread_major_page_faults_total|namedprocess_namegroup_thread_minor_page_faults_total|namedprocess_namegroup_thread_context_switches_total'

    action: 'keep'

    - regex: (jp1_pc_multiple_node|jp1_pc_agent_create_flag)

    action: labeldrop

  • For jpc_promitor

    - source_labels: [resource_uri]

    regex: ([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/(.*)

    target_label: jp1_pc_nodelabel

    replacement: ${8}

    - source_labels: ['__name__']

    regex: 'azure_virtual_machine_disk_read_bytes_total|azure_virtual_machine_disk_write_bytes_total|azure_virtual_machine_percentage_cpu_average|azure_blob_storage_availability_average|azure_blob_storage_blob_capacity_average|azure_function_app_http5xx_total|azure_function_app_http_response_time_average|azure_cosmos_db_total_request_units_total|azure_logic_app_runs_failed_total|azure_container_instance_cpu_usage_average|azure_container_instance_memory_usage_average|azure_kubernetes_service_kube_pod_status_phase_average|azure_file_storage_availability_average|azure_file_storage_file_capacity_average|azure_service_bus_namespace_deadlettered_messages_average|azure_sql_database_cpu_percent_average|azure_sql_database_dtu_used_average|azure_sql_database_storage_maximum|azure_sql_elastic_pool_cpu_percent_average|azure_sql_elastic_pool_e_dtu_used_average|azure_sql_elastic_pool_storage_used_average|azure_sql_managed_instance_avg_cpu_percent_average|azure_sql_managed_instance_io_bytes_read_average|azure_sql_managed_instance_io_bytes_written_average|azure_sql_managed_instance_storage_space_used_mb_average'

    action: 'keep'

    - regex: jp1_pc_rm_agent_create_flag

    action: labeldrop

    - source_labels: ['__name__','phase']

    regex: (azure_kubernetes_service_kube_pod_status_phase_average);Failed

    target_label: __name__

    replacement: ${1}_failed

    - source_labels: ['__name__','phase']

    regex: (azure_kubernetes_service_kube_pod_status_phase_average);Pending

    target_label: __name__

    replacement: ${1}_pending

    - source_labels: ['__name__','phase']

    regex: (azure_kubernetes_service_kube_pod_status_phase_average);Unknown

    target_label: __name__

    replacement: ${1}_unknown

  • For jpc_script

    - source_labels: ['__name__']

    regex: 'script_success|script_duration_seconds|script_exit_code'

    action: 'keep'

    - source_labels: [jp1_pc_script]

    target_label: jp1_pc_nodelabel

    - regex: (jp1_pc_script|jp1_pc_multiple_node|jp1_pc_agent_create_flag)

    action: labeldrop

  • For jpc_node_aix

    - source_labels: ['__name__']

    regex: 'node_context_switches|node_cpu|aix_diskpath_wblks|aix_diskpath_rblks|aix_disk_rserv|aix_disk_rblks|aix_disk_wserv|aix_disk_wblks|aix_disk_time|aix_disk_xrate|aix_disk_xfers|node_filesystem_avail_bytes|node_filesystem_files|node_filesystem_files_free|node_filesystem_free_bytes|node_filesystem_size_bytes|node_intr|node_load1|node_load5|node_load15|aix_memory_real_avail|aix_memory_real_free|aix_memory_real_inuse|aix_memory_real_total|aix_netinterface_mtu|aix_netinterface_ibytes|aix_netinterface_ierrors|aix_netinterface_ipackets|aix_netinterface_obytes|aix_netinterface_collisions|aix_netinterface_oerrors|aix_netinterface_opackets|aix_memory_pgspins|aix_memory_pgspouts'

    action: 'keep'

Legend:

Y: Changeable, N: Not changeable, --: Not applicable

Legend:

Y: Changeable, --: Not applicable

Legend:

R: Required, N: Not changeable, --: Not applicable