Hitachi

JP1 Version 13 JP1/Integrated Management 3 - Manager Command, Definition File and API Reference


promtool

Organization of this page

Function

This command checks the format of Prometheus server definition files and tests alert rules.

Format

promtool check config Prometheus-configuration-file-name
         check rules alert-configuration-file-name
         test rules test-file-name ...

Execution permission

None

Storage directory

In Windows:

Agent-path\tools\

In Linuix:

/opt/jp1ima/tools/

Arguments

check config Prometheus-configuration-file-name

Check the format of the Prometheus configuration file for errors.

For details about the Prometheus configuration file, see Prometheus configuration file (jpc_prometheus_server.yml).

check rules alert-configuration-file-name

Check for errors in the format of the alert configuration file.

For details about the alert configuration file, see Alert configuration file (jpc_alerting_rules.yml).

test rules test-file-name ...

Run a test of the alert rule that you wrote in the test file. You can specify up to 10 test files.

Return values

0

Format is correct, and alert rule test successful.

other than 0

Format is incorrect, and alert rule test failure.

Alert rule test file contents

Format

Write in YAML format.

File

Any-name.yml

Description

Item name

Description

Default value

rule_files:

[ - <file_name> ]

Specify a list of rule files to consider for testing.

The file name can be specified as a wildcard.

--

[ evaluation_interval: <duration> | default = 1m ]

Specify the evaluation interval for the alert rule.

1m

group_eval_order:

[ - <group_name> ]

You can specify the order of group names.

The order of the group names is the order in which the rule groups are evaluated (specific evaluation times). The order specified is guaranteed only for the group name described.

You don't have to describe every group.

You can specify the evaluation order of the rule file described in rule_files:, as shown in the following example.

<Description example>

group_eval_order:

- test02.yml

- test01.yml

--

tests:

[ - <test_group> ]

Enumerate all tests.

--

Legend:

--: Not applicable

  • <test_group>

    Item name

    Description

    Default value

    interval: <duration>

    Specify the interval between data occurrences for input_series:.

    --

    input_series:

    [ - <series> ]

    Specify the data for the series.

    --

    [ name: <string> ]

    Specify a name for the <test group>.

    --

    alert_rule_test:

    [ - <alert_test_case> ]

    Specify a test for the alert rule.

    Considers the alert rule from the specified file.

    --

Legend:

--: Not applicable

  • <series>

    Item name

    Description

    Default value

    series: <string>

    Specify the data for the series in the following format:

    'metric-name {label-name = label-value, ...}'

    <Description example>

    series_name{label1="value1", label2="value2"}

    go_goroutines{job="prometheus", instance="localhost:9090"}

    --

    values: <string>

    Specify the data to occur, separated by spaces.

    You can use the following expansion notation:

    <Example of expanded notation>

    'a+bxc' becomes 'a a+b a+(2*b) a+(3*b) ... a+(c*b)'

    'a-bxc' becomes 'a a-b a-(2*b) a-(3*b) ... a-(c*b)'

    <Description example>

    '-2+4x3' becomes '-2 2 6 10'

    ' 1-2x4' becomes '1 -1 -3 -5 -7'

    --

Legend:

--: Not applicable

  • <alert_test_case>

    Item name

    Description

    Default value

    eval_time: <duration>

    Specify the time elapsed since "time = 0s" when to evaluate the test (check for alerts).

    Evaluates the data occurrence interval specified in interval: in <test_group> up to the elapsed time specified in eval_time:.

    For example, if the data occurrence interval specified for interval: in <test_group> is 1m and the elapsed time specified for eval_time: is 5m, it is evaluated when the sixth data specified in input_series: is acquired.

    --

    alertname: <string>

    Specifies the name of the alert to test.

    Specify the value described in alert: in the alert configuration file.

    --

    exp_alerts:

    [ - <alert> ]

    Specify a list of alerts that you expect to raise.

    If you want to test that the alert rule does not run, leave the exp_alerts specification empty.

    --

Legend:

--: Not applicable

  • <alert>

    Item name

    Description

    Default value

    exp_labels:

    [ <labelname>: <string> ]

    Specifies the label of the alert that you expect to raise.

    The label also includes the label of the sample associated with the alert.

    --

    exp_annotations:

    [ <labelname>: <string> ]

    Specifies the annotation of the alert that you expect to raise.

    --

Legend:

--: Not applicable

Sample alert configuration file and alert rule test file

- Examples of "up" metrics
  • Example of description of alert configuration file

groups:
- name: alerts
  rules:
 
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
        severity: page
    annotations:
        summary: "Instance {{ $labels.instance }} down"
        description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
  • Example of description of alert rule test file

rule_files:
    - jpc_alerting_rules.yml
 
evaluation_interval: 1m
 
tests:
    # Test
    - interval: 1m
      # Series data.
      input_series:
          - series: 'up{job="prometheus", instance="localhost:9090"}'
            values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0' # success
 
      # Unit test for alerting rules.
      alert_rule_test:
          # Unit test 1.
          - eval_time: 5m
            alertname: InstanceDown
            # alertname: AnotherInstanceDown
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      severity: page
                      instance: localhost:9090
                      job: prometheus
                  exp_annotations:
                      summary: "Instance localhost:9090 down"
                      description: "localhost:9090 of job prometheus has been down for more than 5 minutes."
- Examples of CPU
  • Example of description of alert configuration file

groups:
  - name: alerts
    rules:
      - alert: cpu_alert_rule
        expr: sum(rate(windows_cpu_time_total{mode!="idle"}[1m])) < 50
        for: 2m
        labels:
          severity: page
        annotations:
          summary: "CPU alert rule summary"
          description: "CPU alert rule description"
  • Example of description of alert rule test file

rule_files:
    - jpc_alerting_rules.yml
 
evaluation_interval: 1m
 
tests:
    # Test
    - interval: 1m
      # Series data.
      input_series:
          - series: 'windows_cpu_time_total'
            #values: '0+3000x10' # fail
            values: '0+2800x10'  # success
 
      # Unit test for alerting rules.
      alert_rule_test:
          # Unit test 1.
          - eval_time: 7m
            alertname: cpu_alert_rule
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      severity: page
                  exp_annotations:
                      summary: "CPU alert rule summary"
                      description: "CPU alert rule description"
- Examples of memory (#1)
  • Example of description of alert configuration file

groups:
  - name: alerts
    rules:
      - alert: memory_alert_rule
        expr: windows_memory_available_bytes < 1073741824
        for: 3m
        labels:
          severity: page
        annotations:
          summary: "Memory alert rule summary"
          description: "Memory alert rule description"
  • Example of description of alert rule test file

rule_files:
    - jpc_alerting_rules.yml
 
evaluation_interval: 1m
 
tests:
    # Test
    - interval: 1m
      # Series data.
      input_series:
          - series: 'windows_memory_available_bytes'
            values: '1073741822 1073741826 1073741823 1073741823 1073741823 1073741823 1073741828 1073741822' # success
            #values: '1073741822 1073741822 1073741826 1073741823 1073741823 1073741823 1073741828 1073741822' # fail
 
      # Unit test for alerting rules.
      alert_rule_test:
          # Unit test 1.
          - eval_time: 5m
            alertname: memory_alert_rule
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      severity: page
                  exp_annotations:
                      summary: "Memory alert rule summary"
                      description: "Memory alert rule description"
- Examples of memory (#2)
  • Example of description of alert configuration file

groups:
  - name: alerts
    rules:
      - alert: memory_alert_rule
        expr: windows_memory_available_bytes < 1073741824
        for: 3m
        labels:
          severity: page
        annotations:
          summary: "Memory alert rule summary"
          description: "Memory alert rule description"
  • Example of description of alert rule test file

rule_files:
    - jpc_alerting_rules.yml
 
evaluation_interval: 1m
 
tests:
    # Test
    - interval: 1m
      # Series data.
      input_series:
          - series: 'windows_memory_available_bytes'
            values: '1073741822 1073741826 1073741823 1073741823 1073741823 1073741823 1073741828 1073741822' # success
            #values: '1073741822 1073741822 1073741826 1073741823 1073741823 1073741823 1073741828 1073741822' # fail
 
      # Unit test for alerting rules.
      alert_rule_test:
          # Unit test 1.
          - eval_time: 5m
            alertname: memory_alert_rule
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      severity: page
                  exp_annotations:
                      summary: "Memory alert rule summary"
                      description: "Memory alert rule description"
- Examples of interrupt
  • Example of description of alert configuration file

groups:
  - name: alerts
    rules:
      - alert: alerts_rules_increase
        expr: increase(windows_cpu_interrupts_total{core="0,0"}[3m]) > 100
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "Alert test case of increase."
          description: "Use increase function in unit test."
 
      - alert: alerts_rules_increase_all
        expr: sum(increase(windows_cpu_interrupts_total[5m])) > 240
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "Alert test case of increase."
          description: "Use increase function in unit test."
  • Example of description of alert rule test file

rule_files:
    - jpc_alerting_rules.yml
 
evaluation_interval: 1m
 
tests:
    # Test
    - interval: 1m
      # Series data.
      input_series:
          - series: 'windows_cpu_interrupts_total{core="0,0"}'
            #values: '0 10 20 30 110 120 130 140 150 160 170' #fail
            values: '0+10x3 111+10x6'  #success
          - series: 'windows_cpu_interrupts_total{core="0,1"}'
            values: '0+10x3 98+13x6'
 
      # Unit test for alerting rules.
      alert_rule_test:
          # Unit test 1.
          - eval_time: 4m
            alertname: alerts_rules_increase
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      core: "0,0"
                      severity: critical
                  exp_annotations:
                      summary: "Alert test case of increase."
                      description: "Use increase function in unit test."
          - eval_time: 8m
            alertname: alerts_rules_increase_all
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      severity: critical
                  exp_annotations:
                      summary: "Alert test case of increase."
                      description: "Use increase function in unit test."

Examples