promtool

Organization of this page

Function
Format
Execution permission
Storage directory
Arguments
Return values
Alert rule test file contents
Sample alert configuration file and alert rule test file
Examples

Function

This command checks the format of Prometheus server definition files and tests alert rules.

Format

promtool check config Prometheus-configuration-file-name
         check rules alert-configuration-file-name
         test rules test-file-name ...

To Page Top

Execution permission

None

To Page Top

Storage directory

In Windows:: Agent-path\tools\
In Linuix:: /opt/jp1ima/tools/

To Page Top

Arguments

check config Prometheus-configuration-file-name

Check the format of the Prometheus configuration file for errors.

For details about the Prometheus configuration file, see Prometheus configuration file (jpc_prometheus_server.yml).

check rules alert-configuration-file-name

Check for errors in the format of the alert configuration file.

For details about the alert configuration file, see Alert configuration file (jpc_alerting_rules.yml).

test rules test-file-name ...

Run a test of the alert rule that you wrote in the test file. You can specify up to 10 test files.

To Page Top

Return values

0	Format is correct, and alert rule test successful.
other than 0	Format is incorrect, and alert rule test failure.

To Page Top

Alert rule test file contents

Format

Write in YAML format.

File

Any-name.yml

Description

Item name	Description	Default value
rule_files: [ - <file_name> ]	Specify a list of rule files to consider for testing. The file name can be specified as a wildcard.	--
[ evaluation_interval: <duration> \| default = 1m ]	Specify the evaluation interval for the alert rule.	1m
group_eval_order: [ - <group_name> ]	You can specify the order of group names. The order of the group names is the order in which the rule groups are evaluated (specific evaluation times). The order specified is guaranteed only for the group name described. You don't have to describe every group. You can specify the evaluation order of the rule file described in `rule_files:`, as shown in the following example. <Description example> `group_eval_order:` `- test02.yml` `- test01.yml`	--
tests: [ - <test_group> ]	Enumerate all tests.	--

Legend:: --: Not applicable

<test_group>

Item name	Description	Default value
interval: <duration>	Specify the interval between data occurrences for `input_series:`.	--
input_series: [ - <series> ]	Specify the data for the series.	--
[ name: <string> ]	Specify a name for the <test group>.	--
alert_rule_test: [ - <alert_test_case> ]	Specify a test for the alert rule. Considers the alert rule from the specified file.	--

Legend:: --: Not applicable

Item name	Description	Default value
series: <string>	Specify the data for the series in the following format: `'metric-name` `{label-name` `=` `label-value, ...}'` <Description example> series_name{label1="value1", label2="value2"} go_goroutines{job="prometheus", instance="localhost:9090"}	--
values: <string>	Specify the data to occur, separated by spaces. You can use the following expansion notation: <Example of expanded notation> 'a+bxc' becomes 'a a+b a+(2b) a+(3b) ... a+(cb)' 'a-bxc' becomes 'a a-b a-(2b) a-(3b) ... a-(cb)' <Description example> '-2+4x3' becomes '-2 2 6 10' ' 1-2x4' becomes '1 -1 -3 -5 -7'	--

Item name

Description

Default value

series: <string>

Specify the data for the series in the following format:

'metric-name {label-name = label-value, ...}'

series_name{label1="value1", label2="value2"}

go_goroutines{job="prometheus", instance="localhost:9090"}

values: <string>

Specify the data to occur, separated by spaces.

You can use the following expansion notation:

'a+bxc' becomes 'a a+b a+(2*b) a+(3*b) ... a+(c*b)'

'a-bxc' becomes 'a a-b a-(2*b) a-(3*b) ... a-(c*b)'

'-2+4x3' becomes '-2 2 6 10'

' 1-2x4' becomes '1 -1 -3 -5 -7'

Legend:: --: Not applicable

<alert_test_case>

Item name	Description	Default value
eval_time: <duration>	Specify the time elapsed since "`time = 0s`" when to evaluate the test (check for alerts). Evaluates the data occurrence interval specified in `interval:` in <test_group> up to the elapsed time specified in `eval_time:`. For example, if the data occurrence interval specified for `interval:` in <test_group> is `1m` and the elapsed time specified for `eval_time:` is `5m`, it is evaluated when the sixth data specified in `input_series:` is acquired.	--
alertname: <string>	Specifies the name of the alert to test. Specify the value described in `alert:` in the alert configuration file.	--
exp_alerts: [ - <alert> ]	Specify a list of alerts that you expect to raise. If you want to test that the alert rule does not run, leave the `exp_alerts` specification empty.	--

Item name

Description

Default value

eval_time: <duration>

Specify the time elapsed since "time = 0s" when to evaluate the test (check for alerts).

Evaluates the data occurrence interval specified in interval: in <test_group> up to the elapsed time specified in eval_time:.

For example, if the data occurrence interval specified for interval: in <test_group> is 1m and the elapsed time specified for eval_time: is 5m, it is evaluated when the sixth data specified in input_series: is acquired.

alertname: <string>

Specifies the name of the alert to test.

Specify the value described in alert: in the alert configuration file.

exp_alerts:

[ - <alert> ]

Specify a list of alerts that you expect to raise.

If you want to test that the alert rule does not run, leave the exp_alerts specification empty.

Legend:: --: Not applicable

<alert>

Item name	Description	Default value
exp_labels: [ <labelname>: <string> ]	Specifies the label of the alert that you expect to raise. The label also includes the label of the sample associated with the alert.	--
exp_annotations: [ <labelname>: <string> ]	Specifies the annotation of the alert that you expect to raise.	--

Item name

Description

Default value

exp_labels:

[ <labelname>: <string> ]

Specifies the label of the alert that you expect to raise.

The label also includes the label of the sample associated with the alert.

exp_annotations:

[ <labelname>: <string> ]

Specifies the annotation of the alert that you expect to raise.

Legend:: --: Not applicable

To Page Top

Sample alert configuration file and alert rule test file

- Examples of "up" metrics

Example of description of alert configuration file

groups:
- name: alerts
  rules:
 
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
        severity: page
    annotations:
        summary: "Instance {{ $labels.instance }} down"
        description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

Example of description of alert rule test file

rule_files:
    - jpc_alerting_rules.yml
 
evaluation_interval: 1m
 
tests:
    # Test
    - interval: 1m
      # Series data.
      input_series:
          - series: 'up{job="prometheus", instance="localhost:9090"}'
            values: '0 0 0 0 0 0 0 0 0 0 0 0 0 0 0' # success
 
      # Unit test for alerting rules.
      alert_rule_test:
          # Unit test 1.
          - eval_time: 5m
            alertname: InstanceDown
            # alertname: AnotherInstanceDown
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      severity: page
                      instance: localhost:9090
                      job: prometheus
                  exp_annotations:
                      summary: "Instance localhost:9090 down"
                      description: "localhost:9090 of job prometheus has been down for more than 5 minutes."

- Examples of CPU

Example of description of alert configuration file

groups:
  - name: alerts
    rules:
      - alert: cpu_alert_rule
        expr: sum(rate(windows_cpu_time_total{mode!="idle"}[1m])) < 50
        for: 2m
        labels:
          severity: page
        annotations:
          summary: "CPU alert rule summary"
          description: "CPU alert rule description"

Example of description of alert rule test file

rule_files:
    - jpc_alerting_rules.yml
 
evaluation_interval: 1m
 
tests:
    # Test
    - interval: 1m
      # Series data.
      input_series:
          - series: 'windows_cpu_time_total'
            #values: '0+3000x10' # fail
            values: '0+2800x10'  # success
 
      # Unit test for alerting rules.
      alert_rule_test:
          # Unit test 1.
          - eval_time: 7m
            alertname: cpu_alert_rule
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      severity: page
                  exp_annotations:
                      summary: "CPU alert rule summary"
                      description: "CPU alert rule description"

- Examples of memory (#1)

Example of description of alert configuration file

groups:
  - name: alerts
    rules:
      - alert: memory_alert_rule
        expr: windows_memory_available_bytes < 1073741824
        for: 3m
        labels:
          severity: page
        annotations:
          summary: "Memory alert rule summary"
          description: "Memory alert rule description"

Example of description of alert rule test file

rule_files:
    - jpc_alerting_rules.yml
 
evaluation_interval: 1m
 
tests:
    # Test
    - interval: 1m
      # Series data.
      input_series:
          - series: 'windows_memory_available_bytes'
            values: '1073741822 1073741826 1073741823 1073741823 1073741823 1073741823 1073741828 1073741822' # success
            #values: '1073741822 1073741822 1073741826 1073741823 1073741823 1073741823 1073741828 1073741822' # fail
 
      # Unit test for alerting rules.
      alert_rule_test:
          # Unit test 1.
          - eval_time: 5m
            alertname: memory_alert_rule
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      severity: page
                  exp_annotations:
                      summary: "Memory alert rule summary"
                      description: "Memory alert rule description"

- Examples of memory (#2)

Example of description of alert configuration file

groups:
  - name: alerts
    rules:
      - alert: memory_alert_rule
        expr: windows_memory_available_bytes < 1073741824
        for: 3m
        labels:
          severity: page
        annotations:
          summary: "Memory alert rule summary"
          description: "Memory alert rule description"

Example of description of alert rule test file

rule_files:
    - jpc_alerting_rules.yml
 
evaluation_interval: 1m
 
tests:
    # Test
    - interval: 1m
      # Series data.
      input_series:
          - series: 'windows_memory_available_bytes'
            values: '1073741822 1073741826 1073741823 1073741823 1073741823 1073741823 1073741828 1073741822' # success
            #values: '1073741822 1073741822 1073741826 1073741823 1073741823 1073741823 1073741828 1073741822' # fail
 
      # Unit test for alerting rules.
      alert_rule_test:
          # Unit test 1.
          - eval_time: 5m
            alertname: memory_alert_rule
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      severity: page
                  exp_annotations:
                      summary: "Memory alert rule summary"
                      description: "Memory alert rule description"

- Examples of interrupt

Example of description of alert configuration file

groups:
  - name: alerts
    rules:
      - alert: alerts_rules_increase
        expr: increase(windows_cpu_interrupts_total{core="0,0"}[3m]) > 100
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "Alert test case of increase."
          description: "Use increase function in unit test."
 
      - alert: alerts_rules_increase_all
        expr: sum(increase(windows_cpu_interrupts_total[5m])) > 240
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "Alert test case of increase."
          description: "Use increase function in unit test."

Example of description of alert rule test file

rule_files:
    - jpc_alerting_rules.yml
 
evaluation_interval: 1m
 
tests:
    # Test
    - interval: 1m
      # Series data.
      input_series:
          - series: 'windows_cpu_interrupts_total{core="0,0"}'
            #values: '0 10 20 30 110 120 130 140 150 160 170' #fail
            values: '0+10x3 111+10x6'  #success
          - series: 'windows_cpu_interrupts_total{core="0,1"}'
            values: '0+10x3 98+13x6'
 
      # Unit test for alerting rules.
      alert_rule_test:
          # Unit test 1.
          - eval_time: 4m
            alertname: alerts_rules_increase
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      core: "0,0"
                      severity: critical
                  exp_annotations:
                      summary: "Alert test case of increase."
                      description: "Use increase function in unit test."
          - eval_time: 8m
            alertname: alerts_rules_increase_all
            exp_alerts:
                # Alert 1.
                - exp_labels:
                      severity: critical
                  exp_annotations:
                      summary: "Alert test case of increase."
                      description: "Use increase function in unit test."

To Page Top

Examples

Example of performing a format check on a Prometheus configuration file (if the format is correct)

# ./promtool check config jpc_prometheus_server.yml
Checking jpc_prometheus_server.yml
  SUCCESS: 1 rule files found
 
Checking jpc_alerting_rules.yml
  SUCCESS: 1 rules found

Example of performing a format check on a Prometheus configuration file (if the format is incorrect)

# ./promtool check config jpc_prometheus_server.yml
Checking jpc_prometheus_server.yml
  FAILED: parsing YAML file jpc_prometheus_server.yml: yaml: line 42: did not find expected key

Example of performing a format check of the alert configuration file (if the format is correct)

# ./promtool check rules jpc_alerting_rules.yml
Checking jpc_alerting_rules.yml
  SUCCESS: 1 rules found

Example of performing a format check of the alert configuration file (if the format is incorrect)

# ./promtool check rules jpc_alerting_rules.yml
Checking jpc_alerting_rules.yml
  FAILED:
jpc_alerting_rules.yml: yaml: unmarshal errors:
  line 10: field aannotations not found in type rulefmt.RuleNode

Example of running an alert rule test (if the test is successful)

# ./promtool test rules alerts_rules_unit_test.yml
Unit Testing:  alerts_rules_unit_test.yml
  SUCCESS

Example of running an alert rule test (if the test fails (Part 1))

# ./promtool test rules alerts_rules_unit_test.yml
Unit Testing:  alerts_rules_unit_test.yml
  FAILED:
    alertname:InstanceDown, time:10m,
        exp:"[Labels:{alertname=\"InstanceDown\", instance=\"localhost:9090\", job=\"prometheus\", severity=\"page\"} Annotations:{description=\"localhost:9090 of job prometheus has been down for more than 5 minutes.\", summary=\"Instance localhost:9090 down\"}]",
        got:"[]

If the test file is invalid, an alert that is expected to occur in exp and an alert that actually occurs in got are output. In the case of the above execution example, it indicates that no alert was actually notified (output result of got) for the expectation that one alert is output (output result of exp).

Example of running an alert rule test (if the test fails (Part 2))

# ./promtool test rules test_rule_file.yml
Unit Testing:  test_rule_file.yml
  FAILED:
    alertname:InstanceDown, time:5m,
        exp:"[Labels:{alertname=\"InstanceDown\", instance=\"localhost:9090\", job=\"prometheus\", severity=\"warn\"} Annotations:{description=\"localhost:9090 of job prometheus has been down for more than 5 minutes.\", summary=\"Instance localhost:9090 down\"}]",
        got:"[Labels:{alertname=\"InstanceDown\", instance=\"localhost:9090\", job=\"prometheus\", severity=\"page\"} Annotations:{description=\"localhost:9090 of job prometheus has been down for more than 5 minutes.\", summary=\"Instance localhost:9090 down\"}]

If the test file is invalid, an alert that is expected to occur in exp and an alert that actually occurs in got are output. In the above execution example, it is assumed that the value of severity is warn (output result of exp), but the value of severity is actually page (output result of got).

To Page Top